CodeRabbit, AI vs Human Code Report
1.7x / 8xmore issues in AI code than human code. 8x more performance inefficiencies.
// FOR STAFF ENGINEERS & TECH LEADS
You have to read every line because "looks plausible" is how bugs get merged three approvals deep. We built Guardian because we were tired of writing that Slack message with no data behind it.
// // 14:47 — YOUR REVIEW QUEUE
13:02 opened PR #4821 "feat: add retry to webhook handler" +312 −47 author: alice (with claude-code)
13:04 agent comment coderabbit: 18 comments, 14 "consider" nits
13:18 your turn read all 312 lines because "looks plausible" is how bugs ship
13:41 found it off-by-one in the backoff — agent missed it, two juniors already approved
14:12 opened PR #4822 "refactor: extract billing client" +1,104 −890 author: diego (with copilot)
14:47 you staring at 9 of these still in the queue. your CTO wants "more AI output."
// THE NUMBERS
Before we show you what Guardian sees on your repo, here's what the people who actually run studies have already found. Sample sizes included because you'll ask.
CodeRabbit, AI vs Human Code Report
1.7x / 8xmore issues in AI code than human code. 8x more performance inefficiencies.
Sonar, 2026 State of Code Developer Survey
96% / 48%of devs don't fully trust AI output. Only 48% verify before committing.
METR, Measuring the Impact of Early-2025 AI on Experienced OSS Developers
−19% / +24%Devs were 19% slower with AI. They thought they were 24% faster.
Faros.ai, AI Productivity Paradox Report
+98% / +91% / +9%PR volume. Review time. Bugs per developer. All up. Company-level throughput: unchanged.
// These are industry numbers. Guardian shows you yours, on your repo, next week.
// METHODOLOGY
Deterministic Git/PR analytics plus LLM comment classification. The LLM tags the comment type (nit, logic, perf, style) — it does not judge whether the comment is "good." Humans are still the baseline. The methodology is public. You can PR it.
// WHAT YOU SEE
These are real screenshots from a production tenant. Numbers blurred where they identify anyone.
feat(guardian): quantify every agent's signal-to-noise ratio across repos
Every AI review agent on your repos, side by side. Comment volume, non-nit rate, change-adoption rate. You will find out which agent is carrying its weight within the first hour.
fix(review): drill into the agent that's costing your team minutes per PR
Pick the agent. See every PR it commented on this week, its non-nit rate, whether anyone changed the code in response. This is the view you'll screenshot for your CTO.
chore(baseline): you are the reference. everyone else is measured against you
Your human reviewers on the same axis as every agent on your repos. This is the "humans-first" part of the pitch, made literal: the bar is set by the people who already do the job.
// 14:47 — #eng-quality
Not the one to your VP. The one to the people doing the work.
// THE REST OF THE SUITE
Releezy Loop runs AI coding agents in governed containers. Releezy Reviewer is an autonomous reviewer, project-specific. Both are measured by Guardian against your human baseline — no exceptions, no tuning Guardian to flatter them. If they can't meet your team's bar, you'll see it before we do.
// $ releezy init
One repo, one week of data, one Slack message you'll actually believe. If the numbers don't match what you already suspect, we'll close the account ourselves.