Introducing Tenki's code reviewer: deep, context-aware reviews that actually find bugs.Try it for Free
Code Review
Jun 2026

GitHub Measures Copilot Adoption. Tenki Measures What Passes Review.

Eddie Wang
Eddie Wangengineering

Share Article:

On May 29, GitHub shipped a new Copilot usage metrics API update that classifies every engaged developer into an "AI adoption phase." Phase 1 means they're using code completions. Phase 3 means they're using multiple agent surfaces. Enterprise admins can now pull these cohorts from the API, segment them by team, and build adoption dashboards.

That's genuinely useful data. It tells you who's touching the tools. But it doesn't tell you the thing most engineering leaders actually need to know: is the AI-generated code any good?

Adoption rate and code quality are different axes. You can have 100% Copilot adoption and still ship the same number of bugs. All that changed is the authorship of the bugs, not the rate. The metric that closes this gap is catch rate: the percentage of AI-generated PRs that get stopped at the review gate before merging. And that number doesn't live in GitHub's API. It lives in your CI data.

What GitHub's New Cohort Data Actually Shows

The update adds an ai_adoption_phase field to user-level reports and a totals_by_ai_adoption_phase array to enterprise- and org-level reports. Each developer is bucketed into one of four phases based on a rolling 28-day window:

  • Phase 0: No engagement. Didn't meet the 2-day threshold on any Copilot surface.
  • Phase 1 (Code first): Using code completions or IDE agent mode.
  • Phase 2 (Agent first): Using one GitHub-based agent surface like Copilot code review, the cloud agent, or Copilot CLI.
  • Phase 3 (Multi-agent): Using two or more agent surfaces, or the GitHub Copilot app.

The org-level aggregates include things like average code generation activity, acceptance rates, PR throughput, and median time-to-merge, all broken out by phase. If you're running a Copilot rollout and need to prove developers are actually using it, this is exactly what you'd want.

But it's an input metric. It measures tool engagement, not output quality.

Adoption Rate Is Not a Quality Signal

Consider what happens when you build a dashboard entirely from GitHub's cohort data. You can show that 80% of your developers reached Phase 2 last quarter. You can show that Phase 3 users merge PRs 15% faster. You can show that code acceptance rates are climbing.

None of that tells you whether the merged code was correct. A team at 100% Copilot adoption could be merging the same classes of bugs they always did. The only difference is that an LLM wrote the bug instead of a human. From the perspective of the user hitting the bug in production, this is a meaningless distinction.

GitHub's metrics are adoption metrics. They belong on the "are people using the tool" slide. They don't belong on the "is the tool improving outcomes" slide. For that, you need a different signal entirely.

The Metric That Matters: Catch Rate

Catch rate answers a simple question: of the PRs that went through automated review, how many got blocked before merging?

This is an output metric. It directly measures the quality gate's impact on what reaches production. If your review tooling blocked 23% of AI-generated PRs last month, you have a concrete number to attach to the question "what would have shipped without this gate?"

GitHub's API can't produce this number because it doesn't sit in the review path. GitHub tracks whether Copilot generated the code, not whether someone caught a problem in it before merge. That's the gap.

Tenki's code reviewer fills it. Every PR in a connected repo gets reviewed. Every review that flags a blocking issue is logged as a CI event. Because Tenki runs as a GitHub App integrated into your CI pipeline, the block-or-pass decision is recorded in the same place your other CI data lives: the runner job logs. No additional instrumentation required.

How Tenki Produces Catch Rate Data

The architecture is straightforward. You install the Tenki GitHub App, point it at your repos, and every new PR triggers an automated review. Tenki analyzes the diff against your codebase context, flags issues by severity, and posts its findings as review comments. If it finds blocking issues, the PR gets a request-changes review status, which your branch protection rules can enforce as a merge gate.

That review event becomes data. Every block, every pass, every severity level is attributable to a specific PR. Over time this builds into a dataset that answers questions no adoption metric can touch:

  • What percentage of PRs from AI coding agents got blocked vs. human-authored PRs?
  • What's the severity distribution of caught issues? Are they mostly style nits, or are they logic errors and security flaws?
  • After a PR is blocked and revised, what's the merge rate on the second pass?
  • Is catch rate trending up or down as the team adopts new AI coding tools?

Tenki's benchmark data gives some context on the baseline. In tests against 122 real production bugs, Tenki's reviewer caught 69% of them. For comparison, Copilot's built-in code review caught 25%. That gap is the reason catch rate data matters so much: the review tool you use determines how much of the AI-generated risk actually gets intercepted.

What Your Dashboard Should Actually Track

If you're an engineering leader building an AI impact dashboard, here's a practical framework. Use GitHub's cohort data for the adoption layer, and Tenki's review data for the quality layer.

Adoption layer (from GitHub's API):

  • Percentage of developers in each adoption phase, broken down by team
  • Phase progression over time (are teams moving from Phase 1 to Phase 2+?)
  • Code acceptance rates by phase

Quality layer (from Tenki's review data):

  • Review block rate, segmented by PR author type (AI agent vs. human)
  • Severity distribution of caught issues over time
  • Merge rate after an initial block (how often do blocked PRs get fixed and ship?)
  • Catch rate trend line correlated with adoption phase progression

The interesting story is in the correlation. If adoption climbs from Phase 1 to Phase 3 and catch rate holds steady or drops, your team is getting better at using AI tools and the output is passing review. If adoption climbs but catch rate spikes, you've got a training problem: developers are using more AI surfaces, but the code quality is getting worse. Without both data sources on the same dashboard, you'd never see that.

Two Systems, Two Questions

GitHub's new cohort metrics are a solid addition to the Copilot usage API. If you're running a rollout and need to track which teams are actually engaging with AI tooling, use them. They answer the adoption question well.

But adoption and quality are different questions. You need both answers on the same slide. GitHub's data tells you who's using AI. Tenki's data tells you whether it's passing review. Only one of those is a quality signal.

Tags

#copilot-metrics#ai-adoption#catch-rate#copilot-code-review

Recommended for you

What's next in your stack.