
GitHub Measures Copilot Adoption. Tenki Measures What Passes Review.
GitHub shipped native code coverage on pull requests yesterday. Public preview, available on Enterprise Cloud and Team plans, free during the preview period. You add the upload-code-coverage action to your CI workflow, grant the code-quality:write permission, and GitHub posts an aggregate coverage percentage as a comment on every PR.
It's a genuinely useful addition. Coverage as a first-class PR signal means reviewers don't have to context-switch to Codecov or a third-party dashboard just to see whether the new code has tests. But there's a gap between "this code was executed by a test" and "this code is correct," and that gap is where bugs live.
The setup is straightforward. Your CI generates a Cobertura XML report (most test frameworks already support this), and the upload-code-coverage action sends it to GitHub. GitHub then posts a comment on the PR with the aggregate line coverage percentage.
That number tells you one thing: what percentage of lines in the changed files were executed during the test run. If a PR touches 200 lines and 160 of them are hit by some test, coverage is 80%. The reviewer sees this at a glance without leaving the PR.
That's valuable. A PR with 12% coverage probably deserves a harder look than one with 90%. As a triage signal, coverage works.
Coverage measures execution, not correctness. Here's where the distinction matters in practice.
A test that calls a function and never checks the return value still counts as coverage. So does a test that asserts expect(result).toBeDefined() on an object that should have been null. The test ran the code. The coverage report says 100%. The test proved nothing.
This isn't a theoretical problem. Mutation testing research consistently shows that test suites with high line coverage often fail to detect injected faults. Coverage tells you the test suite visited the code. It doesn't tell you the test suite would catch a regression.
Consider a function that calculates a discount. The test passes in a standard order and checks the result. Coverage: 100%. But the function uses > instead of >= in a boundary check, so orders exactly at the threshold get the wrong price. The test doesn't exercise that boundary. Coverage can't see the off-by-one.
More broadly, coverage can't evaluate intent. It doesn't know what the code was supposed to do, only that something executed it.
An authentication handler might have 85% coverage because the happy path is well-tested. But the catch block that handles a malformed JWT? It logs the error and returns a 200 instead of a 401. Coverage sees the catch block as "uncovered" (or worse, sees it as covered if a different test triggered it incidentally), but it can't flag that the error handling itself is wrong.
Security-sensitive code needs more than execution confirmation. It needs someone (or something) reading the logic and asking: "Does this actually reject the request it should reject?"
Tenki's code reviewer doesn't look at coverage metrics. It reads the actual implementation in every PR: the diff, the surrounding context, and the codebase it sits in. Then it flags problems that coverage scores mask.
Where coverage answers "did a test run this line?", Tenki answers a different set of questions:
The setup takes about two minutes: install the GitHub App, connect your repositories, and reviews start on your next PR. You can configure severity thresholds, set custom rules that match your team's conventions, and adjust verbosity so the reviewer focuses on what matters to you.
In benchmarks against six other AI reviewers on 122 real production bugs, Tenki detected 69% of issues. The next closest was 36%. That difference comes from reading the implementation itself rather than relying on surface-level metrics.
The best use of GitHub's coverage feature isn't as a standalone merge gate. It's as one signal alongside an actual code review. Here's a practical workflow for teams that want both:
This isn't about replacing one tool with the other. Coverage and code review measure fundamentally different things. A PR with 95% coverage and a critical logic error in the covered path is a real scenario that happens all the time. Without something reading the actual code, that error sails through.
GitHub adding coverage to PRs is a good move. It closes a real gap: too many teams had to leave the PR page to check test coverage, and a lot of teams just didn't bother. Having that number right on the PR is strictly better than not having it.
But coverage answers "what ran?" and review answers "what's wrong?" They're orthogonal. If your merge gate only checks one, you're flying half-blind.
If you're already using GitHub Actions for CI, adding both takes about five minutes total. Enable coverage in your workflow, install Tenki's code reviewer, and your next PR gets both signals before a human ever opens it.
Tags
Recommended for you
What's next in your stack.