What does the Tenki AI code review benchmark measure?

It measures how reliably AI code review tools catch real, production-grade bugs in pull requests. Each tool reviews the same 50 bug-introducing PRs from five major open-source repositories, and three LLM judges decide by majority vote whether a tool's comment correctly identified each bug.

Which AI code reviewers are compared?

Seven tools are compared: Tenki, CodeRabbit, Greptile, Graphite, GitHub Copilot, Cursor, and Devin. Tenki, CodeRabbit, Greptile, Graphite, and Copilot are dedicated PR review services. Cursor and Devin are coding agents shown for reference; they review their own work rather than third-party diffs.

How is a bug counted as 'caught'?

A finding is caught only when a tool posts a line-level comment that explicitly identifies the faulty code and explains its impact. Three independent LLM judges vote per finding; majority rules. Drive-by comments that mention the file but miss the actual defect do not count.

How often is this benchmark updated?

We re-run the full benchmark when new tools launch, existing tools ship major model updates, or our bug corpus grows. The current run was published April 2026. Subscribe to the Tenki changelog to get notified when new results drop.

Are the bugs real or synthetic?

Every bug is a real, merged fix from the open-source codebase. We replay the bug-introducing diff against each tool with default settings: no synthetic injections, no custom rules, no repo-specific tuning. Full repository context is granted to every reviewer.

Code Review Benchmarks

How well does Tenki do versus other
AI code reviewers in catching real bugs?

Last updated: May 20, 2026

TL;DR

Tenki is the #1 reviewer based on finding-level scoring.

In this independent 2026 benchmark of 6 leading AI code review tools, Tenki catches 84 of 122 real, production bugs across 50 pull requests from cal.com, Sentry, Grafana, Keycloak, and Discourse, graded per finding by a 3-judge LLM panel. That's 1.9× the next-best reviewer.

84/122

Real Bugs Caught

68.9%

Catch Rate

LLM Judges

Tools

Repositories

350

Code Reviews

Methodology

How we benchmarked AI code reviewers.

Every pull request contains a real, merged bug-fix from an open-source codebase. We replay the bug-introducing diff into a clean fork and let each tool review it with its default configuration. Reviews are then graded by a 3-LLM judge panel using majority vote. No synthetic bugs, no repo-specific tuning.

Sources

50 bug-fix PRs from 5 open-source repositories

Real merged fixes from cal.com (TypeScript), sentry (Python), grafana (Go), keycloak (Java), and discourse (Ruby). All five major server-side languages are represented in the bug set.

Replay

Real bugs reintroduced, not invented

The pre-fix diff is replayed against every tool with default settings: no custom rules, no repo-specific tuning, and full repository context for every reviewer. Every tool sees the same code at the same point in history.

Scoring

Per-finding 3-judge LLM majority vote

Bugs are scored individually, not per-PR (which would over-credit drive-by comments). A bug counts as caught only if a line-level comment pinpoints the faulty code and explains its impact, and at least two of three independent LLM judges agree.

Test Sources

Cal.com
TypeScript·Open source scheduling infrastructure
GitHub
Sentry
Python·Error tracking & performance monitoring
GitHub
Grafana
Go·Monitoring & observability platform
GitHub
Keycloak
Java·Identity & access management
GitHub
Discourse
Ruby·Community discussion platform
GitHub

Overall performance

Tenki leads recall and F1 across every real bug.

Each individual bug is scored, not each pull request. Data is sorted by F1 and tools were kept at their default configurations. Higher-precision tools post fewer comments overall; Tenki's higher comment volume drives both higher recall and lower precision. See methodology for how each metric is computed.

Reviewer

Recall

Precision

Tenki

68.9%(84/122 Bugs)

29.9%

41.7

CodeRabbit

28.7%(35/122 Bugs)

25.0%

26.7

Greptile

36.1%(44/122 Bugs)

15.9%

22.1

Copilot

24.6%(30/122 Bugs)

18.9%

21.4

Graphite

3.3%(4/122 Bugs)

50.0%

6.2

Coding agents

Devin

36.1%(44/122 Bugs)

47.3%

40.9

Cursor

32.0%(39/122 Bugs)

51.3%

39.4

By Severity

Do AI code reviewers catch the bugs that actually matter?

Catch rate broken down by the severity of the individual finding. Critical bugs cause outages, data loss, or auth bypass. High-severity bugs break major user-facing flows. Medium bugs degrade behavior without breaking it. The severity that matters most for production reliability is the first column.

Tenki

69%(84/122 Bugs)

Devin

36%(44/122 Bugs)

Greptile

36%(44/122 Bugs)

Cursor

32%(39/122 Bugs)

CodeRabbit

29%(35/122 Bugs)

Copilot

25%(30/122 Bugs)

Graphite

3%(4/122 Bugs)

By Repository

How each reviewer performs across five production codebases.

Per-repository recall: the share of real bugs each AI code reviewer caught in each codebase.

Tenki

91%(30/33 Bugs)

Devin

33%(11/33 Bugs)

Cursor

30%(10/33 Bugs)

Greptile

30%(10/33 Bugs)

CodeRabbit

21%(7/33 Bugs)

Copilot

21%(7/33 Bugs)

Graphite

0%(0/33 Bugs)

Case Library

Every real bug, every reviewer verdict.

One row per real bug, 122 findings across 50 production pull requests. Each cell shows whether that reviewer flagged that specific defect, decided by 3-LLM majority vote. Click any verdict to see the actual review on GitHub.

Caught Missed

deleteCacheHandler throws generic Error → tRPC surfaces as 500 instead of 403/404

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Checkbox fires onChange twice per click via redundant onClick + onCheckedChange handlers

Low

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

`getPendingActions` never shows confirm button for paid payment-enabled bookings

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Missing `revalidateTag('team-features')` leaves settings layout cache stale after role update

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Past pending-unconfirmed bookings lose cancel/edit actions

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Functional regression: 'Check for recordings' action replaced with disabled button

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

afterNthPaintCycle fires callback after n+1 frames, not n frames

Low

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

no_show action always included in afterEventActions, shown disabled for upcoming bookings

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Charge card action hidden for recurring bookings in recurring tab

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

forEach with async callback fire-and-forgets calendar/video deletions

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Backup-code login bypasses password verification

Critical

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Unguarded mainHostDestinationCalendar.integration access crashes booking

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Silent validation failure in constructor — invalid options return empty results, not errors

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Dead branches in getBaseConditions — `else if (filterConditions)` and `else` unreachable

Low

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

immediateDelete path does not update the WorkflowReminder DB row

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Default (non-immediate) path marks DB cancelled but never calls SendGrid

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Working-hours `end` check uses slotStartTime, missing the end-of-slot boundary

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Dayjs object compared with `===` instead of `.isSame()` — always `false`

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

UTC offset sign is inverted when checking date-override day match

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Missing guard for absent `timeZone` in `getSlots` override offset calculation

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Working-hours check does not reject slots on days with no working-hour entries

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

`availabilityCheckProps` does not include `dateOverrides` / `workingHours`, creating inconsistent behaviour between fixed and loose hosts

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Working-hours day-of-week comparison uses UTC day instead of organizer timezone day

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

GoogleCalendarService stores safeParse wrapper {success,data} as credential key

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Missing `prisma` import in SalesforceCalendarService causes runtime crash

Critical

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Computed property keys in minimumTokenResponseSchema use Zod schema objects, not string values

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

refreshOAuthTokens returns incompatible types: Response vs axios response

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Webhook secret compared with non-constant-time string equality

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

ZohoCRM token expiry calculation adds seconds instead of milliseconds

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Unhandled `ZodError` from `.parse()` returns 500 instead of 400 for invalid request body

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Outbound credential sync request carries no authentication — sync endpoint response is implicitly trusted

Medium

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

SMS-reminder cleanup deletes non-SMS reminders with retryCount > 1

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

isTeamAdminOrOwner uses && instead of || — check is unsatisfiable

High

Cal.com

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Duplicate self.downsize definition silently overrides 4-arg signature; existing callers crash

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Unsubscribe endpoint NoMethodError when no TopicUser row exists

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

CSRF: GET request permanently mutates topic notification state

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Missing `unsubscribe_url` in `template_args` causes `I18n::MissingInterpolationArgument` for non-notification callers

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Duplicate `%{respond_instructions}` placeholder in email notification template causes text to appear twice

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

email_in_restriction_setting? regex has no end anchor — whitelist bypass via domain suffix

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

best.html.erb closes if/else with `<%- end if %>` — invalid Ruby, view crashes

Critical

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

SSRF via user-controlled embed_url triggers open() to arbitrary URL

Critical

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

SSRF and RCE via Kernel#open with attacker-controlled URL from Disqus XML

Critical

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Non-atomic Redis SETNX + EXPIRE creates permanent throttle lock on crash

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

`invalid_host?` silently blocks all retrieval when `embeddable_host` is blank

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

XSS via unsanitized Referer header injected into JavaScript postMessage targetOrigin

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

`i.content` in PollFeed may be nil, causing NoMethodError on `.scrub`

High

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

`save_reply_relationships` called unconditionally even when `save` fails

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Port-stripping logic in `absolutize_urls` uses hardcoded ports instead of scheme-relative defaults

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Synchronous `PollFeed.new.execute({})` blocks caller and polls entire feed for one URL

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

.title float removed without flex replacement; child floats now ignored

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Destructive String#<< on literal in website_name domain check

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Light-theme lightness values silently changed during scale-color → dark-light-choose conversion

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

remove_member spec uses PUT but the route is DELETE

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

I18n Fallbacks included on backend.class instead of I18n::Backend::Simple

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

category_fabricator.rb and embeddable_host_fabricator.rb contents swapped

Medium

Discourse

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Cache-poisoning bypass of device limit

High

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

rowsAffected==0 misclassified as ErrDeviceLimitReached

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

TOCTOU race between CountDevices and INSERT

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Stale denial cache blocks newly granted permissions

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

In-process authz client loses 30s client-side cache

Low

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Stream and admission requests lose contextual logging

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Empty-after-interpolation queries bypass expr filter (shardQuerySplitting)

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Filter precedes interpolation in querySplitting too

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Switch from interpolateVariablesInQueries to applyTemplateVariables changes filter semantics

Low

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Missing React key prop on GrafanaRuleListItem in FilterView

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Silence drawer never opens for Grafana rules in v2 list

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Storage.Create error records legacy duration

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Storage.Update error records legacy duration

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

DeleteCollection async legacy goroutine records storage duration

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Delete success records `name` instead of `options.Kind`

Low

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Delete passes d.Log instead of contextual log to klog.NewContext

Low

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

CleanUpService ticker reduced from 10 minutes to 1 minute

High

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Routine cleanup progress logged at Error level

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Missing double-checked lock in GetWebAssets

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

enableSqlExpressions is dead code; FlagSqlExpressions feature flag is non-functional

High

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

BuildIndex no longer serializes concurrent builds for the same key

Medium

Grafana

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

UsernameForm.authenticate calls isConditionalPasskeysEnabled() with no args; compile error

High

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Cache-hit path of getForLogin skips createOrganizationAwareIdentityProviderModel wrapping

High

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

OrganizationAwareIdentityProviderBean adds defensive isEnabled() re-checks

Low

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

CryptoProvider.order() added without default — breaks third-party SPI implementations

High

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

All production CryptoProviders share the same order value — FIPS provider can be silently bypassed

High

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

`readSequence` silently returns empty list for indefinite-length encoded sequences

Medium

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

RECREATE_UPGRADE_EXIT_CODE silently changed from 4 to 3 (breaking external contract)

Medium

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Cleanup listener guarded by V1 flag while rest of class uses V2

High

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

GroupPermissionsV2.canManage() accepts VIEW scope — privilege escalation

Critical

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

GroupPermissionsV2 adds authorization resource IDs instead of group UUIDs to view-permission filter, breaking group-scoped admin user visibility

High

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Lithuanian totpStep1/loginTotpStep1 strings replaced with Italian text

Medium

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

AccessTokenContext constructor null-checks grantType twice instead of rawTokenId

Medium

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

isAccessTokenId test matcher inverts the grant-shortcut check

Medium

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Optional<CredentialModel>.get() called without isPresent check in login bean

Medium

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

getSubGroupsCount returns null instead of delegating like sibling methods

High

Keycloak

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

OptimizedCursorPaginator.get_item_key TypeError on datetime values

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Negative-offset queryset slicing raises AssertionError

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Date-range filtering silently removed

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

AttributeError on organization_context.member when None

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

enable_advanced gate is effectively a no-op (has_global_access defaults True)

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Redirect-chain iteration limit reduced from 10000 to 1000 in add-buffer.lua

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

max_segment_spans safety check removed from flush path

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

OptimizedCursorPaginator.get_item_key TypeError on datetime values

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Negative-offset queryset slicing raises AssertionError

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Audit-logs date-range filtering removed; permission gate too permissive

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Upsampling check uses outer `dataset` instead of `scoped_dataset`

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

`timestamp` raw_groupby validation silently removed

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Misleading 'cached' comments around upsampling decision

Low

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

OAuth state parameter is a deterministic pipeline.signature, not a per-session nonce

Critical

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

fetch_error_details mispairs error IDs to payloads via zip(error_ids, events.values())

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

{title}

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

validate_timestamp/validate_age silently skip mutual-exclusion check when age=0 or timestamp=0

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Wrong key 'detector_type' in update() prevents type from ever being updated

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Feature-flagged TableWidgetVisualization renders with empty data instead of query results

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

drain_mailbox_parallel processing_deadline_duration (120s) is shorter than its internal loop timeout (180s)

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Organization slug missing from queryKey causes cross-org cache hits

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Removed `retry: false` causes unintended request retries for attribute key fetches

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Inconsistent optional chaining in tableData?.meta.fields can throw at runtime

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Hook returns `isFetching` labeled as `isLoading`, causing misleading loading state during background refetches

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Global max-flush-segments cap is multiplied by flusher process count

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Redis cluster memory polled redundantly once per flusher process buffer

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

AssignmentSource.queued default is evaluated at class-definition time

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

schedule_type display transformation is dropped on return

Medium

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

FixedQueuePool.shutdown signals workers before draining queues, dropping messages

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

result_processor moved before block-size config in ResultsStrategyFactory.__init__

Low

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

MetricAlertDetectorHandler instantiation raises TypeError due to unimplemented abstract methods

High

Sentry

Tenki

CodeRabbit

Copilot

Cursor

Devin

Graphite

Greptile

Finding	Tenki	CodeRabbit	Copilot	Cursor	Devin	Graphite	Greptile
deleteCacheHandler throws generic Error → tRPC surfaces as 500 instead of 403/404 MediumCal.com
Checkbox fires onChange twice per click via redundant onClick + onCheckedChange handlers LowCal.com
`getPendingActions` never shows confirm button for paid payment-enabled bookings HighCal.com
Missing `revalidateTag('team-features')` leaves settings layout cache stale after role update MediumCal.com
Past pending-unconfirmed bookings lose cancel/edit actions HighCal.com
Functional regression: 'Check for recordings' action replaced with disabled button MediumCal.com
afterNthPaintCycle fires callback after n+1 frames, not n frames LowCal.com
no_show action always included in afterEventActions, shown disabled for upcoming bookings MediumCal.com
Charge card action hidden for recurring bookings in recurring tab MediumCal.com
forEach with async callback fire-and-forgets calendar/video deletions HighCal.com
Backup-code login bypasses password verification CriticalCal.com
Unguarded mainHostDestinationCalendar.integration access crashes booking HighCal.com
Silent validation failure in constructor — invalid options return empty results, not errors MediumCal.com
Dead branches in getBaseConditions — `else if (filterConditions)` and `else` unreachable LowCal.com
immediateDelete path does not update the WorkflowReminder DB row HighCal.com
Default (non-immediate) path marks DB cancelled but never calls SendGrid HighCal.com
Working-hours `end` check uses slotStartTime, missing the end-of-slot boundary HighCal.com
Dayjs object compared with `===` instead of `.isSame()` — always `false` HighCal.com
UTC offset sign is inverted when checking date-override day match HighCal.com
Missing guard for absent `timeZone` in `getSlots` override offset calculation HighCal.com
Working-hours check does not reject slots on days with no working-hour entries HighCal.com
`availabilityCheckProps` does not include `dateOverrides` / `workingHours`, creating inconsistent behaviour between fixed and loose hosts MediumCal.com
Working-hours day-of-week comparison uses UTC day instead of organizer timezone day MediumCal.com
GoogleCalendarService stores safeParse wrapper {success,data} as credential key HighCal.com
Missing `prisma` import in SalesforceCalendarService causes runtime crash CriticalCal.com
Computed property keys in minimumTokenResponseSchema use Zod schema objects, not string values HighCal.com
refreshOAuthTokens returns incompatible types: Response vs axios response HighCal.com
Webhook secret compared with non-constant-time string equality HighCal.com
ZohoCRM token expiry calculation adds seconds instead of milliseconds MediumCal.com
Unhandled `ZodError` from `.parse()` returns 500 instead of 400 for invalid request body MediumCal.com
Outbound credential sync request carries no authentication — sync endpoint response is implicitly trusted MediumCal.com
SMS-reminder cleanup deletes non-SMS reminders with retryCount > 1 HighCal.com
isTeamAdminOrOwner uses && instead of \|\| — check is unsatisfiable HighCal.com
Duplicate self.downsize definition silently overrides 4-arg signature; existing callers crash HighDiscourse
Unsubscribe endpoint NoMethodError when no TopicUser row exists HighDiscourse
CSRF: GET request permanently mutates topic notification state HighDiscourse
Missing `unsubscribe_url` in `template_args` causes `I18n::MissingInterpolationArgument` for non-notification callers HighDiscourse
Duplicate `%{respond_instructions}` placeholder in email notification template causes text to appear twice MediumDiscourse
email_in_restriction_setting? regex has no end anchor — whitelist bypass via domain suffix HighDiscourse
best.html.erb closes if/else with `<%- end if %>` — invalid Ruby, view crashes CriticalDiscourse
SSRF via user-controlled embed_url triggers open() to arbitrary URL CriticalDiscourse
SSRF and RCE via Kernel#open with attacker-controlled URL from Disqus XML CriticalDiscourse
Non-atomic Redis SETNX + EXPIRE creates permanent throttle lock on crash HighDiscourse
`invalid_host?` silently blocks all retrieval when `embeddable_host` is blank HighDiscourse
XSS via unsanitized Referer header injected into JavaScript postMessage targetOrigin HighDiscourse
`i.content` in PollFeed may be nil, causing NoMethodError on `.scrub` HighDiscourse
`save_reply_relationships` called unconditionally even when `save` fails MediumDiscourse
Port-stripping logic in `absolutize_urls` uses hardcoded ports instead of scheme-relative defaults MediumDiscourse
Synchronous `PollFeed.new.execute({})` blocks caller and polls entire feed for one URL MediumDiscourse
.title float removed without flex replacement; child floats now ignored MediumDiscourse
Destructive String#<< on literal in website_name domain check MediumDiscourse
Light-theme lightness values silently changed during scale-color → dark-light-choose conversion MediumDiscourse
remove_member spec uses PUT but the route is DELETE MediumDiscourse
I18n Fallbacks included on backend.class instead of I18n::Backend::Simple MediumDiscourse
category_fabricator.rb and embeddable_host_fabricator.rb contents swapped MediumDiscourse
Cache-poisoning bypass of device limit HighGrafana
rowsAffected==0 misclassified as ErrDeviceLimitReached MediumGrafana
TOCTOU race between CountDevices and INSERT MediumGrafana
Stale denial cache blocks newly granted permissions MediumGrafana
In-process authz client loses 30s client-side cache LowGrafana
Stream and admission requests lose contextual logging MediumGrafana
Empty-after-interpolation queries bypass expr filter (shardQuerySplitting) MediumGrafana
Filter precedes interpolation in querySplitting too MediumGrafana
Switch from interpolateVariablesInQueries to applyTemplateVariables changes filter semantics LowGrafana
Missing React key prop on GrafanaRuleListItem in FilterView MediumGrafana
Silence drawer never opens for Grafana rules in v2 list MediumGrafana
Storage.Create error records legacy duration MediumGrafana
Storage.Update error records legacy duration MediumGrafana
DeleteCollection async legacy goroutine records storage duration MediumGrafana
Delete success records `name` instead of `options.Kind` LowGrafana
Delete passes d.Log instead of contextual log to klog.NewContext LowGrafana
CleanUpService ticker reduced from 10 minutes to 1 minute HighGrafana
Routine cleanup progress logged at Error level MediumGrafana
Missing double-checked lock in GetWebAssets MediumGrafana
enableSqlExpressions is dead code; FlagSqlExpressions feature flag is non-functional HighGrafana
BuildIndex no longer serializes concurrent builds for the same key MediumGrafana
UsernameForm.authenticate calls isConditionalPasskeysEnabled() with no args; compile error HighKeycloak
Cache-hit path of getForLogin skips createOrganizationAwareIdentityProviderModel wrapping HighKeycloak
OrganizationAwareIdentityProviderBean adds defensive isEnabled() re-checks LowKeycloak
CryptoProvider.order() added without default — breaks third-party SPI implementations HighKeycloak
All production CryptoProviders share the same order value — FIPS provider can be silently bypassed HighKeycloak
`readSequence` silently returns empty list for indefinite-length encoded sequences MediumKeycloak
RECREATE_UPGRADE_EXIT_CODE silently changed from 4 to 3 (breaking external contract) MediumKeycloak
Cleanup listener guarded by V1 flag while rest of class uses V2 HighKeycloak
GroupPermissionsV2.canManage() accepts VIEW scope — privilege escalation CriticalKeycloak
GroupPermissionsV2 adds authorization resource IDs instead of group UUIDs to view-permission filter, breaking group-scoped admin user visibility HighKeycloak
Lithuanian totpStep1/loginTotpStep1 strings replaced with Italian text MediumKeycloak
AccessTokenContext constructor null-checks grantType twice instead of rawTokenId MediumKeycloak
isAccessTokenId test matcher inverts the grant-shortcut check MediumKeycloak
Optional<CredentialModel>.get() called without isPresent check in login bean MediumKeycloak
getSubGroupsCount returns null instead of delegating like sibling methods HighKeycloak
OptimizedCursorPaginator.get_item_key TypeError on datetime values HighSentry
Negative-offset queryset slicing raises AssertionError HighSentry
Date-range filtering silently removed MediumSentry
AttributeError on organization_context.member when None MediumSentry
enable_advanced gate is effectively a no-op (has_global_access defaults True) MediumSentry
Redirect-chain iteration limit reduced from 10000 to 1000 in add-buffer.lua HighSentry
max_segment_spans safety check removed from flush path HighSentry
OptimizedCursorPaginator.get_item_key TypeError on datetime values HighSentry
Negative-offset queryset slicing raises AssertionError HighSentry
Audit-logs date-range filtering removed; permission gate too permissive MediumSentry
Upsampling check uses outer `dataset` instead of `scoped_dataset` HighSentry
`timestamp` raw_groupby validation silently removed MediumSentry
Misleading 'cached' comments around upsampling decision LowSentry
OAuth state parameter is a deterministic pipeline.signature, not a per-session nonce CriticalSentry
fetch_error_details mispairs error IDs to payloads via zip(error_ids, events.values()) HighSentry
{title} HighSentry
validate_timestamp/validate_age silently skip mutual-exclusion check when age=0 or timestamp=0 HighSentry
Wrong key 'detector_type' in update() prevents type from ever being updated HighSentry
Feature-flagged TableWidgetVisualization renders with empty data instead of query results HighSentry
drain_mailbox_parallel processing_deadline_duration (120s) is shorter than its internal loop timeout (180s) HighSentry
Organization slug missing from queryKey causes cross-org cache hits HighSentry
Removed `retry: false` causes unintended request retries for attribute key fetches MediumSentry
Inconsistent optional chaining in tableData?.meta.fields can throw at runtime MediumSentry
Hook returns `isFetching` labeled as `isLoading`, causing misleading loading state during background refetches MediumSentry
Global max-flush-segments cap is multiplied by flusher process count HighSentry
Redis cluster memory polled redundantly once per flusher process buffer MediumSentry
AssignmentSource.queued default is evaluated at class-definition time MediumSentry
schedule_type display transformation is dropped on return MediumSentry
FixedQueuePool.shutdown signals workers before draining queues, dropping messages HighSentry
result_processor moved before block-size config in ResultsStrategyFactory.__init__ LowSentry
MetricAlertDetectorHandler instantiation raises TypeError due to unimplemented abstract methods HighSentry

How well does Tenki do versus otherAI code reviewers in catching real bugs?

Tenki is the #1 reviewer based on finding-level scoring.

How we benchmarked AI code reviewers.

50 bug-fix PRs from 5 open-source repositories

Real bugs reintroduced, not invented

Per-finding 3-judge LLM majority vote

Test Sources

Tenki leads recall and F1 across every real bug.

Do AI code reviewers catch the bugs that actually matter?

How each reviewer performs across five production codebases.

Every real bug, every reviewer verdict.

How well does Tenki do versus other
AI code reviewers in catching real bugs?