Introducing Tenki's code reviewer: deep, context-aware reviews that actually find bugs.Try it for Free
AI Agents
May 2026

The Composable AI Coding Stack Is Here

Eddie Wang
Eddie Wangengineering

Share Article:

Everyone wanted to know which AI coding tool to pick. Cursor, Claude Code, or Codex? The comparison articles piled up. The Reddit threads got heated. But while teams argued about which tool was best, the tools themselves started doing something unexpected: they stopped trying to be everything.

In the first week of April 2026, three things happened almost simultaneously. Cursor shipped version 3, rebuilding its entire interface around managing fleets of agents. OpenAI published an official plugin that runs inside Claude Code, Anthropic's terminal-based coding agent. And early adopters started running all three tools together, not as competitors, but as complementary layers in a composable AI coding tool stack.

The pattern should look familiar if you've worked in infrastructure. Nobody runs a single observability tool. You run Prometheus for metrics, Grafana for dashboards, PagerDuty for alerts. Each does one thing well, and value comes from how they compose. AI coding tools are splitting along the same lines.

Three launches, one week, one pattern

On April 2, Cursor launched version 3, codenamed Glass. The release replaced Cursor's Composer pane with a dedicated Agents Window, a standalone interface built for managing multiple AI agents at once. Developers can now run parallel agents across local machines, worktrees, and cloud sandboxes from a single sidebar. The changelog also added Agent Tabs for side-by-side conversations, a /best-of-n command that sends the same prompt to multiple models in isolated worktrees for comparison, and Design Mode for annotating UI elements in a built-in browser.

Three days earlier, OpenAI published codex-plugin-cc on GitHub. The plugin installs directly inside Claude Code and provides six slash commands. /codex:review runs a standard code review. /codex:adversarial-review pressure-tests around auth, data loss, and race conditions. /codex:rescue hands a task to Codex entirely, spinning it up as a subagent. An optional review gate lets Codex automatically review Claude's output before it finalizes, blocking completion if issues are found.

Read that again: OpenAI shipped an official integration into a direct competitor's product. The plugin delegates through the local Codex CLI using the developer's existing auth. No new runtime. No walled garden.

The insight isn't that these tools launched in the same week. It's that they launched in a way that makes them composable. Cursor orchestrates agents that can use any model. Claude Code accepts plugins from rival providers. Codex runs as a subagent inside another company's terminal. They're not converging into one tool. They're layering.

The three-layer model

What early adopters are assembling looks less like a product choice and more like a toolchain. Three distinct layers, each with a different job.

Orchestration: Cursor as the control plane

Cursor 3's Agents Window isn't an editor with AI bolted on. It's a control plane for managing fleets of coding agents. The interface shows all active agents in a sidebar, whether they were kicked off from the desktop, mobile, Slack, GitHub, or Linear. Agent Tabs let developers view multiple conversations side by side. Design Mode lets them annotate UI elements in a built-in browser and point agents at specific interface problems.

The move away from VS Code is deliberate. Cursor forked VS Code in 2023 to get distribution. Now it's building away from VS Code to get differentiation. The bet: if the orchestration layer wins, the text editor becomes secondary. Managing agents matters more than editing files.

Google reached the same conclusion with Antigravity, which splits its interface into an Editor View for hands-on coding and a Manager Surface for spawning and observing multiple agents. Two companies, two architectures, one conclusion: developers need a surface for managing agents, not just writing code.

Execution: Claude Code and Codex do the work

Claude Code and OpenAI Codex are the agents that actually write, review, and debug code. They operate in terminals, cloud sandboxes, or both. They read entire codebases, run tests, commit changes, and manage pull requests.

Claude Code has the stronger developer following right now. A Pragmatic Engineer survey of 906 software engineers in February 2026 gave it a 46% "most loved" rating. SemiAnalysis estimates it accounts for roughly 4% of all public GitHub commits as of March 2026, with projections suggesting 20% by year-end. Codex recently surpassed 3 million weekly active users, up from 2 million a month earlier.

This is where model differences matter most. Practitioners generally report that Claude performs better on nuanced reasoning across long context windows, while Codex handles parallelizable throughput tasks more efficiently. No neutral benchmark has confirmed that cleanly, but the perception is widespread enough to drive multi-tool adoption. Neither dominates across every scenario, which is precisely why developers are reaching for both.

Review: the newest and most underrated layer

This is what the Codex plugin specifically enables. When Claude writes code and Codex reviews it, the reviewer wasn't involved in writing. It doesn't share the same internal assumptions. It catches different classes of errors.

Cross-provider review solves a structural problem that single-model workflows can't. When you ask the same model that wrote your code to review it, you're asking someone to grade their own homework. The bias is unavoidable. A second model from a different provider, trained on different data with different optimization targets, applies genuinely independent scrutiny.

The review gate feature takes this further. Enable it, and Codex reviews every Claude output before it finalizes. If issues surface, Claude addresses them before proceeding. OpenAI's documentation warns this can create long-running loops that quickly drain usage limits. That warning tells you how seriously they expect developers to use it.

For teams already running AI code review in their CI/CD pipeline, this fits naturally. Tenki's code reviewer, for instance, already provides context-aware PR reviews that catch issues before merge. As the review layer matures, expect these kinds of automated checks to become standard regardless of which model wrote the code.

Why interoperability over lock-in

OpenAI building a plugin for Anthropic's product is the most revealing strategic signal here. The conventional playbook says lock users in, build a walled garden, make switching costly. OpenAI did the opposite.

The economics explain it. Claude Code has a large, enthusiastic installed base among professional developers. Rather than waiting for them to switch, OpenAI embedded Codex where they already work. Every plugin-initiated review generates usage counting against the developer's ChatGPT subscription or API key. Zero acquisition cost. Incremental billing.

Anthropic's open plugin architecture made this possible. Claude Code's MCP-based plugin system supports third-party integrations, including those from competitors. Both sides benefit. Anthropic gets a richer ecosystem. OpenAI gets distribution inside a competitor's installed base. Not altruism. Pragmatism.

The Agent Client Protocol makes it official

If Cursor's Agents Window and OpenAI's Codex plugin show the composable stack forming organically, the Agent Client Protocol (ACP) shows it being formalized. Built by JetBrains and Zed, ACP defines a standard communication interface so any coding agent that implements the protocol can connect to any supporting IDE without a custom integration.

The ecosystem is already substantial. The ACP Registry lists Cursor, Codex, Gemini CLI, GitHub Copilot, Mistral Vibe, and a dozen more agents. On the client side, JetBrains IDEs, Zed, Neovim (via plugin), and several others support it. In March 2026, Cursor joined the ACP Registry and became usable as an agent inside JetBrains IDEs. That means you can orchestrate Cursor from IntelliJ, which can in turn manage Claude Code and Codex agents.

Think of ACP as the LSP of AI coding agents. Language Server Protocol standardized how editors talk to language services, which unlocked the ecosystem that made VS Code dominant. ACP is doing the same thing for the agent layer. Before ACP, every agent-IDE pair needed a bespoke integration. With ACP, an agent provider writes one implementation and it works everywhere.

A JetBrains survey of 11,000 developers in January 2026 found that 90% now use AI at work and 22% use AI coding agents specifically. That's a big enough installed base to make protocol-level interoperability matter.

Cost and token budgets: the practical tradeoff

Running multiple layers is obviously more expensive than running one tool. That's the honest tradeoff, and teams should plan for it.

Consider the review gate. Every time Codex reviews Claude's output, that's a full inference pass against Codex's model. If the review flags issues, Claude re-generates, and Codex reviews again. OpenAI explicitly warns about long-running loops draining usage limits. For a team running this across 50 PRs a day, the token costs add up fast.

But there are ways to manage it. Cursor 3's /best-of-n command treats model selection the way you'd treat database selection: pick the right tool for the workload. Claude for complex refactoring with nuanced reasoning. Codex for parallelizable throughput tasks. Cursor's own Composer 2 model (built on Kimi K2.5) for cost-sensitive batch work where frontier-model quality isn't necessary.

The pattern is workload-aware routing. Don't send everything to the most expensive model. Route tasks based on complexity, risk, and required quality. Use the review layer selectively: adversarial review for security-critical paths, lighter review for low-risk changes. Teams that treat every task identically will burn through budgets. Teams that differentiate will get more value per dollar.

What changes for engineering teams

If this composable pattern holds, it reshapes three things about how engineering teams work.

"Which tool" becomes "which layer." The evaluation framework shifts. Instead of benchmarking Cursor vs. Claude Code vs. Codex in a feature matrix, you're asking what your team's weakest layer is. If you already have strong execution agents but no structured review, the next investment is the review layer. If your developers are running agents in five different terminals with no unified view, the orchestration layer is the gap.

The editor starts to recede. For 40 years, the code editor was the center of gravity. From Emacs to VS Code, the assumption was always the same: the developer writes code and tools help. Cursor 3's Agents Window and Antigravity's Manager Surface both challenge that assumption directly. The orchestration layer is competing with the editor as the primary interface. The editor is still there, still useful, but it's no longer guaranteed to be the default view.

Review goes adversarial. Single-model review was always structurally limited. Cross-provider review, where one model writes and a different model challenges, is the most promising mitigation yet for the sycophancy problem in AI-assisted development. As this matures, it could become a standard CI/CD pipeline step, not just a developer workflow choice.

How to think about your stack right now

You don't need to adopt all three layers at once. Here's a practical way to think about it.

If your team is just starting with AI coding tools, pick a strong execution-layer agent. Claude Code or Codex, depending on your model preference and what your subscription covers. Get comfortable with that before adding layers.

If you're already running agents regularly and struggling to keep track of them across repos and environments, the orchestration layer (Cursor 3, Antigravity) will give you the most immediate productivity gain.

If you're generating significant code volume with AI and starting to worry about quality drift, add the review layer. Install the codex-plugin-cc for cross-provider review inside Claude Code, and consider automated review tools like Tenki for PR-level review in your CI/CD pipeline.

The familiar infrastructure analogy applies. Just as you learned to compose Terraform, Docker, and Kubernetes rather than picking one tool for everything, the emerging pattern in AI coding is composition over consolidation. The stack is assembling faster than anyone planned. The question for your team isn't which tool wins. It's which layers you're missing.

Tags

#ai-coding-agents#multi-model-ai#developer-tooling

Recommended for you

What's next in your stack.

GET TENKI

Smarter reviews. Faster builds. Start for Free in less than 2 min.