Security

Apr 2026

The PR Comment That Hijacked Three AI Agents

Eddie Wangengineering

How the attack works
Claude Code: PR title to remote code execution
Gemini CLI: fake trust boundary in an issue comment
GitHub Copilot: invisible HTML, three layers bypassed
The architectural flaw all three share
What the vendor responses reveal
Why review-scoped architecture resists this by design
The phishing analogy (and why this won't go away)
What to audit in your setup right now
The takeaway

On April 15, 2026, security researcher Aonan Guan published the details of an attack he calls Comment and Control. Using nothing more than crafted GitHub comments, he and researchers from Johns Hopkins University exfiltrated live API keys and tokens from three of the most widely deployed AI coding agents: Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub's Copilot Agent. No external servers. No sophisticated exploits. Just text in a PR title, an issue body, and a hidden HTML comment.

The attack hit the front page of Reddit, drew coverage from SecurityWeek and VentureBeat, and became the highest-profile AI agent security incident of 2026 so far. It also exposed something that security-conscious engineering teams should care about deeply: the architectural flaw that made all three agents vulnerable is the same one, and it's shared by every AI tool that treats PR comments as trusted instructions.

How the attack works

The core pattern is simple. All three agents run inside GitHub Actions and follow the same flow: they read GitHub data (PR titles, issue bodies, comments), feed it into an LLM as context, and then execute tools based on what the LLM decides to do. The attacker plants a prompt injection payload in the GitHub data itself. The agent reads it, follows the injected instructions, and exfiltrates credentials through GitHub's own channels.

Guan calls it Comment and Control, a play on Command and Control (C2), because the entire attack loop runs within GitHub. The attacker writes a comment, the agent reads it, executes the payload, and writes the stolen credentials back as another comment, a commit, or an Actions log entry. No external infrastructure required.

Each agent fell slightly differently.

Claude Code: PR title to remote code execution

Anthropic's Claude Code Security Review interpolates the PR title directly into the prompt with no sanitization. The Claude CLI is invoked without --disallowed-tools or --allowed-tools flags, and the subprocess inherits every environment variable. The attacker opens a PR with a title that breaks out of the prompt context and tells Claude to run shell commands. Claude obliges, executes ps auxeww, and posts the ANTHROPIC_API_KEY and GITHUB_TOKEN as a "security finding" in the PR comment. Anthropic rated it CVSS 9.4 Critical.

Gemini CLI: fake trust boundary in an issue comment

Google's Gemini CLI Action reads issue titles, bodies, and comments into the agent's context. The attacker injects a fake "Trusted Content Section" after the real context, overriding Gemini's safety instructions. Gemini then posts the full GEMINI_API_KEY as a public issue comment, visible to anyone.

GitHub Copilot: invisible HTML, three layers bypassed

This one was the most sophisticated. GitHub's Copilot Agent had three extra runtime defenses on top of model-level and prompt-level protections: environment variable filtering, secret scanning before git push, and a network firewall restricting outbound connections. Guan bypassed all of them.

The attack used an HTML comment in a GitHub issue. In the rendered view, the issue looks innocent: "Debug first. You need to debug with whoami." Hidden in the HTML comment were instructions to run ps auxeww | base64 -w0 and commit the output. The environment filter only applied to the child bash subprocess, but ps auxeww reads every process's environment, including the parent Node.js process and MCP server that still held unfiltered credentials. Base64 encoding defeated the secret scanner. And since github.com is on the firewall's allow list, the credentials were exfiltrated as a commit in the PR. Four credentials extracted, including GITHUB_TOKEN, GITHUB_COPILOT_API_TOKEN, and a personal access token.

Strip away the differences in injection surface and exfiltration channel, and you're left with the same structural problem: these agents are given powerful tools (bash execution, git push, API calls) and production secrets (API keys, tokens) in the same runtime that processes untrusted user input.

This isn't a bug that can be patched with better input sanitization. PR titles, issue comments, and issue bodies are legitimate data that the agent must read to do its job. The attacker isn't exploiting a parser flaw. They're hijacking the agent's context within the boundaries of its intended workflow. As Guan put it: "The prompt injection here is not a bug; it is context that the agent is designed to process."

The trust boundary between "code author" and "reviewer" is collapsed. When an AI coding agent treats a PR comment as an instruction it should follow, any contributor who can open a PR or file an issue becomes a potential operator of that agent. That's the fundamental problem.

What the vendor responses reveal

The three vendor responses tell you a lot about how each company thinks about the problem.

Anthropic rated the issue CVSS 9.4 Critical and shipped a mitigation: they added --disallowed-tools 'Bash(ps:*)' to block the ps command. But cat /proc/*/environ achieves the same result. Blocklisting is whack-a-mole. They also conceded that the action "is not designed to be hardened against prompt injection." Bounty: $100.

Google accepted the report through their VRP and awarded $1,337. Their fix was adding "guardrail prompts" to the system prompt. But the underlying threat model hasn't changed: the Gemini agent still has the same tool capabilities, and the injection surface is still there.

GitHub initially closed the Copilot report as "Informative," calling it "a known issue that does not present a significant security risk." Only after Guan pushed back with reverse-engineered source code evidence (showing the UU() environment filter function was designed to prevent this) did they reopen and resolve with a $500 bounty. Their final statement: "This is a previously identified architectural limitation."

"Architectural limitation" is the key phrase. All three vendors acknowledge, to varying degrees, that the problem is structural. None of them have fundamentally changed how their agents process untrusted input.

Why review-scoped architecture resists this by design

The Comment and Control attack works because the vulnerable agents operate as general-purpose coding agents inside GitHub Actions. They have bash access, they hold production secrets, and they process any text that lands in a PR or issue. They're structurally incapable of distinguishing an attacker's instructions from a legitimate reviewer's input.

A review tool that's scoped differently avoids the entire attack surface. Tenki's Code Reviewer is a useful contrast here because of how its architecture sidesteps the pattern.

First, it's scoped to the diff. Tenki's reviewer analyzes the code changes in a PR. It doesn't process arbitrary PR comments or issue bodies as instructions. There's no path for an attacker to inject prompt content through a comment that the reviewer would treat as operational context.

Second, its behavior is configured through committed Custom Context files that live in your repository. These are markdown files that define team rules, focus areas, and things to ignore. Because they're committed to the codebase, they go through the same review process as any other code change. An external contributor can't modify them through a PR comment or issue body.

Third, it doesn't run as a general-purpose agent with shell access and production secrets. It's a reviewer, not a coding agent. It doesn't execute bash commands, doesn't hold API keys in its runtime, and doesn't push commits. The attack surface that Comment and Control exploits simply doesn't exist in this model.

This isn't a coincidence. It's a design choice. When you build a tool that only reviews code, you don't need to give it a shell. When you configure it through committed files instead of runtime comments, you don't create an injection surface for external contributors.

The phishing analogy (and why this won't go away)

Guan draws a comparison to phishing, and it's apt. Phishing works because employees must process information from outside the organization to do their jobs: emails, links, attachments. An attacker crafts a message that looks legitimate, and the employee acts on it. We've spent decades building defenses against phishing, and it's still the most effective breach vector.

Prompt injection is the same dynamic, applied to machines. AI agents must process context from their environment to do their jobs. An attacker crafts input that looks like legitimate workflow data, and the agent acts on it. The defenses will improve over time, but the fundamental attack surface isn't going away. And as more AI agents get deployed across more workflows, the injection surfaces will grow with them.

This isn't limited to GitHub Actions, either. Guan's research notes that the pattern applies to any agent that processes untrusted input with access to tools and secrets: Slack bots, Jira agents, email agents, deployment automation. The injection surface changes, but the underlying conflict is the same.

What to audit in your setup right now

If your team uses AI coding agents in GitHub Actions, here's a concrete checklist.

Inventory your AI agent triggers. Check every workflow that fires on pull_request, issues, or issue_comment events. If any of them invoke an AI agent, that agent is potentially exposed to prompt injection from external contributors.
Check which secrets each workflow can access. Workflows using pull_request_target can access repository secrets even from fork PRs. If your AI agent workflow uses this trigger, external contributors can inject prompts that execute in a context with your production secrets.
Audit tool permissions. Does the agent have bash access? Can it push commits? Does it hold API keys? If your code review agent has the same capabilities as a full coding agent, it has the same attack surface. Use allowlists (--allowed-tools) rather than blocklists.
Treat AI agents like employees. Guan's framing is useful: if a human intern wouldn't get production credentials to triage GitHub issues, neither should the agent. Apply need-to-know and least privilege the same way you would for any team member.
Check for HTML comment injection in past issues. If you use Copilot Agent, review issues previously assigned to it. Hidden HTML comments won't show in the rendered view. Check the raw markdown for any issue that was assigned to an AI agent.
Rotate exposed secrets. If any of the affected agents have been running in your workflows with access to API keys or tokens, assume those secrets may have been exposed. Rotate them.
Separate review from execution. The safest AI review tools are the ones that don't have shell access or hold production secrets in the first place. If your review agent can also write code, push commits, and run arbitrary commands, it's not really a reviewer. It's a general-purpose agent with a review label.

The takeaway

Comment and Control isn't an isolated vulnerability. It's a demonstration that the current generation of AI coding agents has a trust model problem. When your review tool can be hijacked by anyone who can open a PR, the tool has more in common with a backdoor than a reviewer.

The vendors know it. Anthropic calls their action "not designed to be hardened against prompt injection." GitHub calls it "a previously identified architectural limitation." These are honest admissions. The question for engineering teams is whether you're comfortable running tools with known architectural limitations in workflows that have access to your production secrets.

The alternative is to use tools that are scoped to what they actually need to do. A code reviewer that only reads diffs, takes its configuration from committed files, and doesn't hold production secrets in its runtime. That's a defensible architecture. Everything else is a question of when, not whether, the next injection lands.

The PR Comment That Hijacked Three AI Agents

Table of Contents

How the attack works

Claude Code: PR title to remote code execution

Gemini CLI: fake trust boundary in an issue comment

GitHub Copilot: invisible HTML, three layers bypassed

What the vendor responses reveal

Why review-scoped architecture resists this by design

The phishing analogy (and why this won't go away)

What to audit in your setup right now

The takeaway

Smarter reviews. Faster builds.
Start for Free in less than 2 min.

The PR Comment That Hijacked Three AI Agents

Table of Contents

How the attack works

Claude Code: PR title to remote code execution

Gemini CLI: fake trust boundary in an issue comment

GitHub Copilot: invisible HTML, three layers bypassed

The architectural flaw all three share

What the vendor responses reveal

Why review-scoped architecture resists this by design

The phishing analogy (and why this won't go away)

What to audit in your setup right now

The takeaway

Smarter reviews. Faster builds. Start for Free in less than 2 min.

Smarter reviews. Faster builds.
Start for Free in less than 2 min.