
MCP Security Scanning: Audit Your AI Agent's Tools
AI coding agents are shipping real commits into production repositories. That's not a future scenario. Copilot coding agent has been doing it for months, and GitHub just made it possible to actually verify what those agents did after the fact.
In March 2026, GitHub shipped three related observability features in quick succession: agentic workflow configs visible in Actions run summaries (March 26), commit-to-session tracing (March 20), and improved session visibility (March 19). Individually, each is a nice quality-of-life improvement. Together, they form the foundation of something more useful: a proper audit trail for AI agent activity in your CI pipeline.
But the features alone don't give you governance. You still need to build the observability practices around them. This article walks through what's available now, how to connect the dots into an end-to-end provenance chain, and what a practical governance framework looks like for teams that let agents commit code.
When Copilot coding agent opens a pull request or pushes changes, it triggers GitHub Actions workflows. Before March 26, reviewing what configuration the agent operated under meant navigating to the repository's copilot-setup-steps.yml file, cross-referencing it with the specific run, and hoping nobody changed the config between when the agent ran and when you looked at it.
Now, the Actions run summary shows the exact agentic workflow markdown configs that were active when the workflow ran. Two things matter here:
For teams running agentic workflows with GitHub Agentic Workflows, this is particularly relevant because those configs define what tools the agent can use, what firewall rules apply, and what custom setup steps execute before the agent starts working.
The March 20 update added an Agent-Logs-Url trailer to every commit authored by Copilot coding agent. Every agent commit already lists Copilot as the author and the human who assigned the task as co-author. Now it also includes a permanent link back to the full session logs.
This sounds simple, and it is. But it closes a gap that mattered a lot in practice. Before this, you could see that a commit was agent-authored (the author metadata told you that), but understanding why the agent made that specific change required finding the right session in the Copilot agents tab, scrolling through logs, and correlating timestamps manually.
With the trailer in place, the provenance chain becomes concrete:
Agent-Logs-Url trailer links to the session logs showing exactly what the agent did: which files it read, what tools it called, what tests it ran.That's commit → session trace → Actions run → deployment. End-to-end provenance for agent-generated code, using only first-party GitHub features.
The March 19 session visibility update improved what you see in the session logs themselves. This matters for audit purposes because the session log is what you'll be reviewing when something goes wrong.
Three specific improvements stand out for audit workflows:
Built-in setup step visibility. Before the agent starts working on your task, it clones the repository and starts the agent firewall (if enabled). The logs now show when these steps start and finish. If the firewall didn't initialize correctly, or the clone took unusually long, you'll see it directly in the session timeline.
Custom setup step output. If you've defined custom setup steps in copilot-setup-steps.yml, their output now shows in the session logs. This is critical for debugging environment issues without jumping to the verbose Actions logs.
Subagent activity. Copilot can delegate tasks to subagents (it often spins one up to research the codebase before making changes). Subagent activity is now collapsed by default with a heads-up display showing what it's working on. You can expand the details to see the full output. For audit purposes, this means you can verify the agent didn't go off-script during research phases.
GitHub also shipped a usage metrics update on March 25 that exposes which users have active Copilot coding agent sessions. The API response includes a used_copilot_coding_agent field at the user level, available on both daily and 28-day reports. This distinguishes IDE agent mode usage from cloud coding agent usage.
Combined with the provenance data from session traces and Actions runs, you've got enough raw material to build meaningful observability. Here's what's worth tracking:
Agent session frequency per repository. A spike in agent sessions on a specific repo might mean someone's delegating work that should be reviewed more carefully. Or it might mean the team found a great use case. Either way, you want to see the trend.
File scope per session. How many files does the agent typically touch per session? An agent that modified 3 files in a focused PR is different from one that changed 47 files across 12 directories. You can extract this from the PR diff metadata and correlate it with session IDs.
Test pass rates on agent PRs versus human PRs. Copilot coding agent runs tests in its own environment before pushing. But your CI pipeline runs them again. Comparing pass rates tells you whether the agent's local environment matches your CI environment, and whether the agent is producing code that passes your full test suite at the same rate as human-written code.
Time from session start to PR merge. If agent PRs are merging in under five minutes with minimal review, that's a signal. Maybe the changes are trivial and the fast merge is fine. Maybe the review process needs tightening. The metric by itself doesn't tell you which, but it tells you where to look.
You can pull the used_copilot_coding_agent data from the Copilot usage metrics API and pipe it into whatever dashboard tool your team already uses. Datadog, Grafana, a spreadsheet. The data is there; the visualization is up to you.
One related feature worth discussing here: GitHub also added the option to skip workflow approval for agent-triggered Actions runs (March 13). By default, Copilot is treated like an outside contributor: its PRs require a human to click "Approve and run workflows" before CI executes. The new setting lets repository admins skip that approval so workflows run immediately.
This creates a tension that every team using agents in CI will need to resolve. Skipping approval speeds up the feedback loop. The agent can iterate faster if it doesn't wait for a human to approve each workflow run. But it also means the agent's code triggers your CI pipeline, which may have access to tokens, secrets, and repository permissions, without anyone confirming the change first.
If you do skip approval, the audit trail we've been discussing becomes even more important. You need to be able to answer "what did the agent have access to during that run?" after the fact, because nobody verified it beforehand.
Features give you the raw capability. A governance framework turns that capability into consistent practice. Here's a practical breakdown of what to log, what to flag, and what should require a human before merge.
You can enforce most of these with branch protection rules and CODEOWNERS. The new visibility features don't change that. What they change is your ability to actually investigate when something gets flagged.
You don't need a full observability platform to start. A GitHub Actions workflow that runs on pull_request events can check whether the PR author is copilot[bot], inspect the diff for sensitive file paths, and send a Slack notification if any flags trip.
Here's a rough skeleton:
name: Agent PR Audit
on:
pull_request:
types: [opened, synchronize]
jobs:
audit:
if: github.event.pull_request.user.login == 'copilot[bot]'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check sensitive files
run: |
SENSITIVE_PATTERNS=(
'.github/workflows/'
'Dockerfile'
'terraform/'
'migrations/'
'auth/'
)
CHANGED=$(gh pr diff ${{ github.event.number }} --name-only)
for pattern in "${SENSITIVE_PATTERNS[@]}"; do
if echo "$CHANGED" | grep -q "$pattern"; then
echo "::warning::Agent PR touches $pattern"
# Send alert to Slack, PagerDuty, etc.
fi
done
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}This is deliberately simple. Extend it with the Copilot usage metrics API to correlate session data, add gh agent-task view calls to capture session details (available in GitHub CLI v2.80.0+), and route alerts based on severity. The point is to start with something, not to build a perfect system on day one.
These features are a strong start, but they don't solve everything. A few gaps to watch:
Session log retention. GitHub hasn't published a retention policy for session logs. If you need to audit agent behavior from six months ago, you might not find the logs. Consider archiving session URLs and key metadata into your own systems.
Cross-agent correlation. If you're using multiple AI agents (not just Copilot), the tracing story is currently Copilot-specific. Other agents don't add the same commit trailers or integrate with the same session log infrastructure. You'll need separate observability for each agent type.
Structured event export. The session logs are designed for human reading, not machine parsing. Building dashboards requires scraping or using the usage metrics API, which gives you aggregate data rather than per-session tool call details. A structured event stream (think OpenTelemetry for agent sessions) would make the dashboard story much cleaner.
None of these are dealbreakers. The features shipped in March give you enough to build a workable audit practice today. Just don't mistake the existence of the tools for having the practices in place. The configs in the run summary and the commit trailers are raw material. The governance framework is what your team builds on top of them.
Tags
Recommended for you
What's next in your stack.
GET TENKI