AI Agents

Apr 2026

Audit AI Agent Behavior in CI with Session Traces

Hayssem Vazquez-Elsayedproduct

What the Agentic Workflow Configs Feature Exposes
Commit-to-Session Tracing: The Missing Link
Session Visibility Improvements You Should Know About
Building Dashboards on Agent Activity Patterns
The Approval Question
A Governance Framework for Agent Commits
What to log automatically
What to flag for review
What requires human sign-off before merge
Practical Implementation: A Lightweight Alerting Setup
What's Still Missing

AI coding agents are shipping real commits into production repositories. That's not a future scenario. Copilot coding agent has been doing it for months, and GitHub just made it possible to actually verify what those agents did after the fact.

In March 2026, GitHub shipped three related observability features in quick succession: agentic workflow configs visible in Actions run summaries (March 26), commit-to-session tracing (March 20), and improved session visibility (March 19). Individually, each is a nice quality-of-life improvement. Together, they form the foundation of something more useful: a proper audit trail for AI agent activity in your CI pipeline.

But the features alone don't give you governance. You still need to build the observability practices around them. This article walks through what's available now, how to connect the dots into an end-to-end provenance chain, and what a practical governance framework looks like for teams that let agents commit code.

What the Agentic Workflow Configs Feature Exposes

When Copilot coding agent opens a pull request or pushes changes, it triggers GitHub Actions workflows. Before March 26, reviewing what configuration the agent operated under meant navigating to the repository's copilot-setup-steps.yml file, cross-referencing it with the specific run, and hoping nobody changed the config between when the agent ran and when you looked at it.

Now, the Actions run summary shows the exact agentic workflow markdown configs that were active when the workflow ran. Two things matter here:

Point-in-time accuracy. You see the config as it existed when the agent ran, not the current version in the repo. If someone updated the agent's permissions or tool access after a run, the run summary still reflects what the agent actually had.
No context switching. During incident review or PR audits, you don't need to leave the run summary to understand the agent's operating parameters. The config is right there alongside the job logs and artifacts.

For teams running agentic workflows with GitHub Agentic Workflows, this is particularly relevant because those configs define what tools the agent can use, what firewall rules apply, and what custom setup steps execute before the agent starts working.

Commit-to-Session Tracing: The Missing Link

The March 20 update added an Agent-Logs-Url trailer to every commit authored by Copilot coding agent. Every agent commit already lists Copilot as the author and the human who assigned the task as co-author. Now it also includes a permanent link back to the full session logs.

This sounds simple, and it is. But it closes a gap that mattered a lot in practice. Before this, you could see that a commit was agent-authored (the author metadata told you that), but understanding why the agent made that specific change required finding the right session in the Copilot agents tab, scrolling through logs, and correlating timestamps manually.

With the trailer in place, the provenance chain becomes concrete:

A commit lands in a PR.
The commit's author metadata identifies it as agent-generated, and the co-author field identifies who triggered the task.
The Agent-Logs-Url trailer links to the session logs showing exactly what the agent did: which files it read, what tools it called, what tests it ran.
The PR triggers an Actions run whose summary now includes the agentic workflow config that governed the agent's environment.
Merge, deploy, and you've got a traceable line from production back through the agent's decision-making.

That's commit → session trace → Actions run → deployment. End-to-end provenance for agent-generated code, using only first-party GitHub features.

Session Visibility Improvements You Should Know About

The March 19 session visibility update improved what you see in the session logs themselves. This matters for audit purposes because the session log is what you'll be reviewing when something goes wrong.

Three specific improvements stand out for audit workflows:

Built-in setup step visibility. Before the agent starts working on your task, it clones the repository and starts the agent firewall (if enabled). The logs now show when these steps start and finish. If the firewall didn't initialize correctly, or the clone took unusually long, you'll see it directly in the session timeline.

Custom setup step output. If you've defined custom setup steps in copilot-setup-steps.yml, their output now shows in the session logs. This is critical for debugging environment issues without jumping to the verbose Actions logs.

Subagent activity. Copilot can delegate tasks to subagents (it often spins one up to research the codebase before making changes). Subagent activity is now collapsed by default with a heads-up display showing what it's working on. You can expand the details to see the full output. For audit purposes, this means you can verify the agent didn't go off-script during research phases.

Building Dashboards on Agent Activity Patterns

GitHub also shipped a usage metrics update on March 25 that exposes which users have active Copilot coding agent sessions. The API response includes a used_copilot_coding_agent field at the user level, available on both daily and 28-day reports. This distinguishes IDE agent mode usage from cloud coding agent usage.

Combined with the provenance data from session traces and Actions runs, you've got enough raw material to build meaningful observability. Here's what's worth tracking:

Agent session frequency per repository. A spike in agent sessions on a specific repo might mean someone's delegating work that should be reviewed more carefully. Or it might mean the team found a great use case. Either way, you want to see the trend.

File scope per session. How many files does the agent typically touch per session? An agent that modified 3 files in a focused PR is different from one that changed 47 files across 12 directories. You can extract this from the PR diff metadata and correlate it with session IDs.

Test pass rates on agent PRs versus human PRs. Copilot coding agent runs tests in its own environment before pushing. But your CI pipeline runs them again. Comparing pass rates tells you whether the agent's local environment matches your CI environment, and whether the agent is producing code that passes your full test suite at the same rate as human-written code.

Time from session start to PR merge. If agent PRs are merging in under five minutes with minimal review, that's a signal. Maybe the changes are trivial and the fast merge is fine. Maybe the review process needs tightening. The metric by itself doesn't tell you which, but it tells you where to look.

You can pull the used_copilot_coding_agent data from the Copilot usage metrics API and pipe it into whatever dashboard tool your team already uses. Datadog, Grafana, a spreadsheet. The data is there; the visualization is up to you.

The Approval Question

One related feature worth discussing here: GitHub also added the option to skip workflow approval for agent-triggered Actions runs (March 13). By default, Copilot is treated like an outside contributor: its PRs require a human to click "Approve and run workflows" before CI executes. The new setting lets repository admins skip that approval so workflows run immediately.

This creates a tension that every team using agents in CI will need to resolve. Skipping approval speeds up the feedback loop. The agent can iterate faster if it doesn't wait for a human to approve each workflow run. But it also means the agent's code triggers your CI pipeline, which may have access to tokens, secrets, and repository permissions, without anyone confirming the change first.

If you do skip approval, the audit trail we've been discussing becomes even more important. You need to be able to answer "what did the agent have access to during that run?" after the fact, because nobody verified it beforehand.

A Governance Framework for Agent Commits

Features give you the raw capability. A governance framework turns that capability into consistent practice. Here's a practical breakdown of what to log, what to flag, and what should require a human before merge.

What to log automatically

Every agent session ID, the triggering user, and the associated PR number
The agentic workflow config snapshot from the Actions run summary
Files modified and lines changed per session
CI pass/fail status and test coverage delta
Whether workflow approval was required or skipped

What to flag for review

Agent PRs that touch security-sensitive files (anything in .github/workflows, Dockerfiles, dependency manifests, auth modules)
Sessions where the agent modified more than a threshold number of files (pick a number that fits your codebase; 20 is a reasonable starting point)
Agent sessions that ran subagents (visible in the session logs since the March 19 update)
Any session where the agent's CI run failed but the PR was still approved
Changes to the agentic workflow config itself (use a CODEOWNERS rule on copilot-setup-steps.yml)

What requires human sign-off before merge

Any changes to infrastructure-as-code (Terraform, CloudFormation, Pulumi)
Database migration files
Modifications to authentication or authorization logic
Any PR where the agent reduced test coverage
Changes that affect more than one service in a monorepo

You can enforce most of these with branch protection rules and CODEOWNERS. The new visibility features don't change that. What they change is your ability to actually investigate when something gets flagged.

Practical Implementation: A Lightweight Alerting Setup

You don't need a full observability platform to start. A GitHub Actions workflow that runs on pull_request events can check whether the PR author is copilot[bot], inspect the diff for sensitive file paths, and send a Slack notification if any flags trip.

Here's a rough skeleton:

name: Agent PR Audit
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  audit:
    if: github.event.pull_request.user.login == 'copilot[bot]'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check sensitive files
        run: |
          SENSITIVE_PATTERNS=(
            '.github/workflows/'
            'Dockerfile'
            'terraform/'
            'migrations/'
            'auth/'
          )
          CHANGED=$(gh pr diff ${{ github.event.number }} --name-only)
          for pattern in "${SENSITIVE_PATTERNS[@]}"; do
            if echo "$CHANGED" | grep -q "$pattern"; then
              echo "::warning::Agent PR touches $pattern"
              # Send alert to Slack, PagerDuty, etc.
            fi
          done
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

This is deliberately simple. Extend it with the Copilot usage metrics API to correlate session data, add gh agent-task view calls to capture session details (available in GitHub CLI v2.80.0+), and route alerts based on severity. The point is to start with something, not to build a perfect system on day one.

What's Still Missing

These features are a strong start, but they don't solve everything. A few gaps to watch:

Session log retention. GitHub hasn't published a retention policy for session logs. If you need to audit agent behavior from six months ago, you might not find the logs. Consider archiving session URLs and key metadata into your own systems.

Cross-agent correlation. If you're using multiple AI agents (not just Copilot), the tracing story is currently Copilot-specific. Other agents don't add the same commit trailers or integrate with the same session log infrastructure. You'll need separate observability for each agent type.

Structured event export. The session logs are designed for human reading, not machine parsing. Building dashboards requires scraping or using the usage metrics API, which gives you aggregate data rather than per-session tool call details. A structured event stream (think OpenTelemetry for agent sessions) would make the dashboard story much cleaner.

None of these are dealbreakers. The features shipped in March give you enough to build a workable audit practice today. Just don't mistake the existence of the tools for having the practices in place. The configs in the run summary and the commit trailers are raw material. The governance framework is what your team builds on top of them.

Audit AI Agent Behavior in CI with Session Traces

Table of Contents

What the Agentic Workflow Configs Feature Exposes

Commit-to-Session Tracing: The Missing Link

Session Visibility Improvements You Should Know About

Building Dashboards on Agent Activity Patterns

The Approval Question

A Governance Framework for Agent Commits

What to log automatically

What to flag for review

What requires human sign-off before merge

Practical Implementation: A Lightweight Alerting Setup

What's Still Missing

Smarter reviews. Faster builds.
Start for Free in less than 2 min.

Audit AI Agent Behavior in CI with Session Traces

Table of Contents

What the Agentic Workflow Configs Feature Exposes

Commit-to-Session Tracing: The Missing Link

Session Visibility Improvements You Should Know About

Building Dashboards on Agent Activity Patterns

The Approval Question

A Governance Framework for Agent Commits

What to log automatically

What to flag for review

What requires human sign-off before merge

Practical Implementation: A Lightweight Alerting Setup

What's Still Missing

Smarter reviews. Faster builds. Start for Free in less than 2 min.

Smarter reviews. Faster builds.
Start for Free in less than 2 min.