TracePact: Catch AI agent tool-call regressions before production

Published: 2 days ago (March 8, 2026 at 04:03 AM EDT)

2 min read

Source: Dev.to

What TracePact does

TracePact is a behavioral testing framework for AI agents. It works at the tool‑call level, not the text level.

1. Write behavior contracts

import { TraceBuilder } from '@tracepact/vitest';

const trace = new TraceBuilder()
  .addCall('read_file', { path: 'src/service.ts' }, '...')
  .addCall('write_file', { path: 'src/service.ts', content: '...' })
  .addCall('run_tests', {}, 'PASS')
  .build();

// Did it read before writing?
expect(trace).toHaveCalledToolsInOrder([
  'read_file',
  'write_file',
  'run_tests'
]);

// Did it avoid shell?
expect(trace).toNotHaveCalledTool('bash');

No API calls. No tokens. Runs in milliseconds.

2. Record & replay

# Record a baseline (one‑time, live)
npx tracepact run --live --record

# Replay without API calls (instant, deterministic)
npx tracepact run --replay ./cassettes

3. Diff runs to catch drift

npx tracepact diff baseline.json latest.json --fail-on warn

Sample output

3 changes detected:
- read_file (seq 1) (removed)
+ write_file (seq 3) (added)
~ bash.cmd: "npm test" -> "npm run build"

Summary: 1 removed, 1 added, 1 arg changed

Filter noisy args and irrelevant tools

npx tracepact diff baseline.json latest.json \
  --ignore-keys timestamp,requestId \
  --ignore-tools read_file

Severity levels

none – identical
warn – arguments changed
block – tools added/removed

Use --fail-on in CI to gate deployments.

Good fit

Coding agents – read before write, run tests before finishing, never edit restricted files
Ops agents – inspect before restarting, check evidence before acting
Workflow agents – validate before mutation, avoid duplicate side effects
Internal assistants – use the correct system for the correct task

Less useful for

Pure chatbots, style evaluation, creative tasks, or systems where only text output matters. TracePact is for behavioral guarantees, not response quality.

MCP server for IDEs

TracePact ships an MCP server that works with Claude Code, Cursor, and Windsurf:

{
  "mcpServers": {
    "tracepact": {
      "command": "npx",
      "args": ["@tracepact/mcp-server"]
    }
  }
}

Available tools: tracepact_audit, tracepact_run, tracepact_capture, tracepact_replay, tracepact_diff, tracepact_list_tests.

Get started

npm install @tracepact/core @tracepact/vitest @tracepact/cli
npx tracepact init
npx tracepact

GitHub:

We built this because prompt or model changes silently break agent behavior while the output still looks fine. If you’re testing AI agents, we’d love to hear how you’re handling tool‑call regressions today.

TracePact: Catch AI agent tool-call regressions before production

What TracePact does

1. Write behavior contracts

2. Record & replay

3. Diff runs to catch drift

Filter noisy args and irrelevant tools

Severity levels

Good fit

Less useful for

MCP server for IDEs

Get started

Related posts

Understanding Grafana: A Comprehensive Guide for Beginners

The Business Case for Chaos Engineering: An ROI Calculator for Testing Application Reliability

How We Built a Chat AI Agent Into Live Device Testing Sessions

Your AI agent is a ticking time bomb. Here's how to defuse it.