Overview

AGENT CONTROL PLANE

A local-first developer tool for recording, replaying, and testing AI agent behavior.

What This Alpha Version Proves

01Agent behavior can be recorded as a deterministic trace — Every step is captured with inputs, outputs, and state
02That trace can be replayed exactly — Deterministic replay from recorded data
03Developers can inspect and test agent behavior, not just outputs — Full step inspection and behavioral testing

Getting Started

INSTALLATION

npm install

Usage

QUICK START

1. Run the Agent

# Run the restaurant booking agent

npm start

# Run with a broken scenario (for testing failure detection)

npm start -- --broken

2. Inspect the Trace

# Interactive inspection

npm run inspect traces/<trace-file>.json

# View specific step

npm run inspect traces/<trace-file>.json 3

3. Run Behavioral Tests

# Run built-in tests

npm run test traces/<trace-file>.json

# Run custom tests from YAML

npm run test traces/<trace-file>.json tests/basic.yaml

4. Analyze for Issues

npm run analyze traces/<trace-file>.json

5. Replay the Trace

npm run replay traces/<trace-file>.json

Structure

ARCHITECTURE

Agent-Control-Plane/
├── src/
│   ├── core/                 # Core components
│   │   ├── types.ts          # Type definitions
│   │   ├── trace-recorder.ts # Trace Recorder
│   │   ├── agent-runtime.ts  # Agent Runtime
│   │   ├── replay-engine.ts  # Deterministic Replay
│   │   ├── step-inspector.ts # Step Inspector
│   │   ├── test-engine.ts    # Behavioral Test Engine
│   │   └── analyzer.ts       # Memory & Step Analysis
│   │
│   ├── cli/                  # CLI tools
│   │   ├── inspect.ts        # Step inspector CLI
│   │   ├── test-runner.ts    # Test runner CLI
│   │   └── analyze.ts        # Analyzer CLI
│   │
│   └── agent/                # Agent implementation
│       ├── llm-provider.ts   # LLM provider
│       ├── tools.ts          # Agent tools
│       ├── run.ts            # Agent runner
│       └── replay.ts         # Trace replay
│
├── vscode-extension/         # VS Code Extension
│   └── src/extension.ts
│
├── tests/                    # Test definitions
│   ├── basic.yaml
│   └── broken-agent.yaml
│
└── traces/                   # Generated trace files

Data Structure

TRACE FORMAT

Each trace is a JSON file containing:

{
  "traceId": "trace_1234567890_abc",
  "agentId": "restaurant-booking-agent",
  "taskId": "task_1234567890",
  "startTime": "2024-01-15T10:00:00.000Z",
  "endTime": "2024-01-15T10:00:05.000Z",
  "status": "completed",
  "steps": [
    {
      "stepNumber": 1,
      "stepType": "llm",
      "timestamp": "2024-01-15T10:00:01.000Z",
      "input": { "prompt": "..." },
      "output": { "response": "...", "action": "search" },
      "stateSnapshot": { "currentStep": 1, "memory": {} },
      "duration": 150
    }
  ],
  "metadata": {
    "agentVersion": "1.0.0",
    "toolsUsed": ["search_restaurants", "book_restaurant"],
    "totalLLMCalls": 4,
    "totalToolCalls": 3
  }
}

Testing

TEST FORMAT

Behavioral tests are defined in YAML:

tests:
  - name: "Tool Should Be Called"
    assertions:
      - type: tool_called
        params:
          tool: search_restaurants
          minTimes: 1

  - name: "Step Limit"
    assertions:
      - type: max_steps
        params:
          count: 10

Reference

AVAILABLE ASSERTIONS

Type	Description	Params
`tool_called`	Verify a tool was called	`tool, minTimes`
`tool_not_called`	Verify a tool was NOT called	`tool`
`max_steps`	Maximum step count	`count`
`min_steps`	Minimum step count	`count`
`state_contains`	Final state contains value	`key, value`
`state_not_contains`	Final state doesn't contain value	`key, value`
`step_type_count`	Count of step type	`stepType, count, operator`

Debugging

ANALYSIS WARNINGS

The analyzer automatically scans your trace files to detect common issues and anti-patterns in agent behavior. These warnings help identify performance bottlenecks, logic errors, and resource inefficiencies before they become problems in production.

high_step_count

Too many steps executed

Triggers when an agent exceeds expected step limits. A "warning" level indicates the agent is approaching limits, while "critical" means it's significantly over threshold. This often signals infinite loops, poor decision-making, or tasks that should be broken into smaller subtasks.

memory_growth

Memory growing without cleanup

Detects when the agent's memory footprint continuously increases without being pruned or cleared. This can lead to context window overflow, increased latency, and degraded performance over long-running tasks. Consider implementing memory summarization or cleanup strategies.

repeated_tool_calls

Same tool called with identical parameters

Identifies when an agent calls the same tool with the same arguments multiple times. This usually indicates the agent isn't properly processing or storing tool results, leading to wasted API calls and increased costs. Implement result caching or improve state management.

unused_memory

Memory stored but never accessed

Flags memory entries that were written but never read during the trace execution. This suggests inefficient memory usage—the agent is storing information it doesn't actually need. Review what data is being persisted and whether it's actually required for decision-making.

long_duration

Steps taking too long to execute

Alerts when individual steps exceed expected duration thresholds. Long-running steps may indicate expensive API calls, network issues, or computationally intensive operations. Consider adding timeouts, optimizing prompts, or parallelizing independent operations.

error_rate

High percentage of error steps

Triggers when a significant portion of steps result in errors. High error rates suggest problems with tool configurations, invalid inputs, or unreliable external services. Review error patterns to identify root causes and implement better error handling or retry logic.

IDE Integration

VS CODE EXTENSION

The VS Code extension provides:

→Traces View — List all traces in workspace
→Steps View — Browse steps of current trace
→Trace Inspector Panel — Visual step-by-step inspection

Commands

ACP: Open Trace FileLoad a trace for inspection

ACP: Show Trace InspectorOpen the inspector panel

ACP: Analyze Current TraceRun analysis on loaded trace

ACP: Run AgentRun the agent in terminal

Validation

SUCCESS METRICS

Metric	Status
Same trace produces same replay behavior	Yes
Same final state after replay	Yes
Any step can be inspected	Yes
Inputs, outputs, state visible	Yes
Behavioral tests exist	Yes
Tests catch logic regressions	Yes
Tests don't depend on exact text	Yes
Tool highlights inefficiencies	Yes
Broken agent scenario exists	Yes
Tool explains why it broke	Yes

DEFINITION OF DONE

✓You can run an agent
✓Generate a trace
✓Replay it deterministically
✓Inspect steps
✓Run regression tests
✓Explain failures using the trace