Overview

AGENT CONTROL PLANE

A local-first developer tool for recording, replaying, and testing AI agent behavior.

What This Alpha Version Proves

  • 01Agent behavior can be recorded as a deterministic trace — Every step is captured with inputs, outputs, and state
  • 02That trace can be replayed exactly — Deterministic replay from recorded data
  • 03Developers can inspect and test agent behavior, not just outputs — Full step inspection and behavioral testing
Getting Started

INSTALLATION

npm install
Usage

QUICK START

1. Run the Agent

# Run the restaurant booking agent
npm start
# Run with a broken scenario (for testing failure detection)
npm start -- --broken

2. Inspect the Trace

# Interactive inspection
npm run inspect traces/<trace-file>.json
# View specific step
npm run inspect traces/<trace-file>.json 3

3. Run Behavioral Tests

# Run built-in tests
npm run test traces/<trace-file>.json
# Run custom tests from YAML
npm run test traces/<trace-file>.json tests/basic.yaml

4. Analyze for Issues

npm run analyze traces/<trace-file>.json

5. Replay the Trace

npm run replay traces/<trace-file>.json
Structure

ARCHITECTURE

Agent-Control-Plane/
├── src/
│   ├── core/                 # Core components
│   │   ├── types.ts          # Type definitions
│   │   ├── trace-recorder.ts # Trace Recorder
│   │   ├── agent-runtime.ts  # Agent Runtime
│   │   ├── replay-engine.ts  # Deterministic Replay
│   │   ├── step-inspector.ts # Step Inspector
│   │   ├── test-engine.ts    # Behavioral Test Engine
│   │   └── analyzer.ts       # Memory & Step Analysis
│   │
│   ├── cli/                  # CLI tools
│   │   ├── inspect.ts        # Step inspector CLI
│   │   ├── test-runner.ts    # Test runner CLI
│   │   └── analyze.ts        # Analyzer CLI
│   │
│   └── agent/                # Agent implementation
│       ├── llm-provider.ts   # LLM provider
│       ├── tools.ts          # Agent tools
│       ├── run.ts            # Agent runner
│       └── replay.ts         # Trace replay
│
├── vscode-extension/         # VS Code Extension
│   └── src/extension.ts
│
├── tests/                    # Test definitions
│   ├── basic.yaml
│   └── broken-agent.yaml
│
└── traces/                   # Generated trace files
Data Structure

TRACE FORMAT

Each trace is a JSON file containing:

{
  "traceId": "trace_1234567890_abc",
  "agentId": "restaurant-booking-agent",
  "taskId": "task_1234567890",
  "startTime": "2024-01-15T10:00:00.000Z",
  "endTime": "2024-01-15T10:00:05.000Z",
  "status": "completed",
  "steps": [
    {
      "stepNumber": 1,
      "stepType": "llm",
      "timestamp": "2024-01-15T10:00:01.000Z",
      "input": { "prompt": "..." },
      "output": { "response": "...", "action": "search" },
      "stateSnapshot": { "currentStep": 1, "memory": {} },
      "duration": 150
    }
  ],
  "metadata": {
    "agentVersion": "1.0.0",
    "toolsUsed": ["search_restaurants", "book_restaurant"],
    "totalLLMCalls": 4,
    "totalToolCalls": 3
  }
}
Testing

TEST FORMAT

Behavioral tests are defined in YAML:

tests:
  - name: "Tool Should Be Called"
    assertions:
      - type: tool_called
        params:
          tool: search_restaurants
          minTimes: 1

  - name: "Step Limit"
    assertions:
      - type: max_steps
        params:
          count: 10
Reference

AVAILABLE ASSERTIONS

TypeDescriptionParams
tool_calledVerify a tool was calledtool, minTimes
tool_not_calledVerify a tool was NOT calledtool
max_stepsMaximum step countcount
min_stepsMinimum step countcount
state_containsFinal state contains valuekey, value
state_not_containsFinal state doesn't contain valuekey, value
step_type_countCount of step typestepType, count, operator
Debugging

ANALYSIS WARNINGS

The analyzer automatically scans your trace files to detect common issues and anti-patterns in agent behavior. These warnings help identify performance bottlenecks, logic errors, and resource inefficiencies before they become problems in production.

high_step_count

Too many steps executed

Triggers when an agent exceeds expected step limits. A "warning" level indicates the agent is approaching limits, while "critical" means it's significantly over threshold. This often signals infinite loops, poor decision-making, or tasks that should be broken into smaller subtasks.

memory_growth

Memory growing without cleanup

Detects when the agent's memory footprint continuously increases without being pruned or cleared. This can lead to context window overflow, increased latency, and degraded performance over long-running tasks. Consider implementing memory summarization or cleanup strategies.

repeated_tool_calls

Same tool called with identical parameters

Identifies when an agent calls the same tool with the same arguments multiple times. This usually indicates the agent isn't properly processing or storing tool results, leading to wasted API calls and increased costs. Implement result caching or improve state management.

unused_memory

Memory stored but never accessed

Flags memory entries that were written but never read during the trace execution. This suggests inefficient memory usage—the agent is storing information it doesn't actually need. Review what data is being persisted and whether it's actually required for decision-making.

long_duration

Steps taking too long to execute

Alerts when individual steps exceed expected duration thresholds. Long-running steps may indicate expensive API calls, network issues, or computationally intensive operations. Consider adding timeouts, optimizing prompts, or parallelizing independent operations.

error_rate

High percentage of error steps

Triggers when a significant portion of steps result in errors. High error rates suggest problems with tool configurations, invalid inputs, or unreliable external services. Review error patterns to identify root causes and implement better error handling or retry logic.

IDE Integration

VS CODE EXTENSION

The VS Code extension provides:

  • Traces ViewList all traces in workspace
  • Steps ViewBrowse steps of current trace
  • Trace Inspector PanelVisual step-by-step inspection

Commands

ACP: Open Trace FileLoad a trace for inspection
ACP: Show Trace InspectorOpen the inspector panel
ACP: Analyze Current TraceRun analysis on loaded trace
ACP: Run AgentRun the agent in terminal
Validation

SUCCESS METRICS

MetricStatus
Same trace produces same replay behaviorYes
Same final state after replayYes
Any step can be inspectedYes
Inputs, outputs, state visibleYes
Behavioral tests existYes
Tests catch logic regressionsYes
Tests don't depend on exact textYes
Tool highlights inefficienciesYes
Broken agent scenario existsYes
Tool explains why it brokeYes

DEFINITION OF DONE

  • You can run an agent
  • Generate a trace
  • Replay it deterministically
  • Inspect steps
  • Run regression tests
  • Explain failures using the trace