AGENT CONTROL PLANE
A local-first developer tool for recording, replaying, and testing AI agent behavior.
What This Alpha Version Proves
- 01Agent behavior can be recorded as a deterministic trace — Every step is captured with inputs, outputs, and state
- 02That trace can be replayed exactly — Deterministic replay from recorded data
- 03Developers can inspect and test agent behavior, not just outputs — Full step inspection and behavioral testing
INSTALLATION
npm installQUICK START
1. Run the Agent
npm startnpm start -- --broken2. Inspect the Trace
npm run inspect traces/<trace-file>.jsonnpm run inspect traces/<trace-file>.json 33. Run Behavioral Tests
npm run test traces/<trace-file>.jsonnpm run test traces/<trace-file>.json tests/basic.yaml4. Analyze for Issues
npm run analyze traces/<trace-file>.json5. Replay the Trace
npm run replay traces/<trace-file>.jsonARCHITECTURE
Agent-Control-Plane/
├── src/
│ ├── core/ # Core components
│ │ ├── types.ts # Type definitions
│ │ ├── trace-recorder.ts # Trace Recorder
│ │ ├── agent-runtime.ts # Agent Runtime
│ │ ├── replay-engine.ts # Deterministic Replay
│ │ ├── step-inspector.ts # Step Inspector
│ │ ├── test-engine.ts # Behavioral Test Engine
│ │ └── analyzer.ts # Memory & Step Analysis
│ │
│ ├── cli/ # CLI tools
│ │ ├── inspect.ts # Step inspector CLI
│ │ ├── test-runner.ts # Test runner CLI
│ │ └── analyze.ts # Analyzer CLI
│ │
│ └── agent/ # Agent implementation
│ ├── llm-provider.ts # LLM provider
│ ├── tools.ts # Agent tools
│ ├── run.ts # Agent runner
│ └── replay.ts # Trace replay
│
├── vscode-extension/ # VS Code Extension
│ └── src/extension.ts
│
├── tests/ # Test definitions
│ ├── basic.yaml
│ └── broken-agent.yaml
│
└── traces/ # Generated trace filesTRACE FORMAT
Each trace is a JSON file containing:
{
"traceId": "trace_1234567890_abc",
"agentId": "restaurant-booking-agent",
"taskId": "task_1234567890",
"startTime": "2024-01-15T10:00:00.000Z",
"endTime": "2024-01-15T10:00:05.000Z",
"status": "completed",
"steps": [
{
"stepNumber": 1,
"stepType": "llm",
"timestamp": "2024-01-15T10:00:01.000Z",
"input": { "prompt": "..." },
"output": { "response": "...", "action": "search" },
"stateSnapshot": { "currentStep": 1, "memory": {} },
"duration": 150
}
],
"metadata": {
"agentVersion": "1.0.0",
"toolsUsed": ["search_restaurants", "book_restaurant"],
"totalLLMCalls": 4,
"totalToolCalls": 3
}
}TEST FORMAT
Behavioral tests are defined in YAML:
tests:
- name: "Tool Should Be Called"
assertions:
- type: tool_called
params:
tool: search_restaurants
minTimes: 1
- name: "Step Limit"
assertions:
- type: max_steps
params:
count: 10AVAILABLE ASSERTIONS
| Type | Description | Params |
|---|---|---|
tool_called | Verify a tool was called | tool, minTimes |
tool_not_called | Verify a tool was NOT called | tool |
max_steps | Maximum step count | count |
min_steps | Minimum step count | count |
state_contains | Final state contains value | key, value |
state_not_contains | Final state doesn't contain value | key, value |
step_type_count | Count of step type | stepType, count, operator |
ANALYSIS WARNINGS
The analyzer automatically scans your trace files to detect common issues and anti-patterns in agent behavior. These warnings help identify performance bottlenecks, logic errors, and resource inefficiencies before they become problems in production.
high_step_countToo many steps executed
Triggers when an agent exceeds expected step limits. A "warning" level indicates the agent is approaching limits, while "critical" means it's significantly over threshold. This often signals infinite loops, poor decision-making, or tasks that should be broken into smaller subtasks.
memory_growthMemory growing without cleanup
Detects when the agent's memory footprint continuously increases without being pruned or cleared. This can lead to context window overflow, increased latency, and degraded performance over long-running tasks. Consider implementing memory summarization or cleanup strategies.
repeated_tool_callsSame tool called with identical parameters
Identifies when an agent calls the same tool with the same arguments multiple times. This usually indicates the agent isn't properly processing or storing tool results, leading to wasted API calls and increased costs. Implement result caching or improve state management.
unused_memoryMemory stored but never accessed
Flags memory entries that were written but never read during the trace execution. This suggests inefficient memory usage—the agent is storing information it doesn't actually need. Review what data is being persisted and whether it's actually required for decision-making.
long_durationSteps taking too long to execute
Alerts when individual steps exceed expected duration thresholds. Long-running steps may indicate expensive API calls, network issues, or computationally intensive operations. Consider adding timeouts, optimizing prompts, or parallelizing independent operations.
error_rateHigh percentage of error steps
Triggers when a significant portion of steps result in errors. High error rates suggest problems with tool configurations, invalid inputs, or unreliable external services. Review error patterns to identify root causes and implement better error handling or retry logic.
VS CODE EXTENSION
The VS Code extension provides:
- →Traces View — List all traces in workspace
- →Steps View — Browse steps of current trace
- →Trace Inspector Panel — Visual step-by-step inspection
Commands
ACP: Open Trace FileLoad a trace for inspectionACP: Show Trace InspectorOpen the inspector panelACP: Analyze Current TraceRun analysis on loaded traceACP: Run AgentRun the agent in terminalSUCCESS METRICS
| Metric | Status |
|---|---|
| Same trace produces same replay behavior | Yes |
| Same final state after replay | Yes |
| Any step can be inspected | Yes |
| Inputs, outputs, state visible | Yes |
| Behavioral tests exist | Yes |
| Tests catch logic regressions | Yes |
| Tests don't depend on exact text | Yes |
| Tool highlights inefficiencies | Yes |
| Broken agent scenario exists | Yes |
| Tool explains why it broke | Yes |
DEFINITION OF DONE
- ✓You can run an agent
- ✓Generate a trace
- ✓Replay it deterministically
- ✓Inspect steps
- ✓Run regression tests
- ✓Explain failures using the trace