A local-first, time-travel debugger for AI agent runs. "Redux DevTools for AI agents." Point it at any OpenTelemetry-emitting agent, step through the run locally, diff two runs to see what changed. No cloud, no account.
AI agents fail silently — a confident, wrong answer with no crash and no stack trace. tracebird captures the OpenTelemetry GenAI spans your agent already emits, reconstructs them into an inspectable decision tree, and lets you step through and diff runs locally.
Inspect a run, scrub through it in time, then diff two runs. (static screenshot)
Just want to see it? Open the UI pre-loaded with sample runs — no agent needed:
npx @tracebird/cli demoOtherwise, start the receiver:
npx @tracebird/cliThen point your agent's OpenTelemetry exporter at the local receiver and run it:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318That's the whole integration — zero code changes to your agent. tracebird accepts OTLP/HTTP in both JSON and protobuf (the SDK default).
Have a saved session from a coworker? Replay it with no receiver:
npx @tracebird/cli open ./session.jsonl- Execution tree — flat spans reconstructed into run → agent → LLM/tool.
- Inspector — prompt, completion, tool args/result, tokens, cost, model, timing.
- Time-travel scrubber — drag through the run; the selection follows time.
- Diff — pick two runs; see the structural + word-level text diff ("worked yesterday").
- Search — filter runs by status or full-text across prompts, tools, and models.
- Share — export any run as a self-contained HTML page a coworker opens offline.
- Live — runs stream into the UI the moment they complete (SSE, no refresh).
- Terminal tree —
liveprints each reconstructed run as it arrives.
tracebird ingests the vendor-neutral OpenTelemetry GenAI conventions and auto-normalizes the popular dialects, so the tree, tokens, cost, and prompts render without configuration:
| Source | Guide |
|---|---|
| OpenLLMetry / Traceloop (OpenAI, Anthropic, LangChain, LlamaIndex, …) | docs |
| OpenInference (Arize Phoenix) | docs |
Vercel AI SDK (experimental_telemetry) |
docs |
| Claude Code (enhanced telemetry, beta) | docs |
| LangChain / LangGraph | docs |
| Raw OpenTelemetry GenAI | docs |
Full setup per framework is in docs/integrations/.
See examples/ for runnable agents — including a keyless one
that needs no API key.
your agent ──OTLP/HTTP──▶ @tracebird/cli ──▶ @tracebird/core ──▶ @tracebird/ui
(instrumented) receiver + UI span → run tree inspect + diff
@tracebird/core— pure span → agent-tree reconstruction. No I/O.@tracebird/cli— thenpxentrypoint: OTLP receiver + static UI server.@tracebird/ui— React app: run list, execution tree, inspector, diff, scrubber.
Read-only forensics on agent runs — live as they complete, or loaded from a saved
session. Not yet (see ROADMAP.md): replay-execution, cloud
sync / hosted version, auth / multi-user, eval scoring, multi-agent topology
graphs, gRPC ingest, SQLite persistence, VS Code extension.
This is an Nx + pnpm integrated monorepo.
pnpm install
# Run it locally (builds the CLI + UI, then launches):
pnpm demo # serve the UI with bundled sample runs — best first look
pnpm start # start the OTLP receiver + UI on :4318 (point a real agent at it)
# Quality gates:
pnpm build # nx run-many -t build (all packages)
pnpm test # nx run-many -t test (80 unit/integration tests)
pnpm lint # nx run-many -t lint
pnpm e2e # end-to-end against the built CLI binary
npx nx test core # a single project
npx nx dev ui # UI dev server with HMR (proxies /api to :4318)
pnpm demo/pnpm startbuild first, so they work from a fresh clone.
Releases are managed with changesets;
CI (.github/workflows) runs build + test + lint + e2e on every PR.
| Package | Path | Description |
|---|---|---|
@tracebird/core |
packages/core |
Span ingest + tree reconstruction (pure). |
@tracebird/cli |
packages/cli |
OTLP receiver, session storage, UI server. |
@tracebird/ui |
packages/ui |
React/Vite front-end. |
@tracebird/fixtures |
libs/fixtures |
Sample OTLP payloads + recorded sessions. |
MIT
