Agent runtime observability and provenance layer for multi-agent AI systems.
When agents delegate, loop, and fanout across tools and models, the final output tells you nothing. AgentWeave makes the decision chain the first-class artifact — every span carries W3C PROV-O provenance on OpenTelemetry: which agent acted, which model ran, what was consumed, what was generated, and how much it cost.
Three paths to instrumentation: decorators, auto-instrumentation, or zero-code proxy. Any OTLP backend.
agent.nix 94ms
├── llm.claude-sonnet-4-6 81ms ← prompt_tokens=847, completion_tokens=312
├── tool.delegate_to_max 312ms
│ └── agent.max 298ms
│ ├── llm.gemini-2.0-flash 187ms ← prompt_tokens=1203, completion_tokens=89
│ └── tool.web_search 94ms
├── llm.claude-sonnet-4-6 80ms ← found it
└── tool.deploy_portfolio 48ms
graph LR
subgraph Agents["Agents — proxy mode"]
A1["Anthropic Agent"]
A2["Gemini Agent"]
A3["OpenAI Agent"]
end
SDK["Any Agent — SDK decorators / auto_instrument()"]
subgraph Proxy["AgentWeave Proxy :4000"]
P["Multi-Provider Proxy"]
end
subgraph LLMs["Upstream LLMs"]
AN[api.anthropic.com]
GO[generativelanguage.googleapis.com]
OA[api.openai.com]
end
subgraph Observability
OT["OTLP Collector — Tempo / Jaeger / Langfuse"]
GR["AgentWeave Dashboard"]
end
A1 -- "ANTHROPIC_BASE_URL" --> P
A2 -- "GOOGLE_GENAI_BASE_URL" --> P
A3 -- "OPENAI_BASE_URL" --> P
SDK -- "OTel spans" --> OT
P -- "/v1/messages" --> AN
P -- "/v1beta/models/*" --> GO
P -- "/v1/chat/completions" --> OA
P -- "OTel spans" --> OT
OT --> GR
Three paths to instrumentation:
- Auto-instrumentation (
auto_instrument()) — one call patches Anthropic and OpenAI SDKs. No decorators needed. - Decorators (
@trace_agent,@trace_llm,@trace_tool) — wrap your functions directly in Python, TypeScript, or Go. Zero infrastructure needed. - Proxy — point any agent's base URL at AgentWeave. It auto-detects the provider, forwards upstream, extracts token counts, and emits OTel spans. No code changes.
Main dashboard overview (KPIs, latency, token/cost trends, and agent/model breakdowns)
Tools like OpenLIT, Langfuse, and LangSmith are good at answering: what did my LLM do? Token counts, latency, cost per request, prompt logging. If you have a single agent or a single app making LLM calls, those tools cover the problem well.
AgentWeave answers a different question: what did my agent system do?
When one agent delegates to another across different machines, frameworks, or providers, you lose the thread. A trace that stops at the process boundary tells you nothing about why the overall task failed, which agent introduced the bad output, or where the cost actually went.
| OpenLIT / Langfuse / LangSmith | AgentWeave | |
|---|---|---|
| Single-agent LLM tracing | Great | Basic |
| Cost and token tracking per request | Great | Supported |
| Prompt management, evals, playground | Yes (varies) | Out of scope |
| Cross-agent delegation traces | No | Core feature |
| Traces spanning multiple machines | No | Core feature |
| Proxy-based, zero code changes | No | Yes |
| Open source, self-hosted, no SaaS tier | Varies | Yes (MIT) |
The intended use: run OpenLIT or Langfuse inside each agent for deep per-agent observability, and point them all at AgentWeave for the system view above that. The delegation graph, cross-agent cost rollups, and traces that span process boundaries are what AgentWeave adds.
No cloud, no SaaS, no enterprise tier. Just the tool.
| SDK | Language | Install |
|---|---|---|
| sdk/python | Python | pip install agentweave-sdk |
| sdk/js | TypeScript / JavaScript | npm install agentweave-sdk |
| sdk/go | Go | go get github.com/arniesaha/agentweave-go |
from agentweave import auto_instrument
auto_instrument() # patches Anthropic + OpenAI SDKs automatically
# Every client.messages.create() and client.chat.completions.create()
# now emits OTel spans with token counts — no wrappers needed.from agentweave import AgentWeaveConfig, trace_agent, trace_llm, trace_tool
AgentWeaveConfig.setup(
agent_id="my-agent-v1",
agent_model="claude-sonnet-4-6",
otel_endpoint="http://localhost:4318",
)
@trace_llm(provider="anthropic", model="claude-sonnet-4-6",
captures_input=True, captures_output=True)
def call_claude(messages: list) -> ...:
return client.messages.create(...)
@trace_tool(name="web_search", captures_input=True, captures_output=True)
def web_search(query: str) -> str:
...
@trace_agent(name="my-agent")
async def handle(message: str) -> str:
response = call_claude(messages=[{"role": "user", "content": message}])
return web_search(response.content[0].text)All three spans link to the same trace ID. Open any OTLP backend and you see the waterfall.
| Framework | Example |
|---|---|
| LangGraph | examples/langgraph |
| CrewAI | examples/crewai |
| AutoGen | examples/autogen |
| OpenAI Agents SDK | examples/openai-agents-sdk |
Patch LLM SDK client methods with a single call — no decorators needed.
from agentweave import auto_instrument, uninstrument
auto_instrument() # patch all detected SDKs
auto_instrument(providers=["anthropic"]) # selective
auto_instrument(captures_output=True) # include response preview
uninstrument() # restore originals- Supports Anthropic (
Messages.create) and OpenAI (Completions.create), sync + async - Composes with explicit
@trace_llm— auto-instrumentation detects existing spans and skips to avoid double-tracing - Idempotent — calling
auto_instrument()twice is safe - Streaming support deferred to a follow-up
Root span for an agent turn. Nests all downstream tool and LLM calls.
@trace_agent(name="nix")
def handle(message: str) -> str: ...Pass session_id to group all spans from a single user conversation together.
The value is attached as session.id on every span, making it a filterable
dimension in Grafana / Tempo.
@trace_agent(name="nix", session_id="conv-abc123")
def handle(message: str) -> str: ...The proxy also accepts the X-AgentWeave-Session-Id header for zero-code
session tagging — see docs/session-grouping.md.
Span for any tool call — file ops, API calls, shell commands, A2A delegation.
@trace_tool(name="delegate_to_max", captures_input=True, captures_output=True)
def delegate_to_max(task: str) -> dict: ...Span for LLM invocations. Auto-extracts token counts and stop reason from Anthropic, OpenAI, and Google Gemini response shapes.
@trace_llm(provider="anthropic", model="claude-sonnet-4-6", captures_output=True)
def call_claude(messages: list) -> anthropic.Message: ...Captured automatically:
prov.llm.prompt_tokens/prov.llm.completion_tokens/prov.llm.total_tokensprov.llm.stop_reasonprov.llm.response_preview(first 512 chars, whencaptures_output=True)
When agents delegate to sub-agents, use the sub-agent attribution parameters to link child sessions to their parent and distinguish agent roles in traces.
# Main agent — tags itself as the root session
@trace_agent(name="nix", session_id="sess-main-123", agent_type="main", turn_depth=1)
def main_agent(msg: str) -> str:
return delegate_to_sub(msg)
# Sub-agent — linked to parent session
@trace_agent(name="max", parent_session_id="sess-main-123",
agent_type="subagent", turn_depth=2)
def sub_agent(task: str) -> str:
return call_llm(task)Set AGENTWEAVE_PARENT_SESSION_ID and the SDK auto-populates prov.parent.session.id, defaults agent_type to "subagent", and turn_depth to 2:
export AGENTWEAVE_PARENT_SESSION_ID=sess-main-123When using the proxy, pass sub-agent context via HTTP headers:
| Header | Span attribute | Example |
|---|---|---|
X-AgentWeave-Parent-Session-Id |
prov.parent.session.id |
sess-main-123 |
X-AgentWeave-Agent-Type |
prov.agent.type |
subagent |
X-AgentWeave-Turn-Depth |
prov.session.turn |
2 |
import { traceAgent } from 'agentweave-sdk';
const subAgent = traceAgent({
name: 'max',
parentSessionId: 'sess-main-123',
agentType: 'subagent',
turnDepth: 2,
})(async (task: string) => {
return callLlm(task);
});| Attribute | Description |
|---|---|
prov.parent.session.id |
ID of the parent session that spawned this sub-agent |
prov.agent.type |
"main", "subagent", or "delegated" |
prov.session.turn |
Turn depth: 1 = main session, 2 = first-level sub-agent |
| Attribute | Description |
|---|---|
prov.activity.type |
tool_call, agent_turn, or llm_call |
prov.agent.id |
Agent identifier |
prov.agent.model |
Model name |
prov.used |
Serialized inputs consumed by the activity |
prov.wasGeneratedBy |
Output produced by the activity |
prov.wasAssociatedWith |
Agent responsible for the activity |
prov.llm.provider |
anthropic, openai, or google |
prov.llm.prompt_tokens |
Input token count |
prov.llm.completion_tokens |
Output token count |
prov.llm.total_tokens |
Total tokens |
prov.llm.stop_reason |
Why the model stopped |
prov.task.label |
Human-readable label for the task this agent is executing |
Full schema: sdk/python/agentweave/schema.py
For agents you can't instrument with decorators (Claude Code, Node.js, any runtime), run the AgentWeave proxy — a transparent HTTP server that sits between your agents and their LLM providers. Works with Claude Code out of the box — just set ANTHROPIC_BASE_URL in ~/.claude/settings.json (setup guide).
pip install "agentweave[proxy]"
agentweave proxy start --port 4000 --endpoint http://localhost:4318 --agent-id my-agent
# Point agents at the proxy — no code changes needed
export ANTHROPIC_BASE_URL=http://localhost:4000
export GOOGLE_GENAI_BASE_URL=http://localhost:4000
export OPENAI_BASE_URL=http://localhost:4000OpenAI/Codex streaming note:
/v1/chat/completionsneedsstream_options.include_usage=truefor token usage/v1/responsesand/codex/responsesdo not supportstream_options; usage should arrive in the finalresponse.completedevent- AgentWeave handles this difference in the proxy so the traced spans still get tokens/cost when upstream provides them
One port, all providers. Every LLM call gets a span automatically.
Docker / k8s setup: see
deploy/docker/Dockerfile
AgentWeave emits standard OTLP HTTP — works with any compatible backend:
| Backend | Endpoint |
|---|---|
| Grafana Tempo | http://tempo:4318 — recommended for self-hosted |
| Jaeger | http://jaeger:4318 |
| Langfuse v3 | https://cloud.langfuse.com/api/public/otel |
| Console (dev) | from agentweave import add_console_exporter; add_console_exporter() |
| Topic | Doc |
|---|---|
| Claude Code proxy setup | docs/claude-code-proxy.md |
| Session grouping | docs/session-grouping.md |
| Proxy setup | docs/proxy-setup.md |
| Production hardening | docs/production-hardening.md |
| Provider compatibility | docs/compatibility.md |
| Deterministic trace IDs | docs/deterministic-trace-ids.md |
| Span linking design | docs/span-linking-design.md |
| Proxy benchmarks | docs/benchmarks.md |
| Versioning policy | docs/versioning.md |
git clone https://github.com/arniesaha/agentweave && cd agentweave
pip install -e "./sdk/python[dev]"
pytest sdk/python # 237 Python tests
(cd sdk/js && npm ci && npx jest --verbose) # 10 TypeScript tests
(cd sdk/go && go test ./... -v) # 4 Go testsMIT



