Skip to content

fix: key_injection fallback + OTel links + session_id propagation#182

Merged
arniesaha merged 4 commits intomainfrom
feat/session-id-propagation
May 2, 2026
Merged

fix: key_injection fallback + OTel links + session_id propagation#182
arniesaha merged 4 commits intomainfrom
feat/session-id-propagation

Conversation

@arniesaha
Copy link
Copy Markdown
Owner

@arniesaha arniesaha commented Apr 27, 2026

Summary

Three fixes on feat/session-id-propagation:

#174 — key_injection fallback (fixes sub-agent auth)

Proxy had API keys in AGENTWEAVE_* env vars but key_injection=false meant it wouldn't use them. Sub-agents routing through proxy had no keys → all LLM requests failed.

Fix: When AGENTWEAVE_ANTHROPIC_API_KEY / AGENTWEAVE_OPENAI_API_KEY / AGENTWEAVE_GOOGLE_API_KEY are empty, fall back to standard SDK env vars (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY). Resolution: explicit AGENTWEAVE_* var > standard SDK var.

#178 — OTel links[] for cross-process span lineage

610/627 llm_call spans show as root-of-trace because openclaw HTTP client doesn't propagate W3C traceparent as per-request header.

Fix: Bridge sets x-agentweave-parent-trace-id + x-agentweave-parent-span-id headers on outbound LLM requests. Proxy reads these and creates OTel Link[] from llm_call spans to parent agent_turn spans.

#176 follow-up — session_id propagation (already in this branch)

Adds agentweave.context with ContextVar-based session scope. auto_instrument() and @trace_tool now stamp session.id / prov.session.id from active context.

Files changed

  • sdk/python/agentweave/proxy.py — key_injection fallback + OTel Link creation
  • plugins/openclaw-agentweave-bridge/src/service.ts — parent span/trace ID header injection
  • sdk/python/agentweave/context.py — session_id ContextVar
  • sdk/python/tests/test_proxy.py — new tests for both fixes

arniesaha and others added 3 commits April 26, 2026 23:50
…pans

~40% of lakehouse.spans rows had NULL session_id because auto_instrument()
LLM wrappers and @trace_tool spans never read any session context. Only
@trace_agent(session_id=...) and the proxy server were stamping it.

Introduces agentweave.context with a ContextVar-based session_id source of
truth. Resolution order: explicit @trace_agent kwarg > ContextVar
(set_session_id / session_scope / @trace_agent body) > AGENTWEAVE_SESSION_ID
env var > None.

Auto-instrumented LLM wrappers (Anthropic / OpenAI / Google) and
@trace_tool now stamp session.id and prov.session.id when any source is
set. @trace_agent wraps its body in session_scope so nested LLM/tool
spans inherit the active session. Failure mode is unchanged when no
source is set — span emits without session.id, no exception. A
debug-gated (AGENTWEAVE_DEBUG=1) one-shot warning surfaces missing
session_id during development.

Tracks nexus#29. Phase 2 (JS SDK) deferred — same spec covers it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…loses #174)

When AGENTWEAVE_ANTHROPIC_API_KEY / AGENTWEAVE_GOOGLE_API_KEY / AGENTWEAVE_OPENAI_API_KEY
are empty or unset (i.e. key_injection=false on /health), the proxy now falls back to
the standard SDK env vars (ANTHROPIC_API_KEY, GOOGLE_API_KEY, OPENAI_API_KEY).

This fixes sub-agent LLM requests failing with auth errors when the k8s Secret only
populates the standard SDK vars. The AGENTWEAVE_* variants still take priority when set.

Key resolution order:
  1. AGENTWEAVE_ANTHROPIC_API_KEY (explicit proxy injection var)
  2. ANTHROPIC_API_KEY            (standard SDK var, now used as fallback)

Added TestEnvKeyFallback with 5 tests covering all three providers plus
precedence and /health reflection.
@arniesaha arniesaha changed the title fix(sdk-python): propagate session_id to auto-instrumented and tool spans fix: key_injection fallback + OTel links + session_id propagation May 2, 2026
@arniesaha
Copy link
Copy Markdown
Owner Author

Closes #174, Closes #178

The previous fix set process.env.AGENTWEAVE_PARENT_TRACE_ID/SPAN_ID on the
bridge process and read them via os.environ in the proxy — but the bridge
(JS, Backstage) and proxy (Python, k3s) are separate processes so the env
vars never crossed.  No links[] were ever produced.

Wire it through the existing /session POST channel that already pushes
session_id from bridge to proxy:

- Bridge: include parent_trace_id/parent_span_id in the /session payload
  alongside session_id.
- Proxy: extend POST /session to store (trace_id, span_id) keyed by
  session_id in a new LRU-bounded map; LLM handler looks them up at
  request time using the resolved session_id.  Request headers still take
  precedence for direct callers.
- Proxy: drop the os.environ.AGENTWEAVE_PARENT_*_ID fallback on the LLM
  handler — process-static env on a multi-tenant server would mis-link
  every concurrent request.
- Proxy: hoist `Link` import to module scope; use TraceFlags(0) on link
  contexts since links are independent of the parent's sampling decision.

Validated end-to-end against the deployed proxy on the NAS k3s cluster:
pushed (trace_id, span_id) via POST /session, sent an LLM request with a
matching session_id, and confirmed the resulting llm_call span in Tempo
carries links[0] with byte-exact matching trace/span IDs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arniesaha arniesaha merged commit b368426 into main May 2, 2026
5 checks passed
@arniesaha arniesaha deleted the feat/session-id-propagation branch May 2, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant