fix: key_injection fallback + OTel links + session_id propagation#182
Merged
fix: key_injection fallback + OTel links + session_id propagation#182
Conversation
…pans ~40% of lakehouse.spans rows had NULL session_id because auto_instrument() LLM wrappers and @trace_tool spans never read any session context. Only @trace_agent(session_id=...) and the proxy server were stamping it. Introduces agentweave.context with a ContextVar-based session_id source of truth. Resolution order: explicit @trace_agent kwarg > ContextVar (set_session_id / session_scope / @trace_agent body) > AGENTWEAVE_SESSION_ID env var > None. Auto-instrumented LLM wrappers (Anthropic / OpenAI / Google) and @trace_tool now stamp session.id and prov.session.id when any source is set. @trace_agent wraps its body in session_scope so nested LLM/tool spans inherit the active session. Failure mode is unchanged when no source is set — span emits without session.id, no exception. A debug-gated (AGENTWEAVE_DEBUG=1) one-shot warning surfaces missing session_id during development. Tracks nexus#29. Phase 2 (JS SDK) deferred — same spec covers it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…loses #174) When AGENTWEAVE_ANTHROPIC_API_KEY / AGENTWEAVE_GOOGLE_API_KEY / AGENTWEAVE_OPENAI_API_KEY are empty or unset (i.e. key_injection=false on /health), the proxy now falls back to the standard SDK env vars (ANTHROPIC_API_KEY, GOOGLE_API_KEY, OPENAI_API_KEY). This fixes sub-agent LLM requests failing with auth errors when the k8s Secret only populates the standard SDK vars. The AGENTWEAVE_* variants still take priority when set. Key resolution order: 1. AGENTWEAVE_ANTHROPIC_API_KEY (explicit proxy injection var) 2. ANTHROPIC_API_KEY (standard SDK var, now used as fallback) Added TestEnvKeyFallback with 5 tests covering all three providers plus precedence and /health reflection.
…gent_turn spans (closes #178)
Owner
Author
The previous fix set process.env.AGENTWEAVE_PARENT_TRACE_ID/SPAN_ID on the bridge process and read them via os.environ in the proxy — but the bridge (JS, Backstage) and proxy (Python, k3s) are separate processes so the env vars never crossed. No links[] were ever produced. Wire it through the existing /session POST channel that already pushes session_id from bridge to proxy: - Bridge: include parent_trace_id/parent_span_id in the /session payload alongside session_id. - Proxy: extend POST /session to store (trace_id, span_id) keyed by session_id in a new LRU-bounded map; LLM handler looks them up at request time using the resolved session_id. Request headers still take precedence for direct callers. - Proxy: drop the os.environ.AGENTWEAVE_PARENT_*_ID fallback on the LLM handler — process-static env on a multi-tenant server would mis-link every concurrent request. - Proxy: hoist `Link` import to module scope; use TraceFlags(0) on link contexts since links are independent of the parent's sampling decision. Validated end-to-end against the deployed proxy on the NAS k3s cluster: pushed (trace_id, span_id) via POST /session, sent an LLM request with a matching session_id, and confirmed the resulting llm_call span in Tempo carries links[0] with byte-exact matching trace/span IDs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three fixes on
feat/session-id-propagation:#174 — key_injection fallback (fixes sub-agent auth)
Proxy had API keys in
AGENTWEAVE_*env vars butkey_injection=falsemeant it wouldn't use them. Sub-agents routing through proxy had no keys → all LLM requests failed.Fix: When
AGENTWEAVE_ANTHROPIC_API_KEY/AGENTWEAVE_OPENAI_API_KEY/AGENTWEAVE_GOOGLE_API_KEYare empty, fall back to standard SDK env vars (ANTHROPIC_API_KEY,OPENAI_API_KEY,GOOGLE_API_KEY). Resolution: explicitAGENTWEAVE_*var > standard SDK var.#178 — OTel links[] for cross-process span lineage
610/627
llm_callspans show as root-of-trace because openclaw HTTP client doesn't propagate W3C traceparent as per-request header.Fix: Bridge sets
x-agentweave-parent-trace-id+x-agentweave-parent-span-idheaders on outbound LLM requests. Proxy reads these and creates OTelLink[]fromllm_callspans to parentagent_turnspans.#176 follow-up — session_id propagation (already in this branch)
Adds
agentweave.contextwith ContextVar-based session scope.auto_instrument()and@trace_toolnow stampsession.id/prov.session.idfrom active context.Files changed
sdk/python/agentweave/proxy.py— key_injection fallback + OTel Link creationplugins/openclaw-agentweave-bridge/src/service.ts— parent span/trace ID header injectionsdk/python/agentweave/context.py— session_id ContextVarsdk/python/tests/test_proxy.py— new tests for both fixes