-
Notifications
You must be signed in to change notification settings - Fork 2
[BOT ISSUE] Anthropic integration does not capture time_to_first_token metric #53
Description
Summary
The Anthropic (anthropic) integration does not capture the time_to_first_token metric in either streaming or non-streaming paths. Both OpenAI tracers in this same SDK (chat completions and responses) capture this metric consistently.
What is missing
1. No startTime field on messagesTracer
The messagesTracer struct in trace/contrib/anthropic/messages.go (lines 20–24) has no startTime field:
type messagesTracer struct {
cfg *middlewareConfig
streaming bool
metadata map[string]any
}Compare with the OpenAI chatCompletionsTracer in trace/contrib/openai/chatcompletions.go (lines 20–25):
type chatCompletionsTracer struct {
cfg *middlewareConfig
streaming bool
metadata map[string]any
startTime time.Time // <-- missing from Anthropic
}2. Streaming path does not track first chunk arrival
In the Anthropic streaming handler parseStreamingResponse (messages.go lines 115–189), there is no tracking of when the first SSE data chunk arrives. The OpenAI streaming handler (chatcompletions.go lines 109–166) captures this:
if timeToFirstToken == 0 {
timeToFirstToken = time.Since(ct.startTime)
}and writes it to metrics:
metrics["time_to_first_token"] = timeToFirstToken.Seconds()The Anthropic handler has no equivalent logic.
3. Non-streaming path also missing
The OpenAI non-streaming handler records TTFT as full response latency (chatcompletions.go line 277), providing a consistent metric across modes. The Anthropic non-streaming handler (parseResponse / handleMessageResponse, lines 314–369) does not record any timing metric.
4. OpenAI has test coverage for TTFT; Anthropic does not
The OpenAI integration has explicit test assertions for time_to_first_token (traceopenai_test.go lines 247–252, 344; chatcompletions_test.go line 1092). The Anthropic test suite has no TTFT assertions.
Impact
time_to_first_token is a key latency metric for LLM observability, especially for streaming use cases. Users tracing Anthropic calls through Braintrust see token counts and content but not TTFT, while equivalent OpenAI traces include it. This creates an inconsistent observability experience across providers.
Braintrust docs status
Braintrust docs state the Anthropic integration provides "metric collection (including cached tokens)" during streaming. TTFT is not explicitly mentioned for any provider. The Braintrust observability docs mention "Token counts, latency, and cost" as viewable metrics. Status: unclear (latency metrics are mentioned generically but TTFT is not called out by name).
Upstream sources
- Anthropic streaming API docs: https://docs.anthropic.com/en/api/messages-streaming
- Anthropic streaming uses SSE with event types
message_start,content_block_start,content_block_delta,message_delta, etc. - The first
content_block_deltaevent marks the arrival of the first generated token, making TTFT measurable from the same SSE stream already being parsed
Braintrust docs sources
- https://www.braintrust.dev/docs/integrations/ai-providers/anthropic (mentions streaming metric collection)
- https://www.braintrust.dev/docs/observability (mentions "latency" as a viewable metric)
Local repo files inspected
trace/contrib/anthropic/messages.go—messagesTracerstruct (nostartTime),parseStreamingResponse(no TTFT),handleMessageResponse(no TTFT)trace/contrib/openai/chatcompletions.go— reference implementation withstartTimeandtime_to_first_tokenin both pathstrace/contrib/openai/responses.go— reference implementation withstartTimeandtime_to_first_tokenin both pathstrace/contrib/openai/traceopenai_test.go— TTFT test assertions (lines 247–252, 344)trace/contrib/openai/chatcompletions_test.go— TTFT test assertion (line 1092)trace/contrib/anthropic/traceanthropic_test.go— no TTFT assertions