Skip to content

[BOT ISSUE] Anthropic integration does not capture time_to_first_token metric #53

@braintrust-bot

Description

@braintrust-bot

Summary

The Anthropic (anthropic) integration does not capture the time_to_first_token metric in either streaming or non-streaming paths. Both OpenAI tracers in this same SDK (chat completions and responses) capture this metric consistently.

What is missing

1. No startTime field on messagesTracer

The messagesTracer struct in trace/contrib/anthropic/messages.go (lines 20–24) has no startTime field:

type messagesTracer struct {
    cfg       *middlewareConfig
    streaming bool
    metadata  map[string]any
}

Compare with the OpenAI chatCompletionsTracer in trace/contrib/openai/chatcompletions.go (lines 20–25):

type chatCompletionsTracer struct {
    cfg       *middlewareConfig
    streaming bool
    metadata  map[string]any
    startTime time.Time  // <-- missing from Anthropic
}

2. Streaming path does not track first chunk arrival

In the Anthropic streaming handler parseStreamingResponse (messages.go lines 115–189), there is no tracking of when the first SSE data chunk arrives. The OpenAI streaming handler (chatcompletions.go lines 109–166) captures this:

if timeToFirstToken == 0 {
    timeToFirstToken = time.Since(ct.startTime)
}

and writes it to metrics:

metrics["time_to_first_token"] = timeToFirstToken.Seconds()

The Anthropic handler has no equivalent logic.

3. Non-streaming path also missing

The OpenAI non-streaming handler records TTFT as full response latency (chatcompletions.go line 277), providing a consistent metric across modes. The Anthropic non-streaming handler (parseResponse / handleMessageResponse, lines 314–369) does not record any timing metric.

4. OpenAI has test coverage for TTFT; Anthropic does not

The OpenAI integration has explicit test assertions for time_to_first_token (traceopenai_test.go lines 247–252, 344; chatcompletions_test.go line 1092). The Anthropic test suite has no TTFT assertions.

Impact

time_to_first_token is a key latency metric for LLM observability, especially for streaming use cases. Users tracing Anthropic calls through Braintrust see token counts and content but not TTFT, while equivalent OpenAI traces include it. This creates an inconsistent observability experience across providers.

Braintrust docs status

Braintrust docs state the Anthropic integration provides "metric collection (including cached tokens)" during streaming. TTFT is not explicitly mentioned for any provider. The Braintrust observability docs mention "Token counts, latency, and cost" as viewable metrics. Status: unclear (latency metrics are mentioned generically but TTFT is not called out by name).

Upstream sources

  • Anthropic streaming API docs: https://docs.anthropic.com/en/api/messages-streaming
  • Anthropic streaming uses SSE with event types message_start, content_block_start, content_block_delta, message_delta, etc.
  • The first content_block_delta event marks the arrival of the first generated token, making TTFT measurable from the same SSE stream already being parsed

Braintrust docs sources

Local repo files inspected

  • trace/contrib/anthropic/messages.gomessagesTracer struct (no startTime), parseStreamingResponse (no TTFT), handleMessageResponse (no TTFT)
  • trace/contrib/openai/chatcompletions.go — reference implementation with startTime and time_to_first_token in both paths
  • trace/contrib/openai/responses.go — reference implementation with startTime and time_to_first_token in both paths
  • trace/contrib/openai/traceopenai_test.go — TTFT test assertions (lines 247–252, 344)
  • trace/contrib/openai/chatcompletions_test.go — TTFT test assertion (line 1092)
  • trace/contrib/anthropic/traceanthropic_test.go — no TTFT assertions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions