Summary
I'm proposing a vendor-neutral, OpenTelemetry-compatible semantic convention for LLM observability metrics and would love to understand whether Google ADK's telemetry layer would support emitting data conformant with this convention.
The Problem
Every LLM agent framework and observability tool today uses its own metric names and attribute schemas. Teams building multi-provider or multi-framework LLM apps — including those built on ADK — have to adapt their instrumentation for every backend change. There's no shared language at the metric layer.
The Proposal
A canonical gen_ai.* metric schema built on top of the OpenTelemetry GenAI semantic conventions that standardizes:
Required core metrics (MUST be emitted):
| Metric |
Instrument |
Unit |
Description |
gen_ai.client.operation.duration |
Histogram |
s |
End-to-end latency |
gen_ai.client.time_to_first_token |
Histogram |
s |
Streaming TTFT |
gen_ai.usage.input_tokens |
Counter |
{token} |
Prompt tokens |
gen_ai.usage.output_tokens |
Counter |
{token} |
Completion tokens |
gen_ai.usage.cost |
Counter |
usd |
Estimated cost |
gen_ai.client.error_rate |
Gauge |
1 |
Error ratio |
gen_ai.client.retry_count |
Counter |
{request} |
Retries |
gen_ai.client.rate_limit.events |
Counter |
{event} |
HTTP 429s |
Required span attributes: gen_ai.system, gen_ai.request.model, gen_ai.operation.name
Agent-specific extension pack (gen_ai.agent.*):
| Metric |
Instrument |
Description |
gen_ai.agent.duration |
Histogram |
End-to-end agent execution time |
gen_ai.agent.steps |
Histogram |
Number of reasoning/tool steps per run |
gen_ai.agent.tool_calls |
Counter |
Tool invocations per agent run |
gen_ai.agent.handoffs |
Counter |
Agent-to-agent handoff events |
Why This Matters for ADK Specifically
ADK is one of the most capable multi-agent frameworks available, and it already has a telemetry layer built on OpenTelemetry. If ADK emitted gen_ai.* canonical metrics natively, any observability backend (Langfuse, Arize, Grafana, Datadog, GCP Cloud Monitoring) could consume ADK agent traces and metrics without a custom adapter — dramatically lowering the operational burden for teams building production agents with ADK.
This is also directly aligned with the OTel GenAI SIG's direction: an OTel maintainer (@trask) has already engaged on this proposal and pointed to https://github.com/open-telemetry/semantic-conventions-genai as the right upstream venue for standardization.
Specific Questions for the ADK Team
- Does ADK's current telemetry layer already emit any
gen_ai.*-namespaced attributes or metrics? If so, which ones?
- Are the proposed metric names and instrument types (
Histogram for latency/s, Counter with {token} for tokens) compatible with how ADK's OpenTelemetry integration currently works?
- Would the ADK team be open to adopting these canonical names in ADK's OTEL export layer once the spec matures through the OTel GenAI SIG?
- For multi-agent scenarios (agent handoffs, subagent calls), does the proposed
gen_ai.agent.* extension pack cover the signals ADK currently captures?
Links
Happy to answer questions or adjust the RFC based on ADK's telemetry architecture. The goal is for ADK-instrumented agents to be observable on any OTEL-compatible backend without any custom adapter code.
Summary
I'm proposing a vendor-neutral, OpenTelemetry-compatible semantic convention for LLM observability metrics and would love to understand whether Google ADK's telemetry layer would support emitting data conformant with this convention.
The Problem
Every LLM agent framework and observability tool today uses its own metric names and attribute schemas. Teams building multi-provider or multi-framework LLM apps — including those built on ADK — have to adapt their instrumentation for every backend change. There's no shared language at the metric layer.
The Proposal
A canonical
gen_ai.*metric schema built on top of the OpenTelemetry GenAI semantic conventions that standardizes:Required core metrics (MUST be emitted):
gen_ai.client.operation.durationsgen_ai.client.time_to_first_tokensgen_ai.usage.input_tokens{token}gen_ai.usage.output_tokens{token}gen_ai.usage.costusdgen_ai.client.error_rate1gen_ai.client.retry_count{request}gen_ai.client.rate_limit.events{event}Required span attributes:
gen_ai.system,gen_ai.request.model,gen_ai.operation.nameAgent-specific extension pack (
gen_ai.agent.*):gen_ai.agent.durationgen_ai.agent.stepsgen_ai.agent.tool_callsgen_ai.agent.handoffsWhy This Matters for ADK Specifically
ADK is one of the most capable multi-agent frameworks available, and it already has a telemetry layer built on OpenTelemetry. If ADK emitted
gen_ai.*canonical metrics natively, any observability backend (Langfuse, Arize, Grafana, Datadog, GCP Cloud Monitoring) could consume ADK agent traces and metrics without a custom adapter — dramatically lowering the operational burden for teams building production agents with ADK.This is also directly aligned with the OTel GenAI SIG's direction: an OTel maintainer (@trask) has already engaged on this proposal and pointed to
https://github.com/open-telemetry/semantic-conventions-genaias the right upstream venue for standardization.Specific Questions for the ADK Team
gen_ai.*-namespaced attributes or metrics? If so, which ones?Histogramfor latency/s,Counterwith{token}for tokens) compatible with how ADK's OpenTelemetry integration currently works?gen_ai.agent.*extension pack cover the signals ADK currently captures?Links
Happy to answer questions or adjust the RFC based on ADK's telemetry architecture. The goal is for ADK-instrumented agents to be observable on any OTEL-compatible backend without any custom adapter code.