litellm/trace_logger: normalize tool definitions + input messages to OTel spec by elronbandel · Pull Request #230 · Exgentic/exgentic

elronbandel · 2026-06-01T08:19:39Z

Summary

LiteLLM hands the trace logger whatever the underlying provider emitted — OpenAI native `{type:function, function:{...}}` for tool definitions, Anthropic native `[{tool_use_id, type:tool_result, content:[...]}]` for tool-result-bearing messages, and so on. The current logger passes those through verbatim, producing OTel spans that violate the GenAI semantic conventions in two ways:

Violation	Where it surfaced
`gen_ai.tool.definitions` from Anthropic providers ships with no top-level `type` field (spec requires `["type", "name"]`)	Exgentic `agent-llm-traces-v2` v2.11 retroactively fixed 9.27M entries across 98k chat spans
`gen_ai.input.messages` ships in raw OpenAI/Anthropic shapes (not OTel `[{role, parts: [...]}]`), causing vendor-specific blocks (`tool_use`, `tool_result`, `thinking`) to leak through as text	The Issue #14 family — also fixed retroactively in the dataset

What's new

Two pure normalizer helpers wired into `_set_content_attributes`:

`_normalize_tool_definitions`

OpenAI envelope `{type, function:{name, description, parameters}}` → flat spec shape
Anthropic native `{name, description, input_schema}` (no `type`) → `{type:"function", name, description, parameters}` (renaming `input_schema` → `parameters` per spec)
Provider-extension `type` values (e.g. Anthropic's `computer_20241022`, OpenAI's `code_interpreter`/`file_search`) preserved as-is

`_convert_input_messages_to_parts`

role=user/assistant with string content → single text part
Anthropic blocks → matching OTel parts:
- `text` → text part
- `tool_use` → `tool_call` part
- `tool_result` → `tool_call_response` part; `result` is always a JSON-stringified blocks list (the v2.10 contract, so `json.loads(result)` always yields `[{type, ...}]`)
- `thinking` → `thinking` part (signature preserved)
OpenAI `role=tool` → `tool_call_response` part
OpenAI assistant `tool_calls` → `tool_call` parts (arguments JSON-decoded so downstream can read `part["arguments"]["command"]` directly)

Also: JSON-decode `tool_calls[].function.arguments` in output messages

Matches the dataset v2.9 fix that retroactively unwrapped `{"_value": ""}` shapes on 74k entries — same root cause when LiteLLM serialized arguments as a JSON string and the consumer expected a dict.

Tests

14 new tests covering every conversion path:

OpenAI envelope unwrap, Anthropic native type default + `input_schema` rename, idempotency on already-spec shape, provider-extension type preservation, non-list passthrough
Plain-string content, multi-block content preservation, single-string `tool_result` content wrap, OpenAI tool message, assistant tool_calls, Anthropic thinking with signatures

Why this matters

The Exgentic team has spent the past several days retroactively patching `agent-llm-traces-v2` to fix spans this logger produced (commits v2.7 through v2.11 on the HF dataset). Without this PR, every new run would re-introduce the same bugs. The fixes here are paired with tracelab PR #17 which adds the corresponding `validate_genai` checks to catch any future drift.

Test plan

CI green on the existing suite
`pytest tests/integrations/litellm/` passes
Run one bench end-to-end and `tracelab validate <spans.jsonl> --allow-prefix exgentic.` returns 0

Link to the results dataset, leaderboard space, and submission process so contributors can find where to submit evaluation results. Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

LiteLLM hands `_set_content_attributes` whatever the underlying provider emitted — OpenAI native `{type:function, function:{...}}` for tool definitions, Anthropic native `[{tool_use_id, type:tool_result, content:[...]}]` for tool-result-bearing messages, and so on. The previous logger passed those through verbatim, producing OTel spans that violate the GenAI semantic conventions: - `gen_ai.tool.definitions` from Anthropic providers shipped without the spec-required top-level `type` field (Exgentic agent-llm-traces-v2 v2.11 had to retroactively fix 9.27M entries across 98k chat spans for exactly this). - `gen_ai.input.messages` shipped in raw OpenAI / Anthropic shapes instead of OTel `[{role, parts: [...]}]`, causing vendor-specific blocks (`tool_use`, `tool_result`, `thinking`) to leak through as text — the Issue #14 anti-pattern. Adds two pure normalizer helpers wired into `_set_content_attributes`: _normalize_tool_definitions: - OpenAI envelope → flat spec shape - Anthropic native → adds `type:function`, renames `input_schema` → `parameters` - Provider-extension `type` values (e.g. `computer_20241022`) preserved _convert_input_messages_to_parts: - role=user/assistant string content → text parts - Anthropic blocks (text, tool_use, tool_result, thinking) → matching OTel parts - tool_call_response.result always a JSON-stringified blocks list (v2.10 contract — `json.loads(result)` always yields `[{type, ...}]`) - OpenAI role=tool / tool_calls → tool_call_response / tool_call parts Also: JSON-decode tool_calls[].function.arguments in the existing output_messages builder, matching the v2.9 fix that retroactively unwrapped `{"_value": <JSON-string>}` shapes on 74k entries in the dataset. 14 tests covering every conversion path. Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

…ive copies) Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

elronbandel added 4 commits May 18, 2026 15:55

Add HuggingFace resources and submission link to README

51b167a

Link to the results dataset, leaderboard space, and submission process so contributors can find where to submit evaluation results. Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

trace_logger: tighten normalizers (block dispatch helper, drop defens…

4c725f1

…ive copies) Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

trace_logger: ruff fixups (no semicolons / inline-if-colon)

d94becc

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

litellm/trace_logger: normalize tool definitions + input messages to OTel spec#230

litellm/trace_logger: normalize tool definitions + input messages to OTel spec#230
elronbandel wants to merge 4 commits into
mainfrom
elron/trace-logger-spec-conformance

elronbandel commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

elronbandel commented Jun 1, 2026

Summary

What's new

`_normalize_tool_definitions`

`_convert_input_messages_to_parts`

Also: JSON-decode `tool_calls[].function.arguments` in output messages

Tests

Why this matters

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant