Skip to content

litellm/trace_logger: normalize tool definitions + input messages to OTel spec#230

Open
elronbandel wants to merge 4 commits into
mainfrom
elron/trace-logger-spec-conformance
Open

litellm/trace_logger: normalize tool definitions + input messages to OTel spec#230
elronbandel wants to merge 4 commits into
mainfrom
elron/trace-logger-spec-conformance

Conversation

@elronbandel

Copy link
Copy Markdown
Contributor

Summary

LiteLLM hands the trace logger whatever the underlying provider emitted — OpenAI native `{type:function, function:{...}}` for tool definitions, Anthropic native `[{tool_use_id, type:tool_result, content:[...]}]` for tool-result-bearing messages, and so on. The current logger passes those through verbatim, producing OTel spans that violate the GenAI semantic conventions in two ways:

Violation Where it surfaced
`gen_ai.tool.definitions` from Anthropic providers ships with no top-level `type` field (spec requires `["type", "name"]`) Exgentic `agent-llm-traces-v2` v2.11 retroactively fixed 9.27M entries across 98k chat spans
`gen_ai.input.messages` ships in raw OpenAI/Anthropic shapes (not OTel `[{role, parts: [...]}]`), causing vendor-specific blocks (`tool_use`, `tool_result`, `thinking`) to leak through as text The Issue #14 family — also fixed retroactively in the dataset

What's new

Two pure normalizer helpers wired into `_set_content_attributes`:

`_normalize_tool_definitions`

  • OpenAI envelope `{type, function:{name, description, parameters}}` → flat spec shape
  • Anthropic native `{name, description, input_schema}` (no `type`) → `{type:"function", name, description, parameters}` (renaming `input_schema` → `parameters` per spec)
  • Provider-extension `type` values (e.g. Anthropic's `computer_20241022`, OpenAI's `code_interpreter`/`file_search`) preserved as-is

`_convert_input_messages_to_parts`

  • role=user/assistant with string content → single text part
  • Anthropic blocks → matching OTel parts:
    • `text` → text part
    • `tool_use` → `tool_call` part
    • `tool_result` → `tool_call_response` part; `result` is always a JSON-stringified blocks list (the v2.10 contract, so `json.loads(result)` always yields `[{type, ...}]`)
    • `thinking` → `thinking` part (signature preserved)
  • OpenAI `role=tool` → `tool_call_response` part
  • OpenAI assistant `tool_calls` → `tool_call` parts (arguments JSON-decoded so downstream can read `part["arguments"]["command"]` directly)

Also: JSON-decode `tool_calls[].function.arguments` in output messages

Matches the dataset v2.9 fix that retroactively unwrapped `{"_value": ""}` shapes on 74k entries — same root cause when LiteLLM serialized arguments as a JSON string and the consumer expected a dict.

Tests

14 new tests covering every conversion path:

  • OpenAI envelope unwrap, Anthropic native type default + `input_schema` rename, idempotency on already-spec shape, provider-extension type preservation, non-list passthrough
  • Plain-string content, multi-block content preservation, single-string `tool_result` content wrap, OpenAI tool message, assistant tool_calls, Anthropic thinking with signatures

Why this matters

The Exgentic team has spent the past several days retroactively patching `agent-llm-traces-v2` to fix spans this logger produced (commits v2.7 through v2.11 on the HF dataset). Without this PR, every new run would re-introduce the same bugs. The fixes here are paired with tracelab PR #17 which adds the corresponding `validate_genai` checks to catch any future drift.

Test plan

  • CI green on the existing suite
  • `pytest tests/integrations/litellm/` passes
  • Run one bench end-to-end and `tracelab validate <spans.jsonl> --allow-prefix exgentic.` returns 0

Link to the results dataset, leaderboard space, and submission
process so contributors can find where to submit evaluation results.

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
LiteLLM hands `_set_content_attributes` whatever the underlying provider
emitted — OpenAI native `{type:function, function:{...}}` for tool
definitions, Anthropic native `[{tool_use_id, type:tool_result, content:[...]}]`
for tool-result-bearing messages, and so on. The previous logger passed
those through verbatim, producing OTel spans that violate the GenAI
semantic conventions:

  - `gen_ai.tool.definitions` from Anthropic providers shipped without
    the spec-required top-level `type` field (Exgentic agent-llm-traces-v2
    v2.11 had to retroactively fix 9.27M entries across 98k chat spans
    for exactly this).
  - `gen_ai.input.messages` shipped in raw OpenAI / Anthropic shapes
    instead of OTel `[{role, parts: [...]}]`, causing vendor-specific
    blocks (`tool_use`, `tool_result`, `thinking`) to leak through as
    text — the Issue #14 anti-pattern.

Adds two pure normalizer helpers wired into `_set_content_attributes`:

  _normalize_tool_definitions:
    - OpenAI envelope → flat spec shape
    - Anthropic native → adds `type:function`, renames `input_schema` → `parameters`
    - Provider-extension `type` values (e.g. `computer_20241022`) preserved

  _convert_input_messages_to_parts:
    - role=user/assistant string content → text parts
    - Anthropic blocks (text, tool_use, tool_result, thinking) → matching OTel parts
    - tool_call_response.result always a JSON-stringified blocks list
      (v2.10 contract — `json.loads(result)` always yields `[{type, ...}]`)
    - OpenAI role=tool / tool_calls → tool_call_response / tool_call parts

Also: JSON-decode tool_calls[].function.arguments in the existing
output_messages builder, matching the v2.9 fix that retroactively
unwrapped `{"_value": <JSON-string>}` shapes on 74k entries in the dataset.

14 tests covering every conversion path.

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
…ive copies)

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant