litellm/trace_logger: normalize tool definitions + input messages to OTel spec#230
Open
elronbandel wants to merge 4 commits into
Open
litellm/trace_logger: normalize tool definitions + input messages to OTel spec#230elronbandel wants to merge 4 commits into
elronbandel wants to merge 4 commits into
Conversation
Link to the results dataset, leaderboard space, and submission process so contributors can find where to submit evaluation results. Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
LiteLLM hands `_set_content_attributes` whatever the underlying provider
emitted — OpenAI native `{type:function, function:{...}}` for tool
definitions, Anthropic native `[{tool_use_id, type:tool_result, content:[...]}]`
for tool-result-bearing messages, and so on. The previous logger passed
those through verbatim, producing OTel spans that violate the GenAI
semantic conventions:
- `gen_ai.tool.definitions` from Anthropic providers shipped without
the spec-required top-level `type` field (Exgentic agent-llm-traces-v2
v2.11 had to retroactively fix 9.27M entries across 98k chat spans
for exactly this).
- `gen_ai.input.messages` shipped in raw OpenAI / Anthropic shapes
instead of OTel `[{role, parts: [...]}]`, causing vendor-specific
blocks (`tool_use`, `tool_result`, `thinking`) to leak through as
text — the Issue #14 anti-pattern.
Adds two pure normalizer helpers wired into `_set_content_attributes`:
_normalize_tool_definitions:
- OpenAI envelope → flat spec shape
- Anthropic native → adds `type:function`, renames `input_schema` → `parameters`
- Provider-extension `type` values (e.g. `computer_20241022`) preserved
_convert_input_messages_to_parts:
- role=user/assistant string content → text parts
- Anthropic blocks (text, tool_use, tool_result, thinking) → matching OTel parts
- tool_call_response.result always a JSON-stringified blocks list
(v2.10 contract — `json.loads(result)` always yields `[{type, ...}]`)
- OpenAI role=tool / tool_calls → tool_call_response / tool_call parts
Also: JSON-decode tool_calls[].function.arguments in the existing
output_messages builder, matching the v2.9 fix that retroactively
unwrapped `{"_value": <JSON-string>}` shapes on 74k entries in the dataset.
14 tests covering every conversion path.
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
…ive copies) Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LiteLLM hands the trace logger whatever the underlying provider emitted — OpenAI native `{type:function, function:{...}}` for tool definitions, Anthropic native `[{tool_use_id, type:tool_result, content:[...]}]` for tool-result-bearing messages, and so on. The current logger passes those through verbatim, producing OTel spans that violate the GenAI semantic conventions in two ways:
What's new
Two pure normalizer helpers wired into `_set_content_attributes`:
`_normalize_tool_definitions`
`_convert_input_messages_to_parts`
Also: JSON-decode `tool_calls[].function.arguments` in output messages
Matches the dataset v2.9 fix that retroactively unwrapped `{"_value": ""}` shapes on 74k entries — same root cause when LiteLLM serialized arguments as a JSON string and the consumer expected a dict.
Tests
14 new tests covering every conversion path:
Why this matters
The Exgentic team has spent the past several days retroactively patching `agent-llm-traces-v2` to fix spans this logger produced (commits v2.7 through v2.11 on the HF dataset). Without this PR, every new run would re-introduce the same bugs. The fixes here are paired with tracelab PR #17 which adds the corresponding `validate_genai` checks to catch any future drift.
Test plan