fix: preserve non-ASCII text in provider request formatting#2776
Open
kimnamu wants to merge 1 commit into
Open
fix: preserve non-ASCII text in provider request formatting#2776kimnamu wants to merge 1 commit into
kimnamu wants to merge 1 commit into
Conversation
Provider request-formatting paths serialize tool-call arguments and tool-result json content blocks with json.dumps(), which defaults to ensure_ascii=True and escapes non-ASCII (CJK, emoji) to \uXXXX in the request sent to the model. This inflates token usage for non-Latin scripts and hurts readability/debuggability. This is the provider-side counterpart to the @tool decorator path fixed in strands-agents#2653, and matches what the SDK already does in telemetry/tracer.py and the session managers (ensure_ascii=False). Fixes strands-agents#2660
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thank you to the Strands maintainers for the excellent SDK — and for the recent #2653 which fixed the
@tooldecorator path. This PR addresses the provider-side counterpart of the same issue.Fixes #2660
Problem
Provider request-formatting paths serialize tool-call arguments and tool-result
{"json": ...}content blocks withjson.dumps(), which defaults toensure_ascii=True. Non-ASCII text (CJK, emoji) is therefore escaped to\uXXXXin the request sent to the model. For non-Latin scripts this inflates token usage (and cost) and hurts readability/debuggability.This is the provider-side counterpart to the
@tooldecorator path fixed in #2653, and it matches what the SDK already does intelemetry/tracer.pyand the session managers (ensure_ascii=False).Reproduction (no network / API key needed)
{"json": {"city": "東京"}}{"city": "東京"}❌{"city": "東京"}✅{"query": "東京"}{"query": "東京"}❌{"query": "東京"}✅\uXXXX-escaped ❌bedrock.py,ollama.py,gemini.pystream→internal)Change
json.dumps(x)→json.dumps(x, ensure_ascii=False)in the model-visibleformat_request_*paths only, across the affected providers (openai, anthropic, mistral, llamaapi, writer, llamacpp, ollama, openai_responses). Response→internal conversion paths are intentionally left untouched.Tests
Added regression tests on the issue's primary paths (
test_openai.py,test_anthropic.py). Verified they catch the bug:assert '\u' not in '{"city": "東京"}')ruff check/ruff format --check/mypy(changed files) all clean.This contribution was prepared with the help of an AI agent (Claude Code); a human reviewed the change, rationale, and test results before submission.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.