fix(invoke): encode request body as UTF-8 bytes to prevent Latin-1 corruption#194
Merged
Abhijeet Prasad (AbhiPrasad) merged 2 commits intomainfrom Apr 2, 2026
Conversation
…rruption
## Summary
`invoke()` passed the JSON body as a Python `str` to `requests.post(data=...)`,
which encodes `str` bodies as Latin-1 by default. Any non-Latin-1 character in
the payload (em dashes, smart quotes, etc.) either raised a `UnicodeEncodeError`
or was silently corrupted before the request was sent.
- Encode `bt_dumps()` output to UTF-8 bytes before passing to `data=`
- Add `Content-Type: application/json` header (was missing; only `Accept` was set)
`bt_dumps` is kept — it handles Pydantic models, dataclasses, and NaN/Inf values
that stdlib `json` cannot serialize. Other SDK paths (`logger.py`) already use
`.encode("utf-8")` correctly; this brings `invoke()` in line.
## Test plan
- [ ] Unit test: assert `data=` arg to `requests.post` is `bytes`, contains correct
UTF-8 encoding of em dash (`\xe2\x80\x94`), and `Content-Type` header is set
- [ ] Manually run `invoke()` with Unicode input (`"result \u2014 excellent"`) and
confirm no `UnicodeEncodeError` and payload reaches the API intact
Fixes BT-4620
Member
Abhijeet Prasad (AbhiPrasad)
left a comment
There was a problem hiding this comment.
can we add a VCR test instead of mocking if possibe? If it's too annoying no worries.
Abhijeet Prasad (AbhiPrasad)
approved these changes
Apr 1, 2026
Member
Abhijeet Prasad (AbhiPrasad)
left a comment
There was a problem hiding this comment.
Thanks ekeith (@evanmkeith)!!
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
invoke()passed the JSON body as a Pythonstrtorequests.post(data=...),which encodes
strbodies as Latin-1 by default. Any non-Latin-1 character inthe payload (em dashes, smart quotes, etc.) either raised a
UnicodeEncodeErroror was silently corrupted before the request was sent.
bt_dumps()output to UTF-8 bytes before passing todata=Content-Type: application/jsonheader (was missing; onlyAcceptwas set)bt_dumpsis kept — it handles Pydantic models, dataclasses, and NaN/Inf valuesthat stdlib
jsoncannot serialize. Other SDK paths (logger.py) already use.encode("utf-8")correctly; this bringsinvoke()in line.Test plan
data=arg torequests.postisbytes, contains correct UTF-8 encoding of em dash (\xe2\x80\x94), andContent-Typeheader is setinvoke()with Unicode input ("result \u2014 excellent") and confirm noUnicodeEncodeErrorand payload reaches the API intactFixes BT-4620