Skip to content

fix(invoke): encode request body as UTF-8 bytes to prevent Latin-1 corruption#194

Merged
Abhijeet Prasad (AbhiPrasad) merged 2 commits intomainfrom
04-01-fix(invoke)-encode-request-body-as-UTF-8-bytes
Apr 2, 2026
Merged

fix(invoke): encode request body as UTF-8 bytes to prevent Latin-1 corruption#194
Abhijeet Prasad (AbhiPrasad) merged 2 commits intomainfrom
04-01-fix(invoke)-encode-request-body-as-UTF-8-bytes

Conversation

@evanmkeith
Copy link
Copy Markdown
Contributor

Summary

invoke() passed the JSON body as a Python str to requests.post(data=...),
which encodes str bodies as Latin-1 by default. Any non-Latin-1 character in
the payload (em dashes, smart quotes, etc.) either raised a UnicodeEncodeError
or was silently corrupted before the request was sent.

  • Encode bt_dumps() output to UTF-8 bytes before passing to data=
  • Add Content-Type: application/json header (was missing; only Accept was set)

bt_dumps is kept — it handles Pydantic models, dataclasses, and NaN/Inf values
that stdlib json cannot serialize. Other SDK paths (logger.py) already use
.encode("utf-8") correctly; this brings invoke() in line.

Test plan

  • Unit test: assert data= arg to requests.post is bytes, contains correct UTF-8 encoding of em dash (\xe2\x80\x94), and Content-Type header is set
  • Manually run invoke() with Unicode input ("result \u2014 excellent") and confirm no UnicodeEncodeError and payload reaches the API intact

Fixes BT-4620

…rruption

## Summary

  `invoke()` passed the JSON body as a Python `str` to `requests.post(data=...)`,
  which encodes `str` bodies as Latin-1 by default. Any non-Latin-1 character in
  the payload (em dashes, smart quotes, etc.) either raised a `UnicodeEncodeError`
  or was silently corrupted before the request was sent.

  - Encode `bt_dumps()` output to UTF-8 bytes before passing to `data=`
  - Add `Content-Type: application/json` header (was missing; only `Accept` was set)

  `bt_dumps` is kept — it handles Pydantic models, dataclasses, and NaN/Inf values
  that stdlib `json` cannot serialize. Other SDK paths (`logger.py`) already use
  `.encode("utf-8")` correctly; this brings `invoke()` in line.

  ## Test plan

  - [ ] Unit test: assert `data=` arg to `requests.post` is `bytes`, contains correct
        UTF-8 encoding of em dash (`\xe2\x80\x94`), and `Content-Type` header is set
  - [ ] Manually run `invoke()` with Unicode input (`"result \u2014 excellent"`) and
        confirm no `UnicodeEncodeError` and payload reaches the API intact

  Fixes BT-4620
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a VCR test instead of mocking if possibe? If it's too annoying no worries.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AbhiPrasad Abhijeet Prasad (AbhiPrasad) merged commit 19ecb8a into main Apr 2, 2026
29 checks passed
@AbhiPrasad Abhijeet Prasad (AbhiPrasad) deleted the 04-01-fix(invoke)-encode-request-body-as-UTF-8-bytes branch April 2, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants