Skip to content

CodeInterpreter.upload_file / download_file double-base64 binary content (boto3 already handles blob shapes) #458

@kevin-riste

Description

@kevin-riste

Describe the bug

CodeInterpreter.upload_file / upload_files and download_file / download_files are symmetrically broken: each side does a redundant base64.b64encode / base64.b64decode on top of botocore's existing blob-shape handling, so binary content is corrupted on both upload and download.

The two halves of the bug:

  • Upload: upload_file(path, content=<bytes>) (and upload_files) pre-base64-encodes bytes content into a str before passing it as the blob field to invoke("writeFiles", ...). InputContentBlock.blob is a Body blob shape in the AgentCore service model, which botocore's JSONSerializer._serialize_type_blob then base64-encodes a second time for the JSON wire format. The server decodes once; the file lands on disk as the SDK's pre-encoded base64 ASCII rather than the original bytes.
  • Download: download_file(path) (and download_files) calls base64.b64decode(resource["blob"]) on the response, but botocore has already base64-decoded that field (ResourceContent.blob is a Blob blob shape). The SDK is decoding already-raw bytes a second time, so the call either raises binascii.Error: Incorrect padding (when the raw bytes don't form valid base64) or returns nonsense (when they coincidentally do).

Both bugs are present on main HEAD and unchanged across v1.1.5 and v1.8.0. PR #257 was an incomplete fix for the download side; see Additional context.

The buggy lines (v1.8.0):

Method Source Problem
upload_file L516–517 {"blob": base64.b64encode(content).decode("utf-8")} — passes a pre-encoded str to a blob shape
upload_files L566–567 same pattern
download_file L652 base64.b64decode(resource["blob"]) on already-decoded bytes
download_files L693 same pattern

The single-line fix on each side:

-        file_content = {"path": path, "blob": base64.b64encode(content).decode("utf-8")}
+        file_content = {"path": path, "blob": content}
 elif "blob" in resource:
-    raw = base64.b64decode(resource["blob"])
-    try:
-        return raw.decode("utf-8")
-    except (UnicodeDecodeError, ValueError):
-        return raw
+    blob = resource["blob"]  # already bytes from botocore's blob deserializer
+    try:
+        return blob.decode("utf-8")
+    except (UnicodeDecodeError, ValueError):
+        return blob

Same change on the _files plural variants. The text branches (InputContentBlock.text and ResourceContent.text are both String shapes) are unaffected.

To Reproduce

import boto3
from bedrock_agentcore.tools import code_session

# 67 bytes, deliberately chosen so the base64 encoding is length 92
# (== ceil(67/3) * 4). Starting with PNG magic for recognisability.
PAYLOAD = b"\x89PNG\r\n\x1a\n" + bytes(range(59))
assert len(PAYLOAD) == 67

session = boto3.Session(profile_name="<your-profile>")
with code_session(region="us-east-1", session=session) as client:
    # --- Upload side ---
    # Buggy: upload_file with bytes content
    client.upload_file(path="probe_buggy.png", content=PAYLOAD)
    # Correct (workaround): invoke("writeFiles", ...) with raw bytes
    client.invoke("writeFiles", {"content": [{"path": "probe_correct.png", "blob": PAYLOAD}]})

    # Inspect both files on disk:
    inspect = client.invoke("executeCode", {
        "language": "python",
        "code": (
            "import os\n"
            "for path in ('probe_buggy.png', 'probe_correct.png'):\n"
            "    size = os.path.getsize(path)\n"
            "    with open(path, 'rb') as f:\n"
            "        head = f.read(16)\n"
            "    print(f'{path}: size={size}, head={head!r}')\n"
        ),
    })
    for event in inspect["stream"]:
        for item in event.get("result", {}).get("content", []):
            if "text" in item:
                print(item["text"])

    # --- Download side ---
    # Inspect what botocore hands the SDK before any SDK-side decoding:
    raw = client.invoke("readFiles", {"paths": ["probe_correct.png"]})
    for event in raw["stream"]:
        for content_item in event.get("result", {}).get("content", []):
            if content_item.get("type") == "resource":
                blob = content_item.get("resource", {}).get("blob")
                if blob is not None:
                    print(f"raw resource['blob']: type={type(blob).__name__}, len={len(blob)}, head={blob[:24]!r}")

    # Buggy: download_file decodes again
    try:
        out = client.download_file("probe_correct.png")
        print(f"download_file result: type={type(out).__name__}, len={len(out)}, head={out[:24]!r}")
    except Exception as exc:
        print(f"download_file raised: {type(exc).__name__}: {exc}")

Expected behavior

Both files round-trip cleanly: the upload writes 67 raw bytes to disk, and download_file returns those same 67 bytes (or a UTF-8-decoded str for text content). download_file should never raise on valid binary data.

Actual behavior (literal terminal output, run against code_session on us-east-1)

=== versions ===
bedrock-agentcore: 1.1.5    # lines L516, L566, L652, L693 are unchanged in v1.8.0 and on main HEAD
boto3: 1.42.94
botocore: 1.42.94

=== buggy path: upload_file(content=<bytes>) ===
size=92
head=b'iVBORw0KGgoAAQID'

=== correct path: invoke('writeFiles', ...) with raw bytes ===
size=67
head=b'\x89PNG\r\n\x1a\n\x00\x01\x02\x03\x04\x05\x06\x07'

=== response-side: what does resource['blob'] look like out of botocore? ===
type(resource['blob']) = bytes
len(resource['blob']) = 67
resource['blob'][:24] = b'\x89PNG\r\n\x1a\n\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'

=== via SDK download_file (what does the user actually get back?) ===
raised: Error: Incorrect padding

probe_buggy.png lands as 92 bytes of ASCII starting iVBORw0KGgo... — exactly the base64 encoding of the original 67 bytes (92 == ceil(67/3) * 4). The correctly-written probe_correct.png is 67 raw bytes on disk. On the read side, botocore hands the SDK the already-decoded raw bytes (type=bytes, len=67), confirming the blob deserializer fired correctly. download_file then attempts a second base64.b64decode on those raw bytes, which fails with Incorrect padding because the PNG payload isn't valid base64. (When the raw bytes happen to form valid base64 — e.g., some text payloads — the SDK silently returns a different, garbage value instead of raising. That case is harder to spot; PR #257's UTF-8 try/except masks it as garbage rather than an error, but the data is still wrong.)

Root cause

The bedrock-agentcore service model (botocore's bundled botocore/data/bedrock-agentcore/2024-02-28/service-2.json) declares both blob fields explicitly:

"InputContentBlock": {
  "type": "structure",
  "members": {
    "path": {"shape": "MaxLenString"},
    "text": {"shape": "MaxLenString"},
    "blob": {"shape": "Body"}
  }
}

"Body": {"type": "blob", "max": 100000000, "min": 0, "sensitive": true}

"ResourceContent": {
  "type": "structure",
  "members": {
    "type": {"shape": "ResourceContentType"},
    "uri": {"shape": "String"},
    "mimeType": {"shape": "String"},
    "text": {"shape": "String"},
    "blob": {"shape": "Blob"}
  }
}

"Blob": {"type": "blob"}

The service uses the rest-json protocol, so botocore's RestJSONSerializer handles serialization and RestJSONParser handles deserialization. Both walk request/response structures shape-aware via JSONSerializer._serialize_type_structure / BaseJSONParser._handle_structure, recursing into nested members. When the walk reaches a blob shape:

  • Serializer side (_serialize_type_blob_get_base64) accepts bytes and base64-encodes for the wire. If the caller passes a str instead, _get_base64 UTF-8-encodes it then base64-encodes — which is what produces the double-encoding when the SDK pre-encodes.
  • Parser side (_handle_blob_default_blob_parser) calls base64.b64decode on the wire value and returns bytes. So resource["blob"] is already raw bytes by the time the SDK code runs — empirically confirmed in the repro above (type=bytes, len=67).

(Per the rest-json blob convention; this is consistent across all rest-json services in botocore.)

The convenience wrappers in code_interpreter_client.py add an extra base64 round on each side, on top of these built-in steps. Removing those extra rounds — the diffs in the Describe the bug section — restores correctness.

Workaround

Skip the convenience methods and call invoke() directly:

# Upload
client.invoke("writeFiles", {"content": [{"path": "x", "blob": <bytes>}]})

# Download (returns the boto3 EventStream; iterate and pull resource["blob"])
result = client.invoke("readFiles", {"paths": ["x"]})
for event in result["stream"]:
    for ci in event.get("result", {}).get("content", []):
        if ci.get("type") == "resource":
            blob = ci.get("resource", {}).get("blob")  # already bytes

This is what we ship in production. Reference implementations — _write_binary_to_sandbox for upload and _fetch_from_sandbox for download, including session-reclaim error handling — are in jamf/genai-ask-jamf-assistant#802 (private repo; happy to share via gist if useful).

Environment

  • OS: macOS 15.4 (host); reproduced symptom on Linux AgentCore-managed runtime as well
  • Python: 3.12.x
  • SDK: confirmed against bedrock-agentcore 1.1.5 (the buggy lines are unchanged in v1.8.0 and on main HEAD as of 2026-05-07; only difference between 1.1.5 and 1.8.0 on the download side is PR fix: download_file/download_files crash on binary content with UnicodeDecodeError #257's try/except wrapper, which doesn't remove the redundant b64decode)
  • botocore: 1.42.94 (the Body/Blob blob-shape handling is stable across released versions)

Additional context

  • Docstring at L484–485 says "Binary content will be base64 encoded automatically." That's currently true twice — worth updating to "Binary content is transmitted as-is; boto3 handles wire-format base64 encoding for the blob shape" alongside the fix.
  • Companion bug already filed: #243 (UnicodeDecodeError on binary download) reported only the surface symptom of the download-side bug; #257 addressed that symptom by wrapping the UTF-8 decode in try/except but didn't remove the redundant b64decode. So the underlying bug is still live in v1.8.0; Downloading binary files fails #243 may need to be reopened or this issue can supersede it.
  • Introduced in: PR #202 ("feat(code-interpreter): Add convenience methods for file operations and package management").
  • Independent confirmation that the bypass pattern is correct: strands-agents/tools #304 and PR #462 hit the upload-side ergonomic gap from their tool layer. Their fix bypasses upload_file/upload_files entirely and calls client.invoke("writeFiles", {"content": [{"path": ..., "blob": <bytes>}]}) directly — exactly the pattern we use as a workaround. Their PR description explicitly assumes "the BedrockAgentCore SDK already supports blob on writeFiles" — true at the invoke layer; broken on the convenience wrappers.
  • We're pinned transitively to <1.2 via strands-agents-tools[agent_core_code_interpreter]==0.5.2. Even after this is fixed in a newer SDK release, downstream consumers stuck behind that pin would still need the workaround. Filing this so the fix lands and is referenceable, and so future consumers searching for the symptom find it.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions