fix: stream large file uploads from disk to avoid memory exhaustion and proxy timeouts#147
Open
devin-ai-integration[bot] wants to merge 2 commits into
Open
Conversation
When write_binary receives a file path for files > 5 MB, read chunks directly from disk using os.pread() instead of loading the entire file into memory with Path.read_bytes(). This prevents OOM for multi-GB files and ensures the multipart upload path is used without requiring the full file content in RAM. At most MAX_PARALLEL_UPLOADS * CHUNK_SIZE (15 MB) is held in memory at any time during upload. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
If _upload_part throws inside a thread, capture the exception and re-raise it after all threads join. This prevents _complete_multipart_upload from being called with incomplete parts on partial failure. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
LGTM
The previous bug (silently swallowed thread exceptions) is fixed — exceptions list is now populated and re-raised after join(). Both async and sync paths correctly abort the multipart upload on failure. No new issues found.
Tag @mendral-app with feedback or questions. View session
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
write_binaryis called with a file path (string) for files larger than 5 MB, the SDK previously loaded the entire file into memory withPath.read_bytes()before passing it to the multipart upload path. For multi-GB files (e.g., 17 GB), this causes:Fix: Added
_upload_file_with_multipart()which usesos.pread()to read 5 MB chunks directly from the file descriptor at absolute offsets, uploading them in parallel batches without ever holding more than ~15 MB in memory (3 parallel × 5 MB).Changes applied to both async (
default/filesystem.py) and sync (sync/filesystem.py) versions.Review & Testing Checklist for Human
sandbox.fs.write_binary(\"/remote/path\", \"/local/path/to/bigfile\")Notes
os.pread()is used for positional reads (thread-safe, no seek needed) — available on all POSIX systems and Windows (Python 3.3+)_upload_with_multipart(data: bytes)method is preserved for callers passing in-memorybytesdirectlyLink to Devin session: https://app.devin.ai/sessions/25c0534a826d4e088768cf2c15fa1d53
Requested by: @Joffref
Note
Adds streaming multipart upload for large files (>5 MB) in both async and sync filesystem clients. Uses
os.pread()to read chunks directly from disk, avoiding loading multi-GB files into memory. A follow-up commit adds thread exception propagation in the sync version to prevent_complete_multipart_uploadfrom being called with missing parts on partial failure.Written by Mendral for commit 2f79407.