First stab at streaming refactor#33
Merged
Merged
Conversation
ebranlard
requested changes
Nov 7, 2025
Owner
ebranlard
left a comment
There was a problem hiding this comment.
Thanks, that looks pretty good.
I have small comments to avoid code duplication and try to make use of the inheritance with the paretn class.
But otherwise, that seems pretty good.
ebranlard
requested changes
Nov 7, 2025
…other requested edits
…actor binary reading - Remove nlines parameter from base class methods (use **kwargs for flexibility) - Fix File.__init__() to not read when streaming=True - CSVFile and FASTOutputFile now properly call parent __init__() - Remove duplicate context manager methods (use inheritance) - Refactor FASTOutputFile binary reading to eliminate ~160 lines of duplication - Update _NEWFILE_TEMPLATE.py with correct **kwargs pattern - All 89 tests pass with 100% backward compatibility
ebranlard
requested changes
Nov 13, 2025
Owner
ebranlard
left a comment
There was a problem hiding this comment.
I'm happy to keep the code as it is, but I feel like some things can be simplified a bit more. Feel free to take a stab or not.
ebranlard
approved these changes
Nov 14, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds streaming mode support to weio, enabling memory-efficient reading of large output files (GB-sized) by allowing users to inspect file headers without loading all data into memory.
Motivation
As discussed in [issue/discussion link if applicable], large simulation output files can cause memory issues when only metadata is needed. This implementation provides up to 10,000x memory reduction for header-only inspection.
Implementation
streamingparameter with context manager enforcement (Option B from our discussion)readAll()readAll()after header inspectionreadChunk(nlines=N)Changes
weio/file.py- Added context manager and streaming infrastructureweio/fast_output_file.py- Streaming for ASCII and binary formatsweio/csv_file.py- Streaming with chunk reading supportweio/hawc2_dat_file.py- Streaming for HAWC2 formatsweio/_NEWFILE_TEMPLATE.py- Template for future formatsweio/tests/test_streaming.py- 33 comprehensive testsREADME.md- Added streaming mode examplesTesting
Documentation
Example Usage