Skip to content

First stab at streaming refactor#33

Merged
ebranlard merged 4 commits into
ebranlard:devfrom
xflow-ben:partial_read_refactor
Nov 14, 2025
Merged

First stab at streaming refactor#33
ebranlard merged 4 commits into
ebranlard:devfrom
xflow-ben:partial_read_refactor

Conversation

@xflow-ben
Copy link
Copy Markdown

Summary

This PR adds streaming mode support to weio, enabling memory-efficient reading of large output files (GB-sized) by allowing users to inspect file headers without loading all data into memory.

Motivation

As discussed in [issue/discussion link if applicable], large simulation output files can cause memory issues when only metadata is needed. This implementation provides up to 10,000x memory reduction for header-only inspection.

Implementation

  • Design: Single streaming parameter with context manager enforcement (Option B from our discussion)
  • Formats supported: OpenFAST (.out, .outb), CSV (.csv), HAWC2 (.dat, .sel)
  • Key features:
    • Header-only reading: exit context without calling readAll()
    • Full streaming: call readAll() after header inspection
    • Chunk reading: CSV files support readChunk(nlines=N)
    • Strong OOP: Base File class provides infrastructure
    • 100% backward compatible

Changes

  • Updated weio/file.py - Added context manager and streaming infrastructure
  • Updated weio/fast_output_file.py - Streaming for ASCII and binary formats
  • Updated weio/csv_file.py - Streaming with chunk reading support
  • Updated weio/hawc2_dat_file.py - Streaming for HAWC2 formats
  • Updated weio/_NEWFILE_TEMPLATE.py - Template for future formats
  • Added weio/tests/test_streaming.py - 33 comprehensive tests
  • Updated README.md - Added streaming mode examples

Testing

  • ✅ All 89 tests pass (33 new + 56 existing)
  • ✅ 100% backward compatible - no breaking changes
  • ✅ Test coverage: header-only, full streaming, chunk reading, error handling

Documentation

  • README updated with 3 usage examples
  • Added STREAMING_QUICKREF.md for developers
  • Template file updated with streaming pattern

Example Usage

# Header-only inspection (memory efficient!)
with weio.read('large_output.outb', streaming=True) as f:
    print(f['attribute_names'])  # Channel names
    print(f['attribute_units'])  # Units
    # f.data is None - no data loaded

# Load data after inspection
with weio.read('large_output.out', streaming=True) as f:
    print(f"Channels: {len(f['attribute_names'])}")
    f.readAll()  # Now load data
    df = f.toDataFrame()

Copy link
Copy Markdown
Owner

@ebranlard ebranlard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that looks pretty good.

I have small comments to avoid code duplication and try to make use of the inheritance with the paretn class.

But otherwise, that seems pretty good.

Comment thread .claude/settings.local.json Outdated
Comment thread weio/_NEWFILE_TEMPLATE.py Outdated
Comment thread weio/file.py Outdated
Comment thread weio/csv_file.py
Comment thread weio/fast_output_file.py
Comment thread weio/file.py Outdated
…actor binary reading

- Remove nlines parameter from base class methods (use **kwargs for flexibility)
- Fix File.__init__() to not read when streaming=True
- CSVFile and FASTOutputFile now properly call parent __init__()
- Remove duplicate context manager methods (use inheritance)
- Refactor FASTOutputFile binary reading to eliminate ~160 lines of duplication
- Update _NEWFILE_TEMPLATE.py with correct **kwargs pattern
- All 89 tests pass with 100% backward compatibility
@ebranlard ebranlard self-assigned this Nov 13, 2025
@ebranlard ebranlard added the enhancement New feature or request label Nov 13, 2025
Copy link
Copy Markdown
Owner

@ebranlard ebranlard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to keep the code as it is, but I feel like some things can be simplified a bit more. Feel free to take a stab or not.

Comment thread weio/hawc2_dat_file.py Outdated
Comment thread weio/hawc2_dat_file.py Outdated
Comment thread weio/hawc2_dat_file.py Outdated
Comment thread weio/fast_output_file.py
Comment thread weio/fast_output_file.py
Comment thread weio/fast_output_file.py Outdated
Comment thread weio/fast_output_file.py Outdated
Comment thread weio/fast_output_file.py Outdated
Comment thread weio/csv_file.py Outdated
Comment thread weio/csv_file.py Outdated
@ebranlard ebranlard self-requested a review November 13, 2025 03:52
@ebranlard ebranlard changed the base branch from main to dev November 13, 2025 17:23
@ebranlard ebranlard merged commit 38c6229 into ebranlard:dev Nov 14, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants