Refactor26 by bnlawrence · Pull Request #326 · NCAS-CMS/PyActiveStorage

bnlawrence · 2026-04-09T10:30:18Z

Description

This is a massive refactor of pyactivestorage to establish a complete separation of concerns between storage format, active itself, and backend reductions. The objective is shown in a set of UML diagrams, but in particular these two:

and

If agreed this is the right approach, we wouldn't merge this into main, we'd replace main with it!

Key review objectives should be:

Does the testing strategy here cover all the testing that we had in main? (It should, the main testing was pulled in test by test, rather than by merge).
Is the new structure more logical and likely easier to maintain, and in particular to extend?
IS there any signficant performance degradation associated with it.

Note that documentation has not been updated. We can do that when we are happy with the broad structure.

…p.cfg, README.md, .gitignore, environment.yml; fix s3_exploratory import paths

… updated backends and tests - Backport activestorage/reductionist.py with v2 API (axis parameter, interface_type, option_disable_chunk_cache) - Update S3Backend and HttpsBackend to use new reductionist.reduce_chunk() signature - Change API endpoint from v1 to v2 - Backport test_reductionist.py and test_reductionist_json.py from main - Update test_storage_types.py mock to match new signature - Fix relative imports in test_reductionist_json.py This enables: - Reductionist-based backends (S3, HTTPS) to support axis-aware reductions via v2 API - LocalBackend continues to use storage.reduce_chunk() without axis (unchanged) - Hybrid architecture: axis support at backend level without modifying storage.py contract

- Fix S3Backend.reduce_chunk(): pass request.axis instead of hardcoded axis=None - Fix HttpsBackend.reduce_chunk(): pass request.axis instead of hardcoded axis=None - Add sum() method to Active class with axis parameter support - Update test_storage_types.py mock assertion to expect correct axis=(0,1,2) This completes the axis parameter chain: Active.method(axis=N) → _get_selection() → _from_storage() → strategy → ChunkRequest → backend.reduce_chunk() → reductionist.reduce_chunk(axis=...) All 68 tests pass with axis support fully integrated.

Replace all 'from activestorage.config import *' with explicit symbol imports: - tests/test_bigger_data.py: USE_S3 - tests/test_missing.py: USE_S3 - tests/test_byte_order.py: USE_S3 - tests/test_compression.py: USE_S3, S3_ACCESS_KEY, S3_SECRET_KEY, S3_URL, S3_ACTIVE_STORAGE_URL - tests/test_harness.py: USE_S3 - tests/test_reductionist_json.py: USE_S3 - tests/test_compression_remote_reductionist.py: USE_S3, S3_BUCKET, REMOTE_RED, S3_ACTIVE_STORAGE_URL - tests/unit/test_active.py: USE_S3 - tests/unit/test_storage_types.py: S3_ACTIVE_STORAGE_URL, S3_URL - tests/utils.py: USE_S3, S3_URL, S3_ACCESS_KEY, S3_SECRET_KEY, S3_BUCKET - bnl/bnl_test.py: removed unused import - bnl/test_missing_gubbins.py: removed unused import Benefits: - Pylance type checking now works correctly - Code is clearer about dependencies - Reduces IDE warnings about undefined names - All 68 tests still pass

The InvalidHDF5File error is now raised during Active.__init__() when the file is opened (pyfive.File), not later during data access. Update the non-S3 case to wrap the initialization in pytest.raises context, making it consistent with the S3 case behavior. This fixes the test failure after explicit import changes.

- Update active_refactor_v7.puml: Add sum() method to Active class - Add axis_flow_v1.puml: Sequence diagram showing axis parameter flow through the entire reduction pipeline from User → Active → Strategy → Backend → Reductionist v2 API - Add axis_support_architecture_v1.puml: Detailed architecture diagram showing how axis is handled across core classes, backends, and the reductionist API with practical examples The updated documentation clearly shows: - How axis parameter is stored and passed through Active class - Complete data flow from user call to backend reduction - Differences between LocalBackend (no axis) and S3/HTTPS backends - How Reductionist v2 API performs axis-aware reduction - Example usage patterns for different axis configurations

…h, and one that uses the CBOR response/deserialistion

valeriupredoi · 2026-04-09T12:19:51Z

a few more things to tell Mr Claude to take care of @bnlawrence - fairly important 😁

recognize storage_type (which in current main is interface_type), and...
once it knows it as eg S3 not complain bnl/file.nc can not be found hehe

- Restored test_active_axis.py, test_real_https, test_real_s3, test_real_s3_with_axes from origin/main - Added load_from_https to helpers.py and re-exported from active.py - Added axis parameter to Active.__init__() for backward compatibility with axis-on-construction tests - Fixed storage.reduce_chunk() and reduce_opens3_chunk() to preserve array structure for axis-specific reductions - Added _apply_missing_mask() to keep masked arrays without flattening (replaces remove_missing for axis paths) - Updated LocalBackend and S3Backend v1 to pass axis parameter to reduce functions - Added axis validation in Active._get_selection() with proper out-of-range error handling - Fixed encode_result() to properly serialize numpy arrays/scalars to JSON-compatible types for CBOR This ensures consistency between local, S3 v1, and remote Reductionist axis-based reduction paths.

valeriupredoi · 2026-04-10T12:48:38Z

installed from PyPI: Successfully installed ActiveStorage-0.0.1.dev841+g28ede408d cloudpickle-3.1.2 dask-2026.3.0 donfig-0.8.1.post1 google-crc32c-1.8.0 kerchunk-0.2.10 locket-1.0.0 partd-1.4.2 toolz-1.1.0 zarr-3.1.6 - ideally none of these should be installed from PyPI (apart from active storage of course)

Bryan Lawrence added 12 commits April 5, 2026 09:42

Interim step, probably will delete in future

fd57e12

First cut refactoring

abbd405

Not sure why we need this

32fab24

Sync CI config and documentation from main: restore codecov.yml, setu…

f4cf218

…p.cfg, README.md, .gitignore, environment.yml; fix s3_exploratory import paths

fixed axis flow diagram

c992a92

improved docs for axis understanding

d55e300

Figures to accompany puml

501e53b

bnlawrence requested a review from valeriupredoi April 9, 2026 11:25

Bryan Lawrence added 2 commits April 9, 2026 12:35

Adding a second local variant so we can simulation a normal local pat…

1be535a

…h, and one that uses the CBOR response/deserialistion

Putting thread parallelisation back in

749891f

Bryan Lawrence and others added 5 commits April 9, 2026 14:51

Fixing all but one test

7a9ad53

replace very obsolete github actions workflow

6a5f1bf

replace very obsolete github actions workflow

92340e2

turn off flake8 test

28ede40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor26#326

Refactor26#326
bnlawrence wants to merge 19 commits intomainfrom
refactor26

bnlawrence commented Apr 9, 2026 •

edited

Loading

Uh oh!

valeriupredoi commented Apr 9, 2026

Uh oh!

valeriupredoi commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bnlawrence commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

valeriupredoi commented Apr 9, 2026

Uh oh!

valeriupredoi commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bnlawrence commented Apr 9, 2026 •

edited

Loading