Skip to content

Abstract base parser class#5

Merged
ldgibson merged 10 commits into
mainfrom
abstract-base-parser-class
Dec 11, 2025
Merged

Abstract base parser class#5
ldgibson merged 10 commits into
mainfrom
abstract-base-parser-class

Conversation

@ldgibson

Copy link
Copy Markdown
Owner

Refactoring the xarray backend and parsers to introduce two protocol interfaces (OnDiskTrajectory and OnDiskArray) for a cleaner implementation of the xarray entrypoint. Now, the TrajectoryBackendArray inside the XMDPYBackendEntrypoint class only depends on the trajectory interface via the OnDiskTrajectory protocol.

For each trajectory format, the OnDiskTrajectory implementation contains the relevant trajectory information for that specific format and provides helper functions to generalize the steps taken when creating the final Dataset object with the trajectory information.

The different types of array-like information in the trajectory file (e.g., XYZ coordinates, cell information, etc.) are provided as OnDiskArrays by the OnDiskTrajectory implementation. Indexing an OnDiskArray instance will parse the trajectory file for the associated data it represents (e.g., the XYZ coordinates) and return it, but the result is not cached like in lazy loading. The lazy loading is instead performed by the LazilyIndexedArray (provided by xarray) which wraps the TrajectoryBackendArray. The trajectory data is accessed via several, nested objects when loading the trajectory into the final xarray Dataset object. From highest to lowest levels, the order is as follows:

  • xarray.Dataset
    • xarray.Variable
      • xarray.core.indexing.LazilyIndexedArray
        • TrajectoryBackendArray
          • OnDiskArray (via the OnDiskTrajectory object)
            • raw data in trajectory file

As a result, XMDPYBackendEntrypoint relies on the OnDiskTrajectory interface to provide the trajectory data in a standardized, iterable format when creating the final xarray.Dataset object, simplifying the steps need to patch everything together. Lastly, the backend.names submodule was added to standardize the naming of trajectory variables, coordinates, and dimensions.

@ldgibson ldgibson merged commit ba59844 into main Dec 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant