Skip to content

Improving handling of large weather datasets using chunking and alternative storage formats #193

@AjayKumar0403

Description

@AjayKumar0403

Description

While working with weather datasets in the Weather Routing Tool, I observed that handling larger NetCDF files can lead to increased memory usage and slower processing times.

Currently, datasets are typically loaded into memory, which may not scale well when working with long time spans or high-resolution spatial data.

Observations

  • Loading large datasets can be memory-intensive
  • Processing operations may become slower as dataset size increases
  • There is limited guidance on handling large datasets efficiently

Suggested Improvements

It may be useful to explore scalable data handling approaches such as:

  • Using chunked processing (e.g., with Dask) to enable lazy evaluation
  • Evaluating alternative storage formats like Zarr for faster I/O and better scalability
  • Providing examples or documentation on efficient data loading and processing

Expected Benefit

These improvements could help:

  • Reduce memory usage
  • Improve performance for large datasets
  • Make the Weather Routing Tool more scalable for real-world applications

Context

This observation is based on experimenting with dataset loading, subsetting, and storage operations during the code challenge.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions