From 76a5a9b75b6a463b017e2b84eadc9c3bd40c26a3 Mon Sep 17 00:00:00 2001 From: Aimee Barciauskas Date: Fri, 5 Jun 2026 15:10:21 -0700 Subject: [PATCH 1/5] Updated roadmap --- mkdocs.yml | 4 +- roadmap.md | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 175 insertions(+), 2 deletions(-) create mode 100644 roadmap.md diff --git a/mkdocs.yml b/mkdocs.yml index 2113ff9..be2fa6e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -2,7 +2,7 @@ site_name: Optimized Data Delivery repo_name: NASA-IMPACT/veda-odd site_author: ODD Team docs_dir: docs -site_url: !ENV [READTHEDOCS_CANONICAL_URL, 'https://nasa-impact.github.io/veda-odd/'] +site_url: !ENV [READTHEDOCS_CANONICAL_URL, "https://nasa-impact.github.io/veda-odd/"] extra: version: @@ -11,7 +11,7 @@ extra: nav: - "index.md" - - FY26 Roadmap: "fy26-roadmap.md" + - Roadmap: "roadmap.md" - PI Objectives: "objectives.md" - ODD Products: "products.md" - Tech Tips: "tech-tips.md" diff --git a/roadmap.md b/roadmap.md new file mode 100644 index 0000000..c973178 --- /dev/null +++ b/roadmap.md @@ -0,0 +1,173 @@ +# ODD roadmap + +This page exists to explain the motivations behind ODD's daily work. It connects what +we're building to why we're building it, and explains how work enters, moves through, +and may eventually leave our portfolio. The primary audience is the ODD team. +The secondary audience is peer ODSI teams who want to understand how our work fits the broader picture. + +## Vision: who we serve + +Our vision is expressed as the experiences users will have when we've succeeded: + +1. **Ask in plain language and reproduce response.** As an Earth enthusiast, I want to ask questions like "how did the Gifford fire evolve?" and get an animated visual response — with links to the source code that produced the analysis, so I can verify and reproduce it. +2. **Explore in the browser.** As an Earth enthusiast, I want to visually explore forest disturbance through NISAR data directly in my browser, with no specialized software or cloud account. +3. **Research at scale.** As a fire event researcher, I want to evaluate relationships between variables from different data products across many thousands of fires, with minimal data pre-processing for fusion and modeling. +4. **Operate in near-real time.** As an operational **application**, I need products like HLS for disaster response or sea surface temperature for maritime operations available in near-real time. + +## The gap + +NASA already serves these users — but current services have limits that grow more acute as data volumes grow: + +| User story | Today's services | Where they fall short | +| ------------------------- | -------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | +| Ask in plain language | Earth Information Explorer | Limited dataset access; datasets must be curated into the system | +| Explore in the browser | Worldview / GIBS | Not configurable by users; pre-rendered layers don't scale to new datasets or rendering needs | +| Research at scale | Earthdata Cloud, Harmony, cloud-hosted JupyterHubs | Harmony offloads processing to servers — heavy compute cost rather than a structural fix; users struggle to find the best datasets for their needs | +| Operate in near-real time | LANCE + HLS | Hard to keep metadata and data in sync; no reliable notification system for new data landing in Earthdata Cloud buckets | +| All of the above | CMR | Under increasing pressure from rapid archive growth and analytics-scale query traffic | + +Across all of these: discovery is hard, and current systems are becoming unsustainable as data volumes grow. + +## Our pillars + +We address these gaps through four pillars: + +1. **Open standards & FAIR data.** NASA data and services are findable, accessible, interoperable, and reusable, built on community standards rather than bespoke systems. +2. **Performance, cost & scale.** Optimize performance while minimizing cost, with solutions that scale sustainably to new and growing data volumes. +3. **Empowered users.** Users — both data providers and data consumers — can use and apply the solutions we build without us. +4. **Trusted & reliable data.** The data products NASA generates are verifiable, consistent, and kept in sync with their metadata. + +**Cross-cutting foundation: community developed + adopted.** Every item on this roadmap is built in the open, with and for the community. Open source is the license; community development and adoption is the practice — it's how solutions outlive our involvement, and it underpins all four pillars. + +## Roadmap + +| Pillar | Now · mature | Next · developing | Later · future | +| ------------------------------ | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | +| **Open standards & FAIR data** | ◆ Array format (Zarr) stewardship · ◆ Geospatial conventions (GeoZarr) | Ecosystem sustainability · Codec re-architecture · variable chunking | Convention + CRS utilities | +| **Performance, cost & scale** | Data virtualization · Object-store access · Dynamic tiling · In-browser rendering | Virtual stores + lazy array analytics · Analytics-scale metadata · Storage model evaluation | Resampling/warp tooling · Query at scale · Storage cost optimization | +| **Empowered users** | Cloud-native guidance · Science support · Format evaluation | In-browser rendering · Cloud-optimized decision framework · Improved access & auth libraries · Dataset + tooling coverage metrics | AI-assisted optimization (skills + tooling) · ESRI / ArcGIS integration | +| **Trusted & reliable data** | ◆ Transactional Zarr (Icechunk) | Remote store access · Live virtual stores · Synchronized metadata + data | Event-driven (object store notifications) for near-real time (NRT) updates | + +**◆ Foundational** — a category of work that is ongoing. + +**Handed off:** nothing yet — see [How we work](#how-we-work). Building a working handoff path is a goal itself. + +Every objective of this team should trace to at least one vision story and one pillar. Each item name links to deeper context below. + +## How we work + +> "ODD should not be responsible for virtualizing everything! We (and our partners) are responsible for making it easy for NASA to virtualize things though." — Henry + +ODD is a research and development team, not an operations or continued-maintenance team. Success for any item on this roadmap is *graduating off of it* — not staying on it indefinitely. + +### Lifecycle + +Work moves through four stages: **Later** (future, aspirational) → **Next** (developing) → **Now** (mature) → **Handed off** (owned by someone else). + +An item is ready to hand off when it passes three tests: + +1. **Someone else can do it.** Documentation, tooling, and skills exist so that a data provider or partner can reproduce the work without us. +2. **Someone else owns it.** A named owner — a DAAC, a mission team, community maintainers — has accepted responsibility. +3. **We've stopped learning.** Our remaining contribution is maintenance, not discovery. + +Virtual data stores are an example: today we generate stores ourselves (learning). Next, +we will ship developer docs and optimization skills (enabling). Then store generation +graduates to data providers. Only the underlying tooling remains ours. Several +roadmap items — virtual store authoring docs, decision tooling, the optimization +skill/CLI, ecosystem sustainability (maintainer onboarding) — are not just projects but +handoff mechanisms. + +We don't yet have a reliable handoff process. Naming that honestly is the first step; building it is on the roadmap. + +### Prioritization + +At each planning cycle (PI), we ask two questions of the grid: + +- **What promotes?** Which Next items are ready to become Now? Which Later items are ready to become Next? +- **What graduates?** Which Now items pass the three handoff tests? + +Objectives we take on must also balance "utopian" goals — like a unified Zarr model — +with the necessity of supporting legacy patterns and other formats. + +When evaluating new candidate work, we apply these criteria: + +- **Traceability.** Does it serve at least one vision story and one pillar? +- **Adoption readiness.** How quickly can the ecosystem absorb it? Building on familiar interfaces lowers the barrier (VirtualiZarr adopting xarray's data model made it immediately accessible); very new technology carries adoption lag as a risk (zarr-datafusion-search is powerful but the ecosystem may take years to take it on). +- **Cost.** What does adoption cost — in compute, energy, money, and user capability? Solutions that require cloud compute in a specific region, for example, exclude most users. +- **Handoff path.** Can we articulate who would eventually own this, even roughly? + +## Deeper context + +What each roadmap item unlocks, and what success looks like. + +### Open standards & FAIR data + +**◆ Array format stewardship.** The foundational format for cloud-native array data — Zarr. Ongoing maintenance and stewardship, including convening the community — e.g. Zarr Summit '26/27 — to unblock progress on technical features and convention adoption. + +**◆ Geospatial conventions.** Zarr conventions for geospatial metadata (GeoZarr), essential for native and virtual Zarr collections to interoperate across GIS, visualization, and analysis libraries. Closing in on submission of the GeoZarr standard to the OGC architecture board. Success: trust and interoperability for Zarr data from all Earth data providers (NASA, NOAA, ESA), and a consistent, non-ambiguous platform to build client applications on. + +**Ecosystem sustainability.** A sustainable maintainer ecosystem for Zarr to support growing, complex use cases — the zarr-python roadmap plus maintainer onboarding. Success: adoption of the roadmap by maintainers and stakeholders, plus one or two new onboarded maintainers making significant contributions — reducing stagnation and broadening design perspectives. + +**Codec re-architecture.** The Zarr v2→v3 transition exposed design issues in the codec model. Re-architecting it supports new codec development (vital for virtualization, where archival formats use less-standardized codecs) and alternative client implementations in Rust and TypeScript. Follow-ons: *CF codecs* — capturing CF-convention decoding logic as codecs rather than attribute dictionaries, so clients interacting directly with the Zarr API don't need to duplicate xarray's specialized decoding logic; and *concatenated arrays* — supporting variable compression to unlock virtualization of quirky datasets like MUR SST (pre-design). + +**Convention + CRS utilities.** Utilities and guidance for keeping virtual store metadata aligned with CF and GeoZarr conventions. Unblocks tools that rely on those conventions from using compliant virtual stores. + +### Performance, cost & scale + +**Data virtualization.** Access archival data through the Zarr API without duplicating it — VirtualiZarr. Includes parser improvements (virtual-tiff, obspec-utils, async-hdf5, GRIB) — or transitioning parser maintenance to partners, which is itself a handoff opportunity. This is also our current lever on storage cost (see *Storage cost optimization*). + +**Object-store access.** High-performance object storage access for the Python geospatial stack — obstore. + +**Dynamic tiling.** Tiling driven by CMR — TiTiler-CMR. Current work: regenerated compatibility report (with group support), OPERA integration into the disasters portal, a distributed cache for S3 credentials (~1s saved per cold-start request), and WMTS GetCapabilities so EGIS can surface HLS vegetation indices in ArcGIS. + +**Lazy array analytics.** Instantly materialize massive lazy 4-D arrays (time, band, x, y) from metadata stores — lazycogs, a scalable replacement for stackstac/odc-stac. Success: any collection stored as COGs can be analyzed through a collection-level xarray API. + +**Variable chunking.** Variable chunk support in VirtualiZarr + xarray; unlocks virtualizing more datasets. Near-term delivery. + +**Analytics-scale metadata.** EOSDIS has identified pressure on CMR as a significant risk. Prototype collection-level stores using GeoParquet/Iceberg and zarr-datafusion to understand performance, cost, and scaling — and contribute to the relevant open-source libraries. Includes STAC in Iceberg: an object-storage-only STAC catalog giving providers API-less metadata access. + +**Storage model evaluation.** Understand emerging storage models and their trade-offs — currently the S3 Files synchronization model: compare performance to native S3 for common operations and understand its pricing. Potential to serve both durable shared storage and the low-latency block access that ML and massively parallel array workloads need. + +**Resampling/warp tooling.** A composable, Rust-based resampling/warp library reducing dependence on GDAL's monolithic toolchain. Usable from server-side tiling, distributed array frameworks (Dask, Cubed), and WASM in-browser rendering. Pre-design; builds on a full ecosystem assessment. + +**Query at scale.** Query and access data at scale through a single interface — zarr-datafusion-search. Paves the way for Zarr as a storage target for Level 0/1 and swath data, and moves EOSDIS toward an Arrow-native ecosystem. High potential, but very new — adoption lag is the known risk. + +**Storage cost optimization.** Addressing the growing cost of data volumes in Earthdata Cloud. We are not actively working on this beyond *data virtualization* (accessing archival data through the Zarr API without duplicating it). Avoiding duplication is the lever we pull today; broader storage cost strategies remain future work. + +### Empowered users + +**Cloud-native guidance.** The CNG guide: unblock people confused about which formats exist, why, and when to use each. Success: people use the guide to build cloud-native datasets, or to explain to stakeholders why a dataset was built a given way. + +**Science support.** Direct support for science users, including cloud-optimized data usage guidance (e.g., xarray arguments) in the guide and datacube guide. + +**Format evaluation.** Evaluate mission data formats and recommend improvements that enable optimized access patterns — currently NISAR: assess the NISAR HDF5 format and advise the Algorithm Development Team before the official release in summer 2026. Includes a virtualization + data fusion prototype showing a more user-friendly virtual representation. + +**In-browser rendering.** In-browser GPU rendering of COGs and Zarr via direct data access (deck.gl-raster + Lonboard) — users customize rendering without re-fetching data. Current work: demonstrations in documentation (band combinations, direct access), initial GeoZarr support in both libraries, and a TypeScript WKB→GeoArrow parser enabling DuckDB-Wasm integration. Current limitation: requires open data access. + +**Virtual store authoring.** How to build virtual stores, with or without agents — developer docs. Unblocks DAACs and science teams as virtual store developers — a primary handoff mechanism. + +**Cloud-optimized decision framework.** The cloud-optimized data decision tree: a diagram plus explanatory text with examples per branch, guiding format and chunking decisions. Foundation for AI-assisted optimization. + +**Improved access & auth libraries.** Libraries that get data and credentials into users' hands — earthaccess v1, notably a modular approach with refreshable credentials in a lightweight earth-auth package; finish opening Icechunk stores via earthaccess. + +**AI-assisted optimization (skills + tooling).** A CLI and agentic skill for data structure optimization, plus an agent that walks data providers through chunking and format decisions (CO data AI guidance) — usable across ESDS. Builds on the cloud-optimized decision framework, reducing engineering time to a balanced or optimized data structure. + +**Dataset + tooling coverage metrics.** Assess how many NASA datasets work with our tools (VirtualiZarr, datafusion, lazycogs) so we have metrics for improvement and impact. + +**ESRI / ArcGIS integration.** A large share of NASA data users work in ArcGIS, so our tools and data need to integrate with ESRI systems rather than require users to leave them. Ensure our cloud-native outputs are consumable there through the open standards ESRI already supports (COG, WMTS, OGC APIs, GeoZarr) — the EGIS/ArcGIS WMTS work in *Dynamic tiling* is the first concrete instance. Meeting users where they are, not requiring new software. + +### Trusted & reliable data + +**◆ Transactional Zarr.** Checksum verification and ACID transactions for Zarr stores (Icechunk) — the reliability layer. + +**Remote store access.** Bearer-token HTTP support unblocks NASA data users without cloud compute in us-west-2 from using virtual stores — PO.DAAC has identified this as the single blocker to rolling out their Icechunk stores. Also: parsing manifests back out of Icechunk (inspection and modification of virtual stores, plus risk mitigation) and prefix-changing utilities. + +**Live virtual stores.** Stores kept current as data lands — e.g. MUR SST as native Zarr, rechunked for time series, updated in near-real time as an AWS Public Dataset. Serves anyone doing historical or NRT sea surface temperature analysis, and demonstrates Icechunk's capabilities end to end. + +**Synchronized metadata + data.** Keep metadata in sync with data (via zarr-datafusion-search) — addressing the gap where metadata and data drift apart. + +**Event-driven NRT updates.** Icechunk makes all store updates trackable by listening to changes in object storage keys, enabling simple event-driven pipelines: dynamically updated pyramids (e.g., for Worldview), summary statistics, pre-computed time series. The path to keeping virtual stores current with incoming data streams — and to the near-real-time vision story. + +--- + +*Open questions for the team: verify the Earth Information Explorer claim in the gap table; align timelines with data services (when do they stop coggifying?) and front-end teams (will tile servers eventually go away?); define our first formal handoff.* \ No newline at end of file From df3d18c4f11afb38e98e39fb170d40ce86913c75 Mon Sep 17 00:00:00 2001 From: Aimee Barciauskas Date: Fri, 5 Jun 2026 15:12:08 -0700 Subject: [PATCH 2/5] Remove old roadmap and move new one --- docs/fy26-roadmap.md | 129 -------------------------------- roadmap.md | 173 ------------------------------------------- 2 files changed, 302 deletions(-) delete mode 100644 docs/fy26-roadmap.md delete mode 100644 roadmap.md diff --git a/docs/fy26-roadmap.md b/docs/fy26-roadmap.md deleted file mode 100644 index 6c94442..0000000 --- a/docs/fy26-roadmap.md +++ /dev/null @@ -1,129 +0,0 @@ -# ODD Fiscal Year (FY) 2026 Roadmap - -If you are interested in a better understanding of the ODD service roadmap, and what datasets will be supported when, this document is for you. - -This document provides a roadmap for the VEDA Optimized Data Delivery Team (ODD), broken into 4 categories: -1. Services for granules in CMR -2. Services for datacubes -3. Services non-datacube -4. Foundational Work - -It is important to note that this roadmap is a reflection of the team's current plans, written as of November 2025. These are likely to evolve over time. We intend to update the roadmap quarterly. - -For a higher-level vision, see also: [Optimized Data Delivery Roadmap for NASA - July 2025](https://docs.google.com/presentation/d/1Ouo_9qJJuDBdrzDHpt2P-o1wGBPS1nvTjLRFAFGsYkU/edit?usp=sharing). - ---- - -## Legend - -- **✅ Complete** - Already delivered -- **🚧 In Progress** - Active development -- **🔄 Ongoing** - Ongoing work -- **📅 Planned** - Scheduled for specific quarter -- **🔮 Future** - Planned for future timeline - ---- - -![Services for CMR Granules](./category1-granules.svg) - -## Roadmap for Service Category 1: Services for CMR Granules - -### Access -*N/A* - -### Visualization -- **✅ Complete** titiler-cmr /tiles API + VEDA UI integration - -### Timeseries -- **✅ Complete** titiler-cmr /timeseries/statistics API + VEDA UI integration - -### Additional Features -- **🚧 26.1** Release /compatibility endpoint -- **📅 26.2+** Develop support for more datasets, informed by compatibility testing in 26.1. - -### Dataset Support -- **✅ Complete** Demonstrated with GPM IMERG, TROPESS O3 and MiCASA -- **🚧 26.1** Compile a list of compatible datasets -- **🚧 26.1** Develop support for EDL-based credential access, as an aternative to requester-pays and role-based access. To support NISAR (ASF) and GEDI L4B (ORNL DAAC) specifically. -- **📅 26.2+** Test integration of new datasets as requester-pays is enabled for more buckets. - -### Performance + Operations -- **🚧 26.1** Deploy monitoring + performance evaluation via service tracing (OpenTelemetry) -- **📅 26.1** MCP Production deployment -- **📅 26.2** Consolidated benchmarking utilities for advising users on zoom levels, AOIs and temporal parameters on a per-dataset basis - -### Ecosystem Development -- **📅 26.2** Share compatible dataset list with NASA product teams for potential integration (i.e. Worldview) -- **📅 26.2+** Continued documentation to support self-service use of titiler-cmr. - ---- - -![Services for Datacubes](./category2-datacubes.svg) - -## Roadmap for Service Category 2: Services for Datacubes - -### Access -- **✅ Complete** Lazy loading/intelligent subsetting/intelligent access for varied data formats (GRIB, COG, NetCDF-4, HDF5 via VirtualiZarr) -- **📅 26.1** Support adoption of Virtual Zarr through library maintenance, improved documentation, and user support -- **📅 26.2** Support for arbitrary [chunk-grids (variable chunking)](https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#chunk-grids) -- **📅 26.2** Explore virtualization methods for alternate grid structures (i.e., healpix, cubegrid) - -### Visualization -- **📅 26.1** Virtual container (Icechunk) integration in titiler-multidim to support /tiles endpoints -- **📅 26.1** Identify additional I/O parameters to allow for per-dataset optimizations -- **📅 26.1** Test VEDA UI integration of /tiles for a virtual dataset (e.g. NLDAS) -- **📅 26.2** Additional performance improvements (e.g. obstore integration) - -### Timeseries -- **📅 26.1** Design the timeseries/statistics endpoint to support datacubes (i.e. could be an asynchronous API outside the titiler ecosystem) -- **📅 26.2** Develop the timeseries/statistics endpoint -- **📅 26.2** Integrate the timeseries/statistics endpoint into VEDA UI - -### Datasets -- **✅ Complete** Prototyped virtual (Icechunk) stores for NLDAS, RASI, HRRR, MUR SST -- **📅 26.1** Demonstrate publication and tiling of NLDAS virtual store (💧 Water Insight) -- **📅 26.1** Architecture + documentation for generalizing STAC publication and VEDA UI /tiles integration -- **📅 26.2** HydroGlobe 5km and 10km virtual stores (💧 Water Insight) -- **📅 26.2** CarbonTracker-CH₄, EPA Gridded CH₄ Emissions Inventory virtual stores (🏭 GHGCenter) -- **📅 26.3** Documentation for STAC publication and VEDA UI /timeseries/statistics integration -- **📅 26.3** CarbonTracker-CH₄, EPA Gridded CH₄ Emissions Inventory tiles and timeseries integrations (🏭 GHGCenter) -- **📅 26.3** TROPESS NOx, TROPESS O3, JPL MOMO Chem, GEOS CF virtual stores, tiles and timeseries integrations (💨 Air Quality) - -### Operations -- **📅 26.2** Monitoring + Performance evaluation via service tracing (OpenTelemetry) -- **📅 26.3** MCP deployment -- **📅 26.2** Consolidated benchmarking utilities for advising users on zoom levels, AOIs and temporal parameters on a per-dataset basis - -### Ecosystem Development -- **📅 26.1** Create template data ingestion pipeline for virtualizing datasets -- **📅 26.3+** Moving towards self-service integration - ---- - -![Services for Non-Datacubes](./category3-nondatacubes.svg) - -## Roadmap for Service Category 3: Services for Non-Datacubes - -### Access -- **🚧 26.1-26.3** Prototyping creating a query engine using a Zarr provider for data fusion - -### Visualization -- **🔮 26.4 or FY 27** Tiling endpoints in near-term, direct client approaches in long-term - -### Timeseries -- **🔮 26.4 or FY 27** Timeseries API - -### Datasets -- **📅 26.1** Prototype HLS store -- **📅 26.3+** Prototype NISAR and/or Opera stores - -### Operations -- **🔮 26.4 or FY 27** Operational deployment + documentation -- **🔮 26.4 or FY 27** Consolidated benchmarking utilities for advising users on zoom levels, AOIs and temporal parameters on a per-dataset basis - -### Ecosystem Development -- **🔮 26.4 or FY 27** Develop ecosystem, moving towards self-service adoption within VEDA and broader community - -## Roadmap for Service Category 4: Foundational Work (including Technical Debt) - -- **🔄 26.1+** Establish areas for consolidation in the TiTiler ecosystem. Similar features across applications should rely on shared upstream libraries. The ODD team continuously identifying similar features and proactively DRY up codebases. diff --git a/roadmap.md b/roadmap.md deleted file mode 100644 index c973178..0000000 --- a/roadmap.md +++ /dev/null @@ -1,173 +0,0 @@ -# ODD roadmap - -This page exists to explain the motivations behind ODD's daily work. It connects what -we're building to why we're building it, and explains how work enters, moves through, -and may eventually leave our portfolio. The primary audience is the ODD team. -The secondary audience is peer ODSI teams who want to understand how our work fits the broader picture. - -## Vision: who we serve - -Our vision is expressed as the experiences users will have when we've succeeded: - -1. **Ask in plain language and reproduce response.** As an Earth enthusiast, I want to ask questions like "how did the Gifford fire evolve?" and get an animated visual response — with links to the source code that produced the analysis, so I can verify and reproduce it. -2. **Explore in the browser.** As an Earth enthusiast, I want to visually explore forest disturbance through NISAR data directly in my browser, with no specialized software or cloud account. -3. **Research at scale.** As a fire event researcher, I want to evaluate relationships between variables from different data products across many thousands of fires, with minimal data pre-processing for fusion and modeling. -4. **Operate in near-real time.** As an operational **application**, I need products like HLS for disaster response or sea surface temperature for maritime operations available in near-real time. - -## The gap - -NASA already serves these users — but current services have limits that grow more acute as data volumes grow: - -| User story | Today's services | Where they fall short | -| ------------------------- | -------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | -| Ask in plain language | Earth Information Explorer | Limited dataset access; datasets must be curated into the system | -| Explore in the browser | Worldview / GIBS | Not configurable by users; pre-rendered layers don't scale to new datasets or rendering needs | -| Research at scale | Earthdata Cloud, Harmony, cloud-hosted JupyterHubs | Harmony offloads processing to servers — heavy compute cost rather than a structural fix; users struggle to find the best datasets for their needs | -| Operate in near-real time | LANCE + HLS | Hard to keep metadata and data in sync; no reliable notification system for new data landing in Earthdata Cloud buckets | -| All of the above | CMR | Under increasing pressure from rapid archive growth and analytics-scale query traffic | - -Across all of these: discovery is hard, and current systems are becoming unsustainable as data volumes grow. - -## Our pillars - -We address these gaps through four pillars: - -1. **Open standards & FAIR data.** NASA data and services are findable, accessible, interoperable, and reusable, built on community standards rather than bespoke systems. -2. **Performance, cost & scale.** Optimize performance while minimizing cost, with solutions that scale sustainably to new and growing data volumes. -3. **Empowered users.** Users — both data providers and data consumers — can use and apply the solutions we build without us. -4. **Trusted & reliable data.** The data products NASA generates are verifiable, consistent, and kept in sync with their metadata. - -**Cross-cutting foundation: community developed + adopted.** Every item on this roadmap is built in the open, with and for the community. Open source is the license; community development and adoption is the practice — it's how solutions outlive our involvement, and it underpins all four pillars. - -## Roadmap - -| Pillar | Now · mature | Next · developing | Later · future | -| ------------------------------ | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | -| **Open standards & FAIR data** | ◆ Array format (Zarr) stewardship · ◆ Geospatial conventions (GeoZarr) | Ecosystem sustainability · Codec re-architecture · variable chunking | Convention + CRS utilities | -| **Performance, cost & scale** | Data virtualization · Object-store access · Dynamic tiling · In-browser rendering | Virtual stores + lazy array analytics · Analytics-scale metadata · Storage model evaluation | Resampling/warp tooling · Query at scale · Storage cost optimization | -| **Empowered users** | Cloud-native guidance · Science support · Format evaluation | In-browser rendering · Cloud-optimized decision framework · Improved access & auth libraries · Dataset + tooling coverage metrics | AI-assisted optimization (skills + tooling) · ESRI / ArcGIS integration | -| **Trusted & reliable data** | ◆ Transactional Zarr (Icechunk) | Remote store access · Live virtual stores · Synchronized metadata + data | Event-driven (object store notifications) for near-real time (NRT) updates | - -**◆ Foundational** — a category of work that is ongoing. - -**Handed off:** nothing yet — see [How we work](#how-we-work). Building a working handoff path is a goal itself. - -Every objective of this team should trace to at least one vision story and one pillar. Each item name links to deeper context below. - -## How we work - -> "ODD should not be responsible for virtualizing everything! We (and our partners) are responsible for making it easy for NASA to virtualize things though." — Henry - -ODD is a research and development team, not an operations or continued-maintenance team. Success for any item on this roadmap is *graduating off of it* — not staying on it indefinitely. - -### Lifecycle - -Work moves through four stages: **Later** (future, aspirational) → **Next** (developing) → **Now** (mature) → **Handed off** (owned by someone else). - -An item is ready to hand off when it passes three tests: - -1. **Someone else can do it.** Documentation, tooling, and skills exist so that a data provider or partner can reproduce the work without us. -2. **Someone else owns it.** A named owner — a DAAC, a mission team, community maintainers — has accepted responsibility. -3. **We've stopped learning.** Our remaining contribution is maintenance, not discovery. - -Virtual data stores are an example: today we generate stores ourselves (learning). Next, -we will ship developer docs and optimization skills (enabling). Then store generation -graduates to data providers. Only the underlying tooling remains ours. Several -roadmap items — virtual store authoring docs, decision tooling, the optimization -skill/CLI, ecosystem sustainability (maintainer onboarding) — are not just projects but -handoff mechanisms. - -We don't yet have a reliable handoff process. Naming that honestly is the first step; building it is on the roadmap. - -### Prioritization - -At each planning cycle (PI), we ask two questions of the grid: - -- **What promotes?** Which Next items are ready to become Now? Which Later items are ready to become Next? -- **What graduates?** Which Now items pass the three handoff tests? - -Objectives we take on must also balance "utopian" goals — like a unified Zarr model — -with the necessity of supporting legacy patterns and other formats. - -When evaluating new candidate work, we apply these criteria: - -- **Traceability.** Does it serve at least one vision story and one pillar? -- **Adoption readiness.** How quickly can the ecosystem absorb it? Building on familiar interfaces lowers the barrier (VirtualiZarr adopting xarray's data model made it immediately accessible); very new technology carries adoption lag as a risk (zarr-datafusion-search is powerful but the ecosystem may take years to take it on). -- **Cost.** What does adoption cost — in compute, energy, money, and user capability? Solutions that require cloud compute in a specific region, for example, exclude most users. -- **Handoff path.** Can we articulate who would eventually own this, even roughly? - -## Deeper context - -What each roadmap item unlocks, and what success looks like. - -### Open standards & FAIR data - -**◆ Array format stewardship.** The foundational format for cloud-native array data — Zarr. Ongoing maintenance and stewardship, including convening the community — e.g. Zarr Summit '26/27 — to unblock progress on technical features and convention adoption. - -**◆ Geospatial conventions.** Zarr conventions for geospatial metadata (GeoZarr), essential for native and virtual Zarr collections to interoperate across GIS, visualization, and analysis libraries. Closing in on submission of the GeoZarr standard to the OGC architecture board. Success: trust and interoperability for Zarr data from all Earth data providers (NASA, NOAA, ESA), and a consistent, non-ambiguous platform to build client applications on. - -**Ecosystem sustainability.** A sustainable maintainer ecosystem for Zarr to support growing, complex use cases — the zarr-python roadmap plus maintainer onboarding. Success: adoption of the roadmap by maintainers and stakeholders, plus one or two new onboarded maintainers making significant contributions — reducing stagnation and broadening design perspectives. - -**Codec re-architecture.** The Zarr v2→v3 transition exposed design issues in the codec model. Re-architecting it supports new codec development (vital for virtualization, where archival formats use less-standardized codecs) and alternative client implementations in Rust and TypeScript. Follow-ons: *CF codecs* — capturing CF-convention decoding logic as codecs rather than attribute dictionaries, so clients interacting directly with the Zarr API don't need to duplicate xarray's specialized decoding logic; and *concatenated arrays* — supporting variable compression to unlock virtualization of quirky datasets like MUR SST (pre-design). - -**Convention + CRS utilities.** Utilities and guidance for keeping virtual store metadata aligned with CF and GeoZarr conventions. Unblocks tools that rely on those conventions from using compliant virtual stores. - -### Performance, cost & scale - -**Data virtualization.** Access archival data through the Zarr API without duplicating it — VirtualiZarr. Includes parser improvements (virtual-tiff, obspec-utils, async-hdf5, GRIB) — or transitioning parser maintenance to partners, which is itself a handoff opportunity. This is also our current lever on storage cost (see *Storage cost optimization*). - -**Object-store access.** High-performance object storage access for the Python geospatial stack — obstore. - -**Dynamic tiling.** Tiling driven by CMR — TiTiler-CMR. Current work: regenerated compatibility report (with group support), OPERA integration into the disasters portal, a distributed cache for S3 credentials (~1s saved per cold-start request), and WMTS GetCapabilities so EGIS can surface HLS vegetation indices in ArcGIS. - -**Lazy array analytics.** Instantly materialize massive lazy 4-D arrays (time, band, x, y) from metadata stores — lazycogs, a scalable replacement for stackstac/odc-stac. Success: any collection stored as COGs can be analyzed through a collection-level xarray API. - -**Variable chunking.** Variable chunk support in VirtualiZarr + xarray; unlocks virtualizing more datasets. Near-term delivery. - -**Analytics-scale metadata.** EOSDIS has identified pressure on CMR as a significant risk. Prototype collection-level stores using GeoParquet/Iceberg and zarr-datafusion to understand performance, cost, and scaling — and contribute to the relevant open-source libraries. Includes STAC in Iceberg: an object-storage-only STAC catalog giving providers API-less metadata access. - -**Storage model evaluation.** Understand emerging storage models and their trade-offs — currently the S3 Files synchronization model: compare performance to native S3 for common operations and understand its pricing. Potential to serve both durable shared storage and the low-latency block access that ML and massively parallel array workloads need. - -**Resampling/warp tooling.** A composable, Rust-based resampling/warp library reducing dependence on GDAL's monolithic toolchain. Usable from server-side tiling, distributed array frameworks (Dask, Cubed), and WASM in-browser rendering. Pre-design; builds on a full ecosystem assessment. - -**Query at scale.** Query and access data at scale through a single interface — zarr-datafusion-search. Paves the way for Zarr as a storage target for Level 0/1 and swath data, and moves EOSDIS toward an Arrow-native ecosystem. High potential, but very new — adoption lag is the known risk. - -**Storage cost optimization.** Addressing the growing cost of data volumes in Earthdata Cloud. We are not actively working on this beyond *data virtualization* (accessing archival data through the Zarr API without duplicating it). Avoiding duplication is the lever we pull today; broader storage cost strategies remain future work. - -### Empowered users - -**Cloud-native guidance.** The CNG guide: unblock people confused about which formats exist, why, and when to use each. Success: people use the guide to build cloud-native datasets, or to explain to stakeholders why a dataset was built a given way. - -**Science support.** Direct support for science users, including cloud-optimized data usage guidance (e.g., xarray arguments) in the guide and datacube guide. - -**Format evaluation.** Evaluate mission data formats and recommend improvements that enable optimized access patterns — currently NISAR: assess the NISAR HDF5 format and advise the Algorithm Development Team before the official release in summer 2026. Includes a virtualization + data fusion prototype showing a more user-friendly virtual representation. - -**In-browser rendering.** In-browser GPU rendering of COGs and Zarr via direct data access (deck.gl-raster + Lonboard) — users customize rendering without re-fetching data. Current work: demonstrations in documentation (band combinations, direct access), initial GeoZarr support in both libraries, and a TypeScript WKB→GeoArrow parser enabling DuckDB-Wasm integration. Current limitation: requires open data access. - -**Virtual store authoring.** How to build virtual stores, with or without agents — developer docs. Unblocks DAACs and science teams as virtual store developers — a primary handoff mechanism. - -**Cloud-optimized decision framework.** The cloud-optimized data decision tree: a diagram plus explanatory text with examples per branch, guiding format and chunking decisions. Foundation for AI-assisted optimization. - -**Improved access & auth libraries.** Libraries that get data and credentials into users' hands — earthaccess v1, notably a modular approach with refreshable credentials in a lightweight earth-auth package; finish opening Icechunk stores via earthaccess. - -**AI-assisted optimization (skills + tooling).** A CLI and agentic skill for data structure optimization, plus an agent that walks data providers through chunking and format decisions (CO data AI guidance) — usable across ESDS. Builds on the cloud-optimized decision framework, reducing engineering time to a balanced or optimized data structure. - -**Dataset + tooling coverage metrics.** Assess how many NASA datasets work with our tools (VirtualiZarr, datafusion, lazycogs) so we have metrics for improvement and impact. - -**ESRI / ArcGIS integration.** A large share of NASA data users work in ArcGIS, so our tools and data need to integrate with ESRI systems rather than require users to leave them. Ensure our cloud-native outputs are consumable there through the open standards ESRI already supports (COG, WMTS, OGC APIs, GeoZarr) — the EGIS/ArcGIS WMTS work in *Dynamic tiling* is the first concrete instance. Meeting users where they are, not requiring new software. - -### Trusted & reliable data - -**◆ Transactional Zarr.** Checksum verification and ACID transactions for Zarr stores (Icechunk) — the reliability layer. - -**Remote store access.** Bearer-token HTTP support unblocks NASA data users without cloud compute in us-west-2 from using virtual stores — PO.DAAC has identified this as the single blocker to rolling out their Icechunk stores. Also: parsing manifests back out of Icechunk (inspection and modification of virtual stores, plus risk mitigation) and prefix-changing utilities. - -**Live virtual stores.** Stores kept current as data lands — e.g. MUR SST as native Zarr, rechunked for time series, updated in near-real time as an AWS Public Dataset. Serves anyone doing historical or NRT sea surface temperature analysis, and demonstrates Icechunk's capabilities end to end. - -**Synchronized metadata + data.** Keep metadata in sync with data (via zarr-datafusion-search) — addressing the gap where metadata and data drift apart. - -**Event-driven NRT updates.** Icechunk makes all store updates trackable by listening to changes in object storage keys, enabling simple event-driven pipelines: dynamically updated pyramids (e.g., for Worldview), summary statistics, pre-computed time series. The path to keeping virtual stores current with incoming data streams — and to the near-real-time vision story. - ---- - -*Open questions for the team: verify the Earth Information Explorer claim in the gap table; align timelines with data services (when do they stop coggifying?) and front-end teams (will tile servers eventually go away?); define our first formal handoff.* \ No newline at end of file From ff0333fd99c162f6b5eb98bf01203328c742568b Mon Sep 17 00:00:00 2001 From: Aimee Barciauskas Date: Fri, 5 Jun 2026 15:12:16 -0700 Subject: [PATCH 3/5] Remove old roadmap and move new one --- docs/roadmap.md | 173 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 173 insertions(+) create mode 100644 docs/roadmap.md diff --git a/docs/roadmap.md b/docs/roadmap.md new file mode 100644 index 0000000..c973178 --- /dev/null +++ b/docs/roadmap.md @@ -0,0 +1,173 @@ +# ODD roadmap + +This page exists to explain the motivations behind ODD's daily work. It connects what +we're building to why we're building it, and explains how work enters, moves through, +and may eventually leave our portfolio. The primary audience is the ODD team. +The secondary audience is peer ODSI teams who want to understand how our work fits the broader picture. + +## Vision: who we serve + +Our vision is expressed as the experiences users will have when we've succeeded: + +1. **Ask in plain language and reproduce response.** As an Earth enthusiast, I want to ask questions like "how did the Gifford fire evolve?" and get an animated visual response — with links to the source code that produced the analysis, so I can verify and reproduce it. +2. **Explore in the browser.** As an Earth enthusiast, I want to visually explore forest disturbance through NISAR data directly in my browser, with no specialized software or cloud account. +3. **Research at scale.** As a fire event researcher, I want to evaluate relationships between variables from different data products across many thousands of fires, with minimal data pre-processing for fusion and modeling. +4. **Operate in near-real time.** As an operational **application**, I need products like HLS for disaster response or sea surface temperature for maritime operations available in near-real time. + +## The gap + +NASA already serves these users — but current services have limits that grow more acute as data volumes grow: + +| User story | Today's services | Where they fall short | +| ------------------------- | -------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | +| Ask in plain language | Earth Information Explorer | Limited dataset access; datasets must be curated into the system | +| Explore in the browser | Worldview / GIBS | Not configurable by users; pre-rendered layers don't scale to new datasets or rendering needs | +| Research at scale | Earthdata Cloud, Harmony, cloud-hosted JupyterHubs | Harmony offloads processing to servers — heavy compute cost rather than a structural fix; users struggle to find the best datasets for their needs | +| Operate in near-real time | LANCE + HLS | Hard to keep metadata and data in sync; no reliable notification system for new data landing in Earthdata Cloud buckets | +| All of the above | CMR | Under increasing pressure from rapid archive growth and analytics-scale query traffic | + +Across all of these: discovery is hard, and current systems are becoming unsustainable as data volumes grow. + +## Our pillars + +We address these gaps through four pillars: + +1. **Open standards & FAIR data.** NASA data and services are findable, accessible, interoperable, and reusable, built on community standards rather than bespoke systems. +2. **Performance, cost & scale.** Optimize performance while minimizing cost, with solutions that scale sustainably to new and growing data volumes. +3. **Empowered users.** Users — both data providers and data consumers — can use and apply the solutions we build without us. +4. **Trusted & reliable data.** The data products NASA generates are verifiable, consistent, and kept in sync with their metadata. + +**Cross-cutting foundation: community developed + adopted.** Every item on this roadmap is built in the open, with and for the community. Open source is the license; community development and adoption is the practice — it's how solutions outlive our involvement, and it underpins all four pillars. + +## Roadmap + +| Pillar | Now · mature | Next · developing | Later · future | +| ------------------------------ | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | +| **Open standards & FAIR data** | ◆ Array format (Zarr) stewardship · ◆ Geospatial conventions (GeoZarr) | Ecosystem sustainability · Codec re-architecture · variable chunking | Convention + CRS utilities | +| **Performance, cost & scale** | Data virtualization · Object-store access · Dynamic tiling · In-browser rendering | Virtual stores + lazy array analytics · Analytics-scale metadata · Storage model evaluation | Resampling/warp tooling · Query at scale · Storage cost optimization | +| **Empowered users** | Cloud-native guidance · Science support · Format evaluation | In-browser rendering · Cloud-optimized decision framework · Improved access & auth libraries · Dataset + tooling coverage metrics | AI-assisted optimization (skills + tooling) · ESRI / ArcGIS integration | +| **Trusted & reliable data** | ◆ Transactional Zarr (Icechunk) | Remote store access · Live virtual stores · Synchronized metadata + data | Event-driven (object store notifications) for near-real time (NRT) updates | + +**◆ Foundational** — a category of work that is ongoing. + +**Handed off:** nothing yet — see [How we work](#how-we-work). Building a working handoff path is a goal itself. + +Every objective of this team should trace to at least one vision story and one pillar. Each item name links to deeper context below. + +## How we work + +> "ODD should not be responsible for virtualizing everything! We (and our partners) are responsible for making it easy for NASA to virtualize things though." — Henry + +ODD is a research and development team, not an operations or continued-maintenance team. Success for any item on this roadmap is *graduating off of it* — not staying on it indefinitely. + +### Lifecycle + +Work moves through four stages: **Later** (future, aspirational) → **Next** (developing) → **Now** (mature) → **Handed off** (owned by someone else). + +An item is ready to hand off when it passes three tests: + +1. **Someone else can do it.** Documentation, tooling, and skills exist so that a data provider or partner can reproduce the work without us. +2. **Someone else owns it.** A named owner — a DAAC, a mission team, community maintainers — has accepted responsibility. +3. **We've stopped learning.** Our remaining contribution is maintenance, not discovery. + +Virtual data stores are an example: today we generate stores ourselves (learning). Next, +we will ship developer docs and optimization skills (enabling). Then store generation +graduates to data providers. Only the underlying tooling remains ours. Several +roadmap items — virtual store authoring docs, decision tooling, the optimization +skill/CLI, ecosystem sustainability (maintainer onboarding) — are not just projects but +handoff mechanisms. + +We don't yet have a reliable handoff process. Naming that honestly is the first step; building it is on the roadmap. + +### Prioritization + +At each planning cycle (PI), we ask two questions of the grid: + +- **What promotes?** Which Next items are ready to become Now? Which Later items are ready to become Next? +- **What graduates?** Which Now items pass the three handoff tests? + +Objectives we take on must also balance "utopian" goals — like a unified Zarr model — +with the necessity of supporting legacy patterns and other formats. + +When evaluating new candidate work, we apply these criteria: + +- **Traceability.** Does it serve at least one vision story and one pillar? +- **Adoption readiness.** How quickly can the ecosystem absorb it? Building on familiar interfaces lowers the barrier (VirtualiZarr adopting xarray's data model made it immediately accessible); very new technology carries adoption lag as a risk (zarr-datafusion-search is powerful but the ecosystem may take years to take it on). +- **Cost.** What does adoption cost — in compute, energy, money, and user capability? Solutions that require cloud compute in a specific region, for example, exclude most users. +- **Handoff path.** Can we articulate who would eventually own this, even roughly? + +## Deeper context + +What each roadmap item unlocks, and what success looks like. + +### Open standards & FAIR data + +**◆ Array format stewardship.** The foundational format for cloud-native array data — Zarr. Ongoing maintenance and stewardship, including convening the community — e.g. Zarr Summit '26/27 — to unblock progress on technical features and convention adoption. + +**◆ Geospatial conventions.** Zarr conventions for geospatial metadata (GeoZarr), essential for native and virtual Zarr collections to interoperate across GIS, visualization, and analysis libraries. Closing in on submission of the GeoZarr standard to the OGC architecture board. Success: trust and interoperability for Zarr data from all Earth data providers (NASA, NOAA, ESA), and a consistent, non-ambiguous platform to build client applications on. + +**Ecosystem sustainability.** A sustainable maintainer ecosystem for Zarr to support growing, complex use cases — the zarr-python roadmap plus maintainer onboarding. Success: adoption of the roadmap by maintainers and stakeholders, plus one or two new onboarded maintainers making significant contributions — reducing stagnation and broadening design perspectives. + +**Codec re-architecture.** The Zarr v2→v3 transition exposed design issues in the codec model. Re-architecting it supports new codec development (vital for virtualization, where archival formats use less-standardized codecs) and alternative client implementations in Rust and TypeScript. Follow-ons: *CF codecs* — capturing CF-convention decoding logic as codecs rather than attribute dictionaries, so clients interacting directly with the Zarr API don't need to duplicate xarray's specialized decoding logic; and *concatenated arrays* — supporting variable compression to unlock virtualization of quirky datasets like MUR SST (pre-design). + +**Convention + CRS utilities.** Utilities and guidance for keeping virtual store metadata aligned with CF and GeoZarr conventions. Unblocks tools that rely on those conventions from using compliant virtual stores. + +### Performance, cost & scale + +**Data virtualization.** Access archival data through the Zarr API without duplicating it — VirtualiZarr. Includes parser improvements (virtual-tiff, obspec-utils, async-hdf5, GRIB) — or transitioning parser maintenance to partners, which is itself a handoff opportunity. This is also our current lever on storage cost (see *Storage cost optimization*). + +**Object-store access.** High-performance object storage access for the Python geospatial stack — obstore. + +**Dynamic tiling.** Tiling driven by CMR — TiTiler-CMR. Current work: regenerated compatibility report (with group support), OPERA integration into the disasters portal, a distributed cache for S3 credentials (~1s saved per cold-start request), and WMTS GetCapabilities so EGIS can surface HLS vegetation indices in ArcGIS. + +**Lazy array analytics.** Instantly materialize massive lazy 4-D arrays (time, band, x, y) from metadata stores — lazycogs, a scalable replacement for stackstac/odc-stac. Success: any collection stored as COGs can be analyzed through a collection-level xarray API. + +**Variable chunking.** Variable chunk support in VirtualiZarr + xarray; unlocks virtualizing more datasets. Near-term delivery. + +**Analytics-scale metadata.** EOSDIS has identified pressure on CMR as a significant risk. Prototype collection-level stores using GeoParquet/Iceberg and zarr-datafusion to understand performance, cost, and scaling — and contribute to the relevant open-source libraries. Includes STAC in Iceberg: an object-storage-only STAC catalog giving providers API-less metadata access. + +**Storage model evaluation.** Understand emerging storage models and their trade-offs — currently the S3 Files synchronization model: compare performance to native S3 for common operations and understand its pricing. Potential to serve both durable shared storage and the low-latency block access that ML and massively parallel array workloads need. + +**Resampling/warp tooling.** A composable, Rust-based resampling/warp library reducing dependence on GDAL's monolithic toolchain. Usable from server-side tiling, distributed array frameworks (Dask, Cubed), and WASM in-browser rendering. Pre-design; builds on a full ecosystem assessment. + +**Query at scale.** Query and access data at scale through a single interface — zarr-datafusion-search. Paves the way for Zarr as a storage target for Level 0/1 and swath data, and moves EOSDIS toward an Arrow-native ecosystem. High potential, but very new — adoption lag is the known risk. + +**Storage cost optimization.** Addressing the growing cost of data volumes in Earthdata Cloud. We are not actively working on this beyond *data virtualization* (accessing archival data through the Zarr API without duplicating it). Avoiding duplication is the lever we pull today; broader storage cost strategies remain future work. + +### Empowered users + +**Cloud-native guidance.** The CNG guide: unblock people confused about which formats exist, why, and when to use each. Success: people use the guide to build cloud-native datasets, or to explain to stakeholders why a dataset was built a given way. + +**Science support.** Direct support for science users, including cloud-optimized data usage guidance (e.g., xarray arguments) in the guide and datacube guide. + +**Format evaluation.** Evaluate mission data formats and recommend improvements that enable optimized access patterns — currently NISAR: assess the NISAR HDF5 format and advise the Algorithm Development Team before the official release in summer 2026. Includes a virtualization + data fusion prototype showing a more user-friendly virtual representation. + +**In-browser rendering.** In-browser GPU rendering of COGs and Zarr via direct data access (deck.gl-raster + Lonboard) — users customize rendering without re-fetching data. Current work: demonstrations in documentation (band combinations, direct access), initial GeoZarr support in both libraries, and a TypeScript WKB→GeoArrow parser enabling DuckDB-Wasm integration. Current limitation: requires open data access. + +**Virtual store authoring.** How to build virtual stores, with or without agents — developer docs. Unblocks DAACs and science teams as virtual store developers — a primary handoff mechanism. + +**Cloud-optimized decision framework.** The cloud-optimized data decision tree: a diagram plus explanatory text with examples per branch, guiding format and chunking decisions. Foundation for AI-assisted optimization. + +**Improved access & auth libraries.** Libraries that get data and credentials into users' hands — earthaccess v1, notably a modular approach with refreshable credentials in a lightweight earth-auth package; finish opening Icechunk stores via earthaccess. + +**AI-assisted optimization (skills + tooling).** A CLI and agentic skill for data structure optimization, plus an agent that walks data providers through chunking and format decisions (CO data AI guidance) — usable across ESDS. Builds on the cloud-optimized decision framework, reducing engineering time to a balanced or optimized data structure. + +**Dataset + tooling coverage metrics.** Assess how many NASA datasets work with our tools (VirtualiZarr, datafusion, lazycogs) so we have metrics for improvement and impact. + +**ESRI / ArcGIS integration.** A large share of NASA data users work in ArcGIS, so our tools and data need to integrate with ESRI systems rather than require users to leave them. Ensure our cloud-native outputs are consumable there through the open standards ESRI already supports (COG, WMTS, OGC APIs, GeoZarr) — the EGIS/ArcGIS WMTS work in *Dynamic tiling* is the first concrete instance. Meeting users where they are, not requiring new software. + +### Trusted & reliable data + +**◆ Transactional Zarr.** Checksum verification and ACID transactions for Zarr stores (Icechunk) — the reliability layer. + +**Remote store access.** Bearer-token HTTP support unblocks NASA data users without cloud compute in us-west-2 from using virtual stores — PO.DAAC has identified this as the single blocker to rolling out their Icechunk stores. Also: parsing manifests back out of Icechunk (inspection and modification of virtual stores, plus risk mitigation) and prefix-changing utilities. + +**Live virtual stores.** Stores kept current as data lands — e.g. MUR SST as native Zarr, rechunked for time series, updated in near-real time as an AWS Public Dataset. Serves anyone doing historical or NRT sea surface temperature analysis, and demonstrates Icechunk's capabilities end to end. + +**Synchronized metadata + data.** Keep metadata in sync with data (via zarr-datafusion-search) — addressing the gap where metadata and data drift apart. + +**Event-driven NRT updates.** Icechunk makes all store updates trackable by listening to changes in object storage keys, enabling simple event-driven pipelines: dynamically updated pyramids (e.g., for Worldview), summary statistics, pre-computed time series. The path to keeping virtual stores current with incoming data streams — and to the near-real-time vision story. + +--- + +*Open questions for the team: verify the Earth Information Explorer claim in the gap table; align timelines with data services (when do they stop coggifying?) and front-end teams (will tile servers eventually go away?); define our first formal handoff.* \ No newline at end of file From 2bfedd4c1b71dab1a780705e51071aa5b741fcf6 Mon Sep 17 00:00:00 2001 From: Aimee Barciauskas Date: Fri, 5 Jun 2026 15:13:28 -0700 Subject: [PATCH 4/5] Fix link --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 1acf3d2..3faa5d0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,4 +4,4 @@ Welcome to the documentation for the Optimized Data Delivery (ODD) team, working ## ODD FY26 Roadmap -For a digest of what the team plans to work on this next year, please visit our [Fiscal Year (FY) 2026 Roadmap](./fy26-roadmap.md). +For a digest of what the team plans to work on this next year, please visit our [Roadmap](./roadmap.md). From 9b6d65354ae2e5c0520d4bc7928d18c938293df5 Mon Sep 17 00:00:00 2001 From: Aimee Barciauskas Date: Sun, 7 Jun 2026 21:25:55 -0700 Subject: [PATCH 5/5] Revise ODD roadmap for improved clarity Refactor roadmap content for clarity and conciseness, removing redundant phrases and improving readability. --- docs/roadmap.md | 110 ++++++++++++++++++++---------------------------- 1 file changed, 46 insertions(+), 64 deletions(-) diff --git a/docs/roadmap.md b/docs/roadmap.md index c973178..0767914 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -1,18 +1,15 @@ # ODD roadmap -This page exists to explain the motivations behind ODD's daily work. It connects what -we're building to why we're building it, and explains how work enters, moves through, -and may eventually leave our portfolio. The primary audience is the ODD team. -The secondary audience is peer ODSI teams who want to understand how our work fits the broader picture. +This page explains the motivations behind ODD's daily work. It connects what we're building to why we're building it. The primary audience is the ODD team. The secondary audience is peer ODSI teams who want to understand how our work fits the broader picture. -## Vision: who we serve +## Vision -Our vision is expressed as the experiences users will have when we've succeeded: +If we are successful, we imagine users will be able to: -1. **Ask in plain language and reproduce response.** As an Earth enthusiast, I want to ask questions like "how did the Gifford fire evolve?" and get an animated visual response — with links to the source code that produced the analysis, so I can verify and reproduce it. -2. **Explore in the browser.** As an Earth enthusiast, I want to visually explore forest disturbance through NISAR data directly in my browser, with no specialized software or cloud account. -3. **Research at scale.** As a fire event researcher, I want to evaluate relationships between variables from different data products across many thousands of fires, with minimal data pre-processing for fusion and modeling. -4. **Operate in near-real time.** As an operational **application**, I need products like HLS for disaster response or sea surface temperature for maritime operations available in near-real time. +1. **Ask questions in plain language and reproduce the response:** As an Earth enthusiast, I want to ask questions like "how did the Gifford fire evolve?" and get an animated visual. I want to be able to reproduce responses with links to the source code that produced the analysis, so I can verify and reproduce it. +2. **Explore in the browser:** As an Earth enthusiast, I want to visually explore forest disturbance through NISAR data directly in my browser, with no specialized software or cloud account. +3. **Research at scale:** As a fire event researcher, I want to evaluate relationships between variables from different data products across many thousands of fires, with minimal data pre-processing for fusion and modeling. +4. **Operate in near-real time:** As an operational application, I need products like HLS for disaster response or sea surface temperature for maritime operations available in near-real time. ## The gap @@ -32,32 +29,28 @@ Across all of these: discovery is hard, and current systems are becoming unsusta We address these gaps through four pillars: -1. **Open standards & FAIR data.** NASA data and services are findable, accessible, interoperable, and reusable, built on community standards rather than bespoke systems. -2. **Performance, cost & scale.** Optimize performance while minimizing cost, with solutions that scale sustainably to new and growing data volumes. -3. **Empowered users.** Users — both data providers and data consumers — can use and apply the solutions we build without us. -4. **Trusted & reliable data.** The data products NASA generates are verifiable, consistent, and kept in sync with their metadata. +1. **Open standards & FAIR data:** NASA data and services are findable, accessible, interoperable, and reusable, built on community standards rather than bespoke systems. +2. **Performance, cost & scale:** Optimize performance while minimizing cost, with solutions that scale sustainably to new and growing data volumes. +3. **Empowered users:** Users — both data providers and data consumers — can use and apply the solutions we build without us. +4. **Trusted & reliable data:** The data products NASA generates are verifiable, consistent, and kept in sync with their metadata. -**Cross-cutting foundation: community developed + adopted.** Every item on this roadmap is built in the open, with and for the community. Open source is the license; community development and adoption is the practice — it's how solutions outlive our involvement, and it underpins all four pillars. +Further, we maintain high standards for the software we develop or reuse, while never intending to duplicate effort. All software we develop or use should be of high quality, under an open source license, and developed and adopted by a broad community. ## Roadmap | Pillar | Now · mature | Next · developing | Later · future | | ------------------------------ | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | -| **Open standards & FAIR data** | ◆ Array format (Zarr) stewardship · ◆ Geospatial conventions (GeoZarr) | Ecosystem sustainability · Codec re-architecture · variable chunking | Convention + CRS utilities | -| **Performance, cost & scale** | Data virtualization · Object-store access · Dynamic tiling · In-browser rendering | Virtual stores + lazy array analytics · Analytics-scale metadata · Storage model evaluation | Resampling/warp tooling · Query at scale · Storage cost optimization | +| **Open standards & FAIR data** | ◆ Array format (Zarr) stewardship · ◆ Geospatial conventions (GeoZarr) | Zarr Ecosystem sustainability · Codec re-architecture · variable chunking | Conventions + CRS utilities | +| **Performance, cost & scale** | Data virtualization · Object-store access · Dynamic tiling · In-browser rendering | Virtual stores + lazy array analytics · Analytics-scale metadata · Storage model evaluation | Resampling/warp tooling · Query at scale · Storage cost optimization · Caching | | **Empowered users** | Cloud-native guidance · Science support · Format evaluation | In-browser rendering · Cloud-optimized decision framework · Improved access & auth libraries · Dataset + tooling coverage metrics | AI-assisted optimization (skills + tooling) · ESRI / ArcGIS integration | | **Trusted & reliable data** | ◆ Transactional Zarr (Icechunk) | Remote store access · Live virtual stores · Synchronized metadata + data | Event-driven (object store notifications) for near-real time (NRT) updates | **◆ Foundational** — a category of work that is ongoing. -**Handed off:** nothing yet — see [How we work](#how-we-work). Building a working handoff path is a goal itself. - -Every objective of this team should trace to at least one vision story and one pillar. Each item name links to deeper context below. +**Handed off:** nothing yet — see [How we work](#how-we-work). ## How we work -> "ODD should not be responsible for virtualizing everything! We (and our partners) are responsible for making it easy for NASA to virtualize things though." — Henry - ODD is a research and development team, not an operations or continued-maintenance team. Success for any item on this roadmap is *graduating off of it* — not staying on it indefinitely. ### Lifecycle @@ -72,29 +65,24 @@ An item is ready to hand off when it passes three tests: Virtual data stores are an example: today we generate stores ourselves (learning). Next, we will ship developer docs and optimization skills (enabling). Then store generation -graduates to data providers. Only the underlying tooling remains ours. Several +graduates to data providers. We would continue to work on underlying tooling. Several roadmap items — virtual store authoring docs, decision tooling, the optimization skill/CLI, ecosystem sustainability (maintainer onboarding) — are not just projects but handoff mechanisms. -We don't yet have a reliable handoff process. Naming that honestly is the first step; building it is on the roadmap. +The above example is notional. We have not yet established a reliable handoff process. ### Prioritization -At each planning cycle (PI), we ask two questions of the grid: - -- **What promotes?** Which Next items are ready to become Now? Which Later items are ready to become Next? -- **What graduates?** Which Now items pass the three handoff tests? - Objectives we take on must also balance "utopian" goals — like a unified Zarr model — with the necessity of supporting legacy patterns and other formats. When evaluating new candidate work, we apply these criteria: -- **Traceability.** Does it serve at least one vision story and one pillar? -- **Adoption readiness.** How quickly can the ecosystem absorb it? Building on familiar interfaces lowers the barrier (VirtualiZarr adopting xarray's data model made it immediately accessible); very new technology carries adoption lag as a risk (zarr-datafusion-search is powerful but the ecosystem may take years to take it on). -- **Cost.** What does adoption cost — in compute, energy, money, and user capability? Solutions that require cloud compute in a specific region, for example, exclude most users. -- **Handoff path.** Can we articulate who would eventually own this, even roughly? +- **Traceability:** Does it serve at least one vision story and satisfy all appropriate pillars? +- **Adoption readiness:** How quickly can the ecosystem absorb it? Building on familiar interfaces lowers the barrier (VirtualiZarr adopting xarray's data model made it immediately accessible); very new technology carries adoption lag as a risk. +- **Cost:** What does adoption cost — in compute, energy, money, and user capability? Solutions that require cloud compute in a specific region, for example, exclude many users. +- **Handoff path:** Can we articulate who would eventually own this? ## Deeper context @@ -104,70 +92,64 @@ What each roadmap item unlocks, and what success looks like. **◆ Array format stewardship.** The foundational format for cloud-native array data — Zarr. Ongoing maintenance and stewardship, including convening the community — e.g. Zarr Summit '26/27 — to unblock progress on technical features and convention adoption. -**◆ Geospatial conventions.** Zarr conventions for geospatial metadata (GeoZarr), essential for native and virtual Zarr collections to interoperate across GIS, visualization, and analysis libraries. Closing in on submission of the GeoZarr standard to the OGC architecture board. Success: trust and interoperability for Zarr data from all Earth data providers (NASA, NOAA, ESA), and a consistent, non-ambiguous platform to build client applications on. +**◆ Geospatial conventions.** Zarr conventions for geospatial metadata (GeoZarr), essential for native and virtual Zarr collections to interoperate across GIS, visualization, and analysis libraries. Success is trust and interoperability for Zarr data from all Earth data providers (NASA, NOAA, ESA), and a consistent platform to build client applications on. -**Ecosystem sustainability.** A sustainable maintainer ecosystem for Zarr to support growing, complex use cases — the zarr-python roadmap plus maintainer onboarding. Success: adoption of the roadmap by maintainers and stakeholders, plus one or two new onboarded maintainers making significant contributions — reducing stagnation and broadening design perspectives. +**Ecosystem sustainability.** A sustainable maintainer ecosystem for Zarr to support growing, complex use cases — the zarr-python roadmap plus maintainer onboarding. Success is adoption of the roadmap by maintainers and stakeholders, plus one or two new onboarded maintainers making significant contributions, reducing stagnation and broadening design perspectives. -**Codec re-architecture.** The Zarr v2→v3 transition exposed design issues in the codec model. Re-architecting it supports new codec development (vital for virtualization, where archival formats use less-standardized codecs) and alternative client implementations in Rust and TypeScript. Follow-ons: *CF codecs* — capturing CF-convention decoding logic as codecs rather than attribute dictionaries, so clients interacting directly with the Zarr API don't need to duplicate xarray's specialized decoding logic; and *concatenated arrays* — supporting variable compression to unlock virtualization of quirky datasets like MUR SST (pre-design). +**Codec re-architecture.** The Zarr v2→v3 transition exposed design issues in the codec model. Re-architecting it supports new codec development (vital for virtualization, where archival formats use less-standardized codecs), alternative client implementations in Rust and TypeScript and fixing quirky data (CF codecs and concatenating arrays with varied codecs). -**Convention + CRS utilities.** Utilities and guidance for keeping virtual store metadata aligned with CF and GeoZarr conventions. Unblocks tools that rely on those conventions from using compliant virtual stores. +**Conventions + CRS utilities.** Utilities and guidance for keeping virtual store metadata aligned with CF and GeoZarr conventions. Unblocks tools that rely on those conventions from using compliant virtual stores. ### Performance, cost & scale -**Data virtualization.** Access archival data through the Zarr API without duplicating it — VirtualiZarr. Includes parser improvements (virtual-tiff, obspec-utils, async-hdf5, GRIB) — or transitioning parser maintenance to partners, which is itself a handoff opportunity. This is also our current lever on storage cost (see *Storage cost optimization*). +**Data virtualization.** Access archival data through the Zarr API without duplicating it. Includes VirtualiZarr parser improvements (virtual-tiff, obspec-utils, async-hdf5, GRIB) — or transitioning parser maintenance to partners. -**Object-store access.** High-performance object storage access for the Python geospatial stack — obstore. +**Object-store access.** High-performance object storage access for the Python geospatial stack (e.g. obstore). -**Dynamic tiling.** Tiling driven by CMR — TiTiler-CMR. Current work: regenerated compatibility report (with group support), OPERA integration into the disasters portal, a distributed cache for S3 credentials (~1s saved per cold-start request), and WMTS GetCapabilities so EGIS can surface HLS vegetation indices in ArcGIS. +**Dynamic tiling.** User-driven dynamic tiling. Potential future work includes supporting additional datasets and integrations, for example WMTS GetCapabilities so EGIS can surface HLS vegetation indices in ArcGIS. -**Lazy array analytics.** Instantly materialize massive lazy 4-D arrays (time, band, x, y) from metadata stores — lazycogs, a scalable replacement for stackstac/odc-stac. Success: any collection stored as COGs can be analyzed through a collection-level xarray API. +**Lazy array analytics.** Instantly materialize massive lazy 4-D arrays (time, band, x, y) from metadata stores, (e.g. lazycogs and lazy merge), a scalable replacement for stackstac/odc-stac. Success is any collection stored as COGs can be analyzed through a collection-level xarray API. -**Variable chunking.** Variable chunk support in VirtualiZarr + xarray; unlocks virtualizing more datasets. Near-term delivery. +**Variable chunking.** Variable chunk support in VirtualiZarr + xarray will unlock virtualizing more datasets. -**Analytics-scale metadata.** EOSDIS has identified pressure on CMR as a significant risk. Prototype collection-level stores using GeoParquet/Iceberg and zarr-datafusion to understand performance, cost, and scaling — and contribute to the relevant open-source libraries. Includes STAC in Iceberg: an object-storage-only STAC catalog giving providers API-less metadata access. +**Analytics-scale metadata.** EOSDIS has identified pressure on CMR as a significant risk. Prototype collection-level stores using GeoParquet/Iceberg and DataFusion to understand performance, cost, and scaling — and contribute to the relevant open-source libraries. Includes STAC in Iceberg: an object-storage-only STAC catalog giving providers API-less metadata access. -**Storage model evaluation.** Understand emerging storage models and their trade-offs — currently the S3 Files synchronization model: compare performance to native S3 for common operations and understand its pricing. Potential to serve both durable shared storage and the low-latency block access that ML and massively parallel array workloads need. +**Storage model evaluation.** Understand emerging storage models and their trade-offs, such as the [S3 Files synchronization system](https://aws.amazon.com/s3/features/files/). -**Resampling/warp tooling.** A composable, Rust-based resampling/warp library reducing dependence on GDAL's monolithic toolchain. Usable from server-side tiling, distributed array frameworks (Dask, Cubed), and WASM in-browser rendering. Pre-design; builds on a full ecosystem assessment. +**Resampling/warp tooling.** Build a composable, Rust-based resampling/warp library reducing dependence on GDAL's monolithic toolchain. Usable from server-side tiling, distributed array frameworks (Dask, Cubed), and WASM in-browser rendering. This is a pre-design idea, building on a prior ecosystem assessment. -**Query at scale.** Query and access data at scale through a single interface — zarr-datafusion-search. Paves the way for Zarr as a storage target for Level 0/1 and swath data, and moves EOSDIS toward an Arrow-native ecosystem. High potential, but very new — adoption lag is the known risk. +**Query at scale.** Query and access data at scale through a single interface (e.g. zarr-datafusion-search). Paves the way for Zarr as a storage target for Level 0/1 and swath data, and moves EOSDIS toward an Arrow-native ecosystem. High potential, but very new. -**Storage cost optimization.** Addressing the growing cost of data volumes in Earthdata Cloud. We are not actively working on this beyond *data virtualization* (accessing archival data through the Zarr API without duplicating it). Avoiding duplication is the lever we pull today; broader storage cost strategies remain future work. +**Storage cost optimization.** Addressing the growing cost of data volumes in Earthdata Cloud. We are not actively working on this beyond data virtualization (accessing archival data through the Zarr API without duplicating it). Future work includes applying other storage cost strategies as evaluated in the work item listed above. ### Empowered users -**Cloud-native guidance.** The CNG guide: unblock people confused about which formats exist, why, and when to use each. Success: people use the guide to build cloud-native datasets, or to explain to stakeholders why a dataset was built a given way. +**Cloud-native guidance.** The CNG guide unblocks people confused about which formats exist, why, and when to use each. Success is people use the guide to build cloud-native datasets, or to explain to stakeholders why a dataset was built a given way. -**Science support.** Direct support for science users, including cloud-optimized data usage guidance (e.g., xarray arguments) in the guide and datacube guide. +**Science support.** Direct support for science users, through collaboration with the dedicated science support team, including cloud-optimized data usage guidance in the guide and datacube guide. -**Format evaluation.** Evaluate mission data formats and recommend improvements that enable optimized access patterns — currently NISAR: assess the NISAR HDF5 format and advise the Algorithm Development Team before the official release in summer 2026. Includes a virtualization + data fusion prototype showing a more user-friendly virtual representation. +**Format evaluation.** Evaluate mission data formats and recommend improvements that enable optimized access patterns. For example, the team has assessed and advised on the NISAR HDF5 format. -**In-browser rendering.** In-browser GPU rendering of COGs and Zarr via direct data access (deck.gl-raster + Lonboard) — users customize rendering without re-fetching data. Current work: demonstrations in documentation (band combinations, direct access), initial GeoZarr support in both libraries, and a TypeScript WKB→GeoArrow parser enabling DuckDB-Wasm integration. Current limitation: requires open data access. +**In-browser rendering.** In-browser GPU rendering of COGs and Zarr via direct data access (e.g. deck.gl-raster + Lonboard) — users customize rendering without re-fetching data. -**Virtual store authoring.** How to build virtual stores, with or without agents — developer docs. Unblocks DAACs and science teams as virtual store developers — a primary handoff mechanism. +**Virtual store authoring.** How to build virtual stores, with or without agents — developer docs. Unblocks DAACs and science teams as virtual store developers. **Cloud-optimized decision framework.** The cloud-optimized data decision tree: a diagram plus explanatory text with examples per branch, guiding format and chunking decisions. Foundation for AI-assisted optimization. -**Improved access & auth libraries.** Libraries that get data and credentials into users' hands — earthaccess v1, notably a modular approach with refreshable credentials in a lightweight earth-auth package; finish opening Icechunk stores via earthaccess. +**Improved access & auth libraries.** Supporting libraries that get data and credentials into users' hands (e.g. earthaccess). **AI-assisted optimization (skills + tooling).** A CLI and agentic skill for data structure optimization, plus an agent that walks data providers through chunking and format decisions (CO data AI guidance) — usable across ESDS. Builds on the cloud-optimized decision framework, reducing engineering time to a balanced or optimized data structure. **Dataset + tooling coverage metrics.** Assess how many NASA datasets work with our tools (VirtualiZarr, datafusion, lazycogs) so we have metrics for improvement and impact. -**ESRI / ArcGIS integration.** A large share of NASA data users work in ArcGIS, so our tools and data need to integrate with ESRI systems rather than require users to leave them. Ensure our cloud-native outputs are consumable there through the open standards ESRI already supports (COG, WMTS, OGC APIs, GeoZarr) — the EGIS/ArcGIS WMTS work in *Dynamic tiling* is the first concrete instance. Meeting users where they are, not requiring new software. +**ESRI / ArcGIS integration.** A large share of NASA data users work in ArcGIS, so our tools and data need to integrate with ESRI systems. Ensure our cloud-native outputs are consumable there through the open standards ESRI already supports (COG, WMTS, OGC APIs, GeoZarr). ### Trusted & reliable data -**◆ Transactional Zarr.** Checksum verification and ACID transactions for Zarr stores (Icechunk) — the reliability layer. - -**Remote store access.** Bearer-token HTTP support unblocks NASA data users without cloud compute in us-west-2 from using virtual stores — PO.DAAC has identified this as the single blocker to rolling out their Icechunk stores. Also: parsing manifests back out of Icechunk (inspection and modification of virtual stores, plus risk mitigation) and prefix-changing utilities. - -**Live virtual stores.** Stores kept current as data lands — e.g. MUR SST as native Zarr, rechunked for time series, updated in near-real time as an AWS Public Dataset. Serves anyone doing historical or NRT sea surface temperature analysis, and demonstrates Icechunk's capabilities end to end. - -**Synchronized metadata + data.** Keep metadata in sync with data (via zarr-datafusion-search) — addressing the gap where metadata and data drift apart. +**◆ Transactional Zarr.** Checksum verification and ACID transactions for Zarr stores (Icechunk) provide reliability. -**Event-driven NRT updates.** Icechunk makes all store updates trackable by listening to changes in object storage keys, enabling simple event-driven pipelines: dynamically updated pyramids (e.g., for Worldview), summary statistics, pre-computed time series. The path to keeping virtual stores current with incoming data streams — and to the near-real-time vision story. +**Near-real time virtual stores.** Stores kept current as data arrives. Serves anyone doing historical or NRT sea surface temperature analysis. ---- +**Synchronized metadata + data.** Keep metadata in sync with data (same as "Query at scale"). -*Open questions for the team: verify the Earth Information Explorer claim in the gap table; align timelines with data services (when do they stop coggifying?) and front-end teams (will tile servers eventually go away?); define our first formal handoff.* \ No newline at end of file +**Event-driven NRT updates.** Icechunk makes all store updates trackable by listening to changes in object storage keys. Simple event-driven pipelines will enable dynamically updated pyramids (e.g., for Worldview), summary statistics, and pre-computed time series. This is the path to keeping virtual stores current with incoming data streams — and to the near-real-time vision story.