Skip to content
This repository was archived by the owner on Sep 26, 2023. It is now read-only.
This repository was archived by the owner on Sep 26, 2023. It is now read-only.

Automated COG generation for OMI Ozone dataset #51

@abarciauskas-bgse

Description

@abarciauskas-bgse

This set of steps should be generalizable to other datasets, but we will start with OMI Ozone.

We will deploy a workflow (AWS Step Function) per dataset. Each dataset workflow will be triggered to run on a schedule to discover new files, generate (and publish metadata for) COGs.

We will write infrastructure as code (CDK) to:

  1. Deploy the workflow (step function state machine)
  2. Trigger the workflow to run on a schedule (cloudwatch event rules)
  3. Deploy as step 1 in the workflow the generate file URLs step
  4. Deploy as step 2 in the workflow the generate (and publish STAC metadata) COG step

Tasks breakdown

Infra as code for workflow

  • 1. Deploy a skeleton workflow
  • 2. Deploy trigger to schedule workflow
  • 3. Add steps in workflow to workflow deployment
  • Update workflow to use processing script for OMDOAO3 version 003

There will need to be a way to trigger a parallel workflow for all files discovered in step 1 of the workflow. It looks like, for AWS Step Functions, there is a Map state type that can be used for this.

Steps in workflow

1. Docker which generates list of file URLs

  • Write scripts which queries CMR for a given collection short name, version and temporal range and outputs list of file URLs
  • Check it works for this dataset (OMDOAO3 version 003)

2. Docker which creates COG from file URL

  • Check for any existing scripts which might be relevant in this repository and https://github.com/orgs/stactools-packages/repositories
  • In-progress: Create script which takes file URL as input and generates COG
  • Validate the output with a dataset expert
  • In-progress: Write to S3

3. Docker which creates COG also publishes to STAC

Infrastructure details for workflow

Technologies proposed: CDK + AWS StepFunctions (Lambda, Fargate) + AWS CloudWatch

We are using the AWS SF Map State

Alternatives: Prefect (Cloud Agnostic), Batch (necessary?)

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions