Add Singularity definitions for DataStreamCLI and dependencies#35
Add Singularity definitions for DataStreamCLI and dependencies#35hariteja-jajula wants to merge 1 commit intoCIROH-UA:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds Singularity/Apptainer container support to DataStreamCLI to enable deployment in HPC environments where Docker is typically not available. The PR introduces a new singularity/ directory with build definitions and scripts to create Singularity Image Format (.sif) files from the existing Docker infrastructure.
Changes:
- Added Singularity definition file (
datastream.def) based on the existingawiciroh/datastream-deps:latestDocker image - Added build script (
build.sh) to automate creation of datastream.sif and related workflow container images - Added
.gitignoreto exclude generated .sif files from version control
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| singularity/datastream.def | Singularity definition that bootstraps from the Docker deps image and installs DataStreamCLI package |
| singularity/build.sh | Shell script to build datastream.sif and pull additional workflow containers (forcingprocessor, merkdir, ngiabinabox) |
| singularity/.gitignore | Excludes .sif binary files from git tracking |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| groupadd -r mygroup && \ | ||
| useradd -r -g mygroup myuser && \ | ||
| chown -R myuser:mygroup /datastreamcli |
There was a problem hiding this comment.
The chown command changes ownership of /datastreamcli to myuser:mygroup, but there's no corresponding USER directive to switch to that user. In the Docker equivalent (Dockerfile.datastream line 12), the COPY command uses --chown to set ownership during copy. Without a USER directive, the container will run as root by default, making the ownership change ineffective. Consider adding 'USER myuser' after the chown command, or remove the user/group creation if root access is required.
| chown -R myuser:mygroup /datastreamcli | ||
|
|
||
| %runscript | ||
| exec "$@" |
There was a problem hiding this comment.
The runscript uses 'exec "$@"' which passes all arguments to be executed. However, this differs from typical container patterns where you'd want a default command or entrypoint. Based on the documentation in scripts/datastream and README.md, users expect to run './scripts/datastream' with arguments. Consider making the runscript default to executing the datastream script, such as 'exec /datastreamcli/scripts/datastream "$@"', to provide a clearer entrypoint that matches user expectations.
| exec "$@" | |
| exec /datastreamcli/scripts/datastream "$@" |
| # Main DataStreamCLI container | ||
| singularity build datastream.sif datastream.def | ||
|
|
||
| # Additional workflow containers (built from Docker images) | ||
| singularity build forcingprocessor.sif docker://awiciroh/forcingprocessor:latest | ||
| singularity build merkdir.sif docker://zwills/merkdir:latest |
There was a problem hiding this comment.
The build script lacks documentation comments explaining what each image is for and what dependencies each provides. Given that this is a new feature adding Singularity support, adding header comments describing the purpose of each image build would improve maintainability. For example, explaining that forcingprocessor, merkdir, and ngiabinabox are workflow dependencies would help users understand what's being built.
| # Main DataStreamCLI container | |
| singularity build datastream.sif datastream.def | |
| # Additional workflow containers (built from Docker images) | |
| singularity build forcingprocessor.sif docker://awiciroh/forcingprocessor:latest | |
| singularity build merkdir.sif docker://zwills/merkdir:latest | |
| # Main DataStreamCLI container: | |
| # - Provides the primary DataStreamCLI runtime environment. | |
| # - Contains the CLI entrypoint and core application dependencies used to | |
| # orchestrate and manage the end-to-end workflow. | |
| singularity build datastream.sif datastream.def | |
| # Workflow dependency container: forcingprocessor | |
| # - Built from the awiciroh/forcingprocessor Docker image. | |
| # - Provides tools and scripts for preprocessing hydrologic/meteorologic | |
| # forcing data that are consumed by downstream workflow steps. | |
| singularity build forcingprocessor.sif docker://awiciroh/forcingprocessor:latest | |
| # Workflow dependency container: merkdir | |
| # - Built from the zwills/merkdir Docker image. | |
| # - Supplies the merkdir utilities used by the workflow for organizing and | |
| # managing model input/output directory structures. | |
| singularity build merkdir.sif docker://zwills/merkdir:latest | |
| # Workflow dependency container: ngiabinabox (NGen environment) | |
| # - Built from the awiciroh/ciroh-ngen-image Docker image. | |
| # - Provides the NGen/NGIA model runtime and related dependencies required | |
| # to execute NGen-based modeling components within the workflow. |
| @@ -0,0 +1,15 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
This script uses '#!/usr/bin/env bash' while other shell scripts in the repository (scripts/datastream, scripts/create_ami.sh, docker/docker_loginNpush.sh) consistently use '#!/bin/bash'. For consistency with the codebase conventions, consider changing to '#!/bin/bash'.
| @@ -0,0 +1 @@ | |||
| *.sif | |||
There was a problem hiding this comment.
The new singularity/ directory lacks a README.md file to document its purpose, usage, and requirements. The docker/ directory has a README.md (docker/README.md) that documents the Docker setup. Following this pattern, a singularity/README.md should be added to explain how to build and use the Singularity images, what each image is for, system requirements (e.g., Singularity/Apptainer version), and how to run DataStreamCLI using the built containers. This is especially important since the PR description mentions this is targeting HPC environments where Singularity is commonly used.
| ../src/ /datastreamcli/src/ | ||
| ../setup.cfg /datastreamcli/setup.cfg | ||
| ../pyproject.toml /datastreamcli/pyproject.toml | ||
| ../ /datastreamcli |
There was a problem hiding this comment.
This line copies the entire parent directory (including .git/, .github/, tests/, docs/, docker/, etc.) after already copying specific subdirectories in lines 5-8. This creates redundancy and significantly increases image size. Additionally, the scripts/ directory is not explicitly copied in lines 5-8, but it's required based on the PR description's testing showing '/datastreamcli/scripts/datastream'. Consider either removing this line and adding 'scripts/' to the explicit file list, or removing lines 5-8 and only using this line if all files are needed.
| ../ /datastreamcli | |
| ../scripts/ /datastreamcli/scripts/ |
| ../ /datastreamcli | ||
|
|
||
| %post | ||
| pip3 install -e /datastreamcli && \ |
There was a problem hiding this comment.
The pip3 install command uses '-e /datastreamcli' which installs the package in editable mode from the root directory. However, looking at the Dockerfile.datastream (line 8), it uses 'pip3 install -e datastreamcli/' without the leading slash. While both may work, this inconsistency could lead to confusion. Additionally, the PR description shows testing uses '/datastreamcli/scripts/datastream', but the scripts/ directory is not explicitly copied in the %files section (lines 5-8). Verify that all required files, including scripts/, are properly available in the container.
| @@ -0,0 +1,20 @@ | |||
| Bootstrap: docker | |||
| From: awiciroh/datastream-deps:latest | |||
There was a problem hiding this comment.
The Singularity definition uses the Docker base image awiciroh/datastream-deps:latest, which is a mutable tag pulled from a third-party registry and creates a supply-chain risk because the image contents can change silently over time or be compromised upstream. An attacker who gains control of that Docker image or namespace could inject malicious code that will be built into datastream.sif and execute wherever the container runs. Pin this dependency to an immutable image digest or a vetted, fixed version tag and establish a process for reviewing and updating it.
| singularity build forcingprocessor.sif docker://awiciroh/forcingprocessor:latest | ||
| singularity build merkdir.sif docker://zwills/merkdir:latest | ||
| singularity build ngiabinabox.sif docker://awiciroh/ciroh-ngen-image:latest |
There was a problem hiding this comment.
The build script pulls Docker images awiciroh/forcingprocessor:latest, zwills/merkdir:latest, and awiciroh/ciroh-ngen-image:latest via docker:// using the mutable latest tag, which introduces a supply-chain risk because these images can change or be compromised without any change in this repository. If an attacker gains control of any of these Docker Hub namespaces or images, they could ship a trojanized image that would be transparently converted into Singularity .sif files and executed in your workflows. Pin each image to an immutable digest or a specific, reviewed version tag and document how updates are validated before rebuilding.
|
Hey! Nice work adding Singularity support One thing I noticed in %files
../configs/ /datastreamcli/configs/
../src/ /datastreamcli/src/
../setup.cfg /datastreamcli/setup.cfg
../pyproject.toml /datastreamcli/pyproject.toml
../ /datastreamcliThe last line ( You can simplify this to just: %files
../ /datastreamcliThis copies everything in one go and keeps the definition cleaner! |
|
This is a first step towards addressing CIROH-UA/ngen-datastream#264 |
JordanLaserGit
left a comment
There was a problem hiding this comment.
@hariteja-jajula Thanks for this work!
- For the docker image tags, can we dynamically read the versions.yml file and build with the latest versioned tags (not latest tag)
- Can you remove the forcingprocessor singularity build and move that to the forcingprocessor repository?
- Could you respond the Copilot comments?
- Do you think we can use Github Actions to test these builds? We won't be deploying anything using these images, but it would be nice to know the singularity containers build before testing them on-premise.
[Short description explaining the high-level reason for the pull request]
Additions
-New singularity/ directory containing:
datastream.def — Singularity definition for DataStreamCLI
build.sh — helper script to build required .sif images from existing Docker images
.gitignore — excludes generated .sif files from version control
Removals
Changes
Testing
singularity exec datastream.sif python3 -c "import datastreamcli; print('OK')"Verified DataStreamCLI shell entrypoint is present and executable:
singularity exec datastream.sif /datastreamcli/scripts/datastream --helpScreenshots
Notes
Todos
Checklist
Testing checklist
Target Environment support
Accessibility
Other