Skip to content

Add Singularity definitions for DataStreamCLI and dependencies#35

Open
hariteja-jajula wants to merge 1 commit intoCIROH-UA:mainfrom
hariteja-jajula:main
Open

Add Singularity definitions for DataStreamCLI and dependencies#35
hariteja-jajula wants to merge 1 commit intoCIROH-UA:mainfrom
hariteja-jajula:main

Conversation

@hariteja-jajula
Copy link

[Short description explaining the high-level reason for the pull request]

Additions

-New singularity/ directory containing:

datastream.def — Singularity definition for DataStreamCLI

build.sh — helper script to build required .sif images from existing Docker images

.gitignore — excludes generated .sif files from version control

Removals

Changes

Testing

  1. Built datastream.sif locally using singularity build via singularity/build.sh
  2. Verified Python package installation inside the container:

singularity exec datastream.sif python3 -c "import datastreamcli; print('OK')"

Verified DataStreamCLI shell entrypoint is present and executable:

singularity exec datastream.sif /datastreamcli/scripts/datastream --help

Screenshots

image

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • [] No linting errors or warnings
  • [] JavaScript tests are passing

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds Singularity/Apptainer container support to DataStreamCLI to enable deployment in HPC environments where Docker is typically not available. The PR introduces a new singularity/ directory with build definitions and scripts to create Singularity Image Format (.sif) files from the existing Docker infrastructure.

Changes:

  • Added Singularity definition file (datastream.def) based on the existing awiciroh/datastream-deps:latest Docker image
  • Added build script (build.sh) to automate creation of datastream.sif and related workflow container images
  • Added .gitignore to exclude generated .sif files from version control

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
singularity/datastream.def Singularity definition that bootstraps from the Docker deps image and installs DataStreamCLI package
singularity/build.sh Shell script to build datastream.sif and pull additional workflow containers (forcingprocessor, merkdir, ngiabinabox)
singularity/.gitignore Excludes .sif binary files from git tracking

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +14 to +16
groupadd -r mygroup && \
useradd -r -g mygroup myuser && \
chown -R myuser:mygroup /datastreamcli
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chown command changes ownership of /datastreamcli to myuser:mygroup, but there's no corresponding USER directive to switch to that user. In the Docker equivalent (Dockerfile.datastream line 12), the COPY command uses --chown to set ownership during copy. Without a USER directive, the container will run as root by default, making the ownership change ineffective. Consider adding 'USER myuser' after the chown command, or remove the user/group creation if root access is required.

Copilot uses AI. Check for mistakes.
chown -R myuser:mygroup /datastreamcli

%runscript
exec "$@"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runscript uses 'exec "$@"' which passes all arguments to be executed. However, this differs from typical container patterns where you'd want a default command or entrypoint. Based on the documentation in scripts/datastream and README.md, users expect to run './scripts/datastream' with arguments. Consider making the runscript default to executing the datastream script, such as 'exec /datastreamcli/scripts/datastream "$@"', to provide a clearer entrypoint that matches user expectations.

Suggested change
exec "$@"
exec /datastreamcli/scripts/datastream "$@"

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +11
# Main DataStreamCLI container
singularity build datastream.sif datastream.def

# Additional workflow containers (built from Docker images)
singularity build forcingprocessor.sif docker://awiciroh/forcingprocessor:latest
singularity build merkdir.sif docker://zwills/merkdir:latest
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build script lacks documentation comments explaining what each image is for and what dependencies each provides. Given that this is a new feature adding Singularity support, adding header comments describing the purpose of each image build would improve maintainability. For example, explaining that forcingprocessor, merkdir, and ngiabinabox are workflow dependencies would help users understand what's being built.

Suggested change
# Main DataStreamCLI container
singularity build datastream.sif datastream.def
# Additional workflow containers (built from Docker images)
singularity build forcingprocessor.sif docker://awiciroh/forcingprocessor:latest
singularity build merkdir.sif docker://zwills/merkdir:latest
# Main DataStreamCLI container:
# - Provides the primary DataStreamCLI runtime environment.
# - Contains the CLI entrypoint and core application dependencies used to
# orchestrate and manage the end-to-end workflow.
singularity build datastream.sif datastream.def
# Workflow dependency container: forcingprocessor
# - Built from the awiciroh/forcingprocessor Docker image.
# - Provides tools and scripts for preprocessing hydrologic/meteorologic
# forcing data that are consumed by downstream workflow steps.
singularity build forcingprocessor.sif docker://awiciroh/forcingprocessor:latest
# Workflow dependency container: merkdir
# - Built from the zwills/merkdir Docker image.
# - Supplies the merkdir utilities used by the workflow for organizing and
# managing model input/output directory structures.
singularity build merkdir.sif docker://zwills/merkdir:latest
# Workflow dependency container: ngiabinabox (NGen environment)
# - Built from the awiciroh/ciroh-ngen-image Docker image.
# - Provides the NGen/NGIA model runtime and related dependencies required
# to execute NGen-based modeling components within the workflow.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,15 @@
#!/usr/bin/env bash
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script uses '#!/usr/bin/env bash' while other shell scripts in the repository (scripts/datastream, scripts/create_ami.sh, docker/docker_loginNpush.sh) consistently use '#!/bin/bash'. For consistency with the codebase conventions, consider changing to '#!/bin/bash'.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1 @@
*.sif
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new singularity/ directory lacks a README.md file to document its purpose, usage, and requirements. The docker/ directory has a README.md (docker/README.md) that documents the Docker setup. Following this pattern, a singularity/README.md should be added to explain how to build and use the Singularity images, what each image is for, system requirements (e.g., Singularity/Apptainer version), and how to run DataStreamCLI using the built containers. This is especially important since the PR description mentions this is targeting HPC environments where Singularity is commonly used.

Copilot uses AI. Check for mistakes.
../src/ /datastreamcli/src/
../setup.cfg /datastreamcli/setup.cfg
../pyproject.toml /datastreamcli/pyproject.toml
../ /datastreamcli
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line copies the entire parent directory (including .git/, .github/, tests/, docs/, docker/, etc.) after already copying specific subdirectories in lines 5-8. This creates redundancy and significantly increases image size. Additionally, the scripts/ directory is not explicitly copied in lines 5-8, but it's required based on the PR description's testing showing '/datastreamcli/scripts/datastream'. Consider either removing this line and adding 'scripts/' to the explicit file list, or removing lines 5-8 and only using this line if all files are needed.

Suggested change
../ /datastreamcli
../scripts/ /datastreamcli/scripts/

Copilot uses AI. Check for mistakes.
../ /datastreamcli

%post
pip3 install -e /datastreamcli && \
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pip3 install command uses '-e /datastreamcli' which installs the package in editable mode from the root directory. However, looking at the Dockerfile.datastream (line 8), it uses 'pip3 install -e datastreamcli/' without the leading slash. While both may work, this inconsistency could lead to confusion. Additionally, the PR description shows testing uses '/datastreamcli/scripts/datastream', but the scripts/ directory is not explicitly copied in the %files section (lines 5-8). Verify that all required files, including scripts/, are properly available in the container.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,20 @@
Bootstrap: docker
From: awiciroh/datastream-deps:latest
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Singularity definition uses the Docker base image awiciroh/datastream-deps:latest, which is a mutable tag pulled from a third-party registry and creates a supply-chain risk because the image contents can change silently over time or be compromised upstream. An attacker who gains control of that Docker image or namespace could inject malicious code that will be built into datastream.sif and execute wherever the container runs. Pin this dependency to an immutable image digest or a vetted, fixed version tag and establish a process for reviewing and updating it.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +12
singularity build forcingprocessor.sif docker://awiciroh/forcingprocessor:latest
singularity build merkdir.sif docker://zwills/merkdir:latest
singularity build ngiabinabox.sif docker://awiciroh/ciroh-ngen-image:latest
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build script pulls Docker images awiciroh/forcingprocessor:latest, zwills/merkdir:latest, and awiciroh/ciroh-ngen-image:latest via docker:// using the mutable latest tag, which introduces a supply-chain risk because these images can change or be compromised without any change in this repository. If an attacker gains control of any of these Docker Hub namespaces or images, they could ship a trojanized image that would be transparently converted into Singularity .sif files and executed in your workflows. Pin each image to an immutable digest or a specific, reviewed version tag and document how updates are validated before rebuilding.

Copilot uses AI. Check for mistakes.
@harshavemula-ua
Copy link
Collaborator

harshavemula-ua commented Feb 6, 2026

@hariteja-jajula

Hey! Nice work adding Singularity support

One thing I noticed in datastream.def - the %files section copies specific directories first, then copies the entire repo afterward:

%files
    ../configs/ /datastreamcli/configs/
    ../src/ /datastreamcli/src/
    ../setup.cfg /datastreamcli/setup.cfg
    ../pyproject.toml /datastreamcli/pyproject.toml
    ../ /datastreamcli

The last line (../ /datastreamcli) will overwrite everything copied above it, so the first four lines aren't doing anything.

You can simplify this to just:

%files
    ../ /datastreamcli

This copies everything in one go and keeps the definition cleaner!

@JordanLaserGit
Copy link
Collaborator

This is a first step towards addressing CIROH-UA/ngen-datastream#264

Copy link
Collaborator

@JordanLaserGit JordanLaserGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hariteja-jajula Thanks for this work!

  1. For the docker image tags, can we dynamically read the versions.yml file and build with the latest versioned tags (not latest tag)
  2. Can you remove the forcingprocessor singularity build and move that to the forcingprocessor repository?
  3. Could you respond the Copilot comments?
  4. Do you think we can use Github Actions to test these builds? We won't be deploying anything using these images, but it would be nice to know the singularity containers build before testing them on-premise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request NRDS On-Premise Related to deploying the NRDS on premise at UA

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants