feat: Add new MPI utilities, libraries and functionality by dgchinner · Pull Request #134 · linux-system-roles/hpc

dgchinner · 2026-05-21T05:01:25Z

This series:

adds a set of utilities for manipulating files across a cluster using MPI infrastructure.
adds the MVAPICH MPI Library from Ohio State University with CUDA/GPU acceleration enabled.
implements transparent environment handling of GPU enabled OpenMPI libraries to run on machines without GPUs
Separates installation of Lmod environment files from the MPI libraries to allow development and testing independently from building/installing the MPI libraries themselves
Splits the MPI installation rules out into their own tasks file to make it easier to maintain and further develop MPI support.

Currently supported MPI libraries are now:

$ ml -t spider mpi/
mpi/hpcx-2.24.1-pmix-4.2.9
mpi/hpcx-2.24.1
mpi/mvapich-4.0
mpi/openmpi-x86_64
mpi/openmpi-5.0.8-cuda12-gpu
$

MVAPICH cannot run on non-GPU machines due to it's built in UCX library. OpenMPI uses modular transport infrastructure, so the cuda/GPU modules can be turned off and not loaded. Hence on a non-GPU machine:

$ ml mpi/mvapich-4.0
Error: MVAPICH 4.0 was built with CUDA/UCX support and requires NVidia GPUs. This machine has no GPUs. Use a different MPI module (e.g. hpcx or openmpi).
$ ml mpi/openmpi-5.0.8-cuda12-gpu
$ env |grep OMPI_MCA
OMPI_MCA_osc=^ucx
OMPI_MCA_btl=^smcuda
OMPI_MCA_pml=^ucx
OMPI_MCA_coll=^ucc,cuda,hcoll
$

The mvapich environment refuses to load, whilst the OpenMPI modules turn off all the CUDA modules and UCX transport functionality that requires cuda and/or GPU support. Hence the OpenMPI modules now work on machines with and without GPUs without the user having to do anything special.

Issue Tracker Tickets (Jira or BZ if any): https://redhat.atlassian.net/browse/RHELHPC-105

Summary by CodeRabbit

New Features
- Added MVAPICH MPI distribution support with GPU acceleration.
- Added mpifileutils installation and build capabilities.
- Implemented GPU detection for MPI module environments with conditional configuration.
Refactor
- Reorganized MPI installation workflow for improved modularity.
Tests
- Updated test scenarios for new MPI configuration options.

Move all MPI-related tasks out of tasks/main.yml into a dedicated tasks/mpi.yml file for easier navigation and maintenance. This includes the precondition checks, OpenMPI/HPC-X/PMIx/GDRCopy build and install, etc. This is done in preparation for adding more MPI functionality. The main.yml file now includes mpi.yml via include_tasks at the point where the MPI blocks previously appeared (after RDMA packages, before Docker). Signed-off-by: Dave Chinner <dchinner@redhat.com>

mpifileutils provides MPI-based file utilities for parallel file operations including tools like dcp, drm, dsync, dfind, dwalk, dcmp, and dtar. The package is built from source using cmake with HPC-X MPI, matching the upstream azhpc-images build process. The build uses the same temporary directory pattern as the OpenMPI build: download and extract to a tempdir, build in a separate tempdir, install to the __hpc_azure_resource_dir/mpifileutils directory, then clean up both temp directories. A parameter check is added to ensure HPC-X MPI is available before attempting to build mpifileutils, since HPC-X provides the MPI compilers required for the cmake build. The package is only installed in Azure test environments (tests_azure.yml). All other test playbooks explicitly disable it to avoid requiring HPC-X MPI. Changes: - Add __hpc_mpifileutils_info to vars/RedHat_9.yml (version 0.12) - Add __hpc_mpifileutils_build_dependencies and __hpc_mpifileutils_install_dir to vars/main.yml - Add hpc_install_mpifileutils default (true) to defaults/main.yml - Add parameter validation check requiring hpc_build_openmpi_w_nvidia_gpu_support - Add download, build, and install tasks using tempdir pattern - Add mpifileutils build deps to the build dependency cleanup task - Disable mpifileutils in tests_default, tests_skip_toolkit, and tests_include_vars_from_parent Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: add mpifileutils package to the HPC system role. You will find the version to install in the versions.json file in the azhpc-images repository, and the way it needs to be built in components/install_mpifileutils.sh. You will install it to the __hpc_azure_resource_dir directory and use the same temporary build area construct as used for building the openmpi code. Refinements: - Disable mpifileutils in all non-Azure test playbooks so only tests_azure.yml installs it Signed-off-by: Dave Chinner <dchinner@redhat.com>

MVAPICH is a high-performance MPI implementation optimised for InfiniBand and other high-speed networks. Version 4.0 is built from source using the same temporary directory pattern as the OpenMPI build. The build uses ./configure with --enable-g=none --enable-fast=yes flags matching the upstream azhpc-images build process, and installs to /opt/mvapich-<version>. When hpc_build_mpi_w_nvidia_gpu_support is enabled, the build additionally passes --with-ucx and --with-cuda to configure so that MVAPICH is built with GPU-aware MPI support using the same UCX and CUDA paths as OpenMPI. An Lmod environment module is provided in lua format, consistent with the existing openmpi and hpcx modulefiles, allowing users to load MVAPICH via 'module load mpi/mvapich-4.0'. The module conflicts with other MPI modules so only one can be loaded at a time. When GPU support is enabled, the module also adds the UCX and CUDA library paths to LD_LIBRARY_PATH and PATH, matching the openmpi-cuda module. Changes: - Add __hpc_mvapich_info to vars/RedHat_9.yml (version 4.0) - Add __hpc_mvapich_install_dir to vars/main.yml - Add hpc_install_mvapich default (true) to defaults/main.yml - Add download, build, install, and modulefile tasks to tasks/mpi.yml - Add mvapich-ver.lua.j2 Lmod modulefile template - Disable hpc_install_mvapich in tests_default, tests_skip_toolkit, and tests_include_vars_from_parent - Rename hpc_build_openmpi_w_nvidia_gpu_support to hpc_build_mpi_w_nvidia_gpu_support as the flag now guards GPU support for both OpenMPI and MVAPICH builds - Conditionally pass --with-ucx and --with-cuda to MVAPICH configure when GPU support is enabled Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: add MVAPICH MPI library to the HPC system role. Use version 4.0 as per the reference versions.json, and the build instructions can be derived from components/install_mpis.sh. Ignore the other MPI libraries in that reference file. Add the lmod environment modules using the lua script format to needed to use the MVAPICH libraries similar to those installed by the system role for the openmpi library. Refinements: - configure with --with-device=ch4:ucx to use libucx as the network transport instead of the built in libfabrics code. - Add --with-ucx and --with-cuda configure flags guarded by hpc_build_mpi_w_nvidia_gpu_support for GPU-aware MPI support. - Rename hpc_build_openmpi_w_nvidia_gpu_support to hpc_build_mpi_w_nvidia_gpu_support since it now applies to multiple MPI library builds. - Add UCX and CUDA library/bin paths to the MVAPICH Lmod module when GPU support is enabled, matching the openmpi-cuda module. Signed-off-by: Dave Chinner <dchinner@redhat.com>

Separate the lmod environment module file installation from the MPI library build tasks into standalone task blocks. This allows modulefile changes to be deployed by re-running the playbook without triggering a rebuild of the MPI libraries, which significantly speeds up the iterative development and testing of lmod configuration changes. The OpenMPI-based module files (PMIx, HPC-X, HPC-X+PMIx, OpenMPI, and the no-GPU defaults helper) are grouped under a single block gated by hpc_build_mpi_w_nvidia_gpu_support. The MVAPICH module file has its own block gated by hpc_install_mvapich. Both blocks ensure the target directories exist before installing files. The template and copy modules are idempotent so these tasks are safe to run on every playbook invocation. Changes: - Remove PMIx modulefile install from the PMIx build block - Remove MPI module directory creation and HPC-X/OpenMPI/no-GPU helper installs from the GPU MPI build block - Remove MVAPICH module directory creation and modulefile install from the MVAPICH build block - Add new "Install OpenMPI-based lmod environment module files" block - Add new "Install MVAPICH lmod environment module file" block Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: having to rebuild the mpi libraries to install and test changes to the lmod configuration takes a long time. extract the lmod configuration file installation from each of the MPI library installs, and implement a single task that installs all of the individual lmod files. trigger the installation of the files if any of the MPI libraries is rebuilt, or if the /usr/share/modulefiles/mpi is missing. install the individual files according to the installation parameters for each of the MPI libraries that already exist. Signed-off-by: Dave Chinner <dchinner@redhat.com>

MPI libraries built with CUDA/GPU acceleration use UCX-based transports that cause warnings or failures on machines without GPUs. This adds runtime GPU detection to the lmod environment modules so that when no NVidia GPUs are present, the GPU transports are automatically disabled. For OpenMPI-derived libraries (OpenMPI, HPC-X), a shared Jinja include fragment (openmpi-no-gpu-defaults.lua.j2) checks for /dev/nvidia0 and sets OMPI_MCA environment variables to exclude ucx, smcuda, ucc, cuda, and hcoll transports. The fragment is inlined into each module file at template rendering time via {% include %}. For MVAPICH (when built with GPU support), the module refuses to load on machines without GPUs. MVAPICH hard-codes HPC-X UCX library paths into libmpi.so at build time so it cannot fall back to system UCX. The module issues an LmodError directing users to an alternative MPI module instead. Changes: - Add templates/openmpi-no-gpu-defaults.lua.j2 shared GPU detection fragment - Add {% include %} to openmpi-ver-cuda12-gpu.lua.j2 - Add {% include %} to hpcx-ver.lua.j2 - Add {% include %} to hpcx-ver-pmix-ver.lua.j2 - Add LmodError to mvapich-ver.lua.j2 to refuse loading on non-GPU machines Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: the MPI libraries that are optimised for CUDA and GPU acceleration need different option sets to run on machines without GPUs. All the OpenMPI derived libraries require mpirun/mpiexec to have "--mca pml ^ucx --mca btl ^smcuda --mca osc ^ucx --mca coll ^ucc,cuda,hcoll" to turn off all the underlying UCX-based GPU accelerations. MVAPICH will require a different set of parameters as it passes environment and config variables in a different manner. These need to be set up in the lmod environment modules for each MPI library. If the system does not have any GPUs in it, they should set up the default mpirun/exec environment to use these "avoid using cuda/GPU transports" mechanisms automatically. Refinements: - Use Jinja {% include %} to inline GPU detection at deploy time - MVAPICH refuses to load on non-GPU machines via LmodError because it hard-codes HPC-X UCX paths into libmpi.so at build time Signed-off-by: Dave Chinner <dchinner@redhat.com>

coderabbitai · 2026-05-21T05:01:36Z

📝 Walkthrough

Walkthrough

This PR refactors the HPC Ansible role to support multiple MPI distributions (OpenMPI, MVAPICH, mpifileutils) with GPU-aware builds. It extracts the monolithic inline MPI build logic into a dedicated workflow, renames the GPU flag variable for consistency, and adds runtime GPU detection to modulefiles to automatically disable GPU-specific components when no NVIDIA GPU is present.

Changes

MPI Multi-Distribution Support and GPU Detection

Layer / File(s)	Summary
Configuration defaults and source metadata `defaults/main.yml`, `vars/main.yml`, `vars/RedHat_9.yml`, `tests/tests_default.yml`, `tests/tests_include_vars_from_parent.yml`, `tests/tests_skip_toolkit.yml`	New configuration variable `hpc_build_mpi_w_nvidia_gpu_support` replaces `hpc_build_openmpi_w_nvidia_gpu_support`. New toggles `hpc_install_mvapich` and `hpc_install_mpifileutils` enable optional MPI distributions. Build dependencies and source download metadata added for MVAPICH v4.0 and mpifileutils v0.12. Install directories configured. All test playbooks updated to use new variable names.
GPU detection template and modulefile integration `templates/openmpi-no-gpu-defaults.lua.j2`, `templates/hpcx-ver.lua.j2`, `templates/hpcx-ver-pmix-ver.lua.j2`, `templates/openmpi-ver-cuda12-gpu.lua.j2`	New `openmpi-no-gpu-defaults.lua.j2` template provides reusable GPU detection via `/dev/nvidia0` and disables UCX/CUDA-related OpenMPI MCA components when no GPU is present. This template is included by all OpenMPI-based modulefiles (hpcx-ver, hpcx-ver-pmix-ver, openmpi-ver-cuda12-gpu) to provide consistent GPU-aware defaults.
MVAPICH modulefile with GPU detection `templates/mvapich-ver.lua.j2`	New MVAPICH environment module declares MPI module conflicts, includes GPU detection that fails module load if GPU is required but unavailable, conditionally prepends MVAPICH and UCX/CUDA paths to environment, and sets MPI_* variables for compatibility with MPI-aware tools.
MPI build task refactoring and workflow `tasks/main.yml`, `tasks/mpi.yml`	Removes inline MPI build logic from main.yml and precondition failure for incomplete GPU configuration. Replaces with single `include_tasks: tasks/mpi.yml` entry point. New mpi.yml implements complete MPI provisioning workflow: precondition validation for GPU/CUDA consistency and mpifileutils requirements, conditional system package installation, GPU-enabled builds of PMIx, GDRCopy (with systemd integration), HPC-X rebuild, mpifileutils, OpenMPI (with UCX/UCC/Hcoll/PMIx/CUDA), and MVAPICH (with optional UCX/CUDA). Generates lmod modulefiles for all MPI variants and PMIx using Jinja2 templates.

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Format	⚠️ Warning	PR description does not follow required template. Missing "Enhancement:" (or "Feature:"), "Reason:", and "Result:" sections required by .github/pull_request_template.md.	Reformat PR description to include: "Enhancement:" describing changes, "Reason:" explaining why needed, "Result:" describing outcomes, and optional "Issue Tracker Tickets (Jira or BZ if any):" section.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title follows Conventional Commits format with type 'feat' and a clear, descriptive summary of the main changes (adding new MPI utilities, libraries and functionality).
Description check	✅ Passed	The pull request description provides comprehensive context covering the rationale, implementation details, results, and issue tracker reference, exceeding the basic template structure.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tasks/mpi.yml`:
- Around line 295-330: The OpenMPI tasks ("Get stat of openmpi path", "Download
and build OpenMPI") are incorrectly indented inside the hpc_install_mpifileutils
block; move these tasks out of the hpc_install_mpifileutils block and place them
into the hpc_build_mpi_w_nvidia_gpu_support block (i.e., dedent them one level)
so they run as part of hpc_build_mpi_w_nvidia_gpu_support; keep the same task
names, register variable (__hpc_openmpi_path_stat), use the existing
include_tasks (tasks/download_extract_package.yml) and looped configure/make
steps unchanged while removing them from inside hpc_install_mpifileutils.
- Around line 14-17: The current when clause combines three conditions as a list
(AND) so the failure task only runs when both hpc_install_cuda_toolkit and
hpc_install_nvidia_nccl are false; change the logic so the task triggers if
hpc_build_mpi_w_nvidia_gpu_support is true AND either dependency is missing by
replacing the list with a single boolean expression: use
hpc_build_mpi_w_nvidia_gpu_support and (not hpc_install_cuda_toolkit or not
hpc_install_nvidia_nccl) referencing the same variable names to ensure the fail
runs when either CUDA toolkit or NCCL is not installed.

In `@templates/openmpi-no-gpu-defaults.lua.j2`:
- Around line 1-8: This template is missing the required top-of-file headers;
update templates/openmpi-no-gpu-defaults.lua.j2 to prepend the two mandatory
header lines so they are the first lines in the file: add {{ ansible_managed |
comment }} as the very first header and immediately below it add {{
"system_role:hpc" | comment(prefix="", postfix="") }}, keeping the rest of the
file (the shared GPU-detection Lua fragment) unchanged so it remains includable
by other OpenMPI-derived module files.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1f47850b-259e-4f76-858a-e36afbb48cf9

📥 Commits

Reviewing files that changed from the base of the PR and between 3def6ea and c0df118.

📒 Files selected for processing (13)

defaults/main.yml
tasks/main.yml
tasks/mpi.yml
templates/hpcx-ver-pmix-ver.lua.j2
templates/hpcx-ver.lua.j2
templates/mvapich-ver.lua.j2
templates/openmpi-no-gpu-defaults.lua.j2
templates/openmpi-ver-cuda12-gpu.lua.j2
tests/tests_default.yml
tests/tests_include_vars_from_parent.yml
tests/tests_skip_toolkit.yml
vars/RedHat_9.yml
vars/main.yml

coderabbitai · 2026-05-21T05:05:19Z

+  when:
+    - hpc_build_mpi_w_nvidia_gpu_support
+    - not hpc_install_cuda_toolkit
+    - not hpc_install_nvidia_nccl


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Incorrect condition logic allows invalid configurations to pass.

The when conditions are ANDed, so the fail task only triggers when both hpc_install_cuda_toolkit AND hpc_install_nvidia_nccl are false. If only one is missing (e.g., CUDA toolkit disabled but NCCL enabled), this check passes but the MPI build will fail later.

The condition should fail if either dependency is missing.

🐛 Proposed fix

when: - hpc_build_mpi_w_nvidia_gpu_support - - not hpc_install_cuda_toolkit - - not hpc_install_nvidia_nccl + - not hpc_install_cuda_toolkit or not hpc_install_nvidia_nccl

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tasks/mpi.yml` around lines 14 - 17, The current when clause combines three conditions as a list (AND) so the failure task only runs when both hpc_install_cuda_toolkit and hpc_install_nvidia_nccl are false; change the logic so the task triggers if hpc_build_mpi_w_nvidia_gpu_support is true AND either dependency is missing by replacing the list with a single boolean expression: use hpc_build_mpi_w_nvidia_gpu_support and (not hpc_install_cuda_toolkit or not hpc_install_nvidia_nccl) referencing the same variable names to ensure the fail runs when either CUDA toolkit or NCCL is not installed.

this @dgchinner sounds valid

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2026-05-21T05:05:19Z

+    - name: Get stat of openmpi path
+      stat:
+        path: "{{ __hpc_openmpi_path }}"
+      register: __hpc_openmpi_path_stat
+
+    - name: Download and build OpenMPI
+      when: not __hpc_openmpi_path_stat.stat.exists
+      block:
+        - name: Download {{ __hpc_openmpi_info.name }}
+          include_tasks: tasks/download_extract_package.yml
+          vars:
+            __hpc_pkg_info: "{{ __hpc_openmpi_info }}"
+
+        - name: Build {{ __hpc_openmpi_info.name }}
+          command:
+            cmd: "{{ item }}"
+            chdir: "{{ __hpc_pkg_extracted.path }}"
+          changed_when: true
+          loop:
+            - >-
+              ./configure --prefix={{ __hpc_openmpi_path }}
+              --with-ucx={{ __hpc_ucx_path }}
+              --with-ucc={{ __hpc_ucc_path }}
+              --with-hcoll={{ __hpc_hcoll_path }}
+              --with-pmix={{ __hpc_pmix_path }}
+              --enable-prte-prefix-by-default
+              --with-platform=contrib/platform/mellanox/optimized
+              --with-cuda={{ __hpc_cuda_path }}
+            - make -j {{ ansible_facts["processor_nproc"] }}
+            - make install
+
+        - name: Remove extracted tarball
+          file:
+            path: "{{ __hpc_pkg_extracted.path }}"
+            state: absent
+          changed_when: false


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

OpenMPI build tasks incorrectly nested inside mpifileutils block.

The "Get stat of openmpi path" task and subsequent OpenMPI build tasks (lines 295-330) are indented inside the hpc_install_mpifileutils block. This means OpenMPI will only be built when mpifileutils is also requested, which breaks the intended behavior.

These tasks should be moved outside the mpifileutils block to be part of the hpc_build_mpi_w_nvidia_gpu_support block instead.

🐛 Proposed fix - dedent OpenMPI tasks

Move lines 295-330 outside the mpifileutils block by reducing their indentation by one level, placing them after line 236 (end of HPC-X build block) and before line 238 (mpifileutils block):

- name: Remove extracted tarball file: path: "{{ __hpc_pkg_extracted.path }}" state: absent changed_when: false +- name: Get stat of openmpi path + stat: + path: "{{ __hpc_openmpi_path }}" + register: __hpc_openmpi_path_stat + +- name: Download and build OpenMPI + when: not __hpc_openmpi_path_stat.stat.exists + block: + - name: Download {{ __hpc_openmpi_info.name }} + include_tasks: tasks/download_extract_package.yml + vars: + __hpc_pkg_info: "{{ __hpc_openmpi_info }}" + + - name: Build {{ __hpc_openmpi_info.name }} + command: + cmd: "{{ item }}" + chdir: "{{ __hpc_pkg_extracted.path }}" + changed_when: true + loop: + - >- + ./configure --prefix={{ __hpc_openmpi_path }} + --with-ucx={{ __hpc_ucx_path }} + --with-ucc={{ __hpc_ucc_path }} + --with-hcoll={{ __hpc_hcoll_path }} + --with-pmix={{ __hpc_pmix_path }} + --enable-prte-prefix-by-default + --with-platform=contrib/platform/mellanox/optimized + --with-cuda={{ __hpc_cuda_path }} + - make -j {{ ansible_facts["processor_nproc"] }} + - make install + + - name: Remove extracted tarball + file: + path: "{{ __hpc_pkg_extracted.path }}" + state: absent + changed_when: false + - name: Download, build, and install mpifileutils when: hpc_install_mpifileutils block: - name: Get stat of mpifileutils install path

And remove lines 295-330 from inside the mpifileutils block.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tasks/mpi.yml` around lines 295 - 330, The OpenMPI tasks ("Get stat of openmpi path", "Download and build OpenMPI") are incorrectly indented inside the hpc_install_mpifileutils block; move these tasks out of the hpc_install_mpifileutils block and place them into the hpc_build_mpi_w_nvidia_gpu_support block (i.e., dedent them one level) so they run as part of hpc_build_mpi_w_nvidia_gpu_support; keep the same task names, register variable (__hpc_openmpi_path_stat), use the existing include_tasks (tasks/download_extract_package.yml) and looped configure/make steps unchanged while removing them from inside hpc_install_mpifileutils.

coderabbitai · 2026-05-21T05:05:19Z

+--
+-- Shared GPU detection for OpenMPI-based MPI modules.
+--
+-- When no NVidia GPUs are present, disable the UCX/CUDA-based transports
+-- that cause warnings or failures on GPU-less machines. This is sourced
+-- by each OpenMPI-derived module file via dofile().
+--
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add required template headers.

This template file is missing the mandatory headers required for all templates. As per coding guidelines, templates must include these headers in order:

{{ ansible_managed | comment }}

{{ "system_role:hpc" | comment(prefix="", postfix="") }}

Even though this template is designed to be included as a fragment by other templates, it still falls under templates/**/*.j2 and must comply with the header requirements.

🔧 Proposed fix

+{{ ansible_managed | comment(decoration="-- ") }} +{{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }} -- -- Shared GPU detection for OpenMPI-based MPI modules. --

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

--

-- Shared GPU detection for OpenMPI-based MPI modules.

--

-- When no NVidia GPUs are present, disable the UCX/CUDA-based transports

-- that cause warnings or failures on GPU-less machines. This is sourced

-- by each OpenMPI-derived module file via dofile().

--

{{ ansible_managed | comment(decoration="-- ") }}

{{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }}

--

-- Shared GPU detection for OpenMPI-based MPI modules.

--

-- When no NVidia GPUs are present, disable the UCX/CUDA-based transports

-- that cause warnings or failures on GPU-less machines. This is sourced

-- by each OpenMPI-derived module file via dofile().

--

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@templates/openmpi-no-gpu-defaults.lua.j2` around lines 1 - 8, This template is missing the required top-of-file headers; update templates/openmpi-no-gpu-defaults.lua.j2 to prepend the two mandatory header lines so they are the first lines in the file: add {{ ansible_managed | comment }} as the very first header and immediately below it add {{ "system_role:hpc" | comment(prefix="", postfix="") }}, keeping the rest of the file (the shared GPU-detection Lua fragment) unchanged so it remains includable by other OpenMPI-derived module files.

This is valid

@spetrosi, thanks for confirming! You can apply the proposed fix to add the required headers to templates/openmpi-no-gpu-defaults.lua.j2:

+{{ ansible_managed | comment(decoration="-- ") }} +{{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }} -- -- Shared GPU detection for OpenMPI-based MPI modules. -- -- When no NVidia GPUs are present, disable the UCX/CUDA-based transports -- that cause warnings or failures on GPU-less machines. This is sourced -- by each OpenMPI-derived module file via dofile(). --

Note that decoration="-- " is used here to match Lua's comment syntax, keeping the output consistent with the surrounding Lua code in the template.

dgchinner added 5 commits May 20, 2026 09:24

dgchinner requested review from richm and spetrosi as code owners May 21, 2026 05:01

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

dgchinner mentioned this pull request May 21, 2026

test: Add MPI library test script #135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add new MPI utilities, libraries and functionality#134

feat: Add new MPI utilities, libraries and functionality#134
dgchinner wants to merge 5 commits into
linux-system-roles:mainfrom
dgchinner:mpi-updates

dgchinner commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 21, 2026 •

edited

Loading

Uh oh!

spetrosi May 26, 2026

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026 •

edited

Loading

Uh oh!

spetrosi May 26, 2026

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgchinner commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spetrosi May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spetrosi May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dgchinner commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot May 21, 2026 •

edited

Loading

coderabbitai Bot May 21, 2026 •

edited

Loading