Skip to content

feat: Add new MPI utilities, libraries and functionality#134

Open
dgchinner wants to merge 5 commits into
linux-system-roles:mainfrom
dgchinner:mpi-updates
Open

feat: Add new MPI utilities, libraries and functionality#134
dgchinner wants to merge 5 commits into
linux-system-roles:mainfrom
dgchinner:mpi-updates

Conversation

@dgchinner
Copy link
Copy Markdown
Collaborator

@dgchinner dgchinner commented May 21, 2026

This series:

  • adds a set of utilities for manipulating files across a cluster using MPI infrastructure.
  • adds the MVAPICH MPI Library from Ohio State University with CUDA/GPU acceleration enabled.
  • implements transparent environment handling of GPU enabled OpenMPI libraries to run on machines without GPUs
  • Separates installation of Lmod environment files from the MPI libraries to allow development and testing independently from building/installing the MPI libraries themselves
  • Splits the MPI installation rules out into their own tasks file to make it easier to maintain and further develop MPI support.

Currently supported MPI libraries are now:

$ ml -t spider mpi/
mpi/hpcx-2.24.1-pmix-4.2.9
mpi/hpcx-2.24.1
mpi/mvapich-4.0
mpi/openmpi-x86_64
mpi/openmpi-5.0.8-cuda12-gpu
$

MVAPICH cannot run on non-GPU machines due to it's built in UCX library. OpenMPI uses modular transport infrastructure, so the cuda/GPU modules can be turned off and not loaded. Hence on a non-GPU machine:

$ ml mpi/mvapich-4.0
Error: MVAPICH 4.0 was built with CUDA/UCX support and requires NVidia GPUs. This machine has no GPUs. Use a different MPI module (e.g. hpcx or openmpi).
$ ml mpi/openmpi-5.0.8-cuda12-gpu
$ env |grep OMPI_MCA
OMPI_MCA_osc=^ucx
OMPI_MCA_btl=^smcuda
OMPI_MCA_pml=^ucx
OMPI_MCA_coll=^ucc,cuda,hcoll
$

The mvapich environment refuses to load, whilst the OpenMPI modules turn off all the CUDA modules and UCX transport functionality that requires cuda and/or GPU support. Hence the OpenMPI modules now work on machines with and without GPUs without the user having to do anything special.

Issue Tracker Tickets (Jira or BZ if any): https://redhat.atlassian.net/browse/RHELHPC-105

Summary by CodeRabbit

  • New Features

    • Added MVAPICH MPI distribution support with GPU acceleration.
    • Added mpifileutils installation and build capabilities.
    • Implemented GPU detection for MPI module environments with conditional configuration.
  • Refactor

    • Reorganized MPI installation workflow for improved modularity.
  • Tests

    • Updated test scenarios for new MPI configuration options.

Review Change Stack

dgchinner added 5 commits May 20, 2026 09:24
Move all MPI-related tasks out of tasks/main.yml into a dedicated
tasks/mpi.yml file for easier navigation and maintenance. This includes
the precondition checks, OpenMPI/HPC-X/PMIx/GDRCopy build and
install, etc. This is done in preparation for adding more MPI
functionality.

The main.yml file now includes mpi.yml via include_tasks at the point
where the MPI blocks previously appeared (after RDMA packages, before
Docker).

Signed-off-by: Dave Chinner <dchinner@redhat.com>
mpifileutils provides MPI-based file utilities for parallel file operations
including tools like dcp, drm, dsync, dfind, dwalk, dcmp, and dtar. The
package is built from source using cmake with HPC-X MPI, matching the
upstream azhpc-images build process.

The build uses the same temporary directory pattern as the OpenMPI build:
download and extract to a tempdir, build in a separate tempdir, install to
the __hpc_azure_resource_dir/mpifileutils directory, then clean up both
temp directories.

A parameter check is added to ensure HPC-X MPI is available before
attempting to build mpifileutils, since HPC-X provides the MPI compilers
required for the cmake build.

The package is only installed in Azure test environments (tests_azure.yml).
All other test playbooks explicitly disable it to avoid requiring HPC-X MPI.

Changes:
- Add __hpc_mpifileutils_info to vars/RedHat_9.yml (version 0.12)
- Add __hpc_mpifileutils_build_dependencies and __hpc_mpifileutils_install_dir to vars/main.yml
- Add hpc_install_mpifileutils default (true) to defaults/main.yml
- Add parameter validation check requiring hpc_build_openmpi_w_nvidia_gpu_support
- Add download, build, and install tasks using tempdir pattern
- Add mpifileutils build deps to the build dependency cleanup task
- Disable mpifileutils in tests_default, tests_skip_toolkit, and tests_include_vars_from_parent

Created-by-AI: Claude Opus 4.6 (1M context)

Prompt: new modification: add mpifileutils package to the HPC system role. You will find the version to install in the versions.json file in the azhpc-images repository, and the way it needs to be built in components/install_mpifileutils.sh. You will install it to the __hpc_azure_resource_dir directory and use the same temporary build area construct as used for building the openmpi code.

Refinements:
- Disable mpifileutils in all non-Azure test playbooks so only tests_azure.yml installs it

Signed-off-by: Dave Chinner <dchinner@redhat.com>
MVAPICH is a high-performance MPI implementation optimised for InfiniBand
and other high-speed networks. Version 4.0 is built from source using the
same temporary directory pattern as the OpenMPI build.

The build uses ./configure with --enable-g=none --enable-fast=yes flags
matching the upstream azhpc-images build process, and installs to
/opt/mvapich-<version>.

When hpc_build_mpi_w_nvidia_gpu_support is enabled, the build additionally
passes --with-ucx and --with-cuda to configure so that MVAPICH is built
with GPU-aware MPI support using the same UCX and CUDA paths as OpenMPI.

An Lmod environment module is provided in lua format, consistent with the
existing openmpi and hpcx modulefiles, allowing users to load MVAPICH via
'module load mpi/mvapich-4.0'. The module conflicts with other MPI modules
so only one can be loaded at a time. When GPU support is enabled, the
module also adds the UCX and CUDA library paths to LD_LIBRARY_PATH and
PATH, matching the openmpi-cuda module.

Changes:
- Add __hpc_mvapich_info to vars/RedHat_9.yml (version 4.0)
- Add __hpc_mvapich_install_dir to vars/main.yml
- Add hpc_install_mvapich default (true) to defaults/main.yml
- Add download, build, install, and modulefile tasks to tasks/mpi.yml
- Add mvapich-ver.lua.j2 Lmod modulefile template
- Disable hpc_install_mvapich in tests_default, tests_skip_toolkit, and tests_include_vars_from_parent
- Rename hpc_build_openmpi_w_nvidia_gpu_support to hpc_build_mpi_w_nvidia_gpu_support
  as the flag now guards GPU support for both OpenMPI and MVAPICH builds
- Conditionally pass --with-ucx and --with-cuda to MVAPICH configure when
  GPU support is enabled

Created-by-AI: Claude Opus 4.6 (1M context)

Prompt: new modification: add MVAPICH MPI library to the HPC system role. Use version 4.0 as per the reference versions.json, and the build instructions can be derived from components/install_mpis.sh. Ignore the other MPI libraries in that reference file. Add the lmod environment modules using the lua script format to needed to use the MVAPICH libraries similar to those installed by the system role for the openmpi library.

Refinements:
- configure with --with-device=ch4:ucx to use libucx as the
  network transport instead of the built in libfabrics code.
- Add --with-ucx and --with-cuda configure flags guarded by
  hpc_build_mpi_w_nvidia_gpu_support for GPU-aware MPI support.
- Rename hpc_build_openmpi_w_nvidia_gpu_support to
  hpc_build_mpi_w_nvidia_gpu_support since it now applies to
  multiple MPI library builds.
- Add UCX and CUDA library/bin paths to the MVAPICH Lmod module
  when GPU support is enabled, matching the openmpi-cuda module.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Separate the lmod environment module file installation from the MPI
library build tasks into standalone task blocks. This allows modulefile
changes to be deployed by re-running the playbook without triggering
a rebuild of the MPI libraries, which significantly speeds up the
iterative development and testing of lmod configuration changes.

The OpenMPI-based module files (PMIx, HPC-X, HPC-X+PMIx, OpenMPI,
and the no-GPU defaults helper) are grouped under a single block
gated by hpc_build_mpi_w_nvidia_gpu_support. The MVAPICH module file
has its own block gated by hpc_install_mvapich. Both blocks ensure
the target directories exist before installing files. The template
and copy modules are idempotent so these tasks are safe to run on
every playbook invocation.

Changes:
- Remove PMIx modulefile install from the PMIx build block
- Remove MPI module directory creation and HPC-X/OpenMPI/no-GPU helper
  installs from the GPU MPI build block
- Remove MVAPICH module directory creation and modulefile install from
  the MVAPICH build block
- Add new "Install OpenMPI-based lmod environment module files" block
- Add new "Install MVAPICH lmod environment module file" block

Created-by-AI: Claude Opus 4.6 (1M context)

Prompt: new modification: having to rebuild the mpi libraries to install and test changes to the lmod configuration takes a long time. extract the lmod configuration file installation from each of the MPI library installs, and implement a single task that installs all of the individual lmod files. trigger the installation of the files if any of the MPI libraries is rebuilt, or if the /usr/share/modulefiles/mpi is missing. install the individual files according to the installation parameters for each of the MPI libraries that already exist.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
MPI libraries built with CUDA/GPU acceleration use UCX-based transports
that cause warnings or failures on machines without GPUs. This adds
runtime GPU detection to the lmod environment modules so that when no
NVidia GPUs are present, the GPU transports are automatically disabled.

For OpenMPI-derived libraries (OpenMPI, HPC-X), a shared Jinja include
fragment (openmpi-no-gpu-defaults.lua.j2) checks for /dev/nvidia0 and
sets OMPI_MCA environment variables to exclude ucx, smcuda, ucc, cuda,
and hcoll transports. The fragment is inlined into each module file at
template rendering time via {% include %}.

For MVAPICH (when built with GPU support), the module refuses to load
on machines without GPUs. MVAPICH hard-codes HPC-X UCX library paths
into libmpi.so at build time so it cannot fall back to system UCX.
The module issues an LmodError directing users to an alternative MPI
module instead.

Changes:
- Add templates/openmpi-no-gpu-defaults.lua.j2 shared GPU detection fragment
- Add {% include %} to openmpi-ver-cuda12-gpu.lua.j2
- Add {% include %} to hpcx-ver.lua.j2
- Add {% include %} to hpcx-ver-pmix-ver.lua.j2
- Add LmodError to mvapich-ver.lua.j2 to refuse loading on non-GPU machines

Created-by-AI: Claude Opus 4.6 (1M context)

Prompt: new modification: the MPI libraries that are optimised for CUDA and GPU acceleration need different option sets to run on machines without GPUs. All the OpenMPI derived libraries require mpirun/mpiexec to have "--mca pml ^ucx --mca btl ^smcuda --mca osc ^ucx --mca coll ^ucc,cuda,hcoll" to turn off all the underlying UCX-based GPU accelerations. MVAPICH will require a different set of parameters as it passes environment and config variables in a different manner. These need to be set up in the lmod environment modules for each MPI library. If the system does not have any GPUs in it, they should set up the default mpirun/exec environment to use these "avoid using cuda/GPU transports" mechanisms automatically.

Refinements:
- Use Jinja {% include %} to inline GPU detection at deploy time
- MVAPICH refuses to load on non-GPU machines via LmodError because it
  hard-codes HPC-X UCX paths into libmpi.so at build time

Signed-off-by: Dave Chinner <dchinner@redhat.com>
@dgchinner dgchinner requested review from richm and spetrosi as code owners May 21, 2026 05:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

This PR refactors the HPC Ansible role to support multiple MPI distributions (OpenMPI, MVAPICH, mpifileutils) with GPU-aware builds. It extracts the monolithic inline MPI build logic into a dedicated workflow, renames the GPU flag variable for consistency, and adds runtime GPU detection to modulefiles to automatically disable GPU-specific components when no NVIDIA GPU is present.

Changes

MPI Multi-Distribution Support and GPU Detection

Layer / File(s) Summary
Configuration defaults and source metadata
defaults/main.yml, vars/main.yml, vars/RedHat_9.yml, tests/tests_default.yml, tests/tests_include_vars_from_parent.yml, tests/tests_skip_toolkit.yml
New configuration variable hpc_build_mpi_w_nvidia_gpu_support replaces hpc_build_openmpi_w_nvidia_gpu_support. New toggles hpc_install_mvapich and hpc_install_mpifileutils enable optional MPI distributions. Build dependencies and source download metadata added for MVAPICH v4.0 and mpifileutils v0.12. Install directories configured. All test playbooks updated to use new variable names.
GPU detection template and modulefile integration
templates/openmpi-no-gpu-defaults.lua.j2, templates/hpcx-ver.lua.j2, templates/hpcx-ver-pmix-ver.lua.j2, templates/openmpi-ver-cuda12-gpu.lua.j2
New openmpi-no-gpu-defaults.lua.j2 template provides reusable GPU detection via /dev/nvidia0 and disables UCX/CUDA-related OpenMPI MCA components when no GPU is present. This template is included by all OpenMPI-based modulefiles (hpcx-ver, hpcx-ver-pmix-ver, openmpi-ver-cuda12-gpu) to provide consistent GPU-aware defaults.
MVAPICH modulefile with GPU detection
templates/mvapich-ver.lua.j2
New MVAPICH environment module declares MPI module conflicts, includes GPU detection that fails module load if GPU is required but unavailable, conditionally prepends MVAPICH and UCX/CUDA paths to environment, and sets MPI_* variables for compatibility with MPI-aware tools.
MPI build task refactoring and workflow
tasks/main.yml, tasks/mpi.yml
Removes inline MPI build logic from main.yml and precondition failure for incomplete GPU configuration. Replaces with single include_tasks: tasks/mpi.yml entry point. New mpi.yml implements complete MPI provisioning workflow: precondition validation for GPU/CUDA consistency and mpifileutils requirements, conditional system package installation, GPU-enabled builds of PMIx, GDRCopy (with systemd integration), HPC-X rebuild, mpifileutils, OpenMPI (with UCX/UCC/Hcoll/PMIx/CUDA), and MVAPICH (with optional UCX/CUDA). Generates lmod modulefiles for all MPI variants and PMIx using Jinja2 templates.
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description Format ⚠️ Warning PR description does not follow required template. Missing "Enhancement:" (or "Feature:"), "Reason:", and "Result:" sections required by .github/pull_request_template.md. Reformat PR description to include: "Enhancement:" describing changes, "Reason:" explaining why needed, "Result:" describing outcomes, and optional "Issue Tracker Tickets (Jira or BZ if any):" section.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title follows Conventional Commits format with type 'feat' and a clear, descriptive summary of the main changes (adding new MPI utilities, libraries and functionality).
Description check ✅ Passed The pull request description provides comprehensive context covering the rationale, implementation details, results, and issue tracker reference, exceeding the basic template structure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tasks/mpi.yml`:
- Around line 295-330: The OpenMPI tasks ("Get stat of openmpi path", "Download
and build OpenMPI") are incorrectly indented inside the hpc_install_mpifileutils
block; move these tasks out of the hpc_install_mpifileutils block and place them
into the hpc_build_mpi_w_nvidia_gpu_support block (i.e., dedent them one level)
so they run as part of hpc_build_mpi_w_nvidia_gpu_support; keep the same task
names, register variable (__hpc_openmpi_path_stat), use the existing
include_tasks (tasks/download_extract_package.yml) and looped configure/make
steps unchanged while removing them from inside hpc_install_mpifileutils.
- Around line 14-17: The current when clause combines three conditions as a list
(AND) so the failure task only runs when both hpc_install_cuda_toolkit and
hpc_install_nvidia_nccl are false; change the logic so the task triggers if
hpc_build_mpi_w_nvidia_gpu_support is true AND either dependency is missing by
replacing the list with a single boolean expression: use
hpc_build_mpi_w_nvidia_gpu_support and (not hpc_install_cuda_toolkit or not
hpc_install_nvidia_nccl) referencing the same variable names to ensure the fail
runs when either CUDA toolkit or NCCL is not installed.

In `@templates/openmpi-no-gpu-defaults.lua.j2`:
- Around line 1-8: This template is missing the required top-of-file headers;
update templates/openmpi-no-gpu-defaults.lua.j2 to prepend the two mandatory
header lines so they are the first lines in the file: add {{ ansible_managed |
comment }} as the very first header and immediately below it add {{
"system_role:hpc" | comment(prefix="", postfix="") }}, keeping the rest of the
file (the shared GPU-detection Lua fragment) unchanged so it remains includable
by other OpenMPI-derived module files.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1f47850b-259e-4f76-858a-e36afbb48cf9

📥 Commits

Reviewing files that changed from the base of the PR and between 3def6ea and c0df118.

📒 Files selected for processing (13)
  • defaults/main.yml
  • tasks/main.yml
  • tasks/mpi.yml
  • templates/hpcx-ver-pmix-ver.lua.j2
  • templates/hpcx-ver.lua.j2
  • templates/mvapich-ver.lua.j2
  • templates/openmpi-no-gpu-defaults.lua.j2
  • templates/openmpi-ver-cuda12-gpu.lua.j2
  • tests/tests_default.yml
  • tests/tests_include_vars_from_parent.yml
  • tests/tests_skip_toolkit.yml
  • vars/RedHat_9.yml
  • vars/main.yml

Comment thread tasks/mpi.yml
Comment on lines +14 to +17
when:
- hpc_build_mpi_w_nvidia_gpu_support
- not hpc_install_cuda_toolkit
- not hpc_install_nvidia_nccl
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Incorrect condition logic allows invalid configurations to pass.

The when conditions are ANDed, so the fail task only triggers when both hpc_install_cuda_toolkit AND hpc_install_nvidia_nccl are false. If only one is missing (e.g., CUDA toolkit disabled but NCCL enabled), this check passes but the MPI build will fail later.

The condition should fail if either dependency is missing.

🐛 Proposed fix
   when:
     - hpc_build_mpi_w_nvidia_gpu_support
-    - not hpc_install_cuda_toolkit
-    - not hpc_install_nvidia_nccl
+    - not hpc_install_cuda_toolkit or not hpc_install_nvidia_nccl
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tasks/mpi.yml` around lines 14 - 17, The current when clause combines three
conditions as a list (AND) so the failure task only runs when both
hpc_install_cuda_toolkit and hpc_install_nvidia_nccl are false; change the logic
so the task triggers if hpc_build_mpi_w_nvidia_gpu_support is true AND either
dependency is missing by replacing the list with a single boolean expression:
use hpc_build_mpi_w_nvidia_gpu_support and (not hpc_install_cuda_toolkit or not
hpc_install_nvidia_nccl) referencing the same variable names to ensure the fail
runs when either CUDA toolkit or NCCL is not installed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this @dgchinner sounds valid

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment thread tasks/mpi.yml
Comment on lines +295 to +330
- name: Get stat of openmpi path
stat:
path: "{{ __hpc_openmpi_path }}"
register: __hpc_openmpi_path_stat

- name: Download and build OpenMPI
when: not __hpc_openmpi_path_stat.stat.exists
block:
- name: Download {{ __hpc_openmpi_info.name }}
include_tasks: tasks/download_extract_package.yml
vars:
__hpc_pkg_info: "{{ __hpc_openmpi_info }}"

- name: Build {{ __hpc_openmpi_info.name }}
command:
cmd: "{{ item }}"
chdir: "{{ __hpc_pkg_extracted.path }}"
changed_when: true
loop:
- >-
./configure --prefix={{ __hpc_openmpi_path }}
--with-ucx={{ __hpc_ucx_path }}
--with-ucc={{ __hpc_ucc_path }}
--with-hcoll={{ __hpc_hcoll_path }}
--with-pmix={{ __hpc_pmix_path }}
--enable-prte-prefix-by-default
--with-platform=contrib/platform/mellanox/optimized
--with-cuda={{ __hpc_cuda_path }}
- make -j {{ ansible_facts["processor_nproc"] }}
- make install

- name: Remove extracted tarball
file:
path: "{{ __hpc_pkg_extracted.path }}"
state: absent
changed_when: false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

OpenMPI build tasks incorrectly nested inside mpifileutils block.

The "Get stat of openmpi path" task and subsequent OpenMPI build tasks (lines 295-330) are indented inside the hpc_install_mpifileutils block. This means OpenMPI will only be built when mpifileutils is also requested, which breaks the intended behavior.

These tasks should be moved outside the mpifileutils block to be part of the hpc_build_mpi_w_nvidia_gpu_support block instead.

🐛 Proposed fix - dedent OpenMPI tasks

Move lines 295-330 outside the mpifileutils block by reducing their indentation by one level, placing them after line 236 (end of HPC-X build block) and before line 238 (mpifileutils block):

         - name: Remove extracted tarball
           file:
             path: "{{ __hpc_pkg_extracted.path }}"
             state: absent
           changed_when: false

+- name: Get stat of openmpi path
+  stat:
+    path: "{{ __hpc_openmpi_path }}"
+  register: __hpc_openmpi_path_stat
+
+- name: Download and build OpenMPI
+  when: not __hpc_openmpi_path_stat.stat.exists
+  block:
+    - name: Download {{ __hpc_openmpi_info.name }}
+      include_tasks: tasks/download_extract_package.yml
+      vars:
+        __hpc_pkg_info: "{{ __hpc_openmpi_info }}"
+
+    - name: Build {{ __hpc_openmpi_info.name }}
+      command:
+        cmd: "{{ item }}"
+        chdir: "{{ __hpc_pkg_extracted.path }}"
+      changed_when: true
+      loop:
+        - >-
+          ./configure --prefix={{ __hpc_openmpi_path }}
+          --with-ucx={{ __hpc_ucx_path }}
+          --with-ucc={{ __hpc_ucc_path }}
+          --with-hcoll={{ __hpc_hcoll_path }}
+          --with-pmix={{ __hpc_pmix_path }}
+          --enable-prte-prefix-by-default
+          --with-platform=contrib/platform/mellanox/optimized
+          --with-cuda={{ __hpc_cuda_path }}
+        - make -j {{ ansible_facts["processor_nproc"] }}
+        - make install
+
+    - name: Remove extracted tarball
+      file:
+        path: "{{ __hpc_pkg_extracted.path }}"
+        state: absent
+      changed_when: false
+
 - name: Download, build, and install mpifileutils
   when: hpc_install_mpifileutils
   block:
     - name: Get stat of mpifileutils install path

And remove lines 295-330 from inside the mpifileutils block.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tasks/mpi.yml` around lines 295 - 330, The OpenMPI tasks ("Get stat of
openmpi path", "Download and build OpenMPI") are incorrectly indented inside the
hpc_install_mpifileutils block; move these tasks out of the
hpc_install_mpifileutils block and place them into the
hpc_build_mpi_w_nvidia_gpu_support block (i.e., dedent them one level) so they
run as part of hpc_build_mpi_w_nvidia_gpu_support; keep the same task names,
register variable (__hpc_openmpi_path_stat), use the existing include_tasks
(tasks/download_extract_package.yml) and looped configure/make steps unchanged
while removing them from inside hpc_install_mpifileutils.

Comment on lines +1 to +8
--
-- Shared GPU detection for OpenMPI-based MPI modules.
--
-- When no NVidia GPUs are present, disable the UCX/CUDA-based transports
-- that cause warnings or failures on GPU-less machines. This is sourced
-- by each OpenMPI-derived module file via dofile().
--

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add required template headers.

This template file is missing the mandatory headers required for all templates. As per coding guidelines, templates must include these headers in order:

  1. {{ ansible_managed | comment }}
  2. {{ "system_role:hpc" | comment(prefix="", postfix="") }}

Even though this template is designed to be included as a fragment by other templates, it still falls under templates/**/*.j2 and must comply with the header requirements.

🔧 Proposed fix
+{{ ansible_managed | comment(decoration="-- ") }}
+{{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }}
 --
 -- Shared GPU detection for OpenMPI-based MPI modules.
 --
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--
-- Shared GPU detection for OpenMPI-based MPI modules.
--
-- When no NVidia GPUs are present, disable the UCX/CUDA-based transports
-- that cause warnings or failures on GPU-less machines. This is sourced
-- by each OpenMPI-derived module file via dofile().
--
{{ ansible_managed | comment(decoration="-- ") }}
{{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }}
--
-- Shared GPU detection for OpenMPI-based MPI modules.
--
-- When no NVidia GPUs are present, disable the UCX/CUDA-based transports
-- that cause warnings or failures on GPU-less machines. This is sourced
-- by each OpenMPI-derived module file via dofile().
--
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@templates/openmpi-no-gpu-defaults.lua.j2` around lines 1 - 8, This template
is missing the required top-of-file headers; update
templates/openmpi-no-gpu-defaults.lua.j2 to prepend the two mandatory header
lines so they are the first lines in the file: add {{ ansible_managed | comment
}} as the very first header and immediately below it add {{ "system_role:hpc" |
comment(prefix="", postfix="") }}, keeping the rest of the file (the shared
GPU-detection Lua fragment) unchanged so it remains includable by other
OpenMPI-derived module files.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spetrosi, thanks for confirming! You can apply the proposed fix to add the required headers to templates/openmpi-no-gpu-defaults.lua.j2:

+{{ ansible_managed | comment(decoration="-- ") }}
+{{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }}
 --
 -- Shared GPU detection for OpenMPI-based MPI modules.
 --
 -- When no NVidia GPUs are present, disable the UCX/CUDA-based transports
 -- that cause warnings or failures on GPU-less machines. This is sourced
 -- by each OpenMPI-derived module file via dofile().
 --

Note that decoration="-- " is used here to match Lua's comment syntax, keeping the output consistent with the surrounding Lua code in the template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants