feat: Add new MPI utilities, libraries and functionality#134
Conversation
Move all MPI-related tasks out of tasks/main.yml into a dedicated tasks/mpi.yml file for easier navigation and maintenance. This includes the precondition checks, OpenMPI/HPC-X/PMIx/GDRCopy build and install, etc. This is done in preparation for adding more MPI functionality. The main.yml file now includes mpi.yml via include_tasks at the point where the MPI blocks previously appeared (after RDMA packages, before Docker). Signed-off-by: Dave Chinner <dchinner@redhat.com>
mpifileutils provides MPI-based file utilities for parallel file operations including tools like dcp, drm, dsync, dfind, dwalk, dcmp, and dtar. The package is built from source using cmake with HPC-X MPI, matching the upstream azhpc-images build process. The build uses the same temporary directory pattern as the OpenMPI build: download and extract to a tempdir, build in a separate tempdir, install to the __hpc_azure_resource_dir/mpifileutils directory, then clean up both temp directories. A parameter check is added to ensure HPC-X MPI is available before attempting to build mpifileutils, since HPC-X provides the MPI compilers required for the cmake build. The package is only installed in Azure test environments (tests_azure.yml). All other test playbooks explicitly disable it to avoid requiring HPC-X MPI. Changes: - Add __hpc_mpifileutils_info to vars/RedHat_9.yml (version 0.12) - Add __hpc_mpifileutils_build_dependencies and __hpc_mpifileutils_install_dir to vars/main.yml - Add hpc_install_mpifileutils default (true) to defaults/main.yml - Add parameter validation check requiring hpc_build_openmpi_w_nvidia_gpu_support - Add download, build, and install tasks using tempdir pattern - Add mpifileutils build deps to the build dependency cleanup task - Disable mpifileutils in tests_default, tests_skip_toolkit, and tests_include_vars_from_parent Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: add mpifileutils package to the HPC system role. You will find the version to install in the versions.json file in the azhpc-images repository, and the way it needs to be built in components/install_mpifileutils.sh. You will install it to the __hpc_azure_resource_dir directory and use the same temporary build area construct as used for building the openmpi code. Refinements: - Disable mpifileutils in all non-Azure test playbooks so only tests_azure.yml installs it Signed-off-by: Dave Chinner <dchinner@redhat.com>
MVAPICH is a high-performance MPI implementation optimised for InfiniBand and other high-speed networks. Version 4.0 is built from source using the same temporary directory pattern as the OpenMPI build. The build uses ./configure with --enable-g=none --enable-fast=yes flags matching the upstream azhpc-images build process, and installs to /opt/mvapich-<version>. When hpc_build_mpi_w_nvidia_gpu_support is enabled, the build additionally passes --with-ucx and --with-cuda to configure so that MVAPICH is built with GPU-aware MPI support using the same UCX and CUDA paths as OpenMPI. An Lmod environment module is provided in lua format, consistent with the existing openmpi and hpcx modulefiles, allowing users to load MVAPICH via 'module load mpi/mvapich-4.0'. The module conflicts with other MPI modules so only one can be loaded at a time. When GPU support is enabled, the module also adds the UCX and CUDA library paths to LD_LIBRARY_PATH and PATH, matching the openmpi-cuda module. Changes: - Add __hpc_mvapich_info to vars/RedHat_9.yml (version 4.0) - Add __hpc_mvapich_install_dir to vars/main.yml - Add hpc_install_mvapich default (true) to defaults/main.yml - Add download, build, install, and modulefile tasks to tasks/mpi.yml - Add mvapich-ver.lua.j2 Lmod modulefile template - Disable hpc_install_mvapich in tests_default, tests_skip_toolkit, and tests_include_vars_from_parent - Rename hpc_build_openmpi_w_nvidia_gpu_support to hpc_build_mpi_w_nvidia_gpu_support as the flag now guards GPU support for both OpenMPI and MVAPICH builds - Conditionally pass --with-ucx and --with-cuda to MVAPICH configure when GPU support is enabled Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: add MVAPICH MPI library to the HPC system role. Use version 4.0 as per the reference versions.json, and the build instructions can be derived from components/install_mpis.sh. Ignore the other MPI libraries in that reference file. Add the lmod environment modules using the lua script format to needed to use the MVAPICH libraries similar to those installed by the system role for the openmpi library. Refinements: - configure with --with-device=ch4:ucx to use libucx as the network transport instead of the built in libfabrics code. - Add --with-ucx and --with-cuda configure flags guarded by hpc_build_mpi_w_nvidia_gpu_support for GPU-aware MPI support. - Rename hpc_build_openmpi_w_nvidia_gpu_support to hpc_build_mpi_w_nvidia_gpu_support since it now applies to multiple MPI library builds. - Add UCX and CUDA library/bin paths to the MVAPICH Lmod module when GPU support is enabled, matching the openmpi-cuda module. Signed-off-by: Dave Chinner <dchinner@redhat.com>
Separate the lmod environment module file installation from the MPI library build tasks into standalone task blocks. This allows modulefile changes to be deployed by re-running the playbook without triggering a rebuild of the MPI libraries, which significantly speeds up the iterative development and testing of lmod configuration changes. The OpenMPI-based module files (PMIx, HPC-X, HPC-X+PMIx, OpenMPI, and the no-GPU defaults helper) are grouped under a single block gated by hpc_build_mpi_w_nvidia_gpu_support. The MVAPICH module file has its own block gated by hpc_install_mvapich. Both blocks ensure the target directories exist before installing files. The template and copy modules are idempotent so these tasks are safe to run on every playbook invocation. Changes: - Remove PMIx modulefile install from the PMIx build block - Remove MPI module directory creation and HPC-X/OpenMPI/no-GPU helper installs from the GPU MPI build block - Remove MVAPICH module directory creation and modulefile install from the MVAPICH build block - Add new "Install OpenMPI-based lmod environment module files" block - Add new "Install MVAPICH lmod environment module file" block Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: having to rebuild the mpi libraries to install and test changes to the lmod configuration takes a long time. extract the lmod configuration file installation from each of the MPI library installs, and implement a single task that installs all of the individual lmod files. trigger the installation of the files if any of the MPI libraries is rebuilt, or if the /usr/share/modulefiles/mpi is missing. install the individual files according to the installation parameters for each of the MPI libraries that already exist. Signed-off-by: Dave Chinner <dchinner@redhat.com>
MPI libraries built with CUDA/GPU acceleration use UCX-based transports
that cause warnings or failures on machines without GPUs. This adds
runtime GPU detection to the lmod environment modules so that when no
NVidia GPUs are present, the GPU transports are automatically disabled.
For OpenMPI-derived libraries (OpenMPI, HPC-X), a shared Jinja include
fragment (openmpi-no-gpu-defaults.lua.j2) checks for /dev/nvidia0 and
sets OMPI_MCA environment variables to exclude ucx, smcuda, ucc, cuda,
and hcoll transports. The fragment is inlined into each module file at
template rendering time via {% include %}.
For MVAPICH (when built with GPU support), the module refuses to load
on machines without GPUs. MVAPICH hard-codes HPC-X UCX library paths
into libmpi.so at build time so it cannot fall back to system UCX.
The module issues an LmodError directing users to an alternative MPI
module instead.
Changes:
- Add templates/openmpi-no-gpu-defaults.lua.j2 shared GPU detection fragment
- Add {% include %} to openmpi-ver-cuda12-gpu.lua.j2
- Add {% include %} to hpcx-ver.lua.j2
- Add {% include %} to hpcx-ver-pmix-ver.lua.j2
- Add LmodError to mvapich-ver.lua.j2 to refuse loading on non-GPU machines
Created-by-AI: Claude Opus 4.6 (1M context)
Prompt: new modification: the MPI libraries that are optimised for CUDA and GPU acceleration need different option sets to run on machines without GPUs. All the OpenMPI derived libraries require mpirun/mpiexec to have "--mca pml ^ucx --mca btl ^smcuda --mca osc ^ucx --mca coll ^ucc,cuda,hcoll" to turn off all the underlying UCX-based GPU accelerations. MVAPICH will require a different set of parameters as it passes environment and config variables in a different manner. These need to be set up in the lmod environment modules for each MPI library. If the system does not have any GPUs in it, they should set up the default mpirun/exec environment to use these "avoid using cuda/GPU transports" mechanisms automatically.
Refinements:
- Use Jinja {% include %} to inline GPU detection at deploy time
- MVAPICH refuses to load on non-GPU machines via LmodError because it
hard-codes HPC-X UCX paths into libmpi.so at build time
Signed-off-by: Dave Chinner <dchinner@redhat.com>
📝 WalkthroughWalkthroughThis PR refactors the HPC Ansible role to support multiple MPI distributions (OpenMPI, MVAPICH, mpifileutils) with GPU-aware builds. It extracts the monolithic inline MPI build logic into a dedicated workflow, renames the GPU flag variable for consistency, and adds runtime GPU detection to modulefiles to automatically disable GPU-specific components when no NVIDIA GPU is present. ChangesMPI Multi-Distribution Support and GPU Detection
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tasks/mpi.yml`:
- Around line 295-330: The OpenMPI tasks ("Get stat of openmpi path", "Download
and build OpenMPI") are incorrectly indented inside the hpc_install_mpifileutils
block; move these tasks out of the hpc_install_mpifileutils block and place them
into the hpc_build_mpi_w_nvidia_gpu_support block (i.e., dedent them one level)
so they run as part of hpc_build_mpi_w_nvidia_gpu_support; keep the same task
names, register variable (__hpc_openmpi_path_stat), use the existing
include_tasks (tasks/download_extract_package.yml) and looped configure/make
steps unchanged while removing them from inside hpc_install_mpifileutils.
- Around line 14-17: The current when clause combines three conditions as a list
(AND) so the failure task only runs when both hpc_install_cuda_toolkit and
hpc_install_nvidia_nccl are false; change the logic so the task triggers if
hpc_build_mpi_w_nvidia_gpu_support is true AND either dependency is missing by
replacing the list with a single boolean expression: use
hpc_build_mpi_w_nvidia_gpu_support and (not hpc_install_cuda_toolkit or not
hpc_install_nvidia_nccl) referencing the same variable names to ensure the fail
runs when either CUDA toolkit or NCCL is not installed.
In `@templates/openmpi-no-gpu-defaults.lua.j2`:
- Around line 1-8: This template is missing the required top-of-file headers;
update templates/openmpi-no-gpu-defaults.lua.j2 to prepend the two mandatory
header lines so they are the first lines in the file: add {{ ansible_managed |
comment }} as the very first header and immediately below it add {{
"system_role:hpc" | comment(prefix="", postfix="") }}, keeping the rest of the
file (the shared GPU-detection Lua fragment) unchanged so it remains includable
by other OpenMPI-derived module files.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 1f47850b-259e-4f76-858a-e36afbb48cf9
📒 Files selected for processing (13)
defaults/main.ymltasks/main.ymltasks/mpi.ymltemplates/hpcx-ver-pmix-ver.lua.j2templates/hpcx-ver.lua.j2templates/mvapich-ver.lua.j2templates/openmpi-no-gpu-defaults.lua.j2templates/openmpi-ver-cuda12-gpu.lua.j2tests/tests_default.ymltests/tests_include_vars_from_parent.ymltests/tests_skip_toolkit.ymlvars/RedHat_9.ymlvars/main.yml
| when: | ||
| - hpc_build_mpi_w_nvidia_gpu_support | ||
| - not hpc_install_cuda_toolkit | ||
| - not hpc_install_nvidia_nccl |
There was a problem hiding this comment.
Incorrect condition logic allows invalid configurations to pass.
The when conditions are ANDed, so the fail task only triggers when both hpc_install_cuda_toolkit AND hpc_install_nvidia_nccl are false. If only one is missing (e.g., CUDA toolkit disabled but NCCL enabled), this check passes but the MPI build will fail later.
The condition should fail if either dependency is missing.
🐛 Proposed fix
when:
- hpc_build_mpi_w_nvidia_gpu_support
- - not hpc_install_cuda_toolkit
- - not hpc_install_nvidia_nccl
+ - not hpc_install_cuda_toolkit or not hpc_install_nvidia_nccl🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tasks/mpi.yml` around lines 14 - 17, The current when clause combines three
conditions as a list (AND) so the failure task only runs when both
hpc_install_cuda_toolkit and hpc_install_nvidia_nccl are false; change the logic
so the task triggers if hpc_build_mpi_w_nvidia_gpu_support is true AND either
dependency is missing by replacing the list with a single boolean expression:
use hpc_build_mpi_w_nvidia_gpu_support and (not hpc_install_cuda_toolkit or not
hpc_install_nvidia_nccl) referencing the same variable names to ensure the fail
runs when either CUDA toolkit or NCCL is not installed.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| - name: Get stat of openmpi path | ||
| stat: | ||
| path: "{{ __hpc_openmpi_path }}" | ||
| register: __hpc_openmpi_path_stat | ||
|
|
||
| - name: Download and build OpenMPI | ||
| when: not __hpc_openmpi_path_stat.stat.exists | ||
| block: | ||
| - name: Download {{ __hpc_openmpi_info.name }} | ||
| include_tasks: tasks/download_extract_package.yml | ||
| vars: | ||
| __hpc_pkg_info: "{{ __hpc_openmpi_info }}" | ||
|
|
||
| - name: Build {{ __hpc_openmpi_info.name }} | ||
| command: | ||
| cmd: "{{ item }}" | ||
| chdir: "{{ __hpc_pkg_extracted.path }}" | ||
| changed_when: true | ||
| loop: | ||
| - >- | ||
| ./configure --prefix={{ __hpc_openmpi_path }} | ||
| --with-ucx={{ __hpc_ucx_path }} | ||
| --with-ucc={{ __hpc_ucc_path }} | ||
| --with-hcoll={{ __hpc_hcoll_path }} | ||
| --with-pmix={{ __hpc_pmix_path }} | ||
| --enable-prte-prefix-by-default | ||
| --with-platform=contrib/platform/mellanox/optimized | ||
| --with-cuda={{ __hpc_cuda_path }} | ||
| - make -j {{ ansible_facts["processor_nproc"] }} | ||
| - make install | ||
|
|
||
| - name: Remove extracted tarball | ||
| file: | ||
| path: "{{ __hpc_pkg_extracted.path }}" | ||
| state: absent | ||
| changed_when: false |
There was a problem hiding this comment.
OpenMPI build tasks incorrectly nested inside mpifileutils block.
The "Get stat of openmpi path" task and subsequent OpenMPI build tasks (lines 295-330) are indented inside the hpc_install_mpifileutils block. This means OpenMPI will only be built when mpifileutils is also requested, which breaks the intended behavior.
These tasks should be moved outside the mpifileutils block to be part of the hpc_build_mpi_w_nvidia_gpu_support block instead.
🐛 Proposed fix - dedent OpenMPI tasks
Move lines 295-330 outside the mpifileutils block by reducing their indentation by one level, placing them after line 236 (end of HPC-X build block) and before line 238 (mpifileutils block):
- name: Remove extracted tarball
file:
path: "{{ __hpc_pkg_extracted.path }}"
state: absent
changed_when: false
+- name: Get stat of openmpi path
+ stat:
+ path: "{{ __hpc_openmpi_path }}"
+ register: __hpc_openmpi_path_stat
+
+- name: Download and build OpenMPI
+ when: not __hpc_openmpi_path_stat.stat.exists
+ block:
+ - name: Download {{ __hpc_openmpi_info.name }}
+ include_tasks: tasks/download_extract_package.yml
+ vars:
+ __hpc_pkg_info: "{{ __hpc_openmpi_info }}"
+
+ - name: Build {{ __hpc_openmpi_info.name }}
+ command:
+ cmd: "{{ item }}"
+ chdir: "{{ __hpc_pkg_extracted.path }}"
+ changed_when: true
+ loop:
+ - >-
+ ./configure --prefix={{ __hpc_openmpi_path }}
+ --with-ucx={{ __hpc_ucx_path }}
+ --with-ucc={{ __hpc_ucc_path }}
+ --with-hcoll={{ __hpc_hcoll_path }}
+ --with-pmix={{ __hpc_pmix_path }}
+ --enable-prte-prefix-by-default
+ --with-platform=contrib/platform/mellanox/optimized
+ --with-cuda={{ __hpc_cuda_path }}
+ - make -j {{ ansible_facts["processor_nproc"] }}
+ - make install
+
+ - name: Remove extracted tarball
+ file:
+ path: "{{ __hpc_pkg_extracted.path }}"
+ state: absent
+ changed_when: false
+
- name: Download, build, and install mpifileutils
when: hpc_install_mpifileutils
block:
- name: Get stat of mpifileutils install pathAnd remove lines 295-330 from inside the mpifileutils block.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tasks/mpi.yml` around lines 295 - 330, The OpenMPI tasks ("Get stat of
openmpi path", "Download and build OpenMPI") are incorrectly indented inside the
hpc_install_mpifileutils block; move these tasks out of the
hpc_install_mpifileutils block and place them into the
hpc_build_mpi_w_nvidia_gpu_support block (i.e., dedent them one level) so they
run as part of hpc_build_mpi_w_nvidia_gpu_support; keep the same task names,
register variable (__hpc_openmpi_path_stat), use the existing include_tasks
(tasks/download_extract_package.yml) and looped configure/make steps unchanged
while removing them from inside hpc_install_mpifileutils.
| -- | ||
| -- Shared GPU detection for OpenMPI-based MPI modules. | ||
| -- | ||
| -- When no NVidia GPUs are present, disable the UCX/CUDA-based transports | ||
| -- that cause warnings or failures on GPU-less machines. This is sourced | ||
| -- by each OpenMPI-derived module file via dofile(). | ||
| -- | ||
|
|
There was a problem hiding this comment.
Add required template headers.
This template file is missing the mandatory headers required for all templates. As per coding guidelines, templates must include these headers in order:
{{ ansible_managed | comment }}{{ "system_role:hpc" | comment(prefix="", postfix="") }}
Even though this template is designed to be included as a fragment by other templates, it still falls under templates/**/*.j2 and must comply with the header requirements.
🔧 Proposed fix
+{{ ansible_managed | comment(decoration="-- ") }}
+{{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }}
--
-- Shared GPU detection for OpenMPI-based MPI modules.
--📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| -- | |
| -- Shared GPU detection for OpenMPI-based MPI modules. | |
| -- | |
| -- When no NVidia GPUs are present, disable the UCX/CUDA-based transports | |
| -- that cause warnings or failures on GPU-less machines. This is sourced | |
| -- by each OpenMPI-derived module file via dofile(). | |
| -- | |
| {{ ansible_managed | comment(decoration="-- ") }} | |
| {{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }} | |
| -- | |
| -- Shared GPU detection for OpenMPI-based MPI modules. | |
| -- | |
| -- When no NVidia GPUs are present, disable the UCX/CUDA-based transports | |
| -- that cause warnings or failures on GPU-less machines. This is sourced | |
| -- by each OpenMPI-derived module file via dofile(). | |
| -- |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@templates/openmpi-no-gpu-defaults.lua.j2` around lines 1 - 8, This template
is missing the required top-of-file headers; update
templates/openmpi-no-gpu-defaults.lua.j2 to prepend the two mandatory header
lines so they are the first lines in the file: add {{ ansible_managed | comment
}} as the very first header and immediately below it add {{ "system_role:hpc" |
comment(prefix="", postfix="") }}, keeping the rest of the file (the shared
GPU-detection Lua fragment) unchanged so it remains includable by other
OpenMPI-derived module files.
There was a problem hiding this comment.
@spetrosi, thanks for confirming! You can apply the proposed fix to add the required headers to templates/openmpi-no-gpu-defaults.lua.j2:
+{{ ansible_managed | comment(decoration="-- ") }}
+{{ "system_role:hpc" | comment(decoration="-- ", prefix="", postfix="") }}
--
-- Shared GPU detection for OpenMPI-based MPI modules.
--
-- When no NVidia GPUs are present, disable the UCX/CUDA-based transports
-- that cause warnings or failures on GPU-less machines. This is sourced
-- by each OpenMPI-derived module file via dofile().
--Note that decoration="-- " is used here to match Lua's comment syntax, keeping the output consistent with the surrounding Lua code in the template.
This series:
Currently supported MPI libraries are now:
$ ml -t spider mpi/
mpi/hpcx-2.24.1-pmix-4.2.9
mpi/hpcx-2.24.1
mpi/mvapich-4.0
mpi/openmpi-x86_64
mpi/openmpi-5.0.8-cuda12-gpu
$
MVAPICH cannot run on non-GPU machines due to it's built in UCX library. OpenMPI uses modular transport infrastructure, so the cuda/GPU modules can be turned off and not loaded. Hence on a non-GPU machine:
$ ml mpi/mvapich-4.0
Error: MVAPICH 4.0 was built with CUDA/UCX support and requires NVidia GPUs. This machine has no GPUs. Use a different MPI module (e.g. hpcx or openmpi).
$ ml mpi/openmpi-5.0.8-cuda12-gpu
$ env |grep OMPI_MCA
OMPI_MCA_osc=^ucx
OMPI_MCA_btl=^smcuda
OMPI_MCA_pml=^ucx
OMPI_MCA_coll=^ucc,cuda,hcoll
$
The mvapich environment refuses to load, whilst the OpenMPI modules turn off all the CUDA modules and UCX transport functionality that requires cuda and/or GPU support. Hence the OpenMPI modules now work on machines with and without GPUs without the user having to do anything special.
Issue Tracker Tickets (Jira or BZ if any): https://redhat.atlassian.net/browse/RHELHPC-105
Summary by CodeRabbit
New Features
Refactor
Tests