test: Add MPI library test script#135
Conversation
Move all MPI-related tasks out of tasks/main.yml into a dedicated tasks/mpi.yml file for easier navigation and maintenance. This includes the precondition checks, OpenMPI/HPC-X/PMIx/GDRCopy build and install, etc. This is done in preparation for adding more MPI functionality. The main.yml file now includes mpi.yml via include_tasks at the point where the MPI blocks previously appeared (after RDMA packages, before Docker). Signed-off-by: Dave Chinner <dchinner@redhat.com>
mpifileutils provides MPI-based file utilities for parallel file operations including tools like dcp, drm, dsync, dfind, dwalk, dcmp, and dtar. The package is built from source using cmake with HPC-X MPI, matching the upstream azhpc-images build process. The build uses the same temporary directory pattern as the OpenMPI build: download and extract to a tempdir, build in a separate tempdir, install to the __hpc_azure_resource_dir/mpifileutils directory, then clean up both temp directories. A parameter check is added to ensure HPC-X MPI is available before attempting to build mpifileutils, since HPC-X provides the MPI compilers required for the cmake build. The package is only installed in Azure test environments (tests_azure.yml). All other test playbooks explicitly disable it to avoid requiring HPC-X MPI. Changes: - Add __hpc_mpifileutils_info to vars/RedHat_9.yml (version 0.12) - Add __hpc_mpifileutils_build_dependencies and __hpc_mpifileutils_install_dir to vars/main.yml - Add hpc_install_mpifileutils default (true) to defaults/main.yml - Add parameter validation check requiring hpc_build_openmpi_w_nvidia_gpu_support - Add download, build, and install tasks using tempdir pattern - Add mpifileutils build deps to the build dependency cleanup task - Disable mpifileutils in tests_default, tests_skip_toolkit, and tests_include_vars_from_parent Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: add mpifileutils package to the HPC system role. You will find the version to install in the versions.json file in the azhpc-images repository, and the way it needs to be built in components/install_mpifileutils.sh. You will install it to the __hpc_azure_resource_dir directory and use the same temporary build area construct as used for building the openmpi code. Refinements: - Disable mpifileutils in all non-Azure test playbooks so only tests_azure.yml installs it Signed-off-by: Dave Chinner <dchinner@redhat.com>
MVAPICH is a high-performance MPI implementation optimised for InfiniBand and other high-speed networks. Version 4.0 is built from source using the same temporary directory pattern as the OpenMPI build. The build uses ./configure with --enable-g=none --enable-fast=yes flags matching the upstream azhpc-images build process, and installs to /opt/mvapich-<version>. When hpc_build_mpi_w_nvidia_gpu_support is enabled, the build additionally passes --with-ucx and --with-cuda to configure so that MVAPICH is built with GPU-aware MPI support using the same UCX and CUDA paths as OpenMPI. An Lmod environment module is provided in lua format, consistent with the existing openmpi and hpcx modulefiles, allowing users to load MVAPICH via 'module load mpi/mvapich-4.0'. The module conflicts with other MPI modules so only one can be loaded at a time. When GPU support is enabled, the module also adds the UCX and CUDA library paths to LD_LIBRARY_PATH and PATH, matching the openmpi-cuda module. Changes: - Add __hpc_mvapich_info to vars/RedHat_9.yml (version 4.0) - Add __hpc_mvapich_install_dir to vars/main.yml - Add hpc_install_mvapich default (true) to defaults/main.yml - Add download, build, install, and modulefile tasks to tasks/mpi.yml - Add mvapich-ver.lua.j2 Lmod modulefile template - Disable hpc_install_mvapich in tests_default, tests_skip_toolkit, and tests_include_vars_from_parent - Rename hpc_build_openmpi_w_nvidia_gpu_support to hpc_build_mpi_w_nvidia_gpu_support as the flag now guards GPU support for both OpenMPI and MVAPICH builds - Conditionally pass --with-ucx and --with-cuda to MVAPICH configure when GPU support is enabled Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: add MVAPICH MPI library to the HPC system role. Use version 4.0 as per the reference versions.json, and the build instructions can be derived from components/install_mpis.sh. Ignore the other MPI libraries in that reference file. Add the lmod environment modules using the lua script format to needed to use the MVAPICH libraries similar to those installed by the system role for the openmpi library. Refinements: - configure with --with-device=ch4:ucx to use libucx as the network transport instead of the built in libfabrics code. - Add --with-ucx and --with-cuda configure flags guarded by hpc_build_mpi_w_nvidia_gpu_support for GPU-aware MPI support. - Rename hpc_build_openmpi_w_nvidia_gpu_support to hpc_build_mpi_w_nvidia_gpu_support since it now applies to multiple MPI library builds. - Add UCX and CUDA library/bin paths to the MVAPICH Lmod module when GPU support is enabled, matching the openmpi-cuda module. Signed-off-by: Dave Chinner <dchinner@redhat.com>
Separate the lmod environment module file installation from the MPI library build tasks into standalone task blocks. This allows modulefile changes to be deployed by re-running the playbook without triggering a rebuild of the MPI libraries, which significantly speeds up the iterative development and testing of lmod configuration changes. The OpenMPI-based module files (PMIx, HPC-X, HPC-X+PMIx, OpenMPI, and the no-GPU defaults helper) are grouped under a single block gated by hpc_build_mpi_w_nvidia_gpu_support. The MVAPICH module file has its own block gated by hpc_install_mvapich. Both blocks ensure the target directories exist before installing files. The template and copy modules are idempotent so these tasks are safe to run on every playbook invocation. Changes: - Remove PMIx modulefile install from the PMIx build block - Remove MPI module directory creation and HPC-X/OpenMPI/no-GPU helper installs from the GPU MPI build block - Remove MVAPICH module directory creation and modulefile install from the MVAPICH build block - Add new "Install OpenMPI-based lmod environment module files" block - Add new "Install MVAPICH lmod environment module file" block Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: having to rebuild the mpi libraries to install and test changes to the lmod configuration takes a long time. extract the lmod configuration file installation from each of the MPI library installs, and implement a single task that installs all of the individual lmod files. trigger the installation of the files if any of the MPI libraries is rebuilt, or if the /usr/share/modulefiles/mpi is missing. install the individual files according to the installation parameters for each of the MPI libraries that already exist. Signed-off-by: Dave Chinner <dchinner@redhat.com>
MPI libraries built with CUDA/GPU acceleration use UCX-based transports
that cause warnings or failures on machines without GPUs. This adds
runtime GPU detection to the lmod environment modules so that when no
NVidia GPUs are present, the GPU transports are automatically disabled.
For OpenMPI-derived libraries (OpenMPI, HPC-X), a shared Jinja include
fragment (openmpi-no-gpu-defaults.lua.j2) checks for /dev/nvidia0 and
sets OMPI_MCA environment variables to exclude ucx, smcuda, ucc, cuda,
and hcoll transports. The fragment is inlined into each module file at
template rendering time via {% include %}.
For MVAPICH (when built with GPU support), the module refuses to load
on machines without GPUs. MVAPICH hard-codes HPC-X UCX library paths
into libmpi.so at build time so it cannot fall back to system UCX.
The module issues an LmodError directing users to an alternative MPI
module instead.
Changes:
- Add templates/openmpi-no-gpu-defaults.lua.j2 shared GPU detection fragment
- Add {% include %} to openmpi-ver-cuda12-gpu.lua.j2
- Add {% include %} to hpcx-ver.lua.j2
- Add {% include %} to hpcx-ver-pmix-ver.lua.j2
- Add LmodError to mvapich-ver.lua.j2 to refuse loading on non-GPU machines
Created-by-AI: Claude Opus 4.6 (1M context)
Prompt: new modification: the MPI libraries that are optimised for CUDA and GPU acceleration need different option sets to run on machines without GPUs. All the OpenMPI derived libraries require mpirun/mpiexec to have "--mca pml ^ucx --mca btl ^smcuda --mca osc ^ucx --mca coll ^ucc,cuda,hcoll" to turn off all the underlying UCX-based GPU accelerations. MVAPICH will require a different set of parameters as it passes environment and config variables in a different manner. These need to be set up in the lmod environment modules for each MPI library. If the system does not have any GPUs in it, they should set up the default mpirun/exec environment to use these "avoid using cuda/GPU transports" mechanisms automatically.
Refinements:
- Use Jinja {% include %} to inline GPU detection at deploy time
- MVAPICH refuses to load on non-GPU machines via LmodError because it
hard-codes HPC-X UCX paths into libmpi.so at build time
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Add the OSU Micro-Benchmarks (OMB) package to the system role as an MPI implementation validation test suite. The role downloads and extracts the OMB source into the azure tests directory, and installs a test script that discovers all installed MPI modules via Lmod, builds OMB against each one, and runs a set of single-host MPI tests covering startup, point-to-point, and collective operations. If a module fails to load (e.g. mvapich on a non-GPU machine), the test script skips that module and continues testing the remaining modules rather than failing the entire test suite. The test script is designed to fail fast on the first error, leaving the build artifacts in /tmp/omb-builds/ for debugging. Startup tests run with 1 process, point-to-point tests with 2, and collective tests with nproc. Changes: - Add __hpc_omb_info to vars/RedHat_9.yml with OMB 8.0b2 URL and checksum - Add __hpc_azure_omb_dir to vars/main.yml for the OMB source location - Add hpc_install_mpi_tests default variable (true) - Add tasks to download, extract OMB and install the test script - Add test-mpi-omb.sh.j2 template for MPI validation - Disable mpi_tests in CI test configurations Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: start building a MPI implementation test suite. We will start with the OSU microbenchmark package, downloading it from https://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-8.0b2.tar.gz, calculating the sha256sum and then adding it to the system role. The system role will unpack it into the azure tests directory, and from there we will write a test script that iterates all the installed mpi libraries (via module loading) to build and run a set of tests from the OMB suite. Initially the test script will focus on running the tests on a single host, running tests on a cluster via a scheduler is a future modification. The test script will also begin by focussing on the MPI tests in the suite, more expansive functional testing is a future modification. Refinements: - Use ml -t spider mpi/ for module discovery instead of filesystem scanning - Remove Lmod init - rely on user shell environment already having modules loaded - fail() exits immediately to leave a debuggable corpse - Set np per test category: 1 for startup, 2 for pt2pt, nproc for collective - Do not use --allow-run-as-root as tests should run as a regular user - Skip modules that fail to load instead of aborting the test suite Signed-off-by: Dave Chinner <dchinner@redhat.com>
Add a -g CLI flag to test-mpi-omb.sh that builds the OSU Micro-Benchmarks with CUDA GPU support enabled. The CUDA configure flags are only available when hpc_build_mpi_w_nvidia_gpu_support was set during deployment; if -g is passed but GPU support was not built, the script exits with an error. Changes: - Add ENABLE_CUDA variable and -g option to getopts parsing - Conditionally pass --enable-cuda and --with-cuda to OMB configure - Use Jinja2 conditional to gate CUDA paths on hpc_build_mpi_w_nvidia_gpu_support - Error out if -g is used but MPI was not built with GPU support Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: add a CLI parameter to the MPI test script that builds the test code with CUDA and GPU functionality enabled. Signed-off-by: Dave Chinner <dchinner@redhat.com>
When the -g flag is passed, extend the test suite to exercise NCCL functionality via the OMB xccl benchmarks. The NCCL tests run standalone pt2pt benchmarks (latency, bandwidth, bidirectional bandwidth) and collective benchmarks (allreduce, allgather, bcast, reduce, reduce_scatter, alltoall) which exercise the NCCL communication library directly. The OMB configure is extended with --enable-ncclomb to build the NCCL benchmark binaries when CUDA support is enabled. Includes a workaround for an upstream OMB 8.0b2 bug where the xccl Makefile.am files are missing omb_color.c from the UTILITIES list, causing link failures. The fix patches the Makefile.am files and runs autoreconf before configure. The autotools packages (autoconf, automake, libtool) are moved from __hpc_openmpi_build_dependencies to a new __hpc_mpi_packages list so they persist after the build phase and are available for the autoreconf workaround at test time. Changes: - Add --enable-ncclomb to CUDA configure flags - Add NCCL xccl pt2pt tests (latency, bw, bibw) - Add NCCL xccl collective tests (allreduce, allgather, bcast, reduce, reduce_scatter, alltoall) - Workaround OMB 8.0b2 xccl link failure by adding omb_color.c to UTILITIES in Makefile.am and running autoreconf before configure - Move autoconf/automake/libtool from __hpc_openmpi_build_dependencies to __hpc_mpi_packages so they are not removed after building Created-by-AI: Claude Opus 4.6 (1M context) Prompt: new modification: extend the MPI test script to cover CUDA, GPU and NCCL related functionality provided by the OMB suite. Refinements: - Workaround upstream OMB 8.0b2 bug where xccl Makefile.am files are missing omb_color.c from the UTILITIES list. - Move autotools packages to persistent __hpc_mpi_packages list. - Remove MPI launcher GPU memory tests (-d cuda D D) as the launcher does not support per-benchmark GPU memory placement options. Signed-off-by: Dave Chinner <dchinner@redhat.com>
📝 WalkthroughWalkthroughThis PR refactors the HPC role's MPI installation system from a monolithic OpenMPI workflow into a modular multi-MPI architecture supporting OpenMPI, MVAPICH, and mpifileutils with optional GPU/HPC-X builds, complemented by Lmod environment modules and a comprehensive OSU Micro-Benchmarks testing harness. ChangesMPI Installation and Testing System Refactor
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tasks/mpi.yml`:
- Around line 478-485: The copy task "Copy OMB source to tests directory" uses
the Ansible copy module without an explicit mode causing risky-file-permissions
lint failures; update that task to set a safe explicit mode (e.g., mode: '0755'
or another appropriate octal) on the copy invocation so ownership and file
permissions are deterministic; modify the task that currently contains copy:
src/dest/owner/group to include mode: '0XXX' and ensure the chosen mode fits the
files' needs (executable vs data) to resolve the lint warning.
- Around line 6-17: The fail task that prevents building MPI with GPU support
should check the correct NCCL variable and use OR logic: change the when
condition on the fail task (the "Fail if role builds MPI with GPU support
without CUDA toolkit" task) to require hpc_build_mpi_w_nvidia_gpu_support and
(not hpc_install_cuda_toolkit or not hpc_install_hpc_nvidia_nccl) instead of
using hpc_install_nvidia_nccl and an AND; also update the fail message lines to
mention "hpc_install_hpc_nvidia_nccl: true" so the guidance shows the correct
variable name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 0e961e12-6582-4f7c-a0be-a14efddf25c8
📒 Files selected for processing (14)
defaults/main.ymltasks/main.ymltasks/mpi.ymltemplates/hpcx-ver-pmix-ver.lua.j2templates/hpcx-ver.lua.j2templates/mvapich-ver.lua.j2templates/openmpi-no-gpu-defaults.lua.j2templates/openmpi-ver-cuda12-gpu.lua.j2templates/test-mpi-omb.sh.j2tests/tests_default.ymltests/tests_include_vars_from_parent.ymltests/tests_skip_toolkit.ymlvars/RedHat_9.ymlvars/main.yml
| - name: Fail if role builds MPI with GPU support without CUDA toolkit | ||
| fail: | ||
| msg: | ||
| - Building MPI libraries with Nvidia GPU support requires multiple | ||
| - packages to be installed. | ||
| - You must set the following variables true | ||
| - "hpc_install_cuda_toolkit: true" | ||
| - "hpc_install_nvidia_nccl: true" | ||
| when: | ||
| - hpc_build_mpi_w_nvidia_gpu_support | ||
| - not hpc_install_cuda_toolkit | ||
| - not hpc_install_nvidia_nccl |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify variable usage consistency in MPI tasks/defaults.
rg -n "hpc_install_(hpc_)?nvidia_nccl|hpc_build_mpi_w_nvidia_gpu_support" \
defaults/main.yml tasks/main.yml tasks/mpi.yml tests/tests_*.ymlRepository: linux-system-roles/hpc
Length of output: 1235
Fix MPI GPU prerequisite guard: wrong NCCL variable and incorrect boolean logic.
The GPU-MPI precheck in tasks/mpi.yml uses hpc_install_nvidia_nccl but the role’s variable is hpc_install_hpc_nvidia_nccl (defined in defaults/main.yml and referenced in tests). As written, the guard’s when conditions are ANDed, so it only fails when both CUDA toolkit and NCCL are missing—rather than when either prerequisite is disabled.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tasks/mpi.yml` around lines 6 - 17, The fail task that prevents building MPI
with GPU support should check the correct NCCL variable and use OR logic: change
the when condition on the fail task (the "Fail if role builds MPI with GPU
support without CUDA toolkit" task) to require
hpc_build_mpi_w_nvidia_gpu_support and (not hpc_install_cuda_toolkit or not
hpc_install_hpc_nvidia_nccl) instead of using hpc_install_nvidia_nccl and an
AND; also update the fail message lines to mention "hpc_install_hpc_nvidia_nccl:
true" so the guidance shows the correct variable name.
| - name: Copy OMB source to tests directory | ||
| copy: | ||
| src: "{{ __hpc_pkg_extracted.path }}/" | ||
| remote_src: true | ||
| dest: "{{ __hpc_azure_omb_dir }}" | ||
| owner: root | ||
| group: root | ||
|
|
There was a problem hiding this comment.
Set explicit permissions when copying OMB sources.
The copy task leaves permissions implicit, which matches the lint failure (risky-file-permissions). Set mode explicitly.
💡 Proposed fix
- name: Copy OMB source to tests directory
copy:
src: "{{ __hpc_pkg_extracted.path }}/"
remote_src: true
dest: "{{ __hpc_azure_omb_dir }}"
owner: root
group: root
+ mode: "u+rwX,go+rX"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tasks/mpi.yml` around lines 478 - 485, The copy task "Copy OMB source to
tests directory" uses the Ansible copy module without an explicit mode causing
risky-file-permissions lint failures; update that task to set a safe explicit
mode (e.g., mode: '0755' or another appropriate octal) on the copy invocation so
ownership and file permissions are deterministic; modify the task that currently
contains copy: src/dest/owner/group to include mode: '0XXX' and ensure the
chosen mode fits the files' needs (executable vs data) to resolve the lint
warning.
| hpc_azure_disable_predictable_net_names: true | ||
| hpc_install_system_openmpi: true | ||
| hpc_build_openmpi_w_nvidia_gpu_support: true | ||
| hpc_build_mpi_w_nvidia_gpu_support: true |
There was a problem hiding this comment.
note that this changes the public API and is considered a breaking change
If this is really necessary, then
- the README.md should mark the old variable as deprecated, and should say to use the new one
- we should have logic to use the old variable if set
| hpc_build_mpi_w_nvidia_gpu_support: true | |
| hpc_build_mpi_w_nvidia_gpu_support: "{{ hpc_build_openmpi_w_nvidia_gpu_support | d(true) }}" |
optional: add a task to tasks/main.yml to tell the user that hpc_build_openmpi_w_nvidia_gpu_support is deprecated if it is defined, and to use hpc_build_mpi_w_nvidia_gpu_support instead
| + __hpc_mpifileutils_build_dependencies }} | ||
| state: present | ||
| use: "{{ (__hpc_server_is_ostree | d(false)) | | ||
| ternary('ansible.posix.rhel_rpm_ostree', omit) }}" |
There was a problem hiding this comment.
Does this need a register/until? Which package installation tasks need register/until?
This series:
This series is built on top of the mpi-updates branch posted in PR #134 and so will require that PR to be merged first.
The tests will not run if GPU testing is requested and there are no GPUs in the system:
If the MPI module doesn't load (e.g. because there are no GPUs in the system) then it will gracefully handle the load failure:
Issue Tracker Tickets (Jira or BZ if any): https://redhat.atlassian.net/browse/RHELHPC-118
Summary by CodeRabbit
New Features
Configuration Updates