ML-IAP LAMMPS integration fails to parallelize #566
Unanswered
apoletayev
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
Hi, We need more information to be able to debug this.
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Sorry to bother again. I'm likely missing something very silly here, but the LAMMPS integration built as in (#556) fails to use multiple GPUs with the ML-IAP pair_style with OEQ. System: L40S GPU, CUDA 12.6, cuDNN 9.5, LAMMPS built with KOKKOS for this hardware. Models trained with nequip 0.15.0 and deployed as pair_nequip_allegro or for ML-IAP with OEQ. Models are TF32 but that should not matter?
call to LAMMPS (one GPU node, multiple GPUs on it):
mpirun -np $SLURM_NTASKS $LAMMPS -sf kk -k on g $SLURM_NTASKS -pk kokkos newton on neigh half -in input.inpair_nequipand running on one GPU (works as expected, no OOM), minimization / MD (no change to errors).Potentially related unanswered issue from MACE: ACEsuit/mace#1171
Error message (two GPUs):
Beta Was this translation helpful? Give feedback.
All reactions