ML-IAP model not loading to LAMMPS #556
-
|
A next-step error to a preceding question (#555). With updates to torch=2.8.0 and nequip=0.15.0 and OEQ on an H100 the models train but do not load into LAMMPS. Same versions of everything for training, preparing, and LAMMPS. I would appreciate any help or pointers on what could be the origins of these errors. Error: I then suppressed this with the environment variable If a model is prepared for ML-IAP with the OEQ modifier, the error is again different: The two errors are the same irrespective of whether the model was trained with OEQ; they only differ by whether the OEQ was included in the ML-IAP preparation step. In case it makes any difference, LAMMPS was built with the GPU package rather than KOKKOS because for a very heterogeneous cluster KOKKOS would have to be recompiled for every GPU generation (is KOKKOS necessary for ML-IAP?). I am not sure why the error trace mentions the CPU backend because everything was compiled and built and run from a GPU. LAMMPS cmake call: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
|
Hi @apoletayev , Our LAMMPS ML-IAP integration was implemented with Kokkos in mind. I'm guessing that some of the errors are related. I'm not sure about the serialization error, but indicates that the I think you should compile LAMMPS ML-IAP with Kokkos. I'm not sure entirely about this, but my impression is that the GPU generation constraint for Kokkos you have in mind might not actually be that strict. @wcwitt and @anjohan would know better. Side note that training with OEQ or not should be independent of whatever happens at inference since the OEQ model modification is not persistent, i.e. we modify the model for training, but when it's reconstructed during |
Beta Was this translation helpful? Give feedback.
I believe one needs the update specified in this blogpost: https://developer.nvidia.com/blog/enabling-scalable-ai-driven-molecular-dynamics-simulations/ , specifically