Skip to content

SURF-ML/Megatron-LM-Snellius

Repository files navigation

Megatron-LM on Snellius

This codebase helps Snellius users to quickly set up their LLM pretraining tasks.

A few Snellius-specific pointers:

  • GPU: NVIDIA H100 SXM5 94GB (4 GPUs per node)
  • Interconnect: Infiniband HDR200
  • Persistent storage is provided upon grant agreement under /projects/0/prjsXXXX
  • Temporary storage (/scratch-shared/$USER/) for model checkpointing and log saving
  • Operating system: Red Hat Enterprise Linux 9.4 (Plow)

Code setup

To run this tutorial you must clone this repository in your home directory

git clone https://github.com/SURF-ML/Megatron-LM-Snellius

Please ensure that you obtain the following file hierarchy:

root (you are here)/
├── 0_build_container/ --- build the container
├── 1_download_data/ ----- download the dataset
├── 2_tokenization/ ------ prepare the tokens
├── 3_train/ ------------- train the model
├── Megatron-LM/ --------- Megatron-LM codebase submodule

If the Megatron-LM directory is empty you can download the code following:

git submodule update --init

Megatron LM is actively developped and breaking changes might be introduced in the library. To avoid using those changes you can checkout a particular commit of the library. If you encounter any issues with the Megatron library try the following steps:

cd Megatron-LM
git checkout 1f6cde85d23ff0c307a47bbdd8bfd778b95a161f
cd ../

Environment variable

Most of the script in the tutorial will ask you to specify the path to your project space. This path can be added in the bash file as:

export PROJECT_SPACE=/projects/0/prjsXXXX

You can also export this path in your .bashrc to have the project space persistent so you can always change directory easy with cd $PROJECT_SPACE:

echo 'export PROJECT_SPACE=/projects/0/prjsXXXX' >> ~/.bashrc
source ~/.bashrc

Structure of the tutorial

The tutorial consists of 4 parts:

Acknowledgments

Thanks to @spyysalo original LUMI Megatron-LM guide here which has tremendously helped this guide

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors