Calculon-MoE - An extension of Calculon to support the modeling of Mixture of Experts (MoE) Architectures
If you don't have conda available:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
$HOME/miniconda3/bin/conda init bashconda env create -f environment.yml --name calculon-moe
conda activate calculon-moe
# Inside calculon-MoE folder, do:
makePYTHONPATH=. ./bin/calculon llm models/megatron-1T.json examples/megatron_1T_training_4096_original.json systems/a100_80g.json -PYTHONPATH=. ./bin/calculon llm-optimal-execution models/megatron-1T.json 5128 2520 float16 systems/a100_80g.json output.json -mRun a single calculon training modeling with GPT-like 1.8T MoE Transformer model (models/gpt-1.8T.json) and 4096 H100_80g GPUs (systems/H100_80g.json) used. The execution script (examples/gpt_1.8_training_4096.json) includes the details of the parameters (i.e., TP/DP/PP/EP/ES, etc) used for the execution.
PYTHONPATH=. ./bin/calculon llm models/gpt-1.8T.json examples/gpt_1.8_training_4096.json systems/H100_80g_sxm.json -Run a system execution optimizer for searching the space for GPT-like 1.8T Transformer. The following example searches the parallelization technique for 4096 H100 GPUs, and the Batch Size is 2048, which is specified internally in the calculon/llm/optimal_execution_MoE file:
PYTHONPATH=. ./bin/calculon llm-optimal-execution-moe models/gpt-1.8T.json 4096 2048 float16 systems/H100_80g_sxm.json output_gpt-1.8T_4096_2048.json -moe 16Run a system execution optimizer that flexibly searches all combinations of EP, ES and TP without constraints. The following example searches the parallelization technique for 4096 H100 GPUs, and the Batch Size is 2048, which is specified internally in the calculon/llm/optimal_execution_MoE_flexible file:
PYTHONPATH=. ./bin/calculon llm-optimal-execution-moe-flexible models/gpt-1.8T.json 4096 2048 float16 systems/H100_80g_sxm.json output_gpt-1.8T_4096_2048_flex.json -moe 16-
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models
Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini
Paper -
Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models
Mikhail Isaev, Nic McDonald, Larry Dennison, Richard Vuduc
Paper -
Scaling Infrastructure to Support Multi-Trillion Parameter LLM Training
Mikhail Isaev, Nic McDonald, Richard Vuduc
Paper