LLM Cluster Simulator

Analytical simulator for distributed LLM training and inference. Estimates MFU, memory, throughput, and cost for any model–cluster–parallelism combination without provisioning a single GPU. All computation happens client-side; no backend, no data leaves your browser.

Try it →

Why This Exists

Planning distributed training and inference sizing today means either booking cluster time to run experiments or relying on back-of-envelope math that breaks down the moment you add pipeline parallelism, expert parallelism, or mixed-precision communication. This simulator replaces both with a first-principles physics model: FLOPs, bytes transferred, pipeline bubbles computed analytically from hardware specs, model architecture, and parallelism layout.

Best for sweeping strategies, sanity-checking cluster budgets, and building intuition for parallelism tradeoffs — not a substitute for profiling production workloads.

What You Can Answer

Training

How many H100s do I need to train a 70B model in 30 days, and what's the cost?
What MFU should I expect for LLaMA 405B at 8K vs 131K with context parallelism?
What's the optimal parallelism layout for DeepSeek V3 on 256× H800 with FP8 and expert parallelism?

Inference

What's the TTFT/TPOT for LLaMA 70B on 8×H100 with speculative decoding?
Can LLaMA 70B run on 2× RTX 4090 in INT4 with paged attention?
How does continuous batching throughput scale with batch size and TP degree?

Try it yourself: DeepSeek V3 on 2048× H800 · LLaMA 3.1 405B on 16K× H100

Benchmarks

The physics model is calibrated against published training runs.

Model	GPUs	Strategy	Sim MFU	Published MFU	Source
LLaMA 3.1 405B	16384× H100	3D (TP8 PP16)	41.1%	~40%	Meta Table 4
LLaMA 3.1 405B 131K	16384× H100	3D + CP16	37.2%*	38%*	Meta Table 4
DeepSeek V3 671B FP8	2048× H800	3D + EP32	44.7%	43.7%	DeepSeek §3.1
Nemotron-4 340B	6144× H100	3D (TP8 PP12)	41.2%	41-42%	NVIDIA Table 2
OLMo 3 32B	1024× H100	FSDP (DP=1024)	43.4%	~41%	OLMo 3 (selective AC)

* Model FLOPs MFU — quadratic attention FLOPs at long sequences (Benchmarks)

Features

70+ models — LLaMA, DeepSeek, Qwen, Mistral, Gemma, Phi, Grok, GLM, OLMo, Kimi, and more. Dense, MoE, MLA, GQA.
25 GPUs — A100 through B200, MI300X, RTX 4090, A800/H800. Consumer to datacenter.
Full parallelism stack — DDP, ZeRO, FSDP, TP, PP (1F1B / Interleaved / DualPipeV), CP, SP, EP.
Auto-optimizer — Finds the fastest parallelism layout automatically.
Inference — TTFT, TPOT, speculative decoding, continuous batching, GGUF/GPTQ/AWQ, KV cache sizing.
Training — LoRA/QLoRA, FP8/FP4 mixed precision, selective activation checkpointing, cost projection.

What it does not model

Fused and custom kernels (FA3), NVMe/CPU offloading, runtime optimisations.
Serving frameworks (vLLM/TensorRT), disaggregated prefill/decode, dynamic batching, prefix caching.
Post-training: RLHF, RLVR, PPO, GRPO.
Non-training overhead: checkpointing, data loading, failure recovery.
TPUs, Trainium/Inferentia, non-IB clusters.

Learning

Learn Mode — 60 structured tasks across 6 tracks (training and inference, beginner to advanced). Each task sets up a scenario, defines success criteria, and provides progressive hints. Enter via the Learn button in navigation panel.
Space RPG — a narrative campaign teaching the full parallelism stack through a branching story with hardware unlocks, skill progression, and multi-objective challenges. Enter via the Play button in navigation panel.

Documentation

Overview — Architecture, definitions, reading guide.
Physics — Simulation formulas, constants, rationale.
Strategies — Parallelism strategy implementations.
Hardware — GPU specs, interconnects, topology.
Models — Model registry, architecture types, FLOPs.
Inference — Inference latency, KV cache, speculative decoding.
Optimizer — Recommendation engine, auto-optimizer.
Learning — Learn Mode tasks, Space RPG missions.
Benchmarks — Calibration data and known gaps.

Development

npm install
npm run dev
npm run test
npm run build

Stack: React 19 · TypeScript · Vite 7 · Tailwind CSS 4 · Zustand · Vitest

Citation

If you use this simulator in your work, please cite:

@misc{zhebrak2026llmclustersim,
  author       = {Zhebrak, Alex},
  title        = {{LLM Cluster Simulator}: Interactive Distributed Training and Inference Planning},
  year         = {2026},
  url          = {https://github.com/zhebrak/llm-cluster-simulator},
  doi          = {10.5281/zenodo.19365122},
  note         = {Browser-based simulator for GPU cluster parallelism strategies, calibrated against published benchmarks from Meta, DeepSeek, and NVIDIA}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs		docs
public		public
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Cluster Simulator

Why This Exists

What You Can Answer

Benchmarks

Features

What it does not model

Learning

Documentation

Development

Citation

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Cluster Simulator

Why This Exists

What You Can Answer

Benchmarks

Features

What it does not model

Learning

Documentation

Development

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages