🌀 dLLM Runner

Run Diffusion Language Models locally — like Ollama, but for dLLMs.

dLLM Runner is a lightweight CLI tool that lets you run Diffusion Language Models (Dream, LLaDA) on your own hardware with a simple dllm run command. No cloud, no API keys, full privacy.

What are Diffusion Language Models?

Traditional LLMs (GPT, LLaMA, Qwen) generate text one token at a time, left to right. Diffusion LLMs work differently — they start with a masked sequence and refine all tokens in parallel through iterative denoising, similar to how Stable Diffusion generates images.

This gives them unique advantages:

Parallel generation — potential for significantly faster inference
Bidirectional context — every token sees the full sequence, not just what came before
No "reversal curse" — they handle "A is B" ↔ "B is A" naturally
Superior planning — outperform much larger AR models on constraint tasks (Sudoku, Countdown)
Native infilling — fill in blanks anywhere in the text, not just at the end

Supported Models

Model	Params	VRAM (bf16)	VRAM (4bit)	Notes
`dream-7b`	7B	~14 GB	~5 GB	Best open dLLM for general use
`dream-7b-base`	7B	~14 GB	~5 GB	Untuned foundation model
`llada-8b`	8B	~16 GB	~5 GB	NeurIPS 2025 Oral paper
`llada-8b-base`	8B	~16 GB	~5 GB	Untuned foundation model
`llada2-mini`	16B MoE	~32 GB	~10 GB	1.4B active params, efficient
`llada2.1-mini`	16B MoE	~32 GB	~10 GB	Latest with token editing + RL
`llada2-flash`	100B MoE	~200 GB	~60 GB	Flagship, multi-GPU required
`llada2.1-flash`	100B MoE	~200 GB	~60 GB	Latest 100B, multi-GPU

Quick Start

Install from Release (recommended)

Linux / macOS:

# Download the latest release for your platform from Releases page, then:
tar xzf dllm-*.tar.gz
sudo mkdir -p /opt/dllm
sudo cp -r dllm python /opt/dllm/
sudo ln -s /opt/dllm/dllm /usr/local/bin/dllm

Windows:

Extract the .zip, add the folder to your PATH.

Build from source

git clone https://github.com/YOUR_USER/dllm-runner.git
cd dllm-runner
cargo build --release
sudo mkdir -p /opt/dllm/python
sudo cp target/release/dllm /opt/dllm/
sudo cp python/*.py /opt/dllm/python/
sudo ln -s /opt/dllm/dllm /usr/local/bin/dllm

Setup & Run

# 1. Install Python dependencies (PyTorch + CUDA, ~3 GB download)
dllm setup

# 2. Start chatting!
dllm run dream-7b --quantize 4bit

Usage

dllm <command> [options]

Commands:
  list                    Show all available models
  setup [--force]         Install Python environment + dependencies
  status                  Check GPU, CUDA, and dependency status
  run <model>             Interactive chat with a model
  pull <model>             Pre-download model weights from HuggingFace

Options for 'run':
  --quantize 4bit|8bit    Reduce VRAM usage (requires BitsAndBytes)
  --steps <N>             Number of diffusion steps (default: model-specific)

Examples

# Check your system is ready
dllm status

# Run Dream 7B in 4-bit quantization (~5 GB VRAM)
dllm run dream-7b --quantize 4bit

# Run LLaDA 8B at full precision
dllm run llada-8b

# Pre-download a model for offline use
dllm pull llada2.1-mini

Architecture

┌─────────────────────────┐
│  dllm (Rust CLI, ~1 MB) │   Orchestration, model management,
│  Fast startup, TUI      │   interactive chat loop
└────────┬────────────────┘
         │ JSON over stdin/stdout
         ▼
┌─────────────────────────┐
│  engine.py (Python)     │   PyTorch inference engine,
│  Runs in isolated venv  │   diffusion generation loop
└────────┬────────────────┘
         │
         ▼
┌─────────────────────────┐
│  PyTorch + CUDA         │   GPU-accelerated matrix ops,
│  HuggingFace models     │   model weights from HF Hub
└─────────────────────────┘

The Rust binary (~1 MB) handles CLI, process management, and model registry with instant startup. The Python engine runs inside an isolated venv (~/.dllm/venv), loading models via HuggingFace Transformers and executing the diffusion inference loop on GPU. They communicate through JSON messages over stdin/stdout.

Requirements

Python 3.10+
GPU (recommended): NVIDIA with CUDA 11.8+ and ≥8 GB VRAM
macOS: Apple Silicon with MPS acceleration (M1/M2/M3/M4)
CPU: Supported but very slow for 7B+ models
Disk: ~3 GB for Python deps + 5–14 GB per model

Platform Support

Platform	GPU Acceleration	Quantization (4/8bit)	Status
Linux x86_64	CUDA	✅	Full support
Linux ARM64	CUDA	✅	Full support
macOS Apple Silicon	MPS	❌	Works, no quantization
macOS Intel	CPU only	❌	Slow
Windows x86_64	CUDA	✅	Full support

Troubleshooting

dllm setup fails with "python3-venv not installed"

sudo apt install python3-venv python3-pip

PyTorch CUDA not detected after setup

dllm status          # Check what's detected
dllm setup --force   # Reinstall with auto-detection

Out of VRAM

dllm run dream-7b --quantize 4bit   # ~14 GB → ~5 GB

Model download slow / interrupted

dllm pull dream-7b   # Pre-download, resumes automatically

Acknowledgments

Dream — HKU & Huawei
LLaDA — Renmin University & Ant Group
LLaDA 2.0/2.1 — Ant Group inclusionAI

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
python		python
src		src
target		target
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌀 dLLM Runner

What are Diffusion Language Models?

Supported Models

Quick Start

Install from Release (recommended)

Build from source

Setup & Run

Usage

Examples

Architecture

Requirements

Platform Support

Troubleshooting

Acknowledgments

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

primoco/dllm-runner

Folders and files

Latest commit

History

Repository files navigation

🌀 dLLM Runner

What are Diffusion Language Models?

Supported Models

Quick Start

Install from Release (recommended)

Build from source

Setup & Run

Usage

Examples

Architecture

Requirements

Platform Support

Troubleshooting

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages