A minimal playground for language modeling research. The goal is to provide a clean, hackable base for training and experimenting with GPT-style models — without the overhead of a large framework. Ships with a standard pre-norm transformer (causal attention + SwiGLU MLP) and a training pipeline built on Hydra, W&B, and DDP. This pipeline is adapted from PlainLM.
Clone the repo with submodules (required for flash-linear-attention):
git clone --recurse-submodules https://github.com/philippnazari/quarkIf you already cloned without --recurse-submodules, fetch the submodules with:
git submodule update --init --recursiveThen install dependencies:
uv sync
uv run pre-commit installFor development (adds ruff and pre-commit):
uv sync --extra devDownload and tokenize FineWeb-Edu 10B into chunked Arrow files. Only needs to be run once — the result is reused across training runs.
.venv/bin/python -m data.datasets.prepare \
--dataset_path HuggingFaceFW/fineweb-edu \
--dataset_name sample-10BT \
--tokenizer gpt2 \
--seq_length 256 \
--out_path data/fineweb10B \
--n_tokens_valid 10000000Config is managed by Hydra (configs/). All keys can be overridden from the CLI.
Train the default transformer:
.venv/bin/python train.pyTrain DeltaNet:
.venv/bin/python train.py model=delta_netTrain GLA (Gated Linear Attention):
.venv/bin/python train.py model=glaScale to multiple GPUs with DDP:
torchrun --standalone --nproc_per_node=4 train.py model=delta_netAny config value can be overridden from the CLI:
.venv/bin/python train.py model=delta_net training.lr=1e-4 training.steps_budget=10000Print the fully resolved config without running:
.venv/bin/python train.py --cfg jobTraining logs to W&B and optionally saves checkpoints to out_dir/exp_name (configured in configs/config.yaml).
Before the first run, authenticate:
uv run wandb loginThe W&B project name and run name are set in configs/config.yaml:
logging:
wandb_project: quark # project name on wandb.ai
wandb_log: true
checkpoint:
exp_name: my_experiment # also used as the run name in W&B