1.58-bit ternary Mamba language model for Indian languages.
Weights are {-1, 0, +1} -- inference uses only addition and subtraction, no floating-point multiplies. A 3B model fits in ~750MB and runs at 20+ tokens/sec on mobile.
| Component | Detail |
|---|---|
| Base | Mamba SSM (selective state space, no attention) |
| Quantization | Ternary {-1, 0, +1} via STE, trained from scratch |
| Languages | 22+ Indian languages + English + code |
| Sizes | 54M (dev) / 541M / 1B / 3B (target) |
| Inference | Zero FP multiply -- add/sub only |
| Mobile target | 20+ tok/s on Snapdragon 8 Gen 2 |
- Pure PyTorch Mamba -- no CUDA dependency, runs on MPS/CPU/CUDA
- Hybrid generation -- Jacobi decoding, block diffusion, speculative refinement (zero model changes)
- Agentic code pipeline -- think (caveman) -> validate -> code -> review -> test -> final
- Hierarchical state cache -- encode entire codebases into SSM states for navigation
- Hallucination dampening -- per-channel confidence gates that lower (not remove) noisy nodes
- Distillation from Gemma 4 via LM Studio OpenAI-compatible API
# Install
pip install -e ".[dev]"
# Run tests
make test
# Train 54M dev model (CPU/MPS)
python scripts/train_small.py --device mps --epochs 10
# Train 541M model
python scripts/step4b_train.py --layers 8 --device mps --batch-size 2 --grad-accum 4
# Evaluate
python scripts/step4c_evaluate.py --checkpoint checkpoints/finetune-indic-small
# Download training data (one dataset per day)
python scripts/download_datasets.py --list
python scripts/download_datasets.py --day 1src/indicbitmamba/
model/ # Ternary Mamba architecture
mamba_lm.py # Full language model
mamba_block.py # SSM block (pure PyTorch)
bitlinear.py # Ternary linear layer with STE
quantization.py # Ternary/binary arithmetic primitives
hallucination_dampener.py # Per-channel confidence gates
ssm_ops.py # Selective scan (associative scan)
agent/ # Agentic code generation
code_agent.py # 7-phase pipeline (think->validate->code->review->edit->test->final)
hybrid_generate.py # Jacobi / block-diffusion / speculative generation
state_cache.py # Hierarchical SSM state cache for codebases
tokens.py # Phase control tokens
data/ # Dataset and tokenizer
training/ # Training loop, optimizer
distillation/ # Teacher model interface (Gemma 4)
eval/ # Evaluation and benchmarks
export/ # Ternary packing, ONNX, GGUF
scripts/ # CLI entrypoints
configs/ # Hydra YAML configs
tests/ # Unit tests
We need help training on more datasets across Indian languages. Each contributor can run one dataset at a time -- the tokenized shards are independent and can be merged.
python scripts/download_datasets.py --list # see all 21 datasets
python scripts/download_datasets.py --status # see what's donepython scripts/download_datasets.py --dataset sangraha-hindi
# Outputs: data/pretrain/sangraha-hindi/shard_0000.pt, shard_0001.pt, ...# Small model (any GPU or M-series Mac)
python scripts/train_small.py --data data/pretrain/sangraha-hindi --device mps
# Full model (24GB+ GPU)
python scripts/step4b_train.py --data data/pretrain/sangraha-hindi --device cudaOpen a PR with:
- Your
meta.json(dataset stats, no raw data) - Training logs and final checkpoint metrics
- Any issues or improvements found
See CONTRIBUTING.md for full details.
| Dataset | Languages | Est. tokens | License |
|---|---|---|---|
| Sangraha | 12 Indian | ~20B total | CC-BY-4.0 |
| Aya Collection | Multilingual | ~5B | Apache-2.0 |
| Wikipedia | Hindi/Tamil/Bengali | ~550M | CC-BY-SA-3.0 |
| CC-100 | Hindi/Tamil | ~3B | MIT-like |
| StarCoder | Python | ~20B | Terms of Use |
| Model | Params | d_model | Layers | FP16 size | Ternary packed |
|---|---|---|---|---|---|
| Dev | 54M | 768 | 4 | 216MB | ~13MB |
| Medium | 541M | 2816 | 8 | 2.1GB | ~135MB |
| Large | 1B | 2048 | 48 | 4GB | ~250MB |
| Target | 3B | 2560 | 64 | 12GB | ~750MB |
- Python >= 3.9
- PyTorch >= 2.2
- For Mac: MPS backend (M1/M2/M3/M4)
- For Linux: CUDA or CPU
- No
mamba-ssmdependency (pure PyTorch implementation)
Apache-2.0