TriShakti / IndicBitMamba

1.58-bit ternary Mamba language model for Indian languages.

Weights are {-1, 0, +1} -- inference uses only addition and subtraction, no floating-point multiplies. A 3B model fits in ~750MB and runs at 20+ tokens/sec on mobile.

Architecture

Component	Detail
Base	Mamba SSM (selective state space, no attention)
Quantization	Ternary {-1, 0, +1} via STE, trained from scratch
Languages	22+ Indian languages + English + code
Sizes	54M (dev) / 541M / 1B / 3B (target)
Inference	Zero FP multiply -- add/sub only
Mobile target	20+ tok/s on Snapdragon 8 Gen 2

Key features

Pure PyTorch Mamba -- no CUDA dependency, runs on MPS/CPU/CUDA
Hybrid generation -- Jacobi decoding, block diffusion, speculative refinement (zero model changes)
Agentic code pipeline -- think (caveman) -> validate -> code -> review -> test -> final
Hierarchical state cache -- encode entire codebases into SSM states for navigation
Hallucination dampening -- per-channel confidence gates that lower (not remove) noisy nodes
Distillation from Gemma 4 via LM Studio OpenAI-compatible API

Quick start

# Install
pip install -e ".[dev]"

# Run tests
make test

# Train 54M dev model (CPU/MPS)
python scripts/train_small.py --device mps --epochs 10

# Train 541M model
python scripts/step4b_train.py --layers 8 --device mps --batch-size 2 --grad-accum 4

# Evaluate
python scripts/step4c_evaluate.py --checkpoint checkpoints/finetune-indic-small

# Download training data (one dataset per day)
python scripts/download_datasets.py --list
python scripts/download_datasets.py --day 1

Project structure

src/indicbitmamba/
    model/              # Ternary Mamba architecture
        mamba_lm.py         # Full language model
        mamba_block.py       # SSM block (pure PyTorch)
        bitlinear.py         # Ternary linear layer with STE
        quantization.py      # Ternary/binary arithmetic primitives
        hallucination_dampener.py  # Per-channel confidence gates
        ssm_ops.py           # Selective scan (associative scan)
    agent/              # Agentic code generation
        code_agent.py        # 7-phase pipeline (think->validate->code->review->edit->test->final)
        hybrid_generate.py   # Jacobi / block-diffusion / speculative generation
        state_cache.py       # Hierarchical SSM state cache for codebases
        tokens.py            # Phase control tokens
    data/               # Dataset and tokenizer
    training/           # Training loop, optimizer
    distillation/       # Teacher model interface (Gemma 4)
    eval/               # Evaluation and benchmarks
    export/             # Ternary packing, ONNX, GGUF
scripts/                # CLI entrypoints
configs/                # Hydra YAML configs
tests/                  # Unit tests

How to contribute training

We need help training on more datasets across Indian languages. Each contributor can run one dataset at a time -- the tokenized shards are independent and can be merged.

1. Pick a dataset

python scripts/download_datasets.py --list     # see all 21 datasets
python scripts/download_datasets.py --status    # see what's done

2. Download and tokenize

python scripts/download_datasets.py --dataset sangraha-hindi
# Outputs: data/pretrain/sangraha-hindi/shard_0000.pt, shard_0001.pt, ...

3. Train on your hardware

# Small model (any GPU or M-series Mac)
python scripts/train_small.py --data data/pretrain/sangraha-hindi --device mps

# Full model (24GB+ GPU)
python scripts/step4b_train.py --data data/pretrain/sangraha-hindi --device cuda

4. Share results

Open a PR with:

Your meta.json (dataset stats, no raw data)
Training logs and final checkpoint metrics
Any issues or improvements found

See CONTRIBUTING.md for full details.

Datasets

Dataset	Languages	Est. tokens	License
Sangraha	12 Indian	~20B total	CC-BY-4.0
Aya Collection	Multilingual	~5B	Apache-2.0
Wikipedia	Hindi/Tamil/Bengali	~550M	CC-BY-SA-3.0
CC-100	Hindi/Tamil	~3B	MIT-like
StarCoder	Python	~20B	Terms of Use

Model sizes

Model	Params	d_model	Layers	FP16 size	Ternary packed
Dev	54M	768	4	216MB	~13MB
Medium	541M	2816	8	2.1GB	~135MB
Large	1B	2048	48	4GB	~250MB
Target	3B	2560	64	12GB	~750MB

Requirements

Python >= 3.9
PyTorch >= 2.2
For Mac: MPS backend (M1/M2/M3/M4)
For Linux: CUDA or CPU
No mamba-ssm dependency (pure PyTorch implementation)

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
configs		configs
scripts		scripts
src/indicbitmamba		src/indicbitmamba
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TriShakti / IndicBitMamba

Architecture

Key features

Quick start

Project structure

How to contribute training

1. Pick a dataset

2. Download and tokenize

3. Train on your hardware

4. Share results

Datasets

Model sizes

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TriShakti / IndicBitMamba

Architecture

Key features

Quick start

Project structure

How to contribute training

1. Pick a dataset

2. Download and tokenize

3. Train on your hardware

4. Share results

Datasets

Model sizes

Requirements

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages