Skip to content

siddharth23P/TriShakti

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TriShakti / IndicBitMamba

1.58-bit ternary Mamba language model for Indian languages.

Weights are {-1, 0, +1} -- inference uses only addition and subtraction, no floating-point multiplies. A 3B model fits in ~750MB and runs at 20+ tokens/sec on mobile.

Architecture

Component Detail
Base Mamba SSM (selective state space, no attention)
Quantization Ternary {-1, 0, +1} via STE, trained from scratch
Languages 22+ Indian languages + English + code
Sizes 54M (dev) / 541M / 1B / 3B (target)
Inference Zero FP multiply -- add/sub only
Mobile target 20+ tok/s on Snapdragon 8 Gen 2

Key features

  • Pure PyTorch Mamba -- no CUDA dependency, runs on MPS/CPU/CUDA
  • Hybrid generation -- Jacobi decoding, block diffusion, speculative refinement (zero model changes)
  • Agentic code pipeline -- think (caveman) -> validate -> code -> review -> test -> final
  • Hierarchical state cache -- encode entire codebases into SSM states for navigation
  • Hallucination dampening -- per-channel confidence gates that lower (not remove) noisy nodes
  • Distillation from Gemma 4 via LM Studio OpenAI-compatible API

Quick start

# Install
pip install -e ".[dev]"

# Run tests
make test

# Train 54M dev model (CPU/MPS)
python scripts/train_small.py --device mps --epochs 10

# Train 541M model
python scripts/step4b_train.py --layers 8 --device mps --batch-size 2 --grad-accum 4

# Evaluate
python scripts/step4c_evaluate.py --checkpoint checkpoints/finetune-indic-small

# Download training data (one dataset per day)
python scripts/download_datasets.py --list
python scripts/download_datasets.py --day 1

Project structure

src/indicbitmamba/
    model/              # Ternary Mamba architecture
        mamba_lm.py         # Full language model
        mamba_block.py       # SSM block (pure PyTorch)
        bitlinear.py         # Ternary linear layer with STE
        quantization.py      # Ternary/binary arithmetic primitives
        hallucination_dampener.py  # Per-channel confidence gates
        ssm_ops.py           # Selective scan (associative scan)
    agent/              # Agentic code generation
        code_agent.py        # 7-phase pipeline (think->validate->code->review->edit->test->final)
        hybrid_generate.py   # Jacobi / block-diffusion / speculative generation
        state_cache.py       # Hierarchical SSM state cache for codebases
        tokens.py            # Phase control tokens
    data/               # Dataset and tokenizer
    training/           # Training loop, optimizer
    distillation/       # Teacher model interface (Gemma 4)
    eval/               # Evaluation and benchmarks
    export/             # Ternary packing, ONNX, GGUF
scripts/                # CLI entrypoints
configs/                # Hydra YAML configs
tests/                  # Unit tests

How to contribute training

We need help training on more datasets across Indian languages. Each contributor can run one dataset at a time -- the tokenized shards are independent and can be merged.

1. Pick a dataset

python scripts/download_datasets.py --list     # see all 21 datasets
python scripts/download_datasets.py --status    # see what's done

2. Download and tokenize

python scripts/download_datasets.py --dataset sangraha-hindi
# Outputs: data/pretrain/sangraha-hindi/shard_0000.pt, shard_0001.pt, ...

3. Train on your hardware

# Small model (any GPU or M-series Mac)
python scripts/train_small.py --data data/pretrain/sangraha-hindi --device mps

# Full model (24GB+ GPU)
python scripts/step4b_train.py --data data/pretrain/sangraha-hindi --device cuda

4. Share results

Open a PR with:

  • Your meta.json (dataset stats, no raw data)
  • Training logs and final checkpoint metrics
  • Any issues or improvements found

See CONTRIBUTING.md for full details.

Datasets

Dataset Languages Est. tokens License
Sangraha 12 Indian ~20B total CC-BY-4.0
Aya Collection Multilingual ~5B Apache-2.0
Wikipedia Hindi/Tamil/Bengali ~550M CC-BY-SA-3.0
CC-100 Hindi/Tamil ~3B MIT-like
StarCoder Python ~20B Terms of Use

Model sizes

Model Params d_model Layers FP16 size Ternary packed
Dev 54M 768 4 216MB ~13MB
Medium 541M 2816 8 2.1GB ~135MB
Large 1B 2048 48 4GB ~250MB
Target 3B 2560 64 12GB ~750MB

Requirements

  • Python >= 3.9
  • PyTorch >= 2.2
  • For Mac: MPS backend (M1/M2/M3/M4)
  • For Linux: CUDA or CPU
  • No mamba-ssm dependency (pure PyTorch implementation)

License

Apache-2.0

About

1.58-bit ternary Mamba LLM for Indian languages. Weights are {-1,0,+1} — inference uses only add/sub. 3B model fits in 750MB, runs 20+ tok/s on mobile.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages