bitnet-rs

High-performance Rust inference engine for 1-bit BitNet LLMs.

Warning

Pre-alpha (v0.2.1-dev). QK256 uses scalar kernels (~0.1 tok/s on 2B models). GPU backends are scaffolded but not validated. Significant correctness, performance, and validation work remains. Do not use in production.

Quick Start

# Build (always specify features — defaults are empty)
cargo build --locked --no-default-features --features cpu

# Download a model
cargo run --locked -p xtask -- download-model --id microsoft/bitnet-b1.58-2B-4T-gguf

# Run inference
RUST_LOG=warn cargo run --locked -p bitnet-cli --no-default-features --features cpu,full-cli -- run \
  --model models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
  --tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json \
  --prompt "What is 2+2?" --max-tokens 8

# Interactive chat (auto-detects prompt template)
RUST_LOG=warn cargo run --locked -p bitnet-cli --no-default-features --features cpu,full-cli -- chat \
  --model models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
  --tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json

See docs/quickstart.md for the full getting-started guide.

Status

Feature	State	Notes
CPU inference (I2_S BitNet32)	Working	Production path; SIMD-optimised
CPU inference (I2_S QK256)	Working	Scalar only (~0.1 tok/s on 2B); AVX2 foundation merged
Cross-validation vs C++	Working	Per-token logits comparison framework
Honest-compute receipts	Working	Schema v1.0.0, 8 validation gates
Interactive chat (REPL)	Working	Auto-template detection, `/help`, `/clear`, `/metrics`
SafeTensors to GGUF export	Working	`bitnet-st2gguf` with F16 LayerNorm preservation
59+ prompt templates	Working	LLaMA-3, Phi-4, Qwen, Gemma, Mistral, DeepSeek, etc.
GPU inference (CUDA)	Scaffold	Feature-gated; receipt validation pending
GPU backends (Metal/Vulkan/OpenCL/ROCm)	Scaffold	Stubs only; not validated
Server / HTTP API	Incomplete	Health endpoints wired; inference endpoints have TODOs

Architecture

bitnet-tokenizers ──────────────────────────────────────┐
                                                         │
bitnet-models  (GGUF loader, dual I2_S flavor detection) │
  └── bitnet-quantization  (I2_S / TL1 / TL2 / IQ2_S)  │
        └── bitnet-kernels (AVX2 / AVX-512 / NEON / CUDA)│
                                                         ▼
                        bitnet-inference  (autoregressive engine)
                          ├── bitnet-logits / bitnet-sampling / bitnet-generation
                          ├── bitnet-prompt-templates  (59+ variants)
                          └── bitnet-receipts     (honest-compute schema)
                                                         │
                                          ┌──────────────┴──────────────┐
                                     bitnet-cli                  bitnet-server

The workspace contains ~200 crates. See docs/architecture-overview.md for details.

Building

cargo build --locked --no-default-features --features cpu           # CPU (development)
cargo build --locked --no-default-features --features gpu           # GPU (requires CUDA 12.x)
RUSTFLAGS="-C target-cpu=native -C opt-level=3 -C lto=thin" \
  cargo build --locked --release --no-default-features --features cpu,full-cli  # optimised release

Feature flags

Flag	Purpose
`cpu`	SIMD-optimised CPU inference (AVX2/AVX-512/NEON)
`gpu`	GPU umbrella — CUDA backend
`full-cli`	Enable all CLI subcommands
`ffi`	C++ FFI bridge for cross-validation
`fixtures`	GGUF fixture-based integration tests (test-only)

Nix: nix develop && nix build .#bitnet-cli && nix flake check — see Nix guide.

Testing

# Run all enabled tests (recommended)
cargo nextest run --locked --workspace --no-default-features --features cpu

# CI profile (4 threads, no retries)
cargo nextest run --locked --profile ci

# Lint
cargo fmt --all && cargo clippy --locked --all-targets --no-default-features --features cpu -- -D warnings

~58,700 test annotations spanning unit, property-based (proptest), snapshot (insta), fixture, fuzz (109 targets), and BDD categories. ~2,800 tests are intentionally #[ignore]-d with justification strings. See docs/development/test-suite.md.

Documentation

Organised by Diataxis:

Section	Contents
Tutorials	Getting started, first inference, tokenizer discovery
How-to	Install, run inference, export GGUF, cross-validate, validate models
Explanation	Architecture, quantization formats, feature flags
Reference	Inference CLI, Cross-validation CLI, environment variables, quantization

Contributing

See CONTRIBUTING.md. Before opening a PR:

./ci/local.sh   # or: cargo fmt --all && cargo clippy --locked ... && cargo nextest run --locked ...

Developer workflow boundary: new internal maintenance commands go in xtask; bitnet-task exists only to preserve legacy scripts/*.sh entrypoints while the migration is in flight.

See ROADMAP.md for project direction.

License

Dual-licensed under MIT and Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3,262 Commits
.agent/receipts		.agent/receipts
.cargo		.cargo
.claude		.claude
.config		.config
.copilot/notes		.copilot/notes
.githooks		.githooks
.github		.github
.jules		.jules
.kiro/specs		.kiro/specs
archive		archive
assets		assets
baselines		baselines
benches		benches
benchmarks/baselines/pr-448		benchmarks/baselines/pr-448
bin		bin
ci		ci
config		config
crates		crates
crossval		crossval
docker		docker
docs		docs
examples		examples
fuzz		fuzz
include		include
infra		infra
media		media
models		models
patches		patches
scripts		scripts
src		src
tests-new		tests-new
tests		tests
tools		tools
xtask-build-helper		xtask-build-helper
xtask		xtask
.coderabbit.yaml		.coderabbit.yaml
.crates.toml		.crates.toml
.crates2.json		.crates2.json
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.lychee.toml		.lychee.toml
.markdownlint.jsonc		.markdownlint.jsonc
.pre-commit-config.yaml		.pre-commit-config.yaml
.tokeignore		.tokeignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMPATIBILITY.md		COMPATIBILITY.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Justfile		Justfile
LICENSE		LICENSE
Makefile		Makefile
Makefile.ci		Makefile.ci
Makefile.minimal		Makefile.minimal
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
THIRD_PARTY.md		THIRD_PARTY.md
build.rs		build.rs
clippy.toml		clippy.toml
clippy_all_targets.txt		clippy_all_targets.txt
clippy_common.txt		clippy_common.txt
clippy_common2.txt		clippy_common2.txt
clippy_common3.txt		clippy_common3.txt
clippy_core.txt		clippy_core.txt
clippy_kernels.txt		clippy_kernels.txt
clippy_kernels2.txt		clippy_kernels2.txt
clippy_kernels3.txt		clippy_kernels3.txt
clippy_kernels4.txt		clippy_kernels4.txt
clippy_kernels5.txt		clippy_kernels5.txt
clippy_out.txt		clippy_out.txt
deny.toml		deny.toml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix
mutants.toml		mutants.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml
taplo.toml		taplo.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bitnet-rs

Quick Start

Status

Architecture

Building

Feature flags

Testing

Documentation

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bitnet-rs

Quick Start

Status

Architecture

Building

Feature flags

Testing

Documentation

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages