Peer-to-peer distributed AI inference using 1-bit quantized models. CPU-only, 70-82% energy savings, 103+ tokens/sec. Validated on Zen 4 & Zen 5 (+35% cross-gen improvement).
-
Updated
Jun 4, 2026 - Python
Peer-to-peer distributed AI inference using 1-bit quantized models. CPU-only, 70-82% energy savings, 103+ tokens/sec. Validated on Zen 4 & Zen 5 (+35% cross-gen improvement).
Navigable Degeneracy in the Roots of 1-Bit Language Models
Windows-native BitNet and ternary LLM inference with CPU GGUF, GPU runtime, terminal and browser chat, and release zips.
Two-piece Rust runtime + preprocessor for running medium-sized (1-7B) public LLMs on CPU-only humble hardware (older x86, ARM, retro). Currently: BitNet b1.58 2B with NEON kernel.
Desktop chat app for Microsoft's 1-bit BitNet LLMs. Windows-native, CPU-only, zero dependencies
Run 8B-parameter 1-bit LLMs on NVIDIA Jetson Nano (CUDA 10.2, SM 5.3). Patched PrismML/llama.cpp fork for Q1_0_g128 Bonsai models.
High-performance hybrid architecture for Agent Zero & BitNet b1.58. Natively optimized for Windows ARM64 (Snapdragon X Elite / Copilot+ PCs) using raw C++ inference and Docker-based agent orchestration.
First 1-bit (BitNet b1.58) recursive reasoner for Sudoku-Extreme - distilled from a 7M-param FP TRM teacher into a 1.4 MB ternary student
Add a description, image, and links to the 1-bit-llm topic page so that developers can more easily learn about it.
To associate your repository with the 1-bit-llm topic, visit your repo's landing page and select "manage topics."