Skip to content

Releases: EricLBuehler/mistral.rs

v0.8.0

02 Apr 18:20
962112d

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.8.0

v0.7.0

28 Jan 06:10
b5af260

Choose a tag to compare

Highlights

  • New CLI: mistralrs-cli
  • Prefix Caching: We have implemented Prefix Caching for PagedAttention (#1750). This significantly accelerates multi-turn conversations and RAG workflows by reusing KV cache for shared prompt prefixes.
  • Major model expanstion: Support for the Embedding Gemma, Qwen 3 Embedding, Gemma 3n, GLM-4, Granite Hybrid MoE, GLM-4 MoE, GLM-4 MoE Lite
  • Dynamic model loading: Dynamic Model Loading: The server now supports loading and unloading models at runtime (#1828)
  • Performance: Added support for CUDA 13.0/13.1 (#1767) and introduced highly optimized fused kernels (GEMV, GLU) and blockwise FP8 kernels for significant speedups on NVIDIA GPUs.
  • candle 0.9.2: We have migrated to the official crates.io release of candle 0.9.2, stabilizing our backend dependencies!

New Models & Architectures

  • Embedding models: Qwen 3 Embedding, Embedding Gemma
  • Text models: GLM-4, GLM-4.7 Flash, Granite Hybrid, GPT-OSS
  • Vision models: Gemma 3n, Qwen 3 VL & Qwen 3 VL MoE

What's Changed

Read more

v0.6.0

10 Jun 23:28

Choose a tag to compare

🔥 Highlights from v0.6.0

🚀 Major Features

  • Llama 4 support and Qwen 3 / MoE / VL models, including DeepSeek and DeepCoder integrations
  • Multimodal prefix caching, paged attention scheduler improvements, and faster Metal/CUDA backends
  • Web chat app with chat history, file uploads, speech generation, and revamped tool-calling/search
  • Fast sampler and CPU FlashAttention with improved performance and accuracy
  • Metal and CUDA: major improvements in quantization (AFQ, ISQ), UQFF handling, and memory optimizations
  • MCP (Model Context Protocol): new server endpoints, docs, and integrated client
  • Vision and audio expansion: support for SIGLIP, Dia 1.6b TTS, conformer backbone (Phi-4MM), auto loaders, and vision tool prefixes

🧠 Inference Optimizations

  • Lightning-fast AFQ on CPU, optimized Qwen 3 MoE on Metal, and paged attention fixes
  • Unified FlashAttention backend and automatic method selection for ISQ
  • Metal precompilation support and reduced autorelease thrashing

🧰 Dev Improvements

  • Refactored engine architecture, KV cache, attention backends, and device mapping logic
  • Centralized dependency management and cleaner internal abstractions
  • Streamlined and faster LoRA support

🎉 Other

  • Revamped README, AGENTS.md, and new benchmarking scripts
  • Interactive mode now shows throughput, supports Gumbel sampling, and better runtime sampling controls
  • Expanded quant and GGUF support: AWQ, Qwen3 GGUF, and prequantized MLX compatibility

What's Changed

Read more

v0.5.0

24 Mar 04:16
7c086a9

Choose a tag to compare

Highlights

Blog post: https://huggingface.co/blog/EricB/mistralrs-v0-5-0

Thank you to all contributors for this release! This release includes the following highlights but also countless improvements, fixes, and optimizations.

  • Support for many more models:
    • Gemma 3
    • Qwen 2.5 VL
    • Mistral Small 3.1
    • Phi 4 Multimodal (image only)
  • Native tool calling support for:
    • Llama 3.1/3.2/3.3
    • Mistral Small 3
    • Mistral Nemo
    • Hermes 2 Pro
    • Hermes 3
  • Tensor Parallelism support (NCCL)!
  • FlashAttention V3 support and integration in PagedAttention
  • 30x reduction in ISQ times on Metal!
  • Revamped prefix cacher system

What's Changed

Read more

v0.4.0

22 Jan 19:39

Choose a tag to compare

New features

  • 🔥 New models!
    • DeepSeek V2
    • DeepSeek V3 and R1
    • MiniCpm-O 2.6
  • 🧮 Imatrix quantization
  • ⚙️ Automatic device mapping
  • BNB quantization
  • Support blockwise FP8 dequantization and FP8 on Metal
  • Integrate the llguidance library (@mmoskal)
  • Metal PagedAttention
  • Many fixes and improvements from contributors!

Breaking changes

  • The Rust device mapping API has changed.

MSRV

The MSRV of this release is 1.83.0.

What's Changed

Read more

v0.3.4

28 Nov 19:27
68c078f

Choose a tag to compare

New features

  • Qwen2-VL support
  • Idefics 3/SmolVLM support
  • ️‍🔥 6x prompt performance boost (all benchmarks faster than or comparable to MLX, llama.cpp)!
  • 🗂️ More efficient non-PagedAttention KV cache implementation!
  • Public tokenization API

Python wheels

The wheels now include support for Windows, Linux, and Mac with x84_64 and aarch64.

MSRV

1.79.0

What's Changed

New Contributors

Full Changelog: v0.3.2...v0.3.4

v0.3.2

28 Oct 15:44
57a8b03

Choose a tag to compare

Key changes

  • General improvements and fixes
  • ISQ FP8
  • GPTQ Marlin
  • 26% performance boost on Metal
  • Python package wheels are available. See below and the various PyPi packages.

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.3.2

v0.3.1

29 Sep 15:39
1caf83a

Choose a tag to compare

Highlights

  • UQFF
  • FLUX model
  • Llama 3.2 Vision model

MSRV

The MSRV of this release is 1.79.0.

What's Changed

Full Changelog: v0.3.0...v0.3.1

v0.3.0

02 Sep 17:27
ae71578

Choose a tag to compare

Highlights

  • New model topology feature: ISQ and device mapping
  • 🔥Faster FlashAttention support when batching
  • Removed plotly and associated JS dependencies
  • φ³ Support Phi 3.5, Phi 3.5 vision, Phi 3.5 MoE
  • Improved Rust API ergonomics
  • Support multiple (shaded) GGUF files

MSRV

The Rust MSRV of this version is 1.79.0

What's Changed

New Contributors

Full Changelog: v0.2.5...v0.3.0

v0.2.5

16 Aug 01:10
e64a71a

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.4...v0.2.5

Install mistralrs-server 0.2.5

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.5/mistralrs-server-installer.sh | sh

Download mistralrs-server 0.2.5

File Platform Checksum
mistralrs-server-aarch64-apple-darwin.tar.xz Apple Silicon macOS checksum
mistralrs-server-x86_64-apple-darwin.tar.xz Intel macOS checksum
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz x64 Linux checksum