You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update README.md to include RabbitLLM logo, clarify compatibility with Qwen2 and Qwen3 models, and provide detailed architecture support status. Remove outdated macOS installation instructions and enhance model loading examples for better user guidance.
RabbitLLM enables inference on large language models (70B+ parameters) on consumer GPUs with as
11
-
little as 4GB VRAM by streaming model layers one at a time through GPU memory. No quantization,
12
-
distillation, or pruning needed — full model quality.
12
+
RabbitLLM is a **fork of [AirLLM](https://github.com/airllm/airllm)**. It enables inference on large language models (70B+ parameters) on consumer GPUs with as little as 4GB VRAM by streaming model layers one at a time through GPU memory. No quantization, distillation, or pruning needed — full model quality.
13
+
14
+
### Compatibility (current status)
15
+
16
+
-**Tested and supported:** only **Qwen2** and **Qwen3** are currently tested and compatible. Use these families for reliable results.
17
+
-**Other architectures** (Llama, Mistral, Mixtral, etc.) are present in the codebase but **not yet compatible** — use at your own risk.
18
+
-**Apple (macOS / Apple Silicon)** is **not supported**; run on Linux or Windows with a CUDA-capable GPU (or CPU fallback on x86/ARM Linux).
13
19
14
20
## How it works
15
21
@@ -42,7 +48,7 @@ If the prebuilt wheel is unavailable for your setup, install from
42
48
```python
43
49
from rabbitllm import AutoModel
44
50
45
-
model = AutoModel.from_pretrained("meta-llama/Llama-3-8B")
51
+
model = AutoModel.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct") # or any Qwen2 / Qwen3
46
52
47
53
input_tokens = model.tokenizer(
48
54
["What is the capital of France?"],
@@ -68,19 +74,21 @@ no need to pick the right class manually.
0 commit comments