We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent b1786b9 commit d687ecaCopy full SHA for d687eca
README.md
@@ -49,6 +49,13 @@ for tok in m.generate("Once upon a time"):
49
print(tok, end="", flush=True)
50
```
51
52
+**Longer context with KV compression:**
53
+```python
54
+# KV compression is ON by default (kv_compress=1), using ~4x less cache memory.
55
+# This means you can safely extend context on the same hardware:
56
+m = Model("llama-3b.gguf", context_length=16384) # 16K context where FP32 only fits 4K
57
+```
58
+
59
Pre-built wheels for Linux x86_64/aarch64, macOS arm64 (Python 3.9-3.13). Other platforms compile from source automatically.
60
61
---
0 commit comments