Skip to content

Commit d687eca

Browse files
committed
docs: README context_length example
1 parent b1786b9 commit d687eca

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,13 @@ for tok in m.generate("Once upon a time"):
4949
print(tok, end="", flush=True)
5050
```
5151

52+
**Longer context with KV compression:**
53+
```python
54+
# KV compression is ON by default (kv_compress=1), using ~4x less cache memory.
55+
# This means you can safely extend context on the same hardware:
56+
m = Model("llama-3b.gguf", context_length=16384) # 16K context where FP32 only fits 4K
57+
```
58+
5259
Pre-built wheels for Linux x86_64/aarch64, macOS arm64 (Python 3.9-3.13). Other platforms compile from source automatically.
5360

5461
---

0 commit comments

Comments
 (0)