Skip to content

Commit 0e1c5e3

Browse files
author
Engin Mahmut
committed
Update README.md
1 parent 3f261f6 commit 0e1c5e3

1 file changed

Lines changed: 15 additions & 24 deletions

File tree

README.md

Lines changed: 15 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
[![python](https://img.shields.io/badge/python-3.10%2B-blue)](https://github.com/mindtro/semafold)
66
[![license](https://img.shields.io/badge/license-Apache--2.0-green)](LICENSE)
77

8-
**Vector-first compression for embeddings, retrieval, and KV-cache workloads.**
8+
**Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy, no GPU required.**
99

1010
Semafold is a vector-first compression toolkit for AI workloads that compresses embeddings, retrieval representations, and cache-shaped KV tensors with explicit byte accounting, typed encode/decode contracts, and validation evidence. It is designed for teams building AI infrastructure that need measurable storage reduction without losing visibility into distortion, artifact size, or integration boundaries.
1111

@@ -20,6 +20,17 @@ It gives you:
2020
- deterministic synthetic validation and benchmarks
2121
- pure NumPy — no GPU, no CUDA, runs anywhere
2222

23+
## Compression Results
24+
25+
| Workload | Baseline | Setting | Artifact Size | Smaller | Ratio |
26+
|---|---:|---|---:|---:|---:|
27+
| Embedding `128 x 1536` | `float32` `786,432 B` | `TurboQuantMSE 3-bit` | `74,738 B` | `90.50%` | `10.52x` |
28+
| Embedding `128 x 1536` | `fp16/bf16` `393,216 B` | `TurboQuantMSE 3-bit` | `74,738 B` | `80.99%` | `5.26x` |
29+
| KV tensor `(4,8,256,128)` | `float32` `8,388,608 B` | `K=Prod 3b, V=MSE 3b` | `885,734 B` | `89.44%` | `9.47x` |
30+
| KV tensor `(4,8,256,128)` | `fp16/bf16` `4,194,304 B` | `K=Prod 3b, V=MSE 3b` | `885,734 B` | `78.88%` | `4.74x` |
31+
32+
Full benchmark details: [turboquant_benchmark_report.md](benchmarks/turboquant_benchmark_report.md)
33+
2334
Distribution / import names today:
2435
- distribution: `semafold`
2536
- import: `semafold`
@@ -202,33 +213,13 @@ Runnable versions of these examples live here:
202213
- [examples/turboquant_embedding.py](examples/turboquant_embedding.py)
203214
- [examples/turboquant_kv_block.py](examples/turboquant_kv_block.py)
204215

205-
## Benchmark Snapshot
216+
## Benchmark Details
206217

207-
The current benchmark story is strongest on synthetic, deterministic workloads. The table below summarizes measured outputs from:
218+
Benchmark runners and detailed report:
208219

209220
- [turboquant_paper_validation.py](benchmarks/turboquant_paper_validation.py)
210221
- [turboquant_synthetic_kv_benchmark.py](benchmarks/turboquant_synthetic_kv_benchmark.py)
211-
- benchmark summary:
212-
[turboquant_benchmark_report.md](benchmarks/turboquant_benchmark_report.md)
213-
214-
### Representative Results
215-
216-
| Workload | Baseline | Semafold Setting | Artifact Size | Smaller | Ratio |
217-
|---|---:|---|---:|---:|---:|
218-
| Embedding batch `128 x 1536` | dense `float32` `786,432 B` | `TurboQuantMSE 3-bit` | `74,738 B` | `90.50%` | `10.52x` |
219-
| Embedding batch `128 x 1536` | dense `fp16/bf16` `393,216 B` | `TurboQuantMSE 3-bit` | `74,738 B` | `80.99%` | `5.26x` |
220-
| Embedding batch `128 x 1536` | dense `float32` `786,432 B` | `TurboQuantProd 3 total bits` | `75,255 B` | `90.43%` | `10.45x` |
221-
| KV tensor `(4, 8, 256, 128)` | dense `float32` `8,388,608 B` | `K=Prod 3 bits, V=MSE 3 bits` | `885,734 B` | `89.44%` | `9.47x` |
222-
| KV tensor `(4, 8, 256, 128)` | dense `fp16/bf16` `4,194,304 B` | `K=Prod 3 bits, V=MSE 3 bits` | `885,734 B` | `78.88%` | `4.74x` |
223-
224-
### How To Read These Numbers
225-
226-
- `ratio` is baseline bytes divided by measured artifact bytes
227-
- `smaller` is percentage reduction in total stored bytes
228-
- artifact size includes payload, sidecars, and metadata
229-
- larger tensors amortize sidecar overhead better than very small blocks
230-
231-
One honest caveat: very small K/V blocks can still beat dense `float32`, while being only roughly equal to or slightly worse than dense `fp16`. The benchmark report shows that tradeoff explicitly.
222+
- [turboquant_benchmark_report.md](benchmarks/turboquant_benchmark_report.md)
232223

233224
## Benchmarks
234225

0 commit comments

Comments
 (0)