Skip to content

Commit e66ea13

Browse files
unamedkrclaude
andcommitted
docs/pr: update Reddit draft to v0.9.2 + awesome list PR templates
Reddit draft updated: title uses "The SQLite of LLMs", body uses Model.from_pretrained("Llama-3.2-1B") (better quality demo), v0.9.2 version, KV compression on by default messaging. New: awesome-list-prs.md with ready-to-submit entries for 4 curated lists (awesome-cpp 42K, awesome-production-ml 17K, awesome-llm 5K, awesome-quantization 1K). Each entry formatted per list conventions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6b781eb commit e66ea13

File tree

2 files changed

+71
-6
lines changed

2 files changed

+71
-6
lines changed
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Awesome List PR Submissions
2+
3+
Submit to 3-4 curated lists for sustained organic discovery.
4+
5+
---
6+
7+
## 1. awesome-cpp (42K stars)
8+
https://github.com/fffaraz/awesome-cpp
9+
10+
**Section:** Artificial Intelligence
11+
12+
**Entry:**
13+
```markdown
14+
* [quant.cpp](https://github.com/quantumaikr/quant.cpp) - Single-header (16K LOC) LLM inference engine with KV cache compression. Zero dependencies. `pip install quantcpp`. [Apache-2.0]
15+
```
16+
17+
**PR title:** `Add quant.cpp — single-header LLM inference with KV compression`
18+
19+
---
20+
21+
## 2. awesome-production-machine-learning (17K stars)
22+
https://github.com/EthicalML/awesome-production-machine-learning
23+
24+
**Section:** Model Serving and Monitoring → Optimization Tools
25+
26+
**Entry:**
27+
```markdown
28+
* [quant.cpp](https://github.com/quantumaikr/quant.cpp) - Single-header C engine for LLM inference with KV cache compression (4-7x memory reduction). Zero deps, runs on iOS/Android/WASM/microcontrollers. PyPI: `pip install quantcpp`.
29+
```
30+
31+
---
32+
33+
## 3. awesome-llm (5K+ stars)
34+
https://github.com/Hannibal046/Awesome-LLM
35+
36+
**Section:** Tools for LLM Inference
37+
38+
**Entry:**
39+
```markdown
40+
* [quant.cpp](https://github.com/quantumaikr/quant.cpp) - "The SQLite of LLMs" — single-header (16K LOC, 646KB) C inference engine with built-in KV cache compression. 7 quantization types from TurboQuant/PolarQuant/QJL papers. `pip install quantcpp`.
41+
```
42+
43+
---
44+
45+
## 4. awesome-quantization (1K+ stars)
46+
https://github.com/htqin/awesome-model-quantization
47+
48+
**Section:** Inference Engines / Frameworks
49+
50+
**Entry:**
51+
```markdown
52+
* [quant.cpp](https://github.com/quantumaikr/quant.cpp) - Pure C reference implementation for KV cache quantization research. Implements TurboQuant (ICLR 2026), PolarQuant, QJL in a single-header library. 7 KV quant types with reproducible benchmarks. `pip install quantcpp`.
53+
```
54+
55+
---
56+
57+
## Submission checklist
58+
59+
- [ ] Fork each repo
60+
- [ ] Add entry in alphabetical order within the section
61+
- [ ] PR title: concise, starts with "Add"
62+
- [ ] PR body: 2-3 sentence description + link to PyPI + note "zero dependencies"
63+
- [ ] Check each list's CONTRIBUTING.md for formatting requirements
64+
- [ ] Submit all 4 PRs on the same day (cross-visibility)

docs/pr/2026-04-09-reddit-v081-pip-install.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,27 @@
1-
# Reddit r/LocalLLaMA — quantcpp v0.8.1 + `pip install` (EN)
1+
# Reddit r/LocalLLaMA — quantcpp v0.9.2 + `pip install` (EN)
22

3-
**Suggested title:** `[Project] quantcpp 0.8.1 — single-header KV-compressed LLM engine, now on PyPI`
3+
**Suggested title:** `[Project] quantcpp — "The SQLite of LLMs". Add AI to any C project with one 16K-line file. Now on PyPI.`
44

55
**Suggested flair:** `Resources` or `Other`
66

77
---
88

99
## Body
1010

11-
We just shipped **quantcpp 0.8.1** — a single-header C inference engine focused on **KV cache compression research**, now installable from PyPI:
11+
We just shipped **quantcpp 0.9.2** — a single-header C inference engine that you can `pip install` and use in 3 lines:
1212

1313
```bash
1414
pip install quantcpp
1515
```
1616

1717
```python
1818
from quantcpp import Model
19-
m = Model("model.gguf")
20-
print(m.ask("What is 2+2?"))
19+
20+
m = Model.from_pretrained("Llama-3.2-1B") # auto-downloads ~750MB GGUF
21+
print(m.ask("What is gravity?"))
2122
```
2223

23-
Pre-built wheels for Linux x86_64, Linux aarch64, macOS arm64 (CPython 3.9–3.13). Other platforms fall back to source distribution and compile `quant.h` automatically — zero runtime dependencies.
24+
No API key, no GPU, no configuration. Model downloads once, cached locally. KV cache compression is on by default (4-bit, ~4x memory reduction). Pre-built wheels for Linux x86_64/aarch64, macOS arm64 (Python 3.9–3.13).
2425

2526
### What it is
2627

0 commit comments

Comments
 (0)