docs/pr: update Reddit draft to v0.9.2 + awesome list PR templates

unamedkr · claude · unamedkr · commit e66ea134a25a · 2026-04-09T18:11:42.000+09:00
Reddit draft updated: title uses "The SQLite of LLMs", body uses
Model.from_pretrained("Llama-3.2-1B") (better quality demo), v0.9.2
version, KV compression on by default messaging.

New: awesome-list-prs.md with ready-to-submit entries for 4 curated
lists (awesome-cpp 42K, awesome-production-ml 17K, awesome-llm 5K,
awesome-quantization 1K). Each entry formatted per list conventions.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/pr/2026-04-09-awesome-list-prs.md b/docs/pr/2026-04-09-awesome-list-prs.md
@@ -0,0 +1,64 @@
+# Awesome List PR Submissions
+
+Submit to 3-4 curated lists for sustained organic discovery.
+
+---
+
+## 1. awesome-cpp (42K stars)
+https://github.com/fffaraz/awesome-cpp
+
+**Section:** Artificial Intelligence
+
+**Entry:**
+```markdown
+* [quant.cpp](https://github.com/quantumaikr/quant.cpp) - Single-header (16K LOC) LLM inference engine with KV cache compression. Zero dependencies. `pip install quantcpp`. [Apache-2.0]
+```
+
+**PR title:** `Add quant.cpp — single-header LLM inference with KV compression`
+
+---
+
+## 2. awesome-production-machine-learning (17K stars)
+https://github.com/EthicalML/awesome-production-machine-learning
+
+**Section:** Model Serving and Monitoring → Optimization Tools
+
+**Entry:**
+```markdown
+* [quant.cpp](https://github.com/quantumaikr/quant.cpp) - Single-header C engine for LLM inference with KV cache compression (4-7x memory reduction). Zero deps, runs on iOS/Android/WASM/microcontrollers. PyPI: `pip install quantcpp`.
+```
+
+---
+
+## 3. awesome-llm (5K+ stars)
+https://github.com/Hannibal046/Awesome-LLM
+
+**Section:** Tools for LLM Inference
+
+**Entry:**
+```markdown
+* [quant.cpp](https://github.com/quantumaikr/quant.cpp) - "The SQLite of LLMs" — single-header (16K LOC, 646KB) C inference engine with built-in KV cache compression. 7 quantization types from TurboQuant/PolarQuant/QJL papers. `pip install quantcpp`.
+```
+
+---
+
+## 4. awesome-quantization (1K+ stars)
+https://github.com/htqin/awesome-model-quantization
+
+**Section:** Inference Engines / Frameworks
+
+**Entry:**
+```markdown
+* [quant.cpp](https://github.com/quantumaikr/quant.cpp) - Pure C reference implementation for KV cache quantization research. Implements TurboQuant (ICLR 2026), PolarQuant, QJL in a single-header library. 7 KV quant types with reproducible benchmarks. `pip install quantcpp`.
+```
+
+---
+
+## Submission checklist
+
+- [ ] Fork each repo
+- [ ] Add entry in alphabetical order within the section
+- [ ] PR title: concise, starts with "Add"
+- [ ] PR body: 2-3 sentence description + link to PyPI + note "zero dependencies"
+- [ ] Check each list's CONTRIBUTING.md for formatting requirements
+- [ ] Submit all 4 PRs on the same day (cross-visibility)
diff --git a/docs/pr/2026-04-09-reddit-v081-pip-install.md b/docs/pr/2026-04-09-reddit-v081-pip-install.md
@@ -1,26 +1,27 @@
-# Reddit r/LocalLLaMA — quantcpp v0.8.1 + `pip install` (EN)
+# Reddit r/LocalLLaMA — quantcpp v0.9.2 + `pip install` (EN)
 
-**Suggested title:** `[Project] quantcpp 0.8.1 — single-header KV-compressed LLM engine, now on PyPI`
+**Suggested title:** `[Project] quantcpp — "The SQLite of LLMs". Add AI to any C project with one 16K-line file. Now on PyPI.`
 
 **Suggested flair:** `Resources` or `Other`
 
 ---
 
 ## Body
 
-We just shipped **quantcpp 0.8.1** — a single-header C inference engine focused on **KV cache compression research**, now installable from PyPI:
+We just shipped **quantcpp 0.9.2** — a single-header C inference engine that you can `pip install` and use in 3 lines:
 
 ```bash
 pip install quantcpp
 ```
 
 ```python
 from quantcpp import Model
-m = Model("model.gguf")
-print(m.ask("What is 2+2?"))
+
+m = Model.from_pretrained("Llama-3.2-1B")  # auto-downloads ~750MB GGUF
+print(m.ask("What is gravity?"))
 ```
 
-Pre-built wheels for Linux x86_64, Linux aarch64, macOS arm64 (CPython 3.9–3.13). Other platforms fall back to source distribution and compile `quant.h` automatically — zero runtime dependencies.
+No API key, no GPU, no configuration. Model downloads once, cached locally. KV cache compression is on by default (4-bit, ~4x memory reduction). Pre-built wheels for Linux x86_64/aarch64, macOS arm64 (Python 3.9–3.13).
 
 ### What it is