Skip to content

soy-tuber/localllama-insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

r/LocalLLaMA Technical Insights

Technical blog articles distilled from high-scoring r/LocalLLaMA posts (2025). Generated by Nemotron 9B running locally on vLLM.

Disclaimer: These articles are based on unverified community information from Reddit. Numbers, benchmarks, and claims are self-reported by original posters. Always verify before relying on any data.

Articles

vLLM & Inference

# Article Reddit Score
00 Don't Waste Electricity Running vLLM — Use This Patch 303
05 Benchmarking LLM Inference Backends: vLLM, LMDeploy, MLC-LLM, TensorRT-LLM 50
06 DeepSeek Open-Sources nano-vLLM 621
07 GH200 Desktop: vLLM Tuning Notes (TP vs PP, max-num-seqs) 648
08 Megakernel Doubles Batch-1 Inference Speed 73

Quantization & FP8/NVFP4

# Article Reddit Score
03 Software FP8: 3x Speedup Without Hardware Support 266
04 8+ Hours Benchmarking Every MoE Backend for Qwen3.5-397B NVFP4 223
11 NVIDIA NVFP4: 4-bit Pretraining Matches FP8 Accuracy 808

Hardware & Multi-GPU

# Article Reddit Score
02 Dual-GPU Boosts Speed Despite Common Wisdom (5090 vs H100) 161
09 Patched P2P Driver Enables Multi-5090 Systems 86
12 Qwen3-30B FP8 on RTX Pro 6000 Blackwell: 88.4 tok/s 96
13 RTX Pro 6000 vLLM Benchmark: 120B Model Analysis 173

KV Cache & Optimization

# Article Reddit Score
01 KV Cache RAM Swap is ~10x Faster Than Recomputation 220
14 LMCache: Reuse Non-Prefix KV Cache, 3x RAG Speedup 127

Setup Guides

# Article Reddit Score
10 Qwen3-Next 80B FP8 on WSL2 + vLLM + Docker (Blackwell) 86

How This Was Made

  1. Downloaded 90K r/LocalLLaMA posts via Arctic Shift
  2. Filtered to 485 high-quality technical posts (score >= 20, 2025+, tech-signal detection)
  3. Selected 15 posts most relevant to vLLM / Blackwell / FP8 stack
  4. Generated articles with Nemotron 9B Japanese on local vLLM (RTX 5090)
  5. Pipeline orchestrated by Claude Code (Opus 4.6)

Contributing

Found an error? Have additional context? Issues and corrections are welcome!

License

Articles are derivative of Reddit posts (user-generated content). Shared for educational purposes.

About

Technical insights from r/LocalLLaMA — vLLM, FP8, NVFP4, Blackwell GPU benchmarks, and more. Unverified community knowledge, generated by Nemotron 9B. Issues welcome.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors