Independent AI Researcher | Energy Efficiency & Sustainable Computing
Paper (Draft) | Dashboard | Batch Size Analysis | Metadata
Breakthrough Finding: Discovered that bitsandbytes INT8 increases energy by 17-147% due to mixed-precision decomposition. Causal diagnosis via ablation recovered +79-98% throughput and β35-41% energy across consumer (RTX 4090D) and datacenter (A800) GPUs.
NEW β Batch Size Optimization: A800 sweep (BS 1β64) shows 95.7% energy reduction and 55.5Γ throughput scaling. BS=1 wastes 55% GPU capacity. Interactive results: View Dashboard β
Research Scope: This work focuses on energy efficiency diagnosis. Accuracy assessment (perplexity, downstream tasks) is not yet complete. Pure INT8 (
threshold=0.0) shows major performance gains, but accuracy impact requires validation. Next steps: PPL and MMLU evaluationβcontributions welcome!
Key Contributions:
- π― Root cause identified: Mixed-precision decomposition, not INT8 itself
- π 93+ measurements (CV < 1-2%) across 3 GPU architectures: RTX 5090 (Blackwell), RTX 4090D (Ada Lovelace), A800 (Ampere)
- π Batch size scaling law: 95.7% energy reduction (BS=1β64), validated on A800 with 70 measurements
- β Cross-platform validation: Consistent results across consumer & datacenter GPUs, multiple models
- π Full reproducibility: Complete metadata with software versions, configs, and protocols
- π Open data: All raw data, scripts, interactive dashboard, and provenance publicly available
Impact: Prevents industry from drawing wrong conclusions about INT8 quantization. Provides actionable guidance for practitioners deploying quantized LLMs in production.
Status: Preparing for arXiv submission. bitsandbytes Issues filed: #1851 (NF4) | #1867 (INT8 Energy)
All my benchmarks follow rigorous reproducibility standards:
- β Complete metadata with hardware specs, software versions, and model commits
- β Statistical rigor (n=10, CV < 2%, significance tests)
- β Open data with full provenance and reproducible scripts
- β Causal analysis via controlled experiments and ablations
- β Cross-architecture validation (Blackwell + Ada Lovelace + Ampere)
π View Metadata Standards β
Languages: Python, TypeScript, Bash
ML/AI: PyTorch, Transformers, bitsandbytes, CUDA
Data: Pandas, NumPy, SciPy, Matplotlib
Tools: Git, Docker, Jupyter, VS Code
Cloud: AutoDL, AWS (occasional)
Interactive dashboard for comparing AI models by accuracy, cost, and carbon footprint. Features a Batch Size Analysis page with interactive charts and cost calculator.
Tech: TypeScript, React, Recharts, TailwindCSS, GitHub Pages
Data: 93+ measurements, 8 models, 3 GPU architectures (RTX 5090, RTX 4090D, A800)
Systematic study of quantization energy efficiency on modern GPUs. Discovered two paradoxes (NF4 and bitsandbytes INT8) and provided causal diagnosis via ablation. Latest: batch size sweep reveals 95.7% energy reduction potential.
Tech: Python, PyTorch, NVML, bitsandbytes
Impact: Prevents ~30-96% energy waste in production LLM deployments
- Email: zhanghongping1982@gmail.com
- GitHub: @hongping-zh
- Dashboard: ecocompute-dynamic-eval
I believe in open science and reproducible research. Every benchmark I publish includes:
- Complete metadata (hardware, software, models)
- Raw data and analysis scripts
- Reproduction commands
- Known issues and resolutions
Goal: Make AI research more transparent, reproducible, and energy-efficient.
"Measure, don't assume. Reproduce, don't trust. Share, don't hoard."


