Skip to content
View hongping-zh's full-sized avatar

Block or report hongping-zh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
hongping-zh/README.md

πŸ‘‹ Hi, I'm Hongping Zhang

Independent AI Researcher | Energy Efficiency & Sustainable Computing


πŸ”¬ Research Highlights

πŸ† Energy Efficiency of Quantized LLM Inference

Paper (Draft) | Dashboard | Batch Size Analysis | Metadata

Breakthrough Finding: Discovered that bitsandbytes INT8 increases energy by 17-147% due to mixed-precision decomposition. Causal diagnosis via ablation recovered +79-98% throughput and βˆ’35-41% energy across consumer (RTX 4090D) and datacenter (A800) GPUs.

NEW β€” Batch Size Optimization: A800 sweep (BS 1β†’64) shows 95.7% energy reduction and 55.5Γ— throughput scaling. BS=1 wastes 55% GPU capacity. Interactive results: View Dashboard β†’

Research Scope: This work focuses on energy efficiency diagnosis. Accuracy assessment (perplexity, downstream tasks) is not yet complete. Pure INT8 (threshold=0.0) shows major performance gains, but accuracy impact requires validation. Next steps: PPL and MMLU evaluationβ€”contributions welcome!

Key Contributions:

  • 🎯 Root cause identified: Mixed-precision decomposition, not INT8 itself
  • πŸ“Š 93+ measurements (CV < 1-2%) across 3 GPU architectures: RTX 5090 (Blackwell), RTX 4090D (Ada Lovelace), A800 (Ampere)
  • πŸ“ˆ Batch size scaling law: 95.7% energy reduction (BS=1β†’64), validated on A800 with 70 measurements
  • βœ… Cross-platform validation: Consistent results across consumer & datacenter GPUs, multiple models
  • πŸ”“ Full reproducibility: Complete metadata with software versions, configs, and protocols
  • 🌐 Open data: All raw data, scripts, interactive dashboard, and provenance publicly available

Impact: Prevents industry from drawing wrong conclusions about INT8 quantization. Provides actionable guidance for practitioners deploying quantized LLMs in production.

Status: Preparing for arXiv submission. bitsandbytes Issues filed: #1851 (NF4) | #1867 (INT8 Energy)


πŸ“Š Research Standards

All my benchmarks follow rigorous reproducibility standards:

  • βœ… Complete metadata with hardware specs, software versions, and model commits
  • βœ… Statistical rigor (n=10, CV < 2%, significance tests)
  • βœ… Open data with full provenance and reproducible scripts
  • βœ… Causal analysis via controlled experiments and ablations
  • βœ… Cross-architecture validation (Blackwell + Ada Lovelace + Ampere)

πŸ“ View Metadata Standards β†’


πŸ› οΈ Tech Stack

Languages: Python, TypeScript, Bash
ML/AI: PyTorch, Transformers, bitsandbytes, CUDA
Data: Pandas, NumPy, SciPy, Matplotlib
Tools: Git, Docker, Jupyter, VS Code
Cloud: AutoDL, AWS (occasional)


πŸ“ˆ Current Projects

🌱 EcoCompute Dynamic Eval

Interactive dashboard for comparing AI models by accuracy, cost, and carbon footprint. Features a Batch Size Analysis page with interactive charts and cost calculator.

Tech: TypeScript, React, Recharts, TailwindCSS, GitHub Pages
Data: 93+ measurements, 8 models, 3 GPU architectures (RTX 5090, RTX 4090D, A800)

πŸ”‹ Quantization Energy Research

Systematic study of quantization energy efficiency on modern GPUs. Discovered two paradoxes (NF4 and bitsandbytes INT8) and provided causal diagnosis via ablation. Latest: batch size sweep reveals 95.7% energy reduction potential.

Tech: Python, PyTorch, NVML, bitsandbytes
Impact: Prevents ~30-96% energy waste in production LLM deployments


πŸ“« Contact


πŸ’‘ Philosophy

I believe in open science and reproducible research. Every benchmark I publish includes:

  • Complete metadata (hardware, software, models)
  • Raw data and analysis scripts
  • Reproduction commands
  • Known issues and resolutions

Goal: Make AI research more transparent, reproducible, and energy-efficient.


πŸ“Š GitHub Stats

Hongping's GitHub stats

Top Langs

GitHub Streak


"Measure, don't assume. Reproduce, don't trust. Share, don't hoard."

Pinned Loading

  1. ecocompute-dynamic-eval ecocompute-dynamic-eval Public

    ⚑ Compare AI models by Accuracy Γ— Cost Γ— Carbon β€” RTX 5090 benchmarks reveal 4-bit quantization wastes energy on small models

    TypeScript 1