Measure the energy efficiency of LLM inference across different implementation configurations.
LLenergyMeasure is a Python framework for measuring the energy consumption, throughput, and computational cost (FLOPs) of LLM inference across different deployment configurations. It helps researchers compare the energy efficiency of different models, inference engines, and a wide range of implementation decisions — reproducibly and at publication quality.
- Multi-engine inference — Transformers, vLLM, TensorRT-LLM, SGLang (planned)
- GPU energy measurement — NVML, Zeus, CodeCarbon, others
- Smart sweep system — define parameter grids, run Cartesian product experiments automatically; intelligently managed sweep hierarchy scopes available config fields to appropriate engine/component, and ensures invalid combinations are removed
- Docker isolation — launches per-experiment containers with full GPU passthrough; latest docker images for each engine in registry with full runner configurability and local mode also available. Every study pre-flight now verifies that each image's
ExperimentConfigschema fingerprint matches the host's, aborting with an actionable rebuild hint on drift (llem doctorfor a one-shot check). - Reproducibility — fixed seeds, cycle ordering, thermal management, environment snapshots, effective config recorded (add others)
- Built-in datasets — AI Energy Score benchmark prompts included; custom JSONL datasets also supported
pip install "llenergymeasure[transformers]"Run your first measurement:
llem run --model gpt2 --engine transformersSee Installation for system requirements, Docker setup, and available extras. See Getting Started to run and interpret your first experiment.
| Guide | Description |
|---|---|
| Installation | System requirements, pip install, Docker setup path |
| Getting Started | First experiment, Transformers and Docker tracks |
| Docker Setup | NVIDIA Container Toolkit walkthrough for vLLM |
| Engine Configuration | Transformers vs vLLM, parameter support matrix |
| Study & Experiment Configuration | YAML reference, sweeps, config schema |
| CLI Reference | llem run, llem config, and llem doctor flags and options |
| Energy Measurement | NVML, Zeus, CodeCarbon backends, measurement mechanics |
| Measurement Methodology | Warmup, baseline, thermal management, reproducibility |
| Troubleshooting | Common issues, invalid combinations, getting help |
| Guide | Description |
|---|---|
| What We Measure | Plain-language explanation of energy, throughput, and FLOPs |
| Interpreting Results | How to read llenergymeasure output |
| Getting Started (Policy Maker) | Minimal path to running a measurement |
| Comparison with Other Benchmarks | MLPerf, AI Energy Score, CodeCarbon, Zeus context |
Contributions welcome. See the development install instructions to set up a local environment.