p99-latency

Here is 1 public repository matching this topic...

manishklach / thermal-ctrl-harness

Thermal-aware batch controller for vLLM/TensorRT-LLM. Prevents HBM thermal throttling from killing p99 latency on H100/H200. Monitors nvidia-smi, auto-cuts batch size at 85°C, migrates cold KV to DRAM. Prometheus + Grafana included. 4.2s -> 2.1s p99 at 128K context.

Updated Apr 13, 2026
Python

Improve this page

Add a description, image, and links to the p99-latency topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the p99-latency topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p99-latency

Here is 1 public repository matching this topic...

manishklach / thermal-ctrl-harness

Improve this page

Add this topic to your repo