manishklach / thermal-ctrl-harness Star 1 Code Issues Pull requests Thermal-aware batch controller for vLLM/TensorRT-LLM. Prevents HBM thermal throttling from killing p99 latency on H100/H200. Monitors nvidia-smi, auto-cuts batch size at 85°C, migrates cold KV to DRAM. Prometheus + Grafana included. 4.2s -> 2.1s p99 at 128K context. hpc gpu grafana cuda transformers prometheus pytorch nvidia sre hbm mlops inference-optimization h200 kv-cache vllm thermal-throttling llm-inference h100 tensorrt-llm p99-latency Updated Apr 13, 2026 Python