A real-time monitoring tool for NVIDIA GPU throttling and performance issues. This tool provides instant visibility into power limits, thermal throttling, and other performance bottlenecks affecting your GPUs.
- Real-time Monitoring - Live updates of GPU status with configurable sampling interval
- Multiple GPU Support - Monitor all GPUs or select specific ones
- Throttle Detection - Identifies and explains all types of throttling:
- Power brake (hardware power limits)
- Thermal throttling (overheating protection)
- Software power caps
- Driver thermal limits
- Visual History Graph - 40-sample rolling graph showing throttle events
- Problem Descriptions - Clear, actionable explanations of detected issues
- CSV Logging - Optional data export for analysis
- Compact Mode - Automatic adjustment for small terminals
- Two Backends - Uses PyNVML if available, falls back to nvidia-smi
- Linux system with NVIDIA driver installed
- Python 3.6 or later
- NVIDIA GPU with nvidia-smi support
- Optional:
pynvmlpackage for better performance
# Install directly from GitHub
pip install git+https://github.com/voipmonitor/nvmonitor.git
# Or install with NVML support for better performance
pip install git+https://github.com/voipmonitor/nvmonitor.git#egg=nvmonitor[nvml]# Clone the repository
git clone https://github.com/voipmonitor/nvmonitor.git
cd nvmonitor
# Install the package
pip install .
# Or install with NVML support
pip install .[nvml]# Clone the repository
git clone https://github.com/voipmonitor/nvmonitor.git
cd nvmonitor
# Make the script executable
chmod +x nvmonitor.py
# Run directly
./nvmonitor.pyAfter installation, you can run the tool from anywhere:
# Monitor all GPUs with default settings (1 second interval)
nvmonitor
# Monitor specific GPUs
nvmonitor --gpus 0,1
# Custom sampling interval (0.5 seconds)
nvmonitor --interval 0.5
# Run for specific duration (60 seconds)
nvmonitor --duration 60
# Save data to CSV file
nvmonitor --csv gpu_log.csvThe monitor shows for each GPU:
- Power Draw - Current power consumption in Watts
- SM Clock - Streaming Multiprocessor clock speed in MHz
- Utilization - GPU compute utilization percentage
- Temperature - Current GPU temperature in Celsius
- History Graph - Visual timeline of throttle events:
·= Normal operation█(red) = Throttling detected
- Status Line - Detailed problem description when issues occur
| Problem | Description | Solution |
|---|---|---|
| POWER LIMIT | GPU wants more power but is limited | Check PSU capacity, PCIe cables, increase power limit |
| OVERHEATING | Hardware thermal protection active | Improve cooling, check thermal paste |
| HOT | Driver thermal throttling | Improve airflow, reduce ambient temperature |
| CAPPED | Software power limit reached | Use nvidia-smi -pl <watts> to increase |
| THROTTLED | General hardware slowdown | Check for multiple concurrent issues |
--interval FLOAT Sampling interval in seconds (default: 1.0)
--duration FLOAT Run duration in seconds, 0 = infinite (default: 0)
--gpus STRING Comma-separated GPU indices or "all" (default: all)
--csv PATH Save monitoring data to CSV file
When using --csv, the tool saves:
- Timestamp (ISO format with milliseconds)
- GPU index
- Power draw (W)
- SM clock (MHz)
- GPU utilization (%)
- Temperature (°C)
- Throttle mask (hexadecimal)
- Human-readable problem description
GPU Throttle Monitor │ Uptime: 45.2s │ NVML
────────────────────────────────────────────
GPU0: 250.3W │ 1890MHz │ 98% │ 75°C [HOT]
History: ····················██████··········
Status: POWER LIMIT: GPU wants more power but is limited by power delivery
GPU1: 180.5W │ 2100MHz │ 87% │ 62°C
History: ········································
Status: OK: No throttling
- Ensure NVIDIA driver is installed:
nvidia-smi - Check driver is loaded:
lsmod | grep nvidia
- Some older GPUs don't support all metrics
- Virtual GPUs (vGPU) may have limited telemetry
- Install
pynvmlfor more accurate data:pip install pynvml
The tool monitors NVIDIA's clock event reason masks:
0x0080- HW Power Brake Slowdown0x0040- HW Thermal Slowdown0x0020- SW Thermal Slowdown0x0008- HW Slowdown0x0004- SW Power Cap
This project is open source and available under the MIT License.
Contributions are welcome! Please feel free to submit issues or pull requests.
Created for monitoring GPU performance in high-load environments where throttling can impact workload performance.