NVIDIA GPU Throttle Monitor

A real-time monitoring tool for NVIDIA GPU throttling and performance issues. This tool provides instant visibility into power limits, thermal throttling, and other performance bottlenecks affecting your GPUs.

Features

Real-time Monitoring - Live updates of GPU status with configurable sampling interval
Multiple GPU Support - Monitor all GPUs or select specific ones
Throttle Detection - Identifies and explains all types of throttling:
- Power brake (hardware power limits)
- Thermal throttling (overheating protection)
- Software power caps
- Driver thermal limits
Visual History Graph - 40-sample rolling graph showing throttle events
Problem Descriptions - Clear, actionable explanations of detected issues
CSV Logging - Optional data export for analysis
Compact Mode - Automatic adjustment for small terminals
Two Backends - Uses PyNVML if available, falls back to nvidia-smi

Requirements

Linux system with NVIDIA driver installed
Python 3.6 or later
NVIDIA GPU with nvidia-smi support
Optional: pynvml package for better performance

Installation

Method 1: Install from GitHub using pip (Recommended)

# Install directly from GitHub
pip install git+https://github.com/voipmonitor/nvmonitor.git

# Or install with NVML support for better performance
pip install git+https://github.com/voipmonitor/nvmonitor.git#egg=nvmonitor[nvml]

Method 2: Clone and Install Locally

# Clone the repository
git clone https://github.com/voipmonitor/nvmonitor.git
cd nvmonitor

# Install the package
pip install .

# Or install with NVML support
pip install .[nvml]

Method 3: Run Without Installation

# Clone the repository
git clone https://github.com/voipmonitor/nvmonitor.git
cd nvmonitor

# Make the script executable
chmod +x nvmonitor.py

# Run directly
./nvmonitor.py

Usage

Basic Monitoring

After installation, you can run the tool from anywhere:

# Monitor all GPUs with default settings (1 second interval)
nvmonitor

# Monitor specific GPUs
nvmonitor --gpus 0,1

# Custom sampling interval (0.5 seconds)
nvmonitor --interval 0.5

# Run for specific duration (60 seconds)
nvmonitor --duration 60

# Save data to CSV file
nvmonitor --csv gpu_log.csv

Understanding the Display

The monitor shows for each GPU:

Power Draw - Current power consumption in Watts
SM Clock - Streaming Multiprocessor clock speed in MHz
Utilization - GPU compute utilization percentage
Temperature - Current GPU temperature in Celsius
History Graph - Visual timeline of throttle events:
- · = Normal operation
- █ (red) = Throttling detected
Status Line - Detailed problem description when issues occur

Problem Types and Solutions

Problem	Description	Solution
POWER LIMIT	GPU wants more power but is limited	Check PSU capacity, PCIe cables, increase power limit
OVERHEATING	Hardware thermal protection active	Improve cooling, check thermal paste
HOT	Driver thermal throttling	Improve airflow, reduce ambient temperature
CAPPED	Software power limit reached	Use `nvidia-smi -pl <watts>` to increase
THROTTLED	General hardware slowdown	Check for multiple concurrent issues

Command Line Options

--interval FLOAT    Sampling interval in seconds (default: 1.0)
--duration FLOAT    Run duration in seconds, 0 = infinite (default: 0)
--gpus STRING      Comma-separated GPU indices or "all" (default: all)
--csv PATH         Save monitoring data to CSV file

CSV Output Format

When using --csv, the tool saves:

Timestamp (ISO format with milliseconds)
GPU index
Power draw (W)
SM clock (MHz)
GPU utilization (%)
Temperature (°C)
Throttle mask (hexadecimal)
Human-readable problem description

Example Output

GPU Throttle Monitor │ Uptime: 45.2s │ NVML
────────────────────────────────────────────
GPU0: 250.3W │ 1890MHz │  98% │  75°C [HOT]
  History: ····················██████··········
  Status: POWER LIMIT: GPU wants more power but is limited by power delivery

GPU1: 180.5W │ 2100MHz │  87% │  62°C
  History: ········································
  Status: OK: No throttling

Troubleshooting

"Unable to query GPUs"

Ensure NVIDIA driver is installed: nvidia-smi
Check driver is loaded: lsmod | grep nvidia

Missing temperature or power readings

Some older GPUs don't support all metrics
Virtual GPUs (vGPU) may have limited telemetry

Inaccurate readings with nvidia-smi backend

Install pynvml for more accurate data: pip install pynvml

Technical Details

The tool monitors NVIDIA's clock event reason masks:

0x0080 - HW Power Brake Slowdown
0x0040 - HW Thermal Slowdown
0x0020 - SW Thermal Slowdown
0x0008 - HW Slowdown
0x0004 - SW Power Cap

License

This project is open source and available under the MIT License.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Author

Created for monitoring GPU performance in high-load environments where throttling can impact workload performance.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
nvmonitor.py		nvmonitor.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NVIDIA GPU Throttle Monitor

Features

Requirements

Installation

Method 1: Install from GitHub using pip (Recommended)

Method 2: Clone and Install Locally

Method 3: Run Without Installation

Usage

Basic Monitoring

Understanding the Display

Problem Types and Solutions

Command Line Options

CSV Output Format

Example Output

Troubleshooting

"Unable to query GPUs"

Missing temperature or power readings

Inaccurate readings with nvidia-smi backend

Technical Details

License

Contributing

Author

About

Uh oh!

Releases

Packages

Languages

License

voipmonitor/nvmonitor

Folders and files

Latest commit

History

Repository files navigation

NVIDIA GPU Throttle Monitor

Features

Requirements

Installation

Method 1: Install from GitHub using pip (Recommended)

Method 2: Clone and Install Locally

Method 3: Run Without Installation

Usage

Basic Monitoring

Understanding the Display

Problem Types and Solutions

Command Line Options

CSV Output Format

Example Output

Troubleshooting

"Unable to query GPUs"

Missing temperature or power readings

Inaccurate readings with nvidia-smi backend

Technical Details

License

Contributing

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages