Fix NVML-dependent applications on Grace Blackwell GB10 unified memory architectures
๐ข Discussion: Join the conversation on NVIDIA Developer Forums
๐ค Collaboration: Looking to work with NVIDIA on official unified memory support - see NVIDIA_COLLABORATION.md
NVIDIA Grace Blackwell GB10 uses unified memory (128GB LPDDR5x shared between CPU and GPU). Standard NVML queries fail with NVML_ERROR_NOT_SUPPORTED because there's no dedicated GPU framebuffer.
Applications that break:
- Modular MAX Engine (
max generate --devices gpu) - nvtop (some metrics)
- TensorRT tools
- Any tool using NVML for device detection
Error:
failed to create device: No supported "gpu" device available.
CUDA information: NVML: Unable to get memory info
This shim intercepts NVML calls and provides CUDA runtime fallback when NVML fails. It's a drop-in replacement using LD_PRELOAD - no application modifications needed!
# Without shim - FAILS
$ max generate --devices gpu
Error: No GPU available
# With shim - WORKS!
$ LD_PRELOAD=./libnvml-unified.so max generate --devices gpu
โ Using NVIDIA GB10
โ Generating...makeRequirements:
- GCC
- CUDA toolkit (for headers and libcudart)
- NVIDIA driver installed
# Build test program
cd tests
gcc -o test_basic test_basic.c -lnvidia-ml
# Run without shim (will likely show errors on GB10)
./test_basic
# Run with shim (should work!)
NVML_SHIM_DEBUG=1 LD_PRELOAD=../libnvml-unified.so ./test_basic# MAX Engine
LD_PRELOAD=./libnvml-unified.so max generate --devices gpu --prompt "Hello"
# nvtop
LD_PRELOAD=./libnvml-unified.so nvtop
# Any other NVML app
LD_PRELOAD=./libnvml-unified.so your-appsudo make install
# Now available globally via /usr/local/lib/libnvml-unified.soApplication
โ calls nvmlDeviceGetMemoryInfo()
libnvml-unified.so (our shim)
โ intercepts call (LD_PRELOAD)
โ tries real NVML first
โ NVML returns ERROR_NOT_SUPPORTED
โ fallback to CUDA runtime + /proc/meminfo
โ returns unified memory stats
Application
โ receives success!
Uses LD_PRELOAD + dlsym(RTLD_NEXT, ...) to wrap NVML functions:
- nvmlDeviceGetCount: Tries NVML, falls back to
cudaGetDeviceCount() - nvmlDeviceGetHandleByIndex: Creates fake handles for fallback mode
- nvmlDeviceGetMemoryInfo: Reads
/proc/meminfo+ CUDA memory usage - nvmlDeviceGetName: Falls back to
cudaGetDeviceProperties()
nvmlReturn_t nvmlDeviceGetMemoryInfo(device, memory) {
ret = real_nvmlDeviceGetMemoryInfo(device, memory);
if (ret == NVML_ERROR_NOT_SUPPORTED) {
// Unified memory fallback
memory->total = get_system_memory_total(); // /proc/meminfo
memory->used = get_cuda_memory_used(device); // CUDA runtime
memory->free = memory->total - memory->used;
return NVML_SUCCESS;
}
return ret;
}- Device count detection (CUDA fallback)
- Device handle creation
- Memory info queries (v1 and v2)
- Device name queries
- Debug logging
- PCIe link queries โ NVLink-C2C translation
- Temperature/fan queries โ chassis controller
- Clock frequency queries โ LPDDR5x translation
- Process memory tracking
- Comprehensive NVML coverage
- Error handling edge cases
- Compiles cleanly
- Basic test passes
- Works with MAX Engine
- Works with nvtop
- Memory stats accurate
export NVML_SHIM_DEBUG=1
LD_PRELOAD=./libnvml-unified.so your-appOutput:
[NVML-SHIM] โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[NVML-SHIM] NVML Unified Memory Shim - PoC v0.1
[NVML-SHIM] Grace Blackwell GB10 Support
[NVML-SHIM] โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[NVML-SHIM] Initializing NVML shim...
[NVML-SHIM] nvmlDeviceGetCount() -> 1 devices (via CUDA fallback)
[NVML-SHIM] nvmlDeviceGetMemoryInfo() -> total=122880 MB (CUDA fallback)
- โ
NVIDIA DGX Spark (Grace Blackwell GB10)
- OS: Ubuntu 24.04.3 LTS
- Driver: 580.126.09
- CUDA: 12.8
- Memory: 128GB unified LPDDR5x
Note: Should work on other Grace Blackwell systems (GB200, GH200).
- Fake device handles: Uses index-as-pointer trick (works for most apps)
- Limited NVML coverage: Only essential functions for device detection
- No PCIe translation: Still reports PCIe metrics (incorrect for NVLink)
- No fan/thermal: Returns errors for chassis-managed metrics
This is a proof-of-concept to validate the approach. If it works with MAX Engine, we'll expand to full implementation.
Roadmap:
- โ Basic device detection (PoC)
- Test with MAX Engine
- Test with nvtop
- Full NVML API coverage
- Package for Ubuntu (deb)
- Submit to distributions
- Coordinate with NVIDIA on official support
MIT License - Use freely, contribute back!
TheTiz Homelab - Democratizing AI research through open methodology
Built with โค๏ธ for the Grace Blackwell developer community
Status: ๐ฎ SIDE QUEST ACTIVE Difficulty: โญโญโญโญ Started: 2026-01-27 Goal: Make MAX Engine work on GB10 Achievement: System Architect ๐
"It's only January and this is already the side quest of the year!" - TheTiz, 2026