fix: add native RTX 50xx (Blackwell / sm_120) GPU support#195
Conversation
Switch base image to pytorch:2.8.0-cuda12.8-cudnn9-runtime which ships compiled CUDA kernels for sm_120. The previous cuda12.6 image only covered up to sm_90, making RTX 50xx GPUs unusable for diarization despite the forward-compatibility claim. Also pre-load audio via torchaudio.load and pass the waveform tensor dict to pyannote instead of a file path, bypassing torchcodec which fails under CUDA 12.8. libcublas.so.12 is preserved (CUDA 12.x), keeping faster-whisper unaffected. Closes #194 Relates to #179 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the backend Dockerfile to use PyTorch 2.8.0 with CUDA 12.8 and modifies the diarization service to pre-load audio via torchaudio to bypass torchcodec compatibility issues. However, the synchronous torchaudio.load call is executed on the main event loop, which will block concurrent requests. It is recommended to offload both the audio loading and pipeline execution to the executor.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #194
Relates to #179
Summary
pytorch:2.12.0-cuda12.6topytorch:2.8.0-cuda12.8-cudnn9-runtime— first PyTorch release with native sm_120 (Blackwell) compiled kernels; the previous image only covered up to sm_90torchaudio.loadand pass{"waveform": ..., "sample_rate": ...}to the pyannote pipeline instead of a file path, bypassingtorchcodecwhich fails under CUDA 12.8libcublas.so.12, soctranslate2/faster-whisper is unaffectedTest plan
🤖 Generated with Claude Code