Skip to content

fix: add native RTX 50xx (Blackwell / sm_120) GPU support#195

Merged
NotYuSheng merged 2 commits into
mainfrom
fix/rtx50xx-blackwell-sm120-support
Jun 6, 2026
Merged

fix: add native RTX 50xx (Blackwell / sm_120) GPU support#195
NotYuSheng merged 2 commits into
mainfrom
fix/rtx50xx-blackwell-sm120-support

Conversation

@NotYuSheng

Copy link
Copy Markdown
Owner

Closes #194
Relates to #179

Summary

  • Switch backend base image from pytorch:2.12.0-cuda12.6 to pytorch:2.8.0-cuda12.8-cudnn9-runtime — first PyTorch release with native sm_120 (Blackwell) compiled kernels; the previous image only covered up to sm_90
  • Pre-load audio via torchaudio.load and pass {"waveform": ..., "sample_rate": ...} to the pyannote pipeline instead of a file path, bypassing torchcodec which fails under CUDA 12.8
  • CUDA 12.x preserves libcublas.so.12, so ctranslate2/faster-whisper is unaffected

Test plan

  • Build the Docker image and confirm no dependency errors
  • Run diarization on a machine with an RTX 50xx GPU (sm_120) to confirm the CUDA kernel error is resolved

🤖 Generated with Claude Code

Switch base image to pytorch:2.8.0-cuda12.8-cudnn9-runtime which ships
compiled CUDA kernels for sm_120. The previous cuda12.6 image only covered
up to sm_90, making RTX 50xx GPUs unusable for diarization despite the
forward-compatibility claim.

Also pre-load audio via torchaudio.load and pass the waveform tensor dict
to pyannote instead of a file path, bypassing torchcodec which fails under
CUDA 12.8. libcublas.so.12 is preserved (CUDA 12.x), keeping faster-whisper
unaffected.

Closes #194
Relates to #179

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the backend Dockerfile to use PyTorch 2.8.0 with CUDA 12.8 and modifies the diarization service to pre-load audio via torchaudio to bypass torchcodec compatibility issues. However, the synchronous torchaudio.load call is executed on the main event loop, which will block concurrent requests. It is recommended to offload both the audio loading and pipeline execution to the executor.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread backend/services/diarization_service.py Outdated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NotYuSheng NotYuSheng merged commit 729b714 into main Jun 6, 2026
2 checks passed
@NotYuSheng NotYuSheng deleted the fix/rtx50xx-blackwell-sm120-support branch June 6, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: RTX 50xx (Blackwell) diarization failure — sm_120 kernel support + torchcodec bypass

1 participant