SD WebUI app incompatible with RTX 5090 (Blackwell sm_120) on Olares One

## Bug Description
 
The Stable Diffusion WebUI app from the Olares Market does not work on the Olares One hardware (RTX 5090 Mobile GPU). Every image generation attempt fails with:
 
```
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
```
 
## Root Cause (confirmed via diagnosis)
 
The SD WebUI container image ships with **torch 2.3.0 + CUDA 12.1**, which only supports up to `sm_90`. The RTX 5090 uses Blackwell architecture (`sm_120`) and requires **torch built against CUDA 12.8+** with `sm_120` kernels.
 
Confirmed by running inside the container shell:
```
$ pip show torch | grep -i version
Version: 2.3.0
 
$ python -c "import torch; print(torch.version.cuda); print(torch.cuda.get_arch_list())"
12.1
['sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']
```
 
No `sm_120` in the arch list = no Blackwell GPU support.
 
## Additional Blockers Found
 
1. **Read-only filesystem**: The container's `/opt/conda/` is read-only. `pip uninstall torch` fails with `PermissionError`. This makes it impossible to replace torch inside the running container.
 
2. **YAML command overrides don't work**: Even when modifying the deployment YAML to run `pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128`, pip sees "Requirement already satisfied" from the read-only system torch and skips the install.
 
3. **Workaround partially works but breaks other deps**: Installing torch nightly cu128 to `/tmp/torchfix` and using `PYTHONPATH=/tmp/torchfix` successfully loads torch 2.12+cu128 with `sm_120` support, and CUDA tensor operations pass. However, the new torch pulls in numpy 2.x which breaks the container's tensorflow, gradio, kornia, and other packages compiled against numpy 1.x — causing the WebUI to crash on startup.
 
## Environment
 
- **Hardware**: Olares One (Intel Core Ultra 9 275HX, NVIDIA RTX 5090 Mobile 24GB)
- **Olares OS**: 1.12.4
- **App**: Stable Diffusion WebUI (from Olares Market)
- **Container torch**: 2.3.0+cu121
- **Container Python**: 3.10 (Conda-based)
- **Host CUDA**: 12.8+ (Olares 1.12.0 release notes confirm "CUDA support extended to 12.9")
 
## What Works
 
The RTX 5090 GPU works correctly at the host level. `nvidia-smi` detects it, HAMI GPU scheduler assigns it to the container, and when torch nightly cu128 is manually installed to a separate path, CUDA operations succeed:
 
```
$ PYTHONPATH=/tmp/torchfix python -c "import torch; t=torch.randn(4,4,device='cuda'); print(torch.cuda.get_device_name(0)); print(t@t.T); print('PASS')"
NVIDIA GeForce RTX 5090 Laptop GPU
tensor([...], device='cuda:0')
PASS
```
 
## Expected Behavior
 
The SD WebUI app should work out of the box on Olares One, which ships with an RTX 5090.
 
## Suggested Fix
 
Rebuild the SD WebUI container image with:
- Base image: `nvidia/cuda:12.8.0-devel-ubuntu22.04` (or newer)
- PyTorch: stable release with cu128 (or nightly cu128)
- numpy < 2 (to maintain compatibility with existing packages)
- No xformers (use `--opt-sdp-attention` instead, as xformers crashes on Blackwell)
 
Alternatively, provide a separate SD WebUI image tagged for Blackwell/RTX 50-series GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD WebUI app incompatible with RTX 5090 (Blackwell sm_120) on Olares One #2727

Bug Description

Root Cause (confirmed via diagnosis)

Additional Blockers Found

Environment

What Works

Expected Behavior

Suggested Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SD WebUI app incompatible with RTX 5090 (Blackwell sm_120) on Olares One #2727

Description

Bug Description

Root Cause (confirmed via diagnosis)

Additional Blockers Found

Environment

What Works

Expected Behavior

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions