This project sets up a GPU-accelerated development environment for the NVIDIA DGX Spark, using:
- NVIDIA GB10 GPU (Blackwell architecture, sm_121)
- CUDA 13.0
- PyTorch 2.9 nightly (NGC build, GB10-compatible)
- VS Code Web (code-server) accessible from any browser
- Go 1.25 & Node 22 available inside the same container
- HuggingFace model caching
- Large shared memory (32 GB) for model runtime stability
This environment is required because standard PyTorch wheels do NOT support GB10, and many Python packages break under Python 3.12 unless patched.
What you now have is a working base environment with:
✔ GPU recognized ✔ CUDA recognized ✔ PyTorch + Torchvision functional ✔ Code-Server accessible via browser ✔ RealESRGAN partially working (missing model file fixed manually) ✔ Diffusers, Transformers installed ✔ Go + Node installed ✔ Stable development workflow preserved outside the container
The GB10 uses Blackwell (sm_121) — newer than any public PyTorch wheels.
Public builds only support up to sm_90 / sm_120.
This creates errors like:
module 'torch' has no attribute float8_e4m3fn
CUDA available: False
And prevents torchvision kernels from loading.
The official NGC builds contain:
- CUDA 13
- Custom PyTorch build with sm121 kernels
- Torchvision built against the same CUDA
- Proper cuDNN, NCCL, and Blackwell support
These containers are the ONLY way to get PyTorch running correctly on DGX Spark today.
We use:
nvcr.io/nvidia/pytorch:25.09-py3
Later versions (25.11) also work.
Ubuntu 24.04 uses PEP 668 externally-managed Python, so:
pip install --upgrade pip
throws:
error: externally-managed-environment
Solution: use the Python inside the NGC container, which is not distro-managed.
basicsr imports a module that newer torchvision removed:
torchvision.transforms.functional_tensor
We created a patch shim:
import sys, types
import torchvision.transforms.functional as F
shim = types.ModuleType("torchvision.transforms.functional_tensor")
shim.rgb_to_grayscale = F.rgb_to_grayscale
sys.modules["torchvision.transforms.functional_tensor"] = shimDGX Spark Hardware
│
├── Docker (rootless / NVIDIA runtime)
│ ├── NVIDIA NGC PyTorch 25.xx (CUDA 13, sm_121)
│ ├── Code-Server (VSCode Web)
│ ├── Go 1.25
│ ├── NodeJS 22
│ ├── Python packages (diffusers, transformers, realesrgan, opencv…)
│ └── Shared memory 32GB
│
└── Host volumes
├── ./workspace → project files
├── ./data/code-server-config → VS Code settings
├── ./data/code-server-data → VS Code extensions
└── ~/.cache/huggingface → model cache
services:
server:
build:
context: .
dockerfile: Dockerfile
container_name: dgx-codeserver
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- PASSWORD=${PASSWORD:-changeme123}
working_dir: /workspace
ports:
- "8080:8080"
volumes:
- ./workspace:/workspace
- $HOME/.cache/huggingface:/root/.cache/huggingface
- ./data/code-server-data:/root/.local/share/code-server
- ./data/code-server-config:/root/.config/code-server
shm_size: "32g"
ipc: host
ulimits:
memlock:
soft: -1
hard: -1
stack:
soft: 67108864
hard: 67108864FROM golang:1.25-bookworm AS go
FROM node:22-bookworm AS node
FROM nvcr.io/nvidia/pytorch:25.09-py3
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
curl git nano ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workspace
COPY --from=go /usr/local/go /usr/local/go
ENV PATH="/usr/local/go/bin:${PATH}"
COPY --from=node /usr/local/bin/node /usr/local/bin/node
COPY --from=node /usr/local/lib/node_modules /usr/local/lib/node_modules
RUN ln -s /usr/local/lib/node_modules/npm/bin/npm-cli.js /usr/local/bin/npm && \
ln -s /usr/local/lib/node_modules/npm/bin/npx-cli.js /usr/local/bin/npx
RUN curl -fsSL https://code-server.dev/install.sh | sh
RUN mkdir -p /root/.config/code-server && \
printf "bind-addr: 0.0.0.0:8080\nauth: password\ncert: false\n" \
> /root/.config/code-server/config.yaml
EXPOSE 8080
CMD ["bash", "-lc", "code-server /workspace"]Open:
http://<DGX-SPARK-IP>:8080
Password is stored in:
data/code-server-config/config.yaml
or via .env:
PASSWORD=mysecret
Inside the container run:
python - << 'EOF'
import torch
print("Torch:", torch.__version__)
print("CUDA:", torch.version.cuda)
print("CUDA available:", torch.cuda.is_available())
print("Device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU only")
EOFExpected correct output:
Torch: 2.9.0a0+...nv25.xx
CUDA: 13.0
CUDA available: True
Device: NVIDIA GB10
Your VS Code Web environment is preserved across container rebuilds because you mounted:
./data/code-server-data:/root/.local/share/code-server
./data/code-server-config:/root/.config/code-server
./workspace:/workspace
~/.cache/huggingface:/root/.cache/huggingface
Your RealESRGAN workflow is partially working.
- torchvision incompatibility (shim)
- cv2 import
- environment install
- GPU detection
Download ESRGAN model weights:
mkdir -p models
wget -O models/RealESRGAN_x4plus.pth \
https://github.com/xinntao/Real-ESRGAN/releases/download/v0.3.0/RealESRGAN_x4plus.pth
Then update your Python script path:
ESRGAN_MODEL_PATH = "models/RealESRGAN_x4plus.pth"| Feature | Status | Notes |
|---|---|---|
| PyTorch | ✔ Working | NGC build required |
| Torchvision | ✔ Working | shim needed for realesrgan |
| Code-server | ✔ Working | password via config.yaml |
| GPU compute | ✔ Working | sm_121 kernels supported |
| RealESRGAN | ⚠ Partial | missing model file |
| Diffusers | ✔ Works | CUDA 13 fully supported |
| OpenCV | ✔ Works | required libGL installed |
| Go 1.25 | ✔ Works | from donor image |
| Node 22 | ✔ Works | from donor image |
You now have a fully functional GPU dev environment on DGX Spark with:
- Correct CUDA + PyTorch support for GB10
- Code-Server for browser-based IDE access
- Reproducible Docker setup
- Persistent configuration + extensions
- Multi-language runtime (Python, Go, Node)
- Ability to run heavy ML workloads in a controlled container
The remaining work is just model paths + pipeline polishing.