🚀 DGX Spark GPU Development Environment

CUDA 13 · NVIDIA GB10 · PyTorch 2.9 NGC · Code-Server Web IDE

“Full GPU dev environment inside Docker with VS Code in a browser”

📌 Overview

This project sets up a GPU-accelerated development environment for the NVIDIA DGX Spark, using:

NVIDIA GB10 GPU (Blackwell architecture, sm_121)
CUDA 13.0
PyTorch 2.9 nightly (NGC build, GB10-compatible)
VS Code Web (code-server) accessible from any browser
Go 1.25 & Node 22 available inside the same container
HuggingFace model caching
Large shared memory (32 GB) for model runtime stability

This environment is required because standard PyTorch wheels do NOT support GB10, and many Python packages break under Python 3.12 unless patched.

What you now have is a working base environment with:

✔ GPU recognized ✔ CUDA recognized ✔ PyTorch + Torchvision functional ✔ Code-Server accessible via browser ✔ RealESRGAN partially working (missing model file fixed manually) ✔ Diffusers, Transformers installed ✔ Go + Node installed ✔ Stable development workflow preserved outside the container

🧠 Why All This Was Necessary

1. ❌ The DGX Spark GPU (GB10) is not supported by normal PyTorch

The GB10 uses Blackwell (sm_121) — newer than any public PyTorch wheels.

Public builds only support up to sm_90 / sm_120.

This creates errors like:

module 'torch' has no attribute float8_e4m3fn
CUDA available: False

And prevents torchvision kernels from loading.

2. ✔ NVIDIA NGC PyTorch 25.xx containers are compatible

The official NGC builds contain:

CUDA 13
Custom PyTorch build with sm121 kernels
Torchvision built against the same CUDA
Proper cuDNN, NCCL, and Blackwell support

These containers are the ONLY way to get PyTorch running correctly on DGX Spark today.

We use:

nvcr.io/nvidia/pytorch:25.09-py3

Later versions (25.11) also work.

3. ❌ Python environment conflicts prevented pip installs

Ubuntu 24.04 uses PEP 668 externally-managed Python, so:

pip install --upgrade pip

throws:

error: externally-managed-environment

Solution: use the Python inside the NGC container, which is not distro-managed.

4. ✔ RealESRGAN + basicsr needed patching

basicsr imports a module that newer torchvision removed:

torchvision.transforms.functional_tensor

We created a patch shim:

import sys, types
import torchvision.transforms.functional as F

shim = types.ModuleType("torchvision.transforms.functional_tensor")
shim.rgb_to_grayscale = F.rgb_to_grayscale
sys.modules["torchvision.transforms.functional_tensor"] = shim

🏗️ Environment Architecture

DGX Spark Hardware
│
├── Docker (rootless / NVIDIA runtime)
│   ├── NVIDIA NGC PyTorch 25.xx (CUDA 13, sm_121)
│   ├── Code-Server (VSCode Web)
│   ├── Go 1.25
│   ├── NodeJS 22
│   ├── Python packages (diffusers, transformers, realesrgan, opencv…)
│   └── Shared memory 32GB
│
└── Host volumes
    ├── ./workspace                 → project files
    ├── ./data/code-server-config   → VS Code settings
    ├── ./data/code-server-data     → VS Code extensions
    └── ~/.cache/huggingface        → model cache

📦 docker-compose.yml (current working version)

services:
  server:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: dgx-codeserver
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      - PASSWORD=${PASSWORD:-changeme123}
    working_dir: /workspace
    ports:
      - "8080:8080"
    volumes:
      - ./workspace:/workspace
      - $HOME/.cache/huggingface:/root/.cache/huggingface
      - ./data/code-server-data:/root/.local/share/code-server
      - ./data/code-server-config:/root/.config/code-server
    shm_size: "32g"
    ipc: host
    ulimits:
      memlock:
        soft: -1
        hard: -1
      stack:
        soft: 67108864
        hard: 67108864

🐳 Dockerfile (current working version)

FROM golang:1.25-bookworm AS go
FROM node:22-bookworm AS node

FROM nvcr.io/nvidia/pytorch:25.09-py3

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    curl git nano ca-certificates \
 && rm -rf /var/lib/apt/lists/*

WORKDIR /workspace

COPY --from=go /usr/local/go /usr/local/go
ENV PATH="/usr/local/go/bin:${PATH}"

COPY --from=node /usr/local/bin/node /usr/local/bin/node
COPY --from=node /usr/local/lib/node_modules /usr/local/lib/node_modules

RUN ln -s /usr/local/lib/node_modules/npm/bin/npm-cli.js /usr/local/bin/npm && \
    ln -s /usr/local/lib/node_modules/npm/bin/npx-cli.js /usr/local/bin/npx

RUN curl -fsSL https://code-server.dev/install.sh | sh

RUN mkdir -p /root/.config/code-server && \
    printf "bind-addr: 0.0.0.0:8080\nauth: password\ncert: false\n" \
      > /root/.config/code-server/config.yaml

EXPOSE 8080

CMD ["bash", "-lc", "code-server /workspace"]

🖥️ Accessing VS Code Web

Open:

http://<DGX-SPARK-IP>:8080

Password is stored in:

data/code-server-config/config.yaml

or via .env:

PASSWORD=mysecret

🔥 GPU Verification

Inside the container run:

python - << 'EOF'
import torch
print("Torch:", torch.__version__)
print("CUDA:", torch.version.cuda)
print("CUDA available:", torch.cuda.is_available())
print("Device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU only")
EOF

Expected correct output:

Torch: 2.9.0a0+...nv25.xx
CUDA: 13.0
CUDA available: True
Device: NVIDIA GB10

📂 Persistent VS Code Data

Your VS Code Web environment is preserved across container rebuilds because you mounted:

✔ Extensions

./data/code-server-data:/root/.local/share/code-server

✔ Settings

./data/code-server-config:/root/.config/code-server

✔ Workspace

./workspace:/workspace

✔ HuggingFace models

~/.cache/huggingface:/root/.cache/huggingface

🛠️ RealESRGAN: Remaining Issues

Your RealESRGAN workflow is partially working.

✔ Fixed:

torchvision incompatibility (shim)
cv2 import
environment install
GPU detection

❌ Still needed:

Download ESRGAN model weights:

mkdir -p models
wget -O models/RealESRGAN_x4plus.pth \
  https://github.com/xinntao/Real-ESRGAN/releases/download/v0.3.0/RealESRGAN_x4plus.pth

Then update your Python script path:

ESRGAN_MODEL_PATH = "models/RealESRGAN_x4plus.pth"

⚠️ Known Limitations (Partial Working State)

Feature	Status	Notes
PyTorch	✔ Working	NGC build required
Torchvision	✔ Working	shim needed for realesrgan
Code-server	✔ Working	password via config.yaml
GPU compute	✔ Working	sm_121 kernels supported
RealESRGAN	⚠ Partial	missing model file
Diffusers	✔ Works	CUDA 13 fully supported
OpenCV	✔ Works	required libGL installed
Go 1.25	✔ Works	from donor image
Node 22	✔ Works	from donor image

📘 Summary

You now have a fully functional GPU dev environment on DGX Spark with:

Correct CUDA + PyTorch support for GB10
Code-Server for browser-based IDE access
Reproducible Docker setup
Persistent configuration + extensions
Multi-language runtime (Python, Go, Node)
Ability to run heavy ML workloads in a controlled container

The remaining work is just model paths + pipeline polishing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Dockerfile		Dockerfile
docker-compose.yml		docker-compose.yml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 DGX Spark GPU Development Environment

CUDA 13 · NVIDIA GB10 · PyTorch 2.9 NGC · Code-Server Web IDE

“Full GPU dev environment inside Docker with VS Code in a browser”

📌 Overview

🧠 Why All This Was Necessary

1. ❌ The DGX Spark GPU (GB10) is not supported by normal PyTorch

2. ✔ NVIDIA NGC PyTorch 25.xx containers are compatible

3. ❌ Python environment conflicts prevented pip installs

4. ✔ RealESRGAN + basicsr needed patching

🏗️ Environment Architecture

📦 docker-compose.yml (current working version)

🐳 Dockerfile (current working version)

🖥️ Accessing VS Code Web

🔥 GPU Verification

📂 Persistent VS Code Data

✔ Extensions

✔ Settings

✔ Workspace

✔ HuggingFace models

🛠️ RealESRGAN: Remaining Issues

✔ Fixed:

❌ Still needed:

⚠️ Known Limitations (Partial Working State)

📘 Summary

About

Uh oh!

Languages

TypeTerrors/nated.io

Folders and files

Latest commit

History

Repository files navigation

🚀 DGX Spark GPU Development Environment

CUDA 13 · NVIDIA GB10 · PyTorch 2.9 NGC · Code-Server Web IDE

“Full GPU dev environment inside Docker with VS Code in a browser”

📌 Overview

🧠 Why All This Was Necessary

1. ❌ The DGX Spark GPU (GB10) is not supported by normal PyTorch

2. ✔ NVIDIA NGC PyTorch 25.xx containers are compatible

3. ❌ Python environment conflicts prevented pip installs

4. ✔ RealESRGAN + basicsr needed patching

🏗️ Environment Architecture

📦 docker-compose.yml (current working version)

🐳 Dockerfile (current working version)

🖥️ Accessing VS Code Web

🔥 GPU Verification

📂 Persistent VS Code Data

✔ Extensions

✔ Settings

✔ Workspace

✔ HuggingFace models

🛠️ RealESRGAN: Remaining Issues

✔ Fixed:

❌ Still needed:

⚠️ Known Limitations (Partial Working State)

📘 Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages