Install flash-linear-attention and causal-conv1d for GDN training

Kion · claude · Kion · commit 59674e0e8d69 · 2026-03-06T20:26:58.000-08:00
Qwen3.5's Gated Delta Network layers require these CUDA kernels for
correct forward pass computation. Without them, transformers falls back
to a buggy torch implementation that causes illegal memory access errors
during SDPO distillation training.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/docker/Dockerfile.claas-api b/docker/Dockerfile.claas-api
@@ -16,6 +16,10 @@ RUN mkdir -p claas && touch claas/__init__.py
 
 RUN pip install --no-cache-dir ".[local]" modal
 
+# Qwen3.5 GDN (Gated Delta Networks) layers require these CUDA kernels.
+# Without them, transformers falls back to a buggy torch implementation.
+RUN pip install --no-cache-dir causal-conv1d flash-linear-attention
+
 # Now copy the full source and reinstall the package (no-deps: deps cached above)
 COPY . /app
 RUN pip install --no-cache-dir --no-deps --force-reinstall "."