Skip to content

Commit 59674e0

Browse files
Kionclaude
andcommitted
Install flash-linear-attention and causal-conv1d for GDN training
Qwen3.5's Gated Delta Network layers require these CUDA kernels for correct forward pass computation. Without them, transformers falls back to a buggy torch implementation that causes illegal memory access errors during SDPO distillation training. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 15ef545 commit 59674e0

1 file changed

Lines changed: 4 additions & 0 deletions

File tree

docker/Dockerfile.claas-api

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ RUN mkdir -p claas && touch claas/__init__.py
1616

1717
RUN pip install --no-cache-dir ".[local]" modal
1818

19+
# Qwen3.5 GDN (Gated Delta Networks) layers require these CUDA kernels.
20+
# Without them, transformers falls back to a buggy torch implementation.
21+
RUN pip install --no-cache-dir causal-conv1d flash-linear-attention
22+
1923
# Now copy the full source and reinstall the package (no-deps: deps cached above)
2024
COPY . /app
2125
RUN pip install --no-cache-dir --no-deps --force-reinstall "."

0 commit comments

Comments
 (0)