Commit 59674e0
Install flash-linear-attention and causal-conv1d for GDN training
Qwen3.5's Gated Delta Network layers require these CUDA kernels for
correct forward pass computation. Without them, transformers falls back
to a buggy torch implementation that causes illegal memory access errors
during SDPO distillation training.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 15ef545 commit 59674e0
1 file changed
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
19 | 23 | | |
20 | 24 | | |
21 | 25 | | |
| |||
0 commit comments