Commit 2972def
Add 288M config (1.2B dense equiv) — max for L4 24GB
Config: d_model=1536, n_heads=24, n_layers=30, d_ff=6144, batch=1.
Estimated ~98% VRAM utilization on L4 24GB. Previous configs (432M, 268M)
OOMed due to sparse attention intermediate tensors at d_model>=1536 with
more layers or larger batches. Removed superseded configs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent b1c2b45 commit 2972def
2 files changed
Lines changed: 5 additions & 40 deletions
Lines changed: 5 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
| 25 | + | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
This file was deleted.
0 commit comments