Skip to content

Commit 2972def

Browse files
brian-c-mooreclaude
andcommitted
Add 288M config (1.2B dense equiv) — max for L4 24GB
Config: d_model=1536, n_heads=24, n_layers=30, d_ff=6144, batch=1. Estimated ~98% VRAM utilization on L4 24GB. Previous configs (432M, 268M) OOMed due to sparse attention intermediate tensors at d_model>=1536 with more layers or larger batches. Removed superseded configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b1c2b45 commit 2972def

2 files changed

Lines changed: 5 additions & 40 deletions

File tree

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@ model:
22
vocab_size: 32000
33
d_model: 1536
44
n_heads: 24
5-
n_layers: 24
5+
n_layers: 30
66
d_ff: 6144
77
max_seq_len: 512
88
attention_rank: 192
99
ff_rank: 192
1010
screening_rank: 48
11-
attention_top_k: 128
11+
attention_top_k: 96
1212
ff_gate_rank: 48
1313
ff_sparsity_target: 0.8
1414
min_depth: 6
@@ -22,8 +22,8 @@ data:
2222
training:
2323
output_dir: ./checkpoints/reasoning_core
2424
epochs: 3
25-
batch_size: 2
26-
gradient_accumulation: 32
25+
batch_size: 1
26+
gradient_accumulation: 64
2727
lr: 2.0e-4
2828
warmup_steps: 3000
2929
weight_decay: 0.01
@@ -32,4 +32,4 @@ training:
3232
logging_steps: 100
3333
save_steps: 5000
3434
eval_steps: 1000
35-
run_name: leanformer-reasoning-core-268m
35+
run_name: leanformer-reasoning-core-288m

configs/reasoning_core_432m.yaml

Lines changed: 0 additions & 35 deletions
This file was deleted.

0 commit comments

Comments
 (0)