Memory Tokens + Mixed Quantization (val_bpb: 1.1659) by sp00mm · Pull Request #352 · openai/parameter-golf

sp00mm · 2026-03-21T16:38:23Z

Summary

val_bpb: 1.1659 (sliding window stride=128, post int5/int6+zstd roundtrip)
Artifact: 15.0MB (under 16MB cap)
8xH100 SXM, 600s, 9030 steps

Novel contribution: Memory Tokens

64 learnable embedding vectors that overwrite the first K positions of every input sequence, acting as a global context scratchpad. All real tokens attend to them via the causal mask. A/B tested: -0.014 BPB improvement vs identical config without memory tokens.

Other techniques

10 layers, 3x MLP, BigramHashEmbedding(10240), SmearGate
Partial RoPE (16/64 dims), LN Scale, EMA (0.997), Late QAT
Mixed int5 MLP / int6 attention + zstd-22 compression
MTP auxiliary heads (k=2, stripped before export)
Batched sliding window eval with compiled forward_logits

Test plan

A/B test confirming memory tokens improve val_bpb (-0.014)
Verified artifact under 16MB (15,070,662 bytes)
Int6+zstd roundtrip validation passes
8xH100 SXM run completes in 600s
Reproducibility: additional seed runs for p<0.01 (pending compute credits)

openai#352

Add Memory Tokens submission (val_bpb 1.1659, 8xH100 SXM)

3430298

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

felipe-parodi added a commit to felipe-parodi/parameter-golf that referenced this pull request Mar 21, 2026

feat: add memory tokens (NUM_MEMORY_TOKENS), validated -0.014 BPB in PR

ed3d445

openai#352

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Tokens + Mixed Quantization (val_bpb: 1.1659)#352

Memory Tokens + Mixed Quantization (val_bpb: 1.1659)#352
sp00mm wants to merge 1 commit intoopenai:mainfrom
sp00mm:submission-memory-tokens

sp00mm commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sp00mm commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Novel contribution: Memory Tokens

Other techniques

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sp00mm commented Mar 21, 2026 •

edited

Loading