Skip to content

Memory Tokens + Mixed Quantization (val_bpb: 1.1659)#352

Open
sp00mm wants to merge 1 commit intoopenai:mainfrom
sp00mm:submission-memory-tokens
Open

Memory Tokens + Mixed Quantization (val_bpb: 1.1659)#352
sp00mm wants to merge 1 commit intoopenai:mainfrom
sp00mm:submission-memory-tokens

Conversation

@sp00mm
Copy link

@sp00mm sp00mm commented Mar 21, 2026

Summary

  • val_bpb: 1.1659 (sliding window stride=128, post int5/int6+zstd roundtrip)
  • Artifact: 15.0MB (under 16MB cap)
  • 8xH100 SXM, 600s, 9030 steps

Novel contribution: Memory Tokens

64 learnable embedding vectors that overwrite the first K positions of every input sequence, acting as a global context scratchpad. All real tokens attend to them via the causal mask. A/B tested: -0.014 BPB improvement vs identical config without memory tokens.

Other techniques

  • 10 layers, 3x MLP, BigramHashEmbedding(10240), SmearGate
  • Partial RoPE (16/64 dims), LN Scale, EMA (0.997), Late QAT
  • Mixed int5 MLP / int6 attention + zstd-22 compression
  • MTP auxiliary heads (k=2, stripped before export)
  • Batched sliding window eval with compiled forward_logits

Test plan

  • A/B test confirming memory tokens improve val_bpb (-0.014)
  • Verified artifact under 16MB (15,070,662 bytes)
  • Int6+zstd roundtrip validation passes
  • 8xH100 SXM run completes in 600s
  • Reproducibility: additional seed runs for p<0.01 (pending compute credits)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant