Record: Dynamic Eval + TTT on SOTA Pipeline (val_bpb=1.1364)#397
Open
translatingthename wants to merge 1 commit intoopenai:mainfrom
Open
Record: Dynamic Eval + TTT on SOTA Pipeline (val_bpb=1.1364)#397translatingthename wants to merge 1 commit intoopenai:mainfrom
translatingthename wants to merge 1 commit intoopenai:mainfrom
Conversation
3-seed mean: 1.1371 (seeds 42, 7, 2024) Dynamic evaluation (Krause et al., ICML 2018) applied during sliding window scoring. 2.0% consistent bpb improvement at zero artifact cost. Built on PR openai#315 (jfprincz) and PR openai#338 (alertcat).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Dynamic evaluation (Krause et al., ICML 2018) applied to the SOTA pipeline without modifying training. The model takes periodic SGD gradient steps during sliding window scoring, adapting to local text distribution. 2.0% consistent bpb improvement at zero artifact cost.
3-seed mean: 1.1371 (seeds 42, 7, 2024). Best seed: 1.1364. Merged SOTA: 1.1428.
Results (3-seed, 8xH100 SXM, SDPA backend)
Novel Contribution: Dynamic Evaluation
After TTT adaptation, we score the validation stream using sliding windows (stride=64). Between batches of scored windows, we take an SGD gradient step (lr=0.001) on the model weights. The model adapts to the local distribution as it scores. TTT adapts weights before scoring; dynamic eval adapts during scoring. The two are complementary.
Attribution
Built on PR #315 (jfprincz): XSA, EMA, Partial RoPE, LN Scale, Late QAT.
PR #338 (alertcat): TTT integration.
SmearGate/BigramHash/OrthoInit originally by unnir.
Reference: Krause et al., "Dynamic Evaluation of Neural Sequence Models," ICML 2018.
See
records/track_10min_16mb/2026-03-22_DynamicEval_TTT_11L/README.mdfor full details, ablation, what didn't work, and reproduction instructions.