Skip to content

Add non-record BigramHash4096 + MLP992 + LR0.08 + Slide64 submission#355

Open
josusanmartin wants to merge 1 commit intoopenai:mainfrom
josusanmartin:submission/bigramhash-overcap
Open

Add non-record BigramHash4096 + MLP992 + LR0.08 + Slide64 submission#355
josusanmartin wants to merge 1 commit intoopenai:mainfrom
josusanmartin:submission/bigramhash-overcap

Conversation

@josusanmartin
Copy link

Summary

This PR adds a non-record submission folder for a strong 8xH100 run based on a CUDA variant of the baseline trainer.

The run combines:

  • MLP_HIDDEN=992
  • MATRIX_LR=0.08
  • BigramHash(4096,64)
  • sliding-window evaluation at stride 64
  • fp16 tied-embedding export

Result

  • Pre-quant at stop: val_bpb = 1.1913
  • Post-quant roundtrip: val_bpb = 1.19286858

Why Non-Record

The final int8+zlib artifact is over the strict 16,000,000 byte cap:

  • artifact: 16,120,324 bytes
  • total with code: 16,179,102 bytes

So this is submitted to track_non_record_16mb rather than the record track.

Files

  • README.md
  • submission.json
  • train.log
  • train_gpt.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant