feat: Switch from LayerNorm to RMSNorm#213
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Let's also swap it out for the |
|
as a sanity check, can you run this and compare with the Muon/Rope-scale baseline? |
0120214 to
4858f02
Compare
I switched out LayerNorm for all of the components/models you mentioned now. |
HennerM
left a comment
There was a problem hiding this comment.
nice, thanks for doing this. LGTM
TransformerBlocktest_perceiver.py