Skip to content

feat: Switch from LayerNorm to RMSNorm#213

Open
kinggongzilla wants to merge 2 commits into
mainfrom
feat-rms-norm
Open

feat: Switch from LayerNorm to RMSNorm#213
kinggongzilla wants to merge 2 commits into
mainfrom
feat-rms-norm

Conversation

@kinggongzilla
Copy link
Copy Markdown
Contributor

@kinggongzilla kinggongzilla commented May 12, 2026

  • Switch out LayerNorm for RMSNorm
  • Update magic numbers for rope tests, which use now updated TransformerBlock
  • fix typo in test_perceiver.py

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

Coverage

Tests Skipped Failures Errors Time
1388 24 💤 0 ❌ 0 🔥 29.789s ⏱️

@kinggongzilla kinggongzilla changed the title Feat rms norm feat: Switch from LayerNorm to RMSNorm in TransformerBlock May 12, 2026
@HennerM
Copy link
Copy Markdown
Collaborator

HennerM commented May 12, 2026

Let's also swap it out for the PerceiverBlock and for consistency reasons I am tempted to also change the final norm for UPT, AB-UPT, and the AeroTransformer as well as AeroTransolver. What do you think?

@HennerM
Copy link
Copy Markdown
Collaborator

HennerM commented May 12, 2026

as a sanity check, can you run this and compare with the Muon/Rope-scale baseline?

@kinggongzilla kinggongzilla changed the title feat: Switch from LayerNorm to RMSNorm in TransformerBlock feat: Switch from LayerNorm to RMSNorm May 13, 2026
@kinggongzilla
Copy link
Copy Markdown
Contributor Author

Let's also swap it out for the PerceiverBlock and for consistency reasons I am tempted to also change the final norm for UPT, AB-UPT, and the AeroTransformer as well as AeroTransolver. What do you think?

I switched out LayerNorm for all of the components/models you mentioned now.

Copy link
Copy Markdown
Collaborator

@HennerM HennerM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks for doing this. LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants