Skip to content

feat: Add RMS QK Norm#214

Closed
kinggongzilla wants to merge 1 commit into
mainfrom
feat-add-qk-norm
Closed

feat: Add RMS QK Norm#214
kinggongzilla wants to merge 1 commit into
mainfrom
feat-add-qk-norm

Conversation

@kinggongzilla
Copy link
Copy Markdown
Contributor

@kinggongzilla kinggongzilla commented May 12, 2026

Adds RMS QK norm in DotProductAttention

Magic numbers from tests need to be updated after merging #213 , since the combined change of RMS norm plus WK norm will affect these again.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

self.k = nn.Linear(config.hidden_dim, config.hidden_dim, bias=config.bias)
self.v = nn.Linear(config.hidden_dim, config.hidden_dim, bias=config.bias)
self.proj = nn.Linear(config.hidden_dim, config.hidden_dim, bias=config.bias)
self.qk_norm = nn.RMSNorm(self.head_dim, elementwise_affine=True, eps=1e-6)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want a individual norm for both Q and K, such that they can learn individual scale per dimension

@github-actions github-actions Bot locked and limited conversation to collaborators May 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants