Add skip_update and magma support by alint77 · Pull Request #27 · microsoft/dion

alint77 · 2026-02-18T18:03:32Z

Implements two stochastic update masking techniques from arxiv.org/abs/2602.15322 for both Muon and NorMuon.

SkipUpdate (skip_update_prob): at each step, each parameter matrix is independently kept with probability p or zeroed out with probability 1-p. Surviving updates are rescaled by 1/p to stay unbiased in expectation. Moment buffers always update densely regardless of the skip.

Magma (magma_tau): replaces the fixed 1/p rescaling with an adaptive EMA scale driven by momentum-gradient cosine similarity:

ẽ_t = sigmoid(cossim(μ_t_before, g_t) / τ)
s_t  = 0.9 * s_{t-1} + 0.1 * ẽ_t

The scale is intentionally biased (no 1/s_t correction), the paper found unbiased variants to be unstable. Bernoulli masking is still applied on top.

Both features are opt-in and off by default (None). For NorMuon, the mask is applied after the neuron-normalization step so both moment buffers (momentum and variance_neuron) always update densely, consistent with the paper's intent.

alint77 · 2026-02-18T18:06:02Z

@microsoft-github-policy-service agree

JohnLangford · 2026-03-27T15:20:05Z

Have you tried this on the modded_nanogpt repo? What were the results?

alint77 · 2026-03-27T16:12:51Z

I haven't tried it there, I've been working on nanoPLM which is a framework for Protein Language Model experimentation in both distillation and pretraining (my focus have been on the latter).

I don't think the intuition behind the paper (generalization) aligns with the goal of modded-nanogpt (eval loss wallclock speedrunning), so I didn't bother testing under those constraints, but in our usecase, skipupdate did improve the loss and downstream tasks slightly, albeit on limited model size.

We're still waiting for access to clusters for the larger scale ablations but testing SkipUpdate and Magma on larger scale models/datasets is in our list of experiments.

JohnLangford · 2026-03-31T02:37:40Z

I tried it in modded_nanogpt---it's somewhat worse than normuon there. I also tried a 1B parameters/100B tokens run. It started out superior but gradually fell behind as the system became better optimized. Overall, my impression is that there's a useful thig here, but we perhaps do not have the right expression of it.

alint77 · 2026-03-31T10:06:01Z

was this using Magma or Skipupdate? and what were the parameters?

JohnLangford · 2026-03-31T13:16:28Z

skip_update_prob = 0.9 , magma = 0.5. I'll try some more variations.

alint77 · 2026-04-01T15:53:31Z

Skipupdate at 0.9 is way too aggressive no? IIRC I tested at 0.5 and it was performing better than stock

JohnLangford · 2026-04-02T12:12:02Z

In the modded_nanogpt codebase hack, the meaning of p was inverted, so that was a 0.1 chance of skipping. After a parameter scan there I haven't been able to find something helpful. p(keep)={0.9,0.7,0.5} x tau{--,0.5,1.0}.

alint77 added 2 commits February 18, 2026 16:53

add option to use skipUpdate and magma.

a24c2e7

support muon - fix docstring - add ref to paper's link

3d1f389

JohnLangford mentioned this pull request Mar 28, 2026

Rebase: Add skip_update and magma support #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add skip_update and magma support#27

Add skip_update and magma support#27
alint77 wants to merge 2 commits intomicrosoft:mainfrom
alint77:add/skipUpdate_magma

alint77 commented Feb 18, 2026

Uh oh!

alint77 commented Feb 18, 2026

Uh oh!

JohnLangford commented Mar 27, 2026

Uh oh!

alint77 commented Mar 27, 2026

Uh oh!

JohnLangford commented Mar 31, 2026

Uh oh!

alint77 commented Mar 31, 2026

Uh oh!

JohnLangford commented Mar 31, 2026

Uh oh!

alint77 commented Apr 1, 2026

Uh oh!

JohnLangford commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alint77 commented Feb 18, 2026

Uh oh!

alint77 commented Feb 18, 2026

Uh oh!

JohnLangford commented Mar 27, 2026

Uh oh!

alint77 commented Mar 27, 2026

Uh oh!

JohnLangford commented Mar 31, 2026

Uh oh!

alint77 commented Mar 31, 2026

Uh oh!

JohnLangford commented Mar 31, 2026

Uh oh!

alint77 commented Apr 1, 2026

Uh oh!

JohnLangford commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants