Fix Gemma softcap F16 overflow NaN and scheduler hang (#2058) by glaziermag · Pull Request #2076 · EricLBuehler/mistral.rs

glaziermag · 2026-04-08T02:35:00Z

Fixes #2058.

Two independent fixes

1. Softcap NaN in F16/BF16 (naive attention backend)

Gemma models use an attention softcap (logit / softcap → tanh → logit * softcap). The naive SDPA backend was performing the tanh in the input dtype (F16 or BF16). For F16, values outside approximately ±65504 become ±Inf before tanh, and tanh(Inf) = NaN when computed in reduced precision, producing silent NaN logits.

Fix: Promote attention scores to F32 before computing the softcap tanh, then cast back to the original dtype.

Scope note: This fix applies to the CPU/naive fallback path in attention/backends/naive.rs. The primary CUDA (FlashAttention) and Metal SDPA backends have their own softcap handling and are not changed here.

2. `SequenceState::Error` missing from `is_finished_paged_attn`

When a sequence enters SequenceState::Error, it was not recognized as "finished" by the paged attention scheduler. This caused KV blocks to remain allocated and the scheduler to stall waiting for capacity that would never be returned.

Fix: Add SequenceState::Error to the match arms in is_finished_paged_attn().

Files changed

mistralrs-core/src/attention/backends/naive.rs
mistralrs-core/src/sequence.rs

glaziermag · 2026-04-15T19:50:59Z

Update (2026-04-15): Rebased onto origin/master (upstream upstream HEAD). The branch was previously based on the fork's master, which was behind by ~40 commits. No code changes — both fixes (softcap F32 promotion in naive.rs and SequenceState::Error in is_finished_paged_attn) are unchanged.

glaziermag · 2026-04-16T22:57:39Z

Closing in favor of atomic split PRs for single-responsibility review:

fix(attention): upcast F16/BF16 to F32 before softcap tanh to prevent NaN overflow #2110 — softcap F16/BF16 NaN overflow fix (naive.rs only)
fix(scheduler): treat Error state sequences as finished in PagedAttention #2111 — scheduler error-state hang fix (sequence.rs only)

The two fixes are independent and should be reviewable/mergeable separately.

Fix Gemma softcap F16 NaN overflow (EricLBuehler#2058)

d709896

glaziermag marked this pull request as ready for review April 8, 2026 02:41

glaziermag closed this Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Gemma softcap F16 overflow NaN and scheduler hang (#2058)#2076

Fix Gemma softcap F16 overflow NaN and scheduler hang (#2058)#2076
glaziermag wants to merge 1 commit intoEricLBuehler:masterfrom
glaziermag:fix-2058-gemma-clean

glaziermag commented Apr 8, 2026 •

edited

Loading

Uh oh!

glaziermag commented Apr 15, 2026

Uh oh!

glaziermag commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

glaziermag commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Two independent fixes

1. Softcap NaN in F16/BF16 (naive attention backend)

2. SequenceState::Error missing from is_finished_paged_attn

Files changed

Uh oh!

glaziermag commented Apr 15, 2026

Uh oh!

glaziermag commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

glaziermag commented Apr 8, 2026 •

edited

Loading

2. `SequenceState::Error` missing from `is_finished_paged_attn`