Relax tolerance in Qwen Scope loader test to fix flaky CI by robbiebusinessacc · Pull Request #706 · decoderesearch/SAELens

robbiebusinessacc · 2026-06-16T15:34:27Z

Description

test_qwen_scope_sae_huggingface_loader_with_mocked_download intermittently
fails in CI (e.g. the two runs linked in #688). It compares two mathematically
equal matmuls: the loader stores W_enc/W_dec as .T.contiguous(), while the
reference in the test multiplies a transposed view (raw_W_enc.T). Over the
2048- and 32768-dim reductions, a contiguous tensor and a transposed view can
accumulate float32 in a different order depending on the BLAS backend, so the
two results agree only up to rounding. The default assert_close tolerance
(rtol=1e-5, atol=1e-8) is too tight to absorb that, which is what danra
diagnosed on the issue.

This compares with rtol=1e-3, atol=1e-2, matching the existing precedent for a
matmul comparison elsewhere in this file. The tolerance is generous enough to
absorb the accumulation noise but still catches a real loader regression, which
would be orders of magnitude off (verified locally: a 1% weight-scale error and
a wrong-bias error are both still rejected).

Note: the failure is BLAS/threading-dependent and reproduces on CI but not on
local arm64 macOS (where the two layouts produce bit-identical results), so this
is verified by the linked CI logs plus the regression check above rather than a
local repro.

Fixes #688

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
New and existing tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and tests

I have run the formatter and linter (ruff format + ruff check) on the change.

test_qwen_scope_sae_huggingface_loader_with_mocked_download compares two mathematically equal matmuls: the loader stores W_enc/W_dec as `.T.contiguous()`, while the reference multiplies a transposed view. Over the 2048- and 32768-dim reductions these layouts can accumulate float32 in a different order depending on the BLAS backend, so the results differ by small rounding noise that intermittently exceeds the default rtol=1e-5/atol=1e-8 and fails CI. Compare with rtol=1e-3/atol=1e-2, which absorbs the rounding while still catching any real loader regression. Fixes decoderesearch#688

chanind

Looks good, thanks for this!

chanind approved these changes Jun 16, 2026

View reviewed changes

chanind merged commit 10c26c8 into decoderesearch:main Jun 16, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax tolerance in Qwen Scope loader test to fix flaky CI#706

Relax tolerance in Qwen Scope loader test to fix flaky CI#706
chanind merged 1 commit into
decoderesearch:mainfrom
robbiebusinessacc:contrib/saelens-qwen-scope-loader-test-tol

robbiebusinessacc commented Jun 16, 2026

Uh oh!

chanind left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robbiebusinessacc commented Jun 16, 2026

Description

Type of change

Checklist:

You have tested formatting, typing and tests

Uh oh!

chanind left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants