fix: retune AA index bounds and refresh fallback snapshot for the reworked AA scale by devangpratap · Pull Request #139 · Andyyyy64/whichllm

devangpratap · 2026-06-29T15:15:29Z

Part of #101 (partial - this is the first slice you scoped, not the full issue, so please keep the issue open).

What this does

AA reworked their Intelligence Index and the open-weights raw scores compressed a lot, so the old 12.5/56.2 bounds and the 2026-05-14 snapshot no longer line up with the live data. On the old bounds, live raw for Qwen3-8B (around 7.4 now) normalizes to roughly 0, while the curated snapshot still reported a raw value that normalized to 40 for the same model.

Re-derived the normalization bounds from the current distribution: _AA_INDEX_MIN = -19.4, _AA_INDEX_MAX = 47.6. I kept the same two-point calibration the code already used: the top open model (DeepSeek-V4-Pro, raw 44.3) maps to 95, and the 8B class (Qwen3-8B, raw 7.4) maps to 40. On the compressed scale that fit puts the floor below zero. Live AA raw values are always positive and the output is clamped to 0 to 100, so the negative floor only sets where the curve sits.
Refreshed the curated fallback from a fresh 2026-06-29 scrape. Models AA currently tracks carry their real new-scale raw values. The peer entries that AA does not track keep their previous normalized score: their raw value is set so it reproduces that score under the new bounds, so the bounds change does not move them.
Added tests asserting the new normalized values for known models (top open model around 95, Qwen3-8B at 40, clamp behavior, and a few fallback values).

What this deliberately does not do

I did not touch the overlay merge policy in fetch_aa_index_scores. It is still max-merge: a live score only replaces a snapshot score when the live value is higher. The switch to live-wins is left for a separate PR, as you asked. The snapshot is still stored as raw AA values normalized on read, same as before.

One thing worth a look

On the reworked scale, several peer entries (models AA does not track) fall below the index floor and now read as negative raw values. They normalize correctly because the output is clamped, but storing negative numbers in a raw-index table is a bit odd. If you would rather, the snapshot could instead store already-normalized 0 to 100 values, which drops the negatives and means a future bounds retune cannot silently shift the fallback. I left that out of this PR since it is more than the issue asks for, but happy to do it in a follow-up if you want it.

Note on the README

The "What can I run?" example table is marked as a 2026-05 snapshot of illustrative scores. This retune shifts those numbers slightly, but I left the README untouched since refreshing it is outside this issue and needs a live run. Flagging it so you can update it whenever you next regenerate that example.

Testing

uv run pytest: 456 passed
ruff check . and ruff format --check .: clean

…d scale AA reworked their Intelligence Index and the open-weights raw scores compressed, so the old 12.5/56.2 bounds and the 2026-05-14 snapshot no longer match the live data. Live raw for Qwen3-8B (around 7.4) normalized to roughly 0 under the old floor while the snapshot still mapped it to 40. - Re-derive _AA_INDEX_MIN/_AA_INDEX_MAX (-19.4/47.6) from the current distribution, keeping the two-point calibration (DeepSeek-V4-Pro 44.3 -> 95, Qwen3-8B 7.4 -> 40). The floor goes below zero on the compressed scale; live values are always positive and clamp at 0. - Refresh the curated fallback from a 2026-06-29 scrape: AA-tracked models get real new-scale raw values, untracked peers keep their prior normalized score. - Add tests for the new normalized values of known models. Scope is the first slice of Andyyyy64#101 only; the live-wins merge change is left for a separate PR and the max-merge overlay is unchanged. Refs Andyyyy64#101

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: retune AA index bounds and refresh fallback snapshot for the reworked AA scale#139

fix: retune AA index bounds and refresh fallback snapshot for the reworked AA scale#139
devangpratap wants to merge 1 commit into
Andyyyy64:mainfrom
devangpratap:fix/aa-index-retune-reworked-scale

devangpratap commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devangpratap commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

What this deliberately does not do

One thing worth a look

Note on the README

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devangpratap commented Jun 29, 2026 •

edited

Loading