Skip to content

Conversation

@KartikP
Copy link
Collaborator

@KartikP KartikP commented Jan 12, 2026

Follows #2169.

Summary of issue: Many benchmarks (e.g., all that use NeuralBenchmark() class) use explained_variance() to report ceiled score. As pointed out in #2169, ceiling is incorrectly squared. Following Spearman-Brown correction, it is already a variance (reliability) ceiling. If ceiling is already high, squaring only slightly reduces ceiling, but if ceiling is low, squaring dramatically lowers the ceiling. This results in an artificially lowered ceiling that inflates model scores --> biasing towards noise.

Order of operations

  1. Ensure backup of database is made and can be restored if necessary.
  2. Merge Corrected model-to-ceiling mapping under explained variance #2169
  3. Merge Increment public benchmarks that are affected by #2169 #2244
  4. Terminate vision scoring as a result of "benchmark change"
  5. Manually trigger alexnet on all affected benchmarks to generate new brainscore_benchmarkinstance entries.
  6. Add ceiling value for Coggan family benchmarks
  7. Write new model scores to database based on on existing score_raw and ceiling (script/notebook to follow)
  8. Manually trigger non-standard ceiling benchmarks.

Changes

Affected benchmarks here are incremented. This will create a new entry for appropriate benchmarks in brainscore_benchmarkinstance table with an incremented version.

New recalculated scores will use the new id from brainscore_benchmarkinstance table to reflect update.

Public benchmarks can for the most part be recomputed directly from existing score_raw and ceiling values in database. Private (visibility) benchmarks will not be recomputed. Public benchmarks with non-standard ceiling entries in the database will be recomputed. This last category covers papale, Herbert, and Gifford benchmarks which will be soon set to visible. These benchmarks have multiple ceilings across splits (can also be temporal bins) which are then summarized into a single ceiling value.

Full list of affected benchmark families

  • majajhong2015: 6 benchmarks (V4/IT, PLS, temporal variants)
  • freemanziemba2013: 4 benchmarks (V1/V2, public/private)
  • rajalingham2020: 1 benchmark (IT, PLS)
  • sanghavi2020: 6 benchmarks (3 datasets × V4/IT)
  • coggan2024: All variants (multiple regions and behavior)
  • cadena2017: 2 benchmarks (PLS, mask)
  • igustibagus2024: 1 benchmark (IT, ridge)
  • papale2025: All variants (multiple regions/metrics)
  • hebart2023_fmri: All variants (multiple regions/metrics)
  • gifford2022: All variants (multiple regions/metrics)

Change in score

new_ceiled_score_values

Coggan family of benchmarks with the following ceilings are most affected:

tong.Coggan2024_fMRI.V1-rdm: 0.4477
tong.Coggan2024_fMRI.V2-rdm: 0.4493
tong.Coggan2024_fMRI.V4-rdm: 0.3348
tong.Coggan2024_fMRI.IT-rdm: 0.2397
tong.Coggan2024_behavior-ConditionWiseAccuracySimilarity: 0.6934

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants