Increment public benchmarks that are affected by #2169 #2244
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follows #2169.
Summary of issue: Many benchmarks (e.g., all that use
NeuralBenchmark()class) useexplained_variance()to report ceiled score. As pointed out in #2169, ceiling is incorrectly squared. Following Spearman-Brown correction, it is already a variance (reliability) ceiling. If ceiling is already high, squaring only slightly reduces ceiling, but if ceiling is low, squaring dramatically lowers the ceiling. This results in an artificially lowered ceiling that inflates model scores --> biasing towards noise.Order of operations
alexneton all affected benchmarks to generate newbrainscore_benchmarkinstanceentries.ceilingvalue for Coggan family benchmarksscore_rawandceiling(script/notebook to follow)Changes
Affected benchmarks here are incremented. This will create a new entry for appropriate benchmarks in
brainscore_benchmarkinstancetable with an incremented version.New recalculated scores will use the new
idfrombrainscore_benchmarkinstancetable to reflect update.Public benchmarks can for the most part be recomputed directly from existing
score_rawandceilingvalues in database. Private (visibility) benchmarks will not be recomputed. Public benchmarks with non-standard ceiling entries in the database will be recomputed. This last category covers papale, Herbert, and Gifford benchmarks which will be soon set to visible. These benchmarks have multiple ceilings across splits (can also be temporal bins) which are then summarized into a single ceiling value.Full list of affected benchmark families
Change in score
Coggan family of benchmarks with the following ceilings are most affected: