Skip to content

Revamp benchmark page for new matbench implementation#476

Merged
hholb merged 8 commits into
stagingfrom
hholb/benchmarks
Dec 19, 2025
Merged

Revamp benchmark page for new matbench implementation#476
hholb merged 8 commits into
stagingfrom
hholb/benchmarks

Conversation

@hholb

@hholb hholb commented Dec 17, 2025

Copy link
Copy Markdown
Member

This PR revamps the benchmarks page to handle the the backend route setup and metrics for Matbench Discovery.

I cleaned up the presentation to align with what we now know about Matbench, namely that the different tasks (IS2RE, S2EF, RS2RE, etc.) are all different ways models get to the same end result of the "Discovery" task. That means we don't need separate tables and charts for the different tasks and we can pull all of the into one section. Here is an overview of what it looks like now (with some randomly generated metrics):

Screen.Recording.2025-12-17.at.9.52.54.AM.mov

I kept the table, scatter plot, and radar charts from before, added a bar chart to compare models using a single metric, and a parallel coordinates chart that is like taking the radar chart and 'unrolling' it to where each axis is vertical which makes it easier to parse when a lot of metrics are selected in comparison to the radar chart.

Bar Chart

Screen.Recording.2025-12-17.at.9.58.39.AM.mov

Parallel Coordinates Chart

Screen.Recording.2025-12-17.at.10.01.45.AM.mov

From the table view, you can select rows to compare and the other charts will update, making it easier to compare a subset of models:

Screen.Recording.2025-12-17.at.10.03.01.AM.mov

Below the charts are expandable sections to read more about what the metrics mean:

Screen.Recording.2025-12-17.at.10.03.48.AM.mov

On the SDK, when pushing benchmark results to the backend, we can include an option garden_doi field that if present, we make the model name clickable in the different charts that will take the user to the garden with the corresponding DOI.
(I don't have real garden dois in my fake data, so its just a 404 here):

Screen.Recording.2025-12-17.at.10.07.55.AM.mov

@hholb hholb marked this pull request as ready for review December 17, 2025 17:12

@OwenPriceSkelly OwenPriceSkelly left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! (with a grain of salt as I'm looking only at the screen recordings lol)

@hholb hholb merged commit 2cccccd into staging Dec 19, 2025
1 check passed
WillEngler pushed a commit to Garden-AI/garden-frontend-staging that referenced this pull request Dec 19, 2025
* remove unneeded components and api calls

* update benchmarks data for new metrics, clean up presentation

* add bar and parallel coodinates charts

* add benchmark link to navbar Garden-AI/garden-frontend@2cccccd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants