Revamp benchmark page for new matbench implementation#476
Merged
Conversation
OwenPriceSkelly
approved these changes
Dec 17, 2025
OwenPriceSkelly
left a comment
Member
There was a problem hiding this comment.
Looks good to me! (with a grain of salt as I'm looking only at the screen recordings lol)
WillEngler
pushed a commit
to Garden-AI/garden-frontend-staging
that referenced
this pull request
Dec 19, 2025
* remove unneeded components and api calls * update benchmarks data for new metrics, clean up presentation * add bar and parallel coodinates charts * add benchmark link to navbar Garden-AI/garden-frontend@2cccccd
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR revamps the benchmarks page to handle the the backend route setup and metrics for Matbench Discovery.
I cleaned up the presentation to align with what we now know about Matbench, namely that the different tasks (IS2RE, S2EF, RS2RE, etc.) are all different ways models get to the same end result of the "Discovery" task. That means we don't need separate tables and charts for the different tasks and we can pull all of the into one section. Here is an overview of what it looks like now (with some randomly generated metrics):
Screen.Recording.2025-12-17.at.9.52.54.AM.mov
I kept the table, scatter plot, and radar charts from before, added a bar chart to compare models using a single metric, and a parallel coordinates chart that is like taking the radar chart and 'unrolling' it to where each axis is vertical which makes it easier to parse when a lot of metrics are selected in comparison to the radar chart.
Bar Chart
Screen.Recording.2025-12-17.at.9.58.39.AM.mov
Parallel Coordinates Chart
Screen.Recording.2025-12-17.at.10.01.45.AM.mov
From the table view, you can select rows to compare and the other charts will update, making it easier to compare a subset of models:
Screen.Recording.2025-12-17.at.10.03.01.AM.mov
Below the charts are expandable sections to read more about what the metrics mean:
Screen.Recording.2025-12-17.at.10.03.48.AM.mov
On the SDK, when pushing benchmark results to the backend, we can include an option
garden_doifield that if present, we make the model name clickable in the different charts that will take the user to the garden with the corresponding DOI.(I don't have real garden dois in my fake data, so its just a 404 here):
Screen.Recording.2025-12-17.at.10.07.55.AM.mov