Skip to content

Pull requests: 514-labs/agent-evals

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Internal preview: moose harness matrix + CSV ingest pages
#59 opened Apr 18, 2026 by oatsandsugar Contributor Loading…
feat: add local tool override to decbench
#57 opened Apr 17, 2026 by callicles Contributor Loading…
Add 16 new taxi data scenario benchmarks
#49 opened Apr 15, 2026 by 03cranec Loading…
feat: initial implementation for run page
#37 opened Apr 9, 2026 by callicles Contributor Loading…
feat: redesign leaderboard implementation
#36 opened Apr 9, 2026 by callicles Contributor Loading…
Eval fixes
#29 opened Mar 31, 2026 by georgevanderson Loading…
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.