-
Notifications
You must be signed in to change notification settings - Fork 3k
Pull requests: openai/evals
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: sanitize user-supplied command before logging to prevent log injection
#1671
opened May 25, 2026 by
rajkumar-prog
Loading…
fix: prevent stack trace exposure in flask-playwright API responses
#1669
opened May 21, 2026 by
rajkumar-prog
Loading…
fix: guard against empty/filtered LLM responses in make_me_say utils
#1666
opened May 18, 2026 by
qizwiz
Loading…
fix: omit non-numeric usage fields from token usage report
#1664
opened May 16, 2026 by
rajkumar-prog
Loading…
Add atr_prompt_injection eval (modelgraded safety, 16 multilingual samples)
#1657
opened May 10, 2026 by
eeee2345
Loading…
13 tasks done
Add agent-tool-abstention eval (13 samples, Match template)
#1656
opened May 8, 2026 by
MukundaKatta
Loading…
5 tasks done
Add agent-tool-routing eval (12 samples, Match template)
#1655
opened May 8, 2026 by
MukundaKatta
Loading…
5 tasks done
eval: add Oracle ERP Cloud workflow and terminology eval
#1654
opened May 4, 2026 by
karthikchundi-commits
Loading…
Route modern OpenAI models through chat completions
#1651
opened Apr 23, 2026 by
kayametehan
Loading…
Update Python version to 3.12 and refresh PR template
#1648
opened Apr 23, 2026 by
kayametehan
Loading…
Add Turkish language evals: logical reasoning and grammar
#1647
opened Apr 23, 2026 by
kayametehan
Loading…
eval: add RAIL Score responsible AI evaluation across 8 dimensions
#1640
opened Apr 2, 2026 by
SumitVermakgp
Loading…
12 tasks done
fix: replace 11 bare except clauses with except Exception
#1626
opened Feb 25, 2026 by
haosenwang1018
Loading…
Add finance-agent routing eval dataset and builder guidance
#1625
opened Feb 24, 2026 by
maxpetrusenko
Loading…
Add Logic Stress Stress-test Suite (v2, v3)
#1622
opened Feb 16, 2026 by
14H034160212
Contributor
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.