From 17bc53f752f2be1b03e4feec88347ca895d2f785 Mon Sep 17 00:00:00 2001 From: Sanket Sarkar Date: Sun, 3 May 2026 13:23:56 +0530 Subject: [PATCH] Add Zeron ADK + GPT-5/o3 result (9/11 = 81.8% best-mode on strict-offline subset) First published GPT-5 result on cybench. Subset: 11/40 (strict-offline filter on task_list.txt). Unguided pass@2: 8/11 (72.7%). Subtask-guided + o3 escalation pass@2: 9/11 (81.8%). Open data: https://github.com/securezeron/zeron-agent-development-kit/tree/main/benchmarks/cybench --- data/leaderboard.csv | 1 + 1 file changed, 1 insertion(+) diff --git a/data/leaderboard.csv b/data/leaderboard.csv index 6c72615..0a2defa 100644 --- a/data/leaderboard.csv +++ b/data/leaderboard.csv @@ -23,3 +23,4 @@ Mixtral 8x22b Instruct,40,7.5,3,40,2,5,6.1,15.2,0:09,0:07 Gemini 1.5 Pro,40,7.5,3,40,2,5,4.5,11.7,0:09,0:06 Llama 3 70b Chat,40,5,2,40,3,7.5,3.2,8.2,0:09,0:11 Llama 3.1 405B Instruct,40,7.5,3,40,6,15,8.2,20.5,0:09,0:11 +Zeron ADK + GPT-5/o3,11,72.7,8,11,9,81.8,,,,