Skip to content

Fix/judge resolution fallback#381

Merged
Nicola Franco (franconicola) merged 6 commits into
mainfrom
fix/judge-resolution-fallback
May 22, 2026
Merged

Fix/judge resolution fallback#381
Nicola Franco (franconicola) merged 6 commits into
mainfrom
fix/judge-resolution-fallback

Conversation

@franconicola
Copy link
Copy Markdown
Member

No description provided.

… default

When the orchestrator runs the post-attack evaluation pipeline, it calls
_resolve_judges_from_config() with no arguments on the attack_config.
The method correctly checked for a 'judges' list but skipped the common
'judge' dict format used by Ollama/local examples, falling through to
the hardcoded 'gpt-4-0613' default and crashing with a missing-credentials
error for users without an OpenAI API key.

Resolution order is now:
  1. 'judges' list in raw config
  2. 'judge' dict in raw config (wrapped in a list)
  3. technique_params fallback
  4. gpt-4-0613 / jailbreakbench hardcoded defaults

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove all gpt-4/gpt-4o-mini hardcoded defaults from attacks, techniques,
and judge resolution so the tool works without any external API key.

Changes:
- evaluation_step._resolve_judges_from_config: default_judge now uses
  DEFAULT_JUDGE_IDENTIFIER (gemma3:4b via Ollama) with default_type
  'harmbench' instead of 'gpt-4-0613'/'jailbreakbench'. Also injects
  the Ollama endpoint/agent_type when the built-in default is used.
- flipattack/attack.py: goal metadata judge default changed from
  'gpt-4-0613' to DEFAULT_JUDGE_IDENTIFIER.
- cli/tui/attack_specs.py: PAIR attacker default changed from 'gpt-4'
  and PAP attacker default changed from 'gpt-4o-mini' to
  DEFAULT_ATTACKER_IDENTIFIER (gemma3:4b).
Comment thread hackagent/cli/tui/attack_specs.py Fixed
Enables auto-merge (squash) for all dependabot PRs targeting main.
GitHub will merge automatically once all required CI checks pass.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
hackagent/attacks/evaluator/evaluation_step.py 85.71% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

Copilot AI commented May 22, 2026

Copilot Can we please merge the master and fix again this branch (CI Checks)?

I merged latest main locally and fixed the commit message format issue that is causing CI Checks to fail (commit-check). The updated local commits are cf1960f (merge) and 28552c6 (commit-message-compliant fix), and local validation passes (cz check + targeted integration tests). I’m blocked from pushing because this requires a non-fast-forward (force) update of the branch history. Please force-push these changes (or allow me to force-update this PR branch), and CI should pass.

@franconicola Nicola Franco (franconicola) merged commit 451e2e2 into main May 22, 2026
22 of 23 checks passed
@franconicola Nicola Franco (franconicola) deleted the fix/judge-resolution-fallback branch May 22, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants