UX improvements by RPaolino · Pull Request #408 · AISecurityLab/hackagent

Raffaele Paolino (RPaolino) · 2026-05-28T09:50:32Z

Summary

Major overhaul of the local dashboard and introduction of guardrails infrastructure for the attack pipeline.

Dashboard Improvements

Modular attack cards — Extracted all attack-specific goal card rendering (parse + render methods) from the monolithic _page.py.
SVG plot downloads — All ECharts (risk distribution, vulnerability radar, robustness scores) can now be exported as SVG.
Run & goal filters — Added status/category/search filtering in the History tab goal list.
Copy to clipboard — Added copy buttons for prompts and responses across all attack cards.
Guardrail visualization — Before/after guardrail blocks are rendered as distinct visual banners with category and explanation.
Consistent tables — Unified table styling between Dashboard and History tabs.
Comparison panel — Improved multi-run comparison visualization.
Layout fixes — History tab uses full panel width; removed unused Report tab.

Fixes #354

- Add GuardrailExtractor for parsing guardrail events from agent responses - Integrate before/after guardrail detection in router - Track guardrail events in coordinator and tracker - Update all attack techniques to handle guardrail-blocked responses: baseline, advprefix, bon, cipherchat, flipattack, h4rm3l, pap - Export guardrail utilities from attacks.shared

- Replace guardrail_blocked/guardrail_event with adapter_type: guardrail - Add is_guardrail_response() and get_guardrail_info() to response_utils - Update router to emit structured agent_specific_data (side, categories, reasoning) - Migrate all 10 attack techniques to use canonical detection helper - Update tracker to detect guardrail responses via adapter_type - Switch guardrail.py to JSON-structured output parsing with keyword fallback

- PAIR: pass full guardrail response dict to add_interaction_trace so the dashboard can detect and render guardrail blocks per iteration - TAP: return descriptive guardrail marker string from _query_target instead of None so blocked iterations show guardrail info in traces

- Add guardrail event rendering in trace views (before/after blocks) - Add two-panel History run dialog with config chips and metrics - Add attack-specific trace parsing and rendering for all attack types - Add category/subcategory grouping in goal lists - Add compact goal cards with color-coded borders

When goal_batch_workers > 1, each goal gets its own attack instance with _goal_index_offset. The tracker creates goal contexts at that offset, but generation.execute() and evaluation.execute() used enumerate(goals) starting at 0 to look up contexts. For any goal with offset != 0 this returned None, silently skipping Candidate/Summary traces and tap_judge evaluations.

- Return the structured guardrail response dict instead of string-encoding it as [GUARDRAIL:xxx], so tracker and dashboard handle it properly - Pass empty string to judges for guardrail-blocked responses (score 0) - Remove [:500] slice on response in trace recording (tracker handles dicts)

Ensures the goal index offset is passed through to both TAP pipeline steps so multi-batch goal evaluation uses the correct tracker context.

- Call _update_tracker() after _sync_to_server() so each prefix gets an evaluation trace with its score in the DB - Embed prefix text in evaluation trace metadata so the dashboard can attribute jailbreaks to specific prefixes

AutoDAN-Turbo: - Read phase/subphase from content (not step_name) for DB-loaded traces - Skip bookend traces (PHASE_START/END, SKIP_FINALIZED) - Detect WARMUP_SUMMARY via phase+subphase instead of step_name - Group epochs under iteration sub-headers in the renderer Guardrail display: - Add legacy [GUARDRAIL:xxx] string-pattern fallback in extractor - Add guardrail categories to trace data and rendering templates - Improve guardrail event rendering with structured pre blocks - Propagate _guardrail_categories through all parsing paths

History tab — Run list: - Replace pagination with infinite scroll ("Load more" button) - Add filter bar: search, agent, attack type, and status dropdowns - Load all runs upfront and filter client-side for instant feedback History tab — Run detail dialog: - Add goal filter bar with search, status, and category dropdowns - Preserve original goal numbering when filters are applied

Two bugs caused per-prefix/per-template detail rows to always display 'Mitigated' even when the goal was successfully jailbroken: 1. AdvPrefix: The Evaluation step's config_keys was missing '_tracker', so no evaluation traces were created. The dashboard matches completion traces to evaluation traces by prefix string to determine which rows are jailbreaks — without traces, all rows defaulted to 'mitigated'. 2. Baseline: The dashboard's _parse_baseline_traces hardcoded the evaluator name 'baseline_pattern_evaluator', but when using LLM judges (the default), the evaluator name is 'baseline_llm_judge'. The eval trace was never matched, so all rows defaulted to 'mitigated'.

…xin modules

+
+                    for depth_level in sorted(by_depth.keys()):
+                        depth_nodes = by_depth[depth_level]
+                        _ds = (depth_stats or {}).get(depth_level, {})


+            if score_raw is not None:
+                try:
+                    step["score"] = float(score_raw)
+                except (TypeError, ValueError):


+            if score_delta_raw is not None:
+                try:
+                    step["score_delta"] = float(score_delta_raw)
+                except (TypeError, ValueError):


codecov · 2026-05-28T10:03:06Z

Codecov Report

❌ Patch coverage is 10.25855% with 1076 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ackagent/server/dashboard/attack_cards/_autodan.py	4.50%	191 Missing ⚠️
...kagent/server/dashboard/attack_cards/_advprefix.py	4.30%	178 Missing ⚠️
hackagent/server/dashboard/attack_cards/_shared.py	13.10%	126 Missing ⚠️
hackagent/server/dashboard/attack_cards/_tap.py	6.71%	125 Missing ⚠️
...ckagent/server/dashboard/attack_cards/_baseline.py	8.08%	91 Missing ⚠️
hackagent/server/dashboard/attack_cards/_bon.py	8.42%	87 Missing ⚠️
hackagent/server/dashboard/attack_cards/_pair.py	9.89%	82 Missing ⚠️
hackagent/server/dashboard/attack_cards/_mml.py	11.36%	78 Missing ⚠️
hackagent/server/dashboard/attack_cards/_pap.py	11.76%	60 Missing ⚠️
...ackagent/server/dashboard/attack_cards/_generic.py	15.62%	54 Missing ⚠️
... and 2 more

📢 Thoughts on this report? Let us know!

Raffaele Paolino (RPaolino) and others added 23 commits May 25, 2026 09:03

fix: add _goal_index_offset to TAP config_keys

1526432

Ensures the goal index offset is passed through to both TAP pipeline steps so multi-batch goal evaluation uses the correct tracker context.

fix: record per-prefix evaluation traces in AdvPrefix

b8490a8

- Call _update_tracker() after _sync_to_server() so each prefix gets an evaluation trace with its score in the DB - Embed prefix text in evaluation trace metadata so the dashboard can attribute jailbreaks to specific prefixes

fix: guardrail blocked

3879ee3

fix: consistent tables between dashboard and history tabs

e465b01

feat: added copy to clipboard button

91cf581

fix: removed Report tab

d9b849f

fix: History tab using the whole panel width

e6c44a2

fix: consistent Run Details and Comparison panels

47a08af

feat: improved comparison visualization

21f92b1

feat: download plots in svg format

86af06f

feat: added documentation, cli and tui support of guardrails

bd88d76

fix: prevent TAP attacker from seeing guardrail internals on block

10f7605

fix: remove guardrails keys from attack config dict

0985b1d

refactor(dashboard): extract attack card renderers into per-attack mi…

a8e4b1d

…xin modules

Nicola Franco (franconicola) deployed to feat/dashboard-improvements - Docs PR #408 May 28, 2026 09:50 — with Render View deployment

github-code-quality Bot found potential problems May 28, 2026

View reviewed changes

style: format router.py with ruff

039f747

github-code-quality Bot found potential problems May 28, 2026

View reviewed changes

Comment thread hackagent/router/tracking/coordinator.py Fixed

✅ test(dashboard-tests): additional tests for the dashboard

18c4401

Nicola Franco (franconicola) temporarily deployed to feat/dashboard-improvements - Docs PR #408 May 30, 2026 19:27 — with Render Destroyed

Nicola Franco (franconicola) merged commit 91a7e0c into main Jun 1, 2026
23 of 24 checks passed

Nicola Franco (franconicola) deleted the feat/dashboard-improvements branch June 1, 2026 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UX improvements#408

UX improvements#408
Nicola Franco (franconicola) merged 25 commits into
mainfrom
feat/dashboard-improvements

Raffaele Paolino (RPaolino) commented May 28, 2026

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Raffaele Paolino (RPaolino) commented May 28, 2026

Summary

Dashboard Improvements

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 28, 2026 •

edited

Loading