Report Generator refactor #1573

otavionvidia · 2026-01-26T19:43:23Z

What this change does

This PR introduces a new React-based report UI for garak, replacing the previous Jinja template system with a modern, interactive frontend built on the Kaizen (KUI) design system.

Key Features:

Interactive probe & detector visualization: Bar charts for probe scores, lollipop charts for detector Z-score comparisons
DEFCON severity badges: Color-coded severity indicators (DC-1 through DC-5) for quick risk assessment
Z-Score visualization: Relative performance metrics with explanatory tooltips
Filtering & sorting: Hide N/A entries, filter by DEFCON level, alphabetical ordering
Dark/light theme support: Respects system preferences with manual toggle
Responsive design: Works across different screen sizes

Backend Changes:

Updated report_digest.py to output JSON data consumed by the new UI
Added total_evaluated and passed counts per detector
Removed Jinja template files (replaced by compiled React app in garak/analyze/ui/index.html)

Verification

Quick Test:

# Run garak with any target
garak -m test.Blank -p test.Test

# View the generated HTML report
open garak_runs/<run_id>/*.report.html

Full Test Suite:

# Run all tests
python -m pytest tests/

# Run specific analyze tests
python -m pytest tests/analyze/

Manual Verification:

Report loads and displays probe data correctly
Clicking a probe shows its detector comparison view
Z-scores display correctly (including N/A for missing data)
DEFCON badges show appropriate colors (DC-1=red, DC-2=yellow, DC-3=blue, DC-4=green, DC-5=teal)
"Hide N/A" checkbox filters unavailable entries
Theme toggle switches between light/dark modes
Charts are alphabetically ordered by class name

Frontend Development (optional):

cd garak-report
yarn install
yarn dev  # Development server at localhost:5173
yarn test # Run frontend unit tests
yarn build # Build production bundle

Notes

No specific hardware requirements
No external API dependencies for the UI itself
The compiled UI (garak/analyze/ui/index.html) is included in this PR

Remove ?? 0 fallback that incorrectly converted null zscore to 0, causing tooltip to show '0.00' instead of 'N/A' for unavailable data.

Instead of rendering chart with only N/A markers, show StatusMessage when all visible entries have null zscore.

- Remove Z-Score tooltip from DetectorChartHeader - Add Z-Score label with info tooltip below chart (near X-axis) - Remove ECharts xAxis name since we have custom label

Change DC-3 badge color from green to blue to visually distinguish from DC-4 (green). CSS chart colors already use different shades (green-200 vs green-600).

Collect detector types from ALL probes instead of just the selected probe. This prevents detector sections from disappearing when selecting a probe that doesn't have certain detectors. Probes without a detector now appear as N/A in that section.

Only iterate over detectors from the selected probe, not all detector types from all probes. This ensures the detector count matches what the probe actually tested.

Show probe name, detector type, and failure/prompt counts even when the chart displays 'No Data Available' empty state.

Backend (report_digest.py): - Remove inaccurate probe-level prompt_count/fail_count (were incorrectly summed) - Rename attempt_count → total_evaluated (matches source) - Rename hit_count → passed (matches source, frontend computes failures) Frontend: - Remove legacy zscore_*/zscore field fallbacks (backend uses relative_*) - Update types to use total_evaluated/passed from backend - Add backward compatibility for old field names (attempt_count/hit_count) - Frontend computes hit_count (failures) from total_evaluated - passed - Remove probe-level prompt/failure count display (data was inaccurate) - Detector-level counts still shown in y-axis labels - Update all related tests

- Show prompt count in tooltip: 'Prompts: 255' - Gives context for score percentages on hover - Fixed 45° label rotation

- Probe bar chart: sort by class name alphabetically (A→Z left to right) - Detector lollipop chart: reverse alphabetical (A at bottom, Z at top) - Consistent ordering across views for better UX - Add tests for sorting behavior

…ields

saichandrapandraju · 2026-01-27T06:56:25Z

Hi @otavionvidia

Thanks for the refactor — this is a solid improvement to the report UI.

While trying out new changes, I noticed a potential difference in how names/labels are handled compared to the pre-refactor behavior.

Previously, the report surfaced more informative, human-readable names as shown below -

whereas with the new changes they seem to be grouped or labeled differently as shown below -

Below is the config used for these runs -

---
system:
  parallel_attempts: 16
  lite: true
run:
  generations: 1
  probe_tags: avid-effect:performance

plugins:
  # Target model configuration
  target_type: openai.OpenAICompatible
  target_name: Granite-3.3-8B-Instruct
  generators:
    openai:
      OpenAICompatible:
        uri: "<redacted>/v1"
        model: "Granite-3.3-8B-Instruct"
        api_key: "dummy"
        suppressed_params:
          - "n"
        max_tokens: 512

reporting:
  taxonomy: avid-effect # or 'owasp'

Similarly, here's the view for owasp taxonomy -

Old UI:

New UI:

I’m not sure if this change is intentional as part of the refactor, but wanted to flag it in case it’s an unintended regression. Happy to dig deeper!

cc @jmartin-tech

jmartin-tech · 2026-01-27T14:44:58Z

@saichandrapandraju thank you for pointing this out, the taxonomy view has not really been well tested here. In fact I am not sure we accounted for this at all. The new flow here is organized based on the digest result and groupings other than by probe package may need some thought.

leondz · 2026-01-27T16:43:31Z

@saichandrapandraju Drafting patch to see if we can get this fixed for release

- Show detectors within a single probe instead of comparing across probes - Y-axis now displays detector names, not probe names - Remove cross-probe comparison logic and Hide N/A checkbox - Add DetectorResultsTable showing DEFCON, passed, failed, total counts - Simplify useDetectorChartOptions hook for new data flow - Update all related tests

…robe metrics - Refactor DetectorsView layout: probe header → detector breakdown → z-score chart - Add severity badge and pass rate display to probe header - Remove fail_count/prompt_count from UI (source unverified in backend) - Extract ProgressBar component for detector results visualization - Extract formatPercentage utility for consistent % display (no .00 decimals) - Use hit_count directly from backend for failure counts (no client calculations) - Add linked hover highlighting between results table and lollipop chart - Add tooltip to module score badge explaining aggregation function - Update all related tests to match new component structure - Fix color mappings: DC-3=blue, DC-4=green for visual consistency

Backend (report_digest.py) provides 'passed' count, not 'hit_count'. Code was defaulting to 0 failures when hit_count wasn't found. - Calculate failures as total - passed when hit_count is missing - Remove failure counts from Z-score y-axis labels (cleaner view) - Update related tests

- Add 'passed' field to useFlattenedModules data transformation - Failures now correctly derived from total - passed - Set probe bar chart y-axis to always show 0-100% scale

- Use module.summary.group instead of module.group_name for display - Shows 'LLM01: Prompt Injection' instead of 'llm01' - Conditionally render Anchor only when group_link exists

- Add ModuleFilterChips component with multi-select support - Show colored dots in x-axis labels to indicate probe module - Update tooltip to show full probe name with module prefix - Consistent module colors across filter chips and chart labels - Support filtering by multiple modules simultaneously - Single module displays as read-only badge

Updates document.title to include the target/model name for easier tab identification when multiple reports are open. Falls back through: target_name → model_name → plugins.model_name

When only one module exists, no need for color coding: - Chart labels: no colored dot prefix - Badge: gray color, no dot, read-only

otavionvidia and others added 13 commits January 13, 2026 14:12

feat(ui): garak-report frontend with Kaizen design system

21ab2cf

fix(ui): display N/A for null zscore in detector tooltip

ea91801

Remove ?? 0 fallback that incorrectly converted null zscore to 0, causing tooltip to show '0.00' instead of 'N/A' for unavailable data.

fix(ui): show empty state when all detector entries are N/A

0f21d63

Instead of rendering chart with only N/A markers, show StatusMessage when all visible entries have null zscore.

fix(ui): move Z-Score label from header to X-axis with info tooltip

36f5219

- Remove Z-Score tooltip from DetectorChartHeader - Add Z-Score label with info tooltip below chart (near X-axis) - Remove ECharts xAxis name since we have custom label

fix(ui): make DC-3 and DC-4 badge colors distinct

1f12335

Change DC-3 badge color from green to blue to visually distinguish from DC-4 (green). CSS chart colors already use different shades (green-200 vs green-600).

Merge 'main' into garak-ui

8dd7f3d

fix(ui): show only selected probe's detectors in comparison view

3285d87

Only iterate over detectors from the selected probe, not all detector types from all probes. This ensures the detector count matches what the probe actually tested.

fix(ui): always show detector header even when No Data Available

1b74fb1

Show probe name, detector type, and failure/prompt counts even when the chart displays 'No Data Available' empty state.

feat: show prompt count in probe tooltip

d63dc2d

- Show prompt count in tooltip: 'Prompts: 255' - Gives context for score percentages on hover - Fixed 45° label rotation

feat: alphabetical ordering for both charts

6655198

- Probe bar chart: sort by class name alphabetically (A→Z left to right) - Detector lollipop chart: reverse alphabetical (A at bottom, Z at top) - Consistent ordering across views for better UX - Add tests for sorting behavior

build: update compiled UI bundle

871ad67

jmartin-tech mentioned this pull request Jan 26, 2026

Garak Report UI #1293

Closed

7 tasks

test: update agg.report.jsonl reference with total_evaluated/passed f…

75f7406

…ields

otavionvidia added 11 commits January 28, 2026 16:12

WIP

9f4575d

fix(report-ui): restore prompt_count display in panel and tooltip

0df5fb6

fix(report-ui): pass through 'passed' field and fix y-axis to 100%

5585174

- Add 'passed' field to useFlattenedModules data transformation - Failures now correctly derived from total - passed - Set probe bar chart y-axis to always show 0-100% scale

fix(report-ui): remove pass rate from detector panel header

4747419

fix(report-ui): use formatted group name for taxonomy groupings

e263c1c

- Use module.summary.group instead of module.group_name for display - Shows 'LLM01: Prompt Injection' instead of 'llm01' - Conditionally render Anchor only when group_link exists

feat(report): show target name in browser tab title

3364942

Updates document.title to include the target/model name for easier tab identification when multiple reports are open. Falls back through: target_name → model_name → plugins.model_name

fix(module-filter): hide dots and use gray badge for single module

3643351

When only one module exists, no need for color coding: - Chart labels: no colored dot prefix - Badge: gray color, no dot, read-only

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report Generator refactor #1573

Report Generator refactor #1573

Uh oh!

otavionvidia commented Jan 26, 2026 •

edited

Loading

Uh oh!

saichandrapandraju commented Jan 27, 2026

Uh oh!

jmartin-tech commented Jan 27, 2026

Uh oh!

leondz commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Report Generator refactor #1573

Are you sure you want to change the base?

Report Generator refactor #1573

Uh oh!

Conversation

otavionvidia commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this change does

Key Features:

Backend Changes:

Verification

Quick Test:

Full Test Suite:

Manual Verification:

Frontend Development (optional):

Notes

Uh oh!

saichandrapandraju commented Jan 27, 2026

Uh oh!

jmartin-tech commented Jan 27, 2026

Uh oh!

leondz commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

otavionvidia commented Jan 26, 2026 •

edited

Loading