Skip to content

Conversation

@otavionvidia
Copy link

@otavionvidia otavionvidia commented Jan 26, 2026

What this change does

This PR introduces a new React-based report UI for garak, replacing the previous Jinja template system with a modern, interactive frontend built on the Kaizen (KUI) design system.

Key Features:

  • Interactive probe & detector visualization: Bar charts for probe scores, lollipop charts for detector Z-score comparisons
  • DEFCON severity badges: Color-coded severity indicators (DC-1 through DC-5) for quick risk assessment
  • Z-Score visualization: Relative performance metrics with explanatory tooltips
  • Filtering & sorting: Hide N/A entries, filter by DEFCON level, alphabetical ordering
  • Dark/light theme support: Respects system preferences with manual toggle
  • Responsive design: Works across different screen sizes

Backend Changes:

  • Updated report_digest.py to output JSON data consumed by the new UI
  • Added total_evaluated and passed counts per detector
  • Removed Jinja template files (replaced by compiled React app in garak/analyze/ui/index.html)

Verification

Quick Test:

# Run garak with any target
garak -m test.Blank -p test.Test

# View the generated HTML report
open garak_runs/<run_id>/*.report.html

Full Test Suite:

# Run all tests
python -m pytest tests/

# Run specific analyze tests
python -m pytest tests/analyze/

Manual Verification:

  1. Report loads and displays probe data correctly
  2. Clicking a probe shows its detector comparison view
  3. Z-scores display correctly (including N/A for missing data)
  4. DEFCON badges show appropriate colors (DC-1=red, DC-2=yellow, DC-3=blue, DC-4=green, DC-5=teal)
  5. "Hide N/A" checkbox filters unavailable entries
  6. Theme toggle switches between light/dark modes
  7. Charts are alphabetically ordered by class name

Frontend Development (optional):

cd garak-report
yarn install
yarn dev  # Development server at localhost:5173
yarn test # Run frontend unit tests
yarn build # Build production bundle

Notes

  • No specific hardware requirements
  • No external API dependencies for the UI itself
  • The compiled UI (garak/analyze/ui/index.html) is included in this PR

otavionvidia and others added 13 commits January 13, 2026 14:12
Remove ?? 0 fallback that incorrectly converted null zscore to 0,
causing tooltip to show '0.00' instead of 'N/A' for unavailable data.
Instead of rendering chart with only N/A markers, show StatusMessage
when all visible entries have null zscore.
- Remove Z-Score tooltip from DetectorChartHeader
- Add Z-Score label with info tooltip below chart (near X-axis)
- Remove ECharts xAxis name since we have custom label
Change DC-3 badge color from green to blue to visually
distinguish from DC-4 (green). CSS chart colors already use
different shades (green-200 vs green-600).
Collect detector types from ALL probes instead of just the selected
probe. This prevents detector sections from disappearing when
selecting a probe that doesn't have certain detectors.

Probes without a detector now appear as N/A in that section.
Only iterate over detectors from the selected probe, not all detector
types from all probes. This ensures the detector count matches what
the probe actually tested.
Show probe name, detector type, and failure/prompt counts even when
the chart displays 'No Data Available' empty state.
Backend (report_digest.py):
- Remove inaccurate probe-level prompt_count/fail_count (were incorrectly summed)
- Rename attempt_count → total_evaluated (matches source)
- Rename hit_count → passed (matches source, frontend computes failures)

Frontend:
- Remove legacy zscore_*/zscore field fallbacks (backend uses relative_*)
- Update types to use total_evaluated/passed from backend
- Add backward compatibility for old field names (attempt_count/hit_count)
- Frontend computes hit_count (failures) from total_evaluated - passed
- Remove probe-level prompt/failure count display (data was inaccurate)
- Detector-level counts still shown in y-axis labels
- Update all related tests
- Show prompt count in tooltip: 'Prompts: 255'
- Gives context for score percentages on hover
- Fixed 45° label rotation
- Probe bar chart: sort by class name alphabetically (A→Z left to right)
- Detector lollipop chart: reverse alphabetical (A at bottom, Z at top)
- Consistent ordering across views for better UX
- Add tests for sorting behavior
@jmartin-tech jmartin-tech mentioned this pull request Jan 26, 2026
7 tasks
@saichandrapandraju
Copy link
Contributor

Hi @otavionvidia

Thanks for the refactor — this is a solid improvement to the report UI.

While trying out new changes, I noticed a potential difference in how names/labels are handled compared to the pre-refactor behavior.

Previously, the report surfaced more informative, human-readable names as shown below -
image

whereas with the new changes they seem to be grouped or labeled differently as shown below -
image

Below is the config used for these runs -

---
system:
  parallel_attempts: 16
  lite: true
run:
  generations: 1
  probe_tags: avid-effect:performance

plugins:
  # Target model configuration
  target_type: openai.OpenAICompatible
  target_name: Granite-3.3-8B-Instruct
  generators:
    openai:
      OpenAICompatible:
        uri: "<redacted>/v1"
        model: "Granite-3.3-8B-Instruct"
        api_key: "dummy"
        suppressed_params:
          - "n"
        max_tokens: 512

reporting:
  taxonomy: avid-effect # or 'owasp'

Similarly, here's the view for owasp taxonomy -

Old UI:
image

New UI:
image

I’m not sure if this change is intentional as part of the refactor, but wanted to flag it in case it’s an unintended regression. Happy to dig deeper!

cc @jmartin-tech

@jmartin-tech
Copy link
Collaborator

@saichandrapandraju thank you for pointing this out, the taxonomy view has not really been well tested here. In fact I am not sure we accounted for this at all. The new flow here is organized based on the digest result and groupings other than by probe package may need some thought.

@leondz
Copy link
Collaborator

leondz commented Jan 27, 2026

@saichandrapandraju Drafting patch to see if we can get this fixed for release

- Show detectors within a single probe instead of comparing across probes
- Y-axis now displays detector names, not probe names
- Remove cross-probe comparison logic and Hide N/A checkbox
- Add DetectorResultsTable showing DEFCON, passed, failed, total counts
- Simplify useDetectorChartOptions hook for new data flow
- Update all related tests
…robe metrics

- Refactor DetectorsView layout: probe header → detector breakdown → z-score chart
- Add severity badge and pass rate display to probe header
- Remove fail_count/prompt_count from UI (source unverified in backend)
- Extract ProgressBar component for detector results visualization
- Extract formatPercentage utility for consistent % display (no .00 decimals)
- Use hit_count directly from backend for failure counts (no client calculations)
- Add linked hover highlighting between results table and lollipop chart
- Add tooltip to module score badge explaining aggregation function
- Update all related tests to match new component structure
- Fix color mappings: DC-3=blue, DC-4=green for visual consistency
Backend (report_digest.py) provides 'passed' count, not 'hit_count'.
Code was defaulting to 0 failures when hit_count wasn't found.

- Calculate failures as total - passed when hit_count is missing
- Remove failure counts from Z-score y-axis labels (cleaner view)
- Update related tests
- Add 'passed' field to useFlattenedModules data transformation
- Failures now correctly derived from total - passed
- Set probe bar chart y-axis to always show 0-100% scale
- Use module.summary.group instead of module.group_name for display
- Shows 'LLM01: Prompt Injection' instead of 'llm01'
- Conditionally render Anchor only when group_link exists
- Add ModuleFilterChips component with multi-select support
- Show colored dots in x-axis labels to indicate probe module
- Update tooltip to show full probe name with module prefix
- Consistent module colors across filter chips and chart labels
- Support filtering by multiple modules simultaneously
- Single module displays as read-only badge
Updates document.title to include the target/model name for easier
tab identification when multiple reports are open.

Falls back through: target_name → model_name → plugins.model_name
When only one module exists, no need for color coding:
- Chart labels: no colored dot prefix
- Badge: gray color, no dot, read-only
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants