-
Notifications
You must be signed in to change notification settings - Fork 769
Report Generator refactor #1573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
otavionvidia
wants to merge
25
commits into
NVIDIA:main
Choose a base branch
from
otavionvidia:kaizen-implementation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Remove ?? 0 fallback that incorrectly converted null zscore to 0, causing tooltip to show '0.00' instead of 'N/A' for unavailable data.
Instead of rendering chart with only N/A markers, show StatusMessage when all visible entries have null zscore.
- Remove Z-Score tooltip from DetectorChartHeader - Add Z-Score label with info tooltip below chart (near X-axis) - Remove ECharts xAxis name since we have custom label
Change DC-3 badge color from green to blue to visually distinguish from DC-4 (green). CSS chart colors already use different shades (green-200 vs green-600).
Collect detector types from ALL probes instead of just the selected probe. This prevents detector sections from disappearing when selecting a probe that doesn't have certain detectors. Probes without a detector now appear as N/A in that section.
Only iterate over detectors from the selected probe, not all detector types from all probes. This ensures the detector count matches what the probe actually tested.
Show probe name, detector type, and failure/prompt counts even when the chart displays 'No Data Available' empty state.
Backend (report_digest.py): - Remove inaccurate probe-level prompt_count/fail_count (were incorrectly summed) - Rename attempt_count → total_evaluated (matches source) - Rename hit_count → passed (matches source, frontend computes failures) Frontend: - Remove legacy zscore_*/zscore field fallbacks (backend uses relative_*) - Update types to use total_evaluated/passed from backend - Add backward compatibility for old field names (attempt_count/hit_count) - Frontend computes hit_count (failures) from total_evaluated - passed - Remove probe-level prompt/failure count display (data was inaccurate) - Detector-level counts still shown in y-axis labels - Update all related tests
- Show prompt count in tooltip: 'Prompts: 255' - Gives context for score percentages on hover - Fixed 45° label rotation
- Probe bar chart: sort by class name alphabetically (A→Z left to right) - Detector lollipop chart: reverse alphabetical (A at bottom, Z at top) - Consistent ordering across views for better UX - Add tests for sorting behavior
Contributor
Collaborator
|
@saichandrapandraju thank you for pointing this out, the taxonomy view has not really been well tested here. In fact I am not sure we accounted for this at all. The new flow here is organized based on the |
Collaborator
|
@saichandrapandraju Drafting patch to see if we can get this fixed for release |
- Show detectors within a single probe instead of comparing across probes - Y-axis now displays detector names, not probe names - Remove cross-probe comparison logic and Hide N/A checkbox - Add DetectorResultsTable showing DEFCON, passed, failed, total counts - Simplify useDetectorChartOptions hook for new data flow - Update all related tests
…robe metrics - Refactor DetectorsView layout: probe header → detector breakdown → z-score chart - Add severity badge and pass rate display to probe header - Remove fail_count/prompt_count from UI (source unverified in backend) - Extract ProgressBar component for detector results visualization - Extract formatPercentage utility for consistent % display (no .00 decimals) - Use hit_count directly from backend for failure counts (no client calculations) - Add linked hover highlighting between results table and lollipop chart - Add tooltip to module score badge explaining aggregation function - Update all related tests to match new component structure - Fix color mappings: DC-3=blue, DC-4=green for visual consistency
Backend (report_digest.py) provides 'passed' count, not 'hit_count'. Code was defaulting to 0 failures when hit_count wasn't found. - Calculate failures as total - passed when hit_count is missing - Remove failure counts from Z-score y-axis labels (cleaner view) - Update related tests
- Add 'passed' field to useFlattenedModules data transformation - Failures now correctly derived from total - passed - Set probe bar chart y-axis to always show 0-100% scale
- Use module.summary.group instead of module.group_name for display - Shows 'LLM01: Prompt Injection' instead of 'llm01' - Conditionally render Anchor only when group_link exists
- Add ModuleFilterChips component with multi-select support - Show colored dots in x-axis labels to indicate probe module - Update tooltip to show full probe name with module prefix - Consistent module colors across filter chips and chart labels - Support filtering by multiple modules simultaneously - Single module displays as read-only badge
Updates document.title to include the target/model name for easier tab identification when multiple reports are open. Falls back through: target_name → model_name → plugins.model_name
When only one module exists, no need for color coding: - Chart labels: no colored dot prefix - Badge: gray color, no dot, read-only
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.




What this change does
This PR introduces a new React-based report UI for garak, replacing the previous Jinja template system with a modern, interactive frontend built on the Kaizen (KUI) design system.
Key Features:
Backend Changes:
report_digest.pyto output JSON data consumed by the new UItotal_evaluatedandpassedcounts per detectorgarak/analyze/ui/index.html)Verification
Quick Test:
Full Test Suite:
Manual Verification:
Frontend Development (optional):
Notes
garak/analyze/ui/index.html) is included in this PR