Add Bootstrap Confidence Intervals for Attack Success Rates #1577

patriciapampanelli · 2026-01-27T19:06:22Z

Summary

Adds 95% bootstrap CIs to attack success rates, accounting for sampling variance and detector imperfection via Rogan-Gladen correction.

Changes

New: bootstrap_ci.py, detector_metrics.py - CI calculation with Se/Sp correction
Modified: evaluators/base.py - CI integration into eval pipeline and output
Modified: report_digest.py - CI propagation through reports

Methodology

Resampling: Draws 10,000 bootstrap samples from the binary pass/fail results (with replacement)
Correction: Adjusts each sample's observed rate using the Rogan-Gladen formula to account for detector error
Interval extraction: Takes the 2.5th and 97.5th percentiles as CI bounds

The correction formula:

P_true = (P_obs + Sp - 1) / (Se + Sp - 1)

P_obs = observed failure rate in the resampled data
Se = detector sensitivity (probability of detecting a true attack)
Sp = detector specificity (probability of correctly passing a benign response)

Requires ≥30 evaluated outputs per probe-detector pair; falls back to perfect detector (Se=Sp=1.0) when detector metrics unavailable.

Statistical Limitations

Se/Sp treated as fixed (no detectors uncertainty propagation)
Uses detector-level metrics only (not probe-specific): Detector performance (Se/Sp) can vary depending on the probe.

Out of Scope

Probe-specific Se/Sp lookup

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

erickgalinkin

Would be nice to find a better way to print this. I'm mostly confident that this methodology can work, though I had trouble writing a formal proof that this gives us a true 95% CI.

erickgalinkin · 2026-01-27T19:33:11Z

docs/source/reporting.rst

+During console output, attack success rates may include confidence intervals displayed as: ``(attack success rate: 45.23%) ± 2.15``.
+The ± margin represents the 95% confidence interval half-width in percentage points.


Realistically, our + and - won't be evenly distributed. We almost universally have asymmetric CIs.

Absolutely, yes, they are already calculated asymmetrically. I'll correct how the CI's are displayed.

Done. Updated to bracketed format [lower%, upper%].

garak/analyze/bootstrap_ci.py

erickgalinkin · 2026-01-27T19:35:02Z

garak/analyze/bootstrap_ci.py

+        p_obs = resampled_results.mean()
+
+        # Apply Se/Sp correction to get true ASR
+        # TODO: propagate detector metric uncertainty (requires Se/Sp CIs in detector_metrics_summary.json)


garak/analyze/bootstrap_ci.py

erickgalinkin · 2026-01-27T19:43:29Z

garak/evaluators/base.py

+            ci_text = (
+                f" ± {(ci_upper - ci_lower) / 2:.2f}"
+                if ci_lower is not None and ci_upper is not None
+                else ""
+            )


Doesn't this assume even distribution? I understand there's some lossiness in printing it this way, but I'd think that if failrate is, for example, 100%, we'd want something more like:
ci_lower <= failrate? Hard to manage it, but I'm not completely sure how to avoid saying something like "100% ± 10%"

would love to do this based on model of distribution of probe:detector scores acquired during calibration, thus ditching the frequently-untrue even assumption

@leondz I have a separate research branch where I try a totally different calculation. Working on checking how different my bounds (which are derived from a nonparametric test on an empirical CDF) are compared to these.

garak/evaluators/base.py

…ic ± format Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

leondz

Shaping up well. Few minor requests around non-duplication and configuration. Larger questions about where this code belongs and how to support CI calculation beyond Evaluator.

leondz · 2026-01-28T13:00:13Z

docs/source/reporting.rst

 Status can be 0 (not sent to target), 1 (with target response but not evaluated), or 2 (with response and evaluation).
 Eval-type entries are added after each probe/detector pair completes, and list the results used to compute the score.

+Confidence Intervals (Optional)


What does the (Optional) refer to here?

docs/source/reporting.rst

leondz · 2026-01-28T13:01:05Z

garak/analyze/bootstrap_ci.py

+    num_iterations: int = 10000,
+    confidence_level: float = 0.95,


these should be configurable, propose in core config under reporting

Fixed. Now reads from _config.reporting.

Thanks!

The intent with _config for objects to never read from it, but instead from a config parameter passed at instantiation. I think adherence to this pattern might block directly accessing _config in these methods, and then the question is where does the data come from. One solution might be to have the instantiated Evaluator - which is configured with access to those parameter - pass these values to this function; or even to pass this function its own config object. Could that make sense?

also paging @jmartin-tech for opinion

leondz · 2026-01-28T13:02:38Z

garak/analyze/bootstrap_ci.py

+    num_iterations: int = 10000,
+    confidence_level: float = 0.95,


Should be configurable. Also, prefer defining defaults in just one place wherever possible

Fixed. Now reads from _config.reporting.

garak/analyze/bootstrap_ci.py

leondz · 2026-01-28T13:14:07Z

garak/evaluators/base.py

+                            se,
+                            sp,
+                        )
+                except Exception as e:


avoid catching Exception parent class, be specific

Fixed. Now catching specific ValueError.

leondz · 2026-01-28T13:14:50Z

garak/evaluators/base.py

+
+            # Add CI fields if calculation succeeded
+            if ci_lower is not None and ci_upper is not None:
+                eval_record["confidence"] = "0.95"


draw this 0.95 value from one central place

Fixed. Reading from _config.reporting.bootstrap_confidence_level.

leondz · 2026-01-28T13:16:06Z

garak/evaluators/base.py

+            ci_text = (
+                f" ± {(ci_upper - ci_lower) / 2:.2f}"
+                if ci_lower is not None and ci_upper is not None
+                else ""
+            )


would love to do this based on model of distribution of probe:detector scores acquired during calibration, thus ditching the frequently-untrue even assumption

leondz · 2026-01-28T13:19:40Z

garak/evaluators/base.py

In this implementation, CIs are calculated only during active garak runs, before results eval objects are logged

Disadvantages:

There's no route to calculating CIs post-hoc (e.g. for older runs)

There's no route to recalculating CIs with different config

Failures during the non-trivial CI calc procedure abort the run

Would prefer to factor this out and have it run at report digest compilation time. On the other hand that fails the requirement to print CIs on the command line. That's tricky. Can we get both? Calling report_digest already recalculates a great deal - I wouldn't be averse to having a "rebuild_cis" flag for that when called as a CLI tool.

leondz · 2026-01-28T13:24:12Z

garak/analyze/detector_metrics.py

I'd still like a super-simple CI for the general case that ignores detector performance, clamped to 0.0-1.0. We can estimate a CI for cases where we don't have extensive detector perf information, and we can do it quickly.

Could be configured in core via e.g. reporting.confidence_interval_method with values:

None - no confidence interval calc/display

bootstrap - bootstrap only

simple - simple only

backoff - bootstrap where we can, simple in the gaps

backoff might be a bit much for this week, but some pattern like this is where I'd like this to go

Co-authored-by: Leon Derczynski <leonderczynski@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

… config with None/bootstrap options Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Implement non-parametric bootstrap CI for ASR with detector correction

41ad57c

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

patriciapampanelli requested review from erickgalinkin, jmartin-tech and leondz January 27, 2026 19:06

patriciapampanelli self-assigned this Jan 27, 2026

erickgalinkin reviewed Jan 27, 2026

View reviewed changes

patriciapampanelli and others added 3 commits January 27, 2026 13:07

Display CI's as asymmetric bounds [lower%, upper%] instead of symmetr…

803c9b0

…ic ± format Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Update garak/analyze/bootstrap_ci.py

e820db3

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Optimize calculation by skipping correction for perfect detectors

794f3ec

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

leondz requested changes Jan 28, 2026

View reviewed changes

leondz added the reporting Reporting, analysis, and other per-run result functions label Jan 28, 2026

patriciapampanelli and others added 7 commits January 28, 2026 12:45

Update docs/source/reporting.rst

3d2b4a4

Co-authored-by: Leon Derczynski <leonderczynski@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Centralize configuration values and improve error handling

f83ab4e

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Refactor detector metrics exception handling

4382e32

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Refactor detector metrics and document bootstrap config

cab9c1e

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Make CI calculation optional via reporting.confidence_interval_method…

29fd1a4

… config with None/bootstrap options Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Add --rebuild_cis command and fix CI inversion bug in HTML display

0b83f44

Add CI calculator tests

e2e7086

		During console output, attack success rates may include confidence intervals displayed as: ``(attack success rate: 45.23%) ± 2.15``.
		The ± margin represents the 95% confidence interval half-width in percentage points.

Add Bootstrap Confidence Intervals for Attack Success Rates #1577

Are you sure you want to change the base?

Add Bootstrap Confidence Intervals for Attack Success Rates #1577

Uh oh!

Conversation

patriciapampanelli commented Jan 27, 2026

Summary

Changes

Methodology

Statistical Limitations

Out of Scope

Uh oh!

erickgalinkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leondz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants