Skip to content

feat: MetaAttack model#441

Merged
jim-smith merged 54 commits into
mainfrom
428-meta-attack
May 26, 2026
Merged

feat: MetaAttack model#441
jim-smith merged 54 commits into
mainfrom
428-meta-attack

Conversation

@shamykyzer

Copy link
Copy Markdown
Contributor

Closes #428

Added a MetaAttack class that runs multiple privacy attacks on the same target and combines their per-record results into one DataFrame.

  • Runs LiRA, QMIA, and/or Structural attacks (with optional repeated runs)
  • Extracts each record's vulnerability score from every attack
  • Aggregates scores in two levels:
    • Within-attack: mean, std, and consistency across repeated runs
    • Cross-attack: MIA ensemble mean (arithmetic + geometric), structural flag, and a count of how many attacks flagged each record
  • Outputs a vulnerability_matrix.csv and a standard JSON report
meta = MetaAttack(
    attacks=[("lira", {"n_shadow_models": 100}, 5), ("qmia", {}), ("structural", {})],
)
meta.attack(target)
df = meta.vulnerability_df  # one row per record, one column group per attack

available via the factory: factory.attack(target, "meta", attacks=[...])

ssrhaso and others added 25 commits March 13, 2026 18:48
Add MetaAttack(Attack) with validated constructor, _parse_attacks(),
and abstract method stubs. Register as "meta" in the attack factory.

Supports (name, params, n_reps) tuples with validation against
supported attacks (lira, qmia, structural). Loads k-anonymity
threshold from ACRO config when not explicitly provided.

Includes design spec and staged implementation plan.
Add _run_sub_attack() and the orchestration loop in _attack().
Each sub-attack runs in an isolated subdirectory under output_dir
to prevent shadow model and report collisions between runs.

MIA attacks (LiRA, QMIA) get report_individual=True injected
automatically. Structural always computes record-level results.
Sub-attack objects are collected for score extraction in Stage 3.
Add _extract_mia_scores() and _extract_structural_scores() with a
field-mapping dict (_MIA_SCORE_FIELDS) for LiRA/QMIA score paths.

Wire extraction into _attack() loop: scores collected immediately
after each sub-attack run into mia_scores and structural_scores
dicts, keyed by attack name with one list per repetition.
- Guard against sub-attack not running: check return value from
  attack() and raise RuntimeError with clear message if empty
- Reject empty attacks list in _parse_attacks with ValueError
- Use copy.deepcopy(params) instead of shallow dict(params) to
  prevent nested mutable values leaking between repetitions
- Add logging.basicConfig to match peer attack file conventions
Implement _build_dataframe() with:
- Level 1 (within-attack): mean, std, consistency per MIA attack
  across n_reps; mean k / majority vote for structural reps
- Level 2 (cross-attack): arithmetic and geometric mean of MIA
  per-attack means; binary structural flag; n_vulnerable count
- NaN padding for structural columns on test records
- Epsilon-stabilised geometric mean to handle log(0)

Wire into _attack(): DataFrame stored on self.vulnerability_df
after all sub-attacks complete and scores are extracted.
- Clip MIA scores to [0, 1] during extraction to handle LiRA Carlini
  modes that produce unbounded log-likelihood ratios
- Document LiRA score convention: score = CDF under out-distribution,
  high values = evidence for membership (not against)
- Replace 'v is True' identity check with truthiness test 'if v:' to
  handle numpy bools correctly
- Round averaged k-anonymity to int for multi-rep structural runs
  (fractional k is not meaningful)
Complete the MetaAttack pipeline:
- _compute_global_metrics: uses mia_mean as membership predictor with
  get_metrics() for AUC/TPR/Advantage; falls back to summary dict
  for structural-only configs
- _construct_metadata: enriches report with thresholds and key metrics
- _get_attack_metrics_instances: standard report structure with
  sub-attack summary and full DataFrame under "individual"
- CSV export: saves vulnerability_matrix.csv alongside JSON report
- _attack() now returns a proper report dict (no more NotImplementedError)
10 test cases covering:
- Validation: unsupported attack, invalid tuple, empty list, bad n_reps
- Integration: QMIA + structural basic run, DataFrame shape and columns
- Structural NaN for test records
- Repeated runs: std column exists, consistency in [0, 1]
- Threshold effects: lower threshold flags more records
- Global metrics: AUC and TPR in [0, 1]
- Report structure: standard nested JSON keys
- Factory integration: factory.attack(target, "meta", ...) works
- CSV export: vulnerability_matrix.csv written and loadable
Demonstrates end-to-end usage: synthetic data, Target construction,
MetaAttack with QMIA (2 reps) + structural, DataFrame inspection,
summary statistics, and top-10 most vulnerable records.
@shamykyzer shamykyzer self-assigned this Apr 12, 2026
@shamykyzer shamykyzer changed the title 428 meta attack MetaAttack Apr 12, 2026
@shamykyzer shamykyzer changed the title MetaAttack feat: MetaAttack model Apr 12, 2026
@shamykyzer

shamykyzer commented May 11, 2026

Copy link
Copy Markdown
Contributor Author

Hi @jim-smith, could you please review these changes that I addressed:

  • 3-way behaviour flag (run_all, use_existing_only, fill_missing) per your sketch
  • MetaAttack output appended to report_dir/report.json by default (keep_separate=True opts out)
  • use_existing_only and fill_missing read both canonical single-file report.json and subdirectory layouts
  • _make_pdf produces a PDF via report.create_meta_report with a bar chart of records flagged by N attacks
  • CHANGELOG and README mention MetaAttack
  • Graceful degradation across sub-attacks (narrow except, warnings on expected failures)
  • Structural n_reps clamped to 1 (deterministic)

Follow-ups (separate PRs / issues):

Thanks.


Overview
meta_attack_overview

Architecture flow with behaviour modes
meta_attack_architecture

Aggregation pipeline
meta_attack_aggregation

Class hierarchy (click to expand) meta_attack_class_hierarchy

@shamykyzer

shamykyzer commented May 11, 2026

Copy link
Copy Markdown
Contributor Author

also -- thank you Jim for the thorough review and the time you put into walking through the code.

Comment thread examples/sklearn/meta_attack_example.py
Comment thread sacroml/attacks/meta_attack.py Outdated
Comment thread sacroml/attacks/meta_attack.py Outdated
super().__init__(output_dir=output_dir, write_report=write_report)
# MetaAttack does not use shadow models; remove the empty directory
# created by the base class so the output directory stays clean.
with contextlib.suppress(OSError):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checking, this doesn't remove shadow models created by other attacks ? that would be catastrophic@!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good thing to check, that would be catastrophic if true, as i read it the safety holds via two lines, but please gut check me

base class at attacks/attack.py:43:

self.shadow_path = os.path.normpath(f"{self.output_dir}/shadow_models")

so self.shadow_path is rooted at this metaattack instance's own output_dir, it can never resolve to another attack's directory

then at meta_attack.py lines 131 to 134:

with contextlib.suppress(OSError):
    os.rmdir(self.shadow_path)

os.rmdir raises OSError on a populated directory which suppress swallows, so the call does nothing rather than destructively deleting if anything is in there

i think the worst case is "remove the empty shadow_models/ this metaattack just created, otherwise do nothing"

@jim-smith

Copy link
Copy Markdown
Contributor

@shamykyzer have made one comment in the example file. Will make a local copt of this branch and see what it gives me when I run it.

@shamykyzer shamykyzer requested a review from jim-smith May 14, 2026 08:30
@ssrhaso

ssrhaso commented May 22, 2026

Copy link
Copy Markdown
Contributor

added unit tests covering the previously-uncovered branches in meta_attack.py (defensive guards, report-scanning edge cases, sub-attack failure paths, non-numeric scores) and the finite-auc branch in create_meta_report.

takes meta_attack.py from 87% to 100% and should clear the codecov/patch and codecov/project checks. test-only, no changes to attack code :)

@shamykyzer

@shamykyzer

shamykyzer commented May 25, 2026

Copy link
Copy Markdown
Contributor Author

hi @jim-smith, just wanted to let you know this PR is ready for review

the constants file is in, the behaviour kwarg is in the example and README, and hasaan added the coverage on the 22nd

the follow ups are each their own PR now, #460 structural respecting report_individual, #461 worstcase per record scores, #462 canary test parametrised over QMIA and LiRA (worstcase is a follow up), #463 lira score to member_prob rename

mind taking another look when you have a chance

@jim-smith jim-smith merged commit 483b641 into main May 26, 2026
4 checks passed
@jim-smith jim-smith deleted the 428-meta-attack branch May 26, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants