feat: new instance-based attack for data leakage in SVM/kNN models by shamykyzer · Pull Request #431 · AI-SDC/SACRO-ML

shamykyzer · 2026-03-27T02:44:53Z

New InstanceBasedAttack detects training data leakage in models that store raw instances, support vectors (SVC, NuSVC, OneClassSVM) and stored neighbours (KNeighborsClassifier, KNeighborsRegressor)
Compares stored instances against training data via np.allclose, reports first ten matches with feature previews, plus storage fraction and match fraction metrics
Unwraps sklearn.Pipeline so the comparison runs in the final estimator's feature space
Detects differentially private model variants and surfaces a mitigation note
Registered in the factory under "instance_based"
Match Fraction glossary updated to "A non-zero match fraction confirms data leakage" per Jim's review
Module level constants extracted: INSTANCE_MATCH_ATOL = 1e-8, N_EXAMPLES = 10, N_FEATURE_PREVIEW = 10

Closes [New Feature Request] New Attack: Model contains training data#59
Closes #454

codecov · 2026-03-27T02:55:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.66%. Comparing base (7176627) to head (a0c25cd).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #431      +/-   ##
==========================================
+ Coverage   99.65%   99.66%   +0.01%     
==========================================
  Files          27       28       +1     
  Lines        3439     3632     +193     
==========================================
+ Hits         3427     3620     +193     
  Misses         12       12

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… in SVM/kNN models

jim-smith

@shamykyzer Just a couple of minor changes to make please.
If you want to raise the 'move dealing with pipelines into utils.py' as a separate issue, and leave that code in here for now that is fine.

…tants

shamykyzer · 2026-05-22T10:36:38Z

Hi @jim-smith, could you please review this PR so far?

thanks a lot.

- add report_individual option, gated like StructuralAttack so the per-record block only appears under the 'individual' key when set - record all matched instances (n_examples now limits PDF display only) - replace bespoke example_matches with an InstanceBasedRecordLevelResults dataclass of parallel lists, consistent with other attacks - give InstanceBasedAttackResults field defaults to trim the graceful-degradation construction sites

# Conflicts: # CHANGELOG.md

jim-smith

Still needs changes. I think the way it is implemented is possibly more efficient and the overall message "does this mode lcintain training instances' is answered correctly.

However, the way that the record level results are presented (is this stored instance present in the training set) is inconsistent with the way it is presented for other attacks (which would be 'is this training record stored in the model'. Quick change to create a new field,
individual_risk:np.array = np.zeros(X_train.shape[0],dtype=int) and then a for loop setting to 1 (True) the index of training record you have stored in individual level results

…Attack

shamykyzer self-assigned this Mar 27, 2026

shamykyzer requested review from jim-smith and rpreen March 27, 2026 02:45

shamykyzer marked this pull request as ready for review March 27, 2026 02:54

shamykyzer and others added 2 commits April 2, 2026 12:52

feat: add instance-based model attack to detect training data leakage…

7b963c0

… in SVM/kNN models

style: pre-commit fixes

da9ed7d

shamykyzer force-pushed the new-attack-model branch from a901e02 to da9ed7d Compare April 2, 2026 11:52

jim-smith reviewed Apr 24, 2026

View reviewed changes

Comment thread sacroml/attacks/instance_based_attack.py Outdated

jim-smith reviewed Apr 24, 2026

View reviewed changes

Comment thread sacroml/attacks/instance_based_attack.py Outdated

jim-smith reviewed Apr 24, 2026

View reviewed changes

Comment thread sacroml/attacks/instance_based_attack.py Outdated

jim-smith requested changes Apr 24, 2026

View reviewed changes

shamykyzer and others added 3 commits May 12, 2026 17:01

docs: clarify match fraction glossary to flag any non-zero leakage

6c10dc3

refactor: name magic numbers as N_EXAMPLES and N_FEATURE_PREVIEW cons…

6359835

…tants

Merge branch 'main' into new-attack-model

0eb5e19

This was referenced May 15, 2026

chore: define shared atol semantics across attacks #454

Open

refactor: move _unwrap_model to utils.py #455

Closed

shamykyzer and others added 3 commits May 18, 2026 16:51

style: add type annotations to _unwrap_model

ffa061c

refactor: tighten _unwrap_model annotations to sklearn types

445a9da

Merge branch 'main' into new-attack-model

bcb370a

shamykyzer requested a review from jim-smith May 18, 2026 14:06

Merge branch 'main' into new-attack-model

afc47f0

shamykyzer mentioned this pull request May 21, 2026

refactor: move unwrap_model to sacroml.attacks.utils for reuse #458

Closed

shamykyzer added 2 commits May 21, 2026 14:15

refactor: extract INSTANCE_MATCH_ATOL constant for InstanceBasedAttack

ea331ab

refactor: move unwrap_model to sacroml.attacks.utils for reuse

a05adbc

This was linked to issues May 22, 2026

refactor: move _unwrap_model to utils.py #455

Closed

chore: define shared atol semantics across attacks #454

Open

revert: move unwrap_model to utils.py

a7e842f

shamykyzer removed a link to an issue May 22, 2026

refactor: move _unwrap_model to utils.py #455

Closed

shamykyzer added 2 commits May 25, 2026 11:35

test: cover graceful-degradation paths in InstanceBasedAttack

1e869be

style: rename test variables to match ruff pep8-naming allowlist

b014644

jim-smith requested changes May 26, 2026

View reviewed changes

shamykyzer and others added 4 commits May 26, 2026 16:58

refactor: move unwrap_model to sacroml.attacks.utils for reuse (#459)

6c52949

Merge remote-tracking branch 'origin/main' into new-attack-model

6268e39

# Conflicts: # CHANGELOG.md

style: satisfy pydocstringformatter on test docstring

ca71239

shamykyzer requested a review from jim-smith May 29, 2026 13:34

jim-smith requested changes Jun 5, 2026

View reviewed changes

Comment thread sacroml/attacks/instance_based_attack.py Outdated

Comment thread tests/attacks/test_instance_based_attack.py Outdated

Comment thread sacroml/attacks/instance_based_attack.py Outdated

JessUWE and others added 2 commits June 16, 2026 12:47

fix: reindex record-level results by training record in InstanceBased…

048f499

…Attack

Merge branch 'main' into new-attack-model

a0c25cd

JessUWE requested a review from jim-smith June 17, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: new instance-based attack for data leakage in SVM/kNN models#431

feat: new instance-based attack for data leakage in SVM/kNN models#431
shamykyzer wants to merge 20 commits into
mainfrom
new-attack-model

shamykyzer commented Mar 27, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jim-smith left a comment •

edited

Loading

Uh oh!

shamykyzer commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jim-smith left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shamykyzer commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jim-smith left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shamykyzer commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jim-smith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shamykyzer commented Mar 27, 2026 •

edited

Loading

codecov Bot commented Mar 27, 2026 •

edited

Loading

jim-smith left a comment •

edited

Loading