You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe. test_qmia_predicts_canaries in tests/attacks/test_qmia_attack.py (added in #435) is qmia-specific. lira and worstcase have no equivalent positive-signal test that verifies the attack actually flags deliberately-vulnerable records. duplicating the canary scaffolding (data generation, 9-nn boundary selection, label flip, target build) per attack would be churn, and per-attack thresholds would drift out of sync over time.
Describe the solution you'd like
follow-up from #435: @jim-smith suggested lifting test_qmia_predicts_canaries into a generic test_attack_predicts_canaries(attack_cls, tmp_path) parametrised across [QMIAAttack, LIRAAttack, WorstCaseAttack].
Describe alternatives you've considered
leave it qmia-specific and write separate canary tests for each attack ad hoc. rejected: duplicates the canary scaffolding, encourages thresholds to drift, and means the next attack added to the project starts without a positive-signal test.
Additional context
notes from the qmia version that may matter for lira/worstcase:
bootstrap=False was needed for the rf to actually memorise the canaries; default rf (bootstrap=True) dilutes the signal across trees that never saw the canary.
ranking canaries within the training slice was bimodal across seeds because the qmia quantile threshold extrapolates at rare label/location combos. the stable metric was canary-vs-test auc.
would consider extracting the canary fixture into a shared helper, then pytest.parametrize over the three attacks with per-attack auc thresholds - since shadow-model attacks will likely need looser bounds than qmia.
Is your feature request related to a problem? Please describe.
test_qmia_predicts_canariesintests/attacks/test_qmia_attack.py(added in #435) is qmia-specific. lira and worstcase have no equivalent positive-signal test that verifies the attack actually flags deliberately-vulnerable records. duplicating the canary scaffolding (data generation, 9-nn boundary selection, label flip, target build) per attack would be churn, and per-attack thresholds would drift out of sync over time.Describe the solution you'd like
follow-up from #435: @jim-smith suggested lifting
test_qmia_predicts_canariesinto a generictest_attack_predicts_canaries(attack_cls, tmp_path)parametrised across[QMIAAttack, LIRAAttack, WorstCaseAttack].Describe alternatives you've considered
leave it qmia-specific and write separate canary tests for each attack ad hoc. rejected: duplicates the canary scaffolding, encourages thresholds to drift, and means the next attack added to the project starts without a positive-signal test.
Additional context
notes from the qmia version that may matter for lira/worstcase:
bootstrap=Falsewas needed for the rf to actually memorise the canaries; default rf (bootstrap=True) dilutes the signal across trees that never saw the canary.would consider extracting the canary fixture into a shared helper, then
pytest.parametrizeover the three attacks with per-attack auc thresholds - since shadow-model attacks will likely need looser bounds than qmia.