[REVIEW] ai-data-privacy: add AI data deletion propagation gates

## Skill Being Reviewed
**Skill name:** `ai-data-privacy`
**Skill path:** `skills/ai-security/ai-data-privacy/`

## False Positive Analysis

**Benign-looking deletion control that can be over-credited:**
```yaml
privacy:
  dsar_endpoint: /privacy/delete
  primary_store_delete: true
rag:
  vector_store: pgvector
  embedding_ttl_days: 365
analytics:
  prompt_logs: retained
backups:
  retention_days: 90
```

**Why this is a false positive:**

The system can delete the primary user record, but the review does not prove that deletion propagates to embeddings, vector-store replicas, cached retrieval chunks, prompt/completion logs, training dataset snapshots, fine-tuning artifacts, analytics exports, and backups. A DSAR endpoint can exist while AI-derived copies of personal data persist and remain retrievable.

## Coverage Gaps

**Missed variant 1: source document deletion does not remove embeddings**

The source document is deleted, but vector rows, chunk text, and search indexes remain available to RAG retrieval.

**Missed variant 2: training snapshots retain opted-out data**

Consent withdrawal removes new data from ingestion, but already-created fine-tuning datasets and model checkpoints are not flagged for retraining, unlearning, or exclusion.

**Missed variant 3: analytics and backup stores extend retention**

Prompt/completion logs, BI exports, and backup systems retain PII beyond the primary AI store's retention period.

## Edge Cases

- Deleting embeddings may require re-indexing or tombstoning if physical deletion is asynchronous.
- Legal hold can override deletion but must be documented with scope, authority, and expiration.
- Provider-hosted LLM retention and zero-data-retention settings need separate evidence from first-party stores.

## Remediation Quality

- [x] Fix resolves the vulnerability
- [x] Fix doesn't introduce new security issues
- [x] Fix doesn't break functionality
- **Issues found:** Add deletion propagation evidence gates for source data, embeddings, vector indexes, logs, training snapshots, model artifacts, analytics exports, backups, provider retention, and legal holds.

## Comparison to Other Tools

| Tool | Catches this? | Notes |
|------|:---:|-------|
| DSAR workflow tools | Partial | Usually track primary application records, not all AI-derived data stores. |
| Data catalogs | Partial | Can inventory assets, but reviewers must verify propagation and deletion proofs. |
| Vector DB TTLs | Partial | May expire records eventually, but DSARs require targeted propagation and evidence. |

## Overall Assessment

**Strengths:** Strong privacy lifecycle coverage for training data, prompt/completion PII, retention, memorization, EU AI Act, and consent.

**Needs improvement:** Add operational deletion propagation evidence so reviewers can distinguish a DSAR endpoint from actual removal across AI-specific derived stores.

**Priority recommendations:**
1. Add a deletion propagation evidence checklist under data retention or consent.
2. Require source-to-derived mapping for embeddings, chunks, logs, snapshots, model artifacts, backups, analytics, and third-party provider stores.
3. Add output fields for propagation status, proof artifact, residual data risk, legal hold, and re-index/retrain/unlearning decision.

## Sources Checked

- GDPR Article 17: https://gdpr-info.eu/art-17-gdpr/
- NIST AI RMF 1.0: https://www.nist.gov/itl/ai-risk-management-framework
- OWASP Top 10 for LLM Applications 2025: https://owasp.org/www-project-top-10-for-large-language-model-applications/

## Bounty Info
- [x] I have read and agree to the [CONTRIBUTING.md](../../CONTRIBUTING.md) bounty terms
- **Preferred payment method:** GitHub Sponsors


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] ai-data-privacy: add AI data deletion propagation gates #1382

Skill Being Reviewed

False Positive Analysis

Coverage Gaps

Edge Cases

Remediation Quality

Comparison to Other Tools

Overall Assessment

Sources Checked

Bounty Info

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tool	Catches this?	Notes
DSAR workflow tools	Partial	Usually track primary application records, not all AI-derived data stores.
Data catalogs	Partial	Can inventory assets, but reviewers must verify propagation and deletion proofs.
Vector DB TTLs	Partial	May expire records eventually, but DSARs require targeted propagation and evidence.

[REVIEW] ai-data-privacy: add AI data deletion propagation gates #1382

Description

Skill Being Reviewed

False Positive Analysis

Coverage Gaps

Edge Cases

Remediation Quality

Comparison to Other Tools

Overall Assessment

Sources Checked

Bounty Info

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions