Skill Being Reviewed
Skill name: ai-data-privacy
Skill path: skills/ai-security/ai-data-privacy/
False Positive Analysis
Benign-looking deletion control that can be over-credited:
privacy:
dsar_endpoint: /privacy/delete
primary_store_delete: true
rag:
vector_store: pgvector
embedding_ttl_days: 365
analytics:
prompt_logs: retained
backups:
retention_days: 90
Why this is a false positive:
The system can delete the primary user record, but the review does not prove that deletion propagates to embeddings, vector-store replicas, cached retrieval chunks, prompt/completion logs, training dataset snapshots, fine-tuning artifacts, analytics exports, and backups. A DSAR endpoint can exist while AI-derived copies of personal data persist and remain retrievable.
Coverage Gaps
Missed variant 1: source document deletion does not remove embeddings
The source document is deleted, but vector rows, chunk text, and search indexes remain available to RAG retrieval.
Missed variant 2: training snapshots retain opted-out data
Consent withdrawal removes new data from ingestion, but already-created fine-tuning datasets and model checkpoints are not flagged for retraining, unlearning, or exclusion.
Missed variant 3: analytics and backup stores extend retention
Prompt/completion logs, BI exports, and backup systems retain PII beyond the primary AI store's retention period.
Edge Cases
- Deleting embeddings may require re-indexing or tombstoning if physical deletion is asynchronous.
- Legal hold can override deletion but must be documented with scope, authority, and expiration.
- Provider-hosted LLM retention and zero-data-retention settings need separate evidence from first-party stores.
Remediation Quality
Comparison to Other Tools
| Tool |
Catches this? |
Notes |
| DSAR workflow tools |
Partial |
Usually track primary application records, not all AI-derived data stores. |
| Data catalogs |
Partial |
Can inventory assets, but reviewers must verify propagation and deletion proofs. |
| Vector DB TTLs |
Partial |
May expire records eventually, but DSARs require targeted propagation and evidence. |
Overall Assessment
Strengths: Strong privacy lifecycle coverage for training data, prompt/completion PII, retention, memorization, EU AI Act, and consent.
Needs improvement: Add operational deletion propagation evidence so reviewers can distinguish a DSAR endpoint from actual removal across AI-specific derived stores.
Priority recommendations:
- Add a deletion propagation evidence checklist under data retention or consent.
- Require source-to-derived mapping for embeddings, chunks, logs, snapshots, model artifacts, backups, analytics, and third-party provider stores.
- Add output fields for propagation status, proof artifact, residual data risk, legal hold, and re-index/retrain/unlearning decision.
Sources Checked
Bounty Info
Skill Being Reviewed
Skill name:
ai-data-privacySkill path:
skills/ai-security/ai-data-privacy/False Positive Analysis
Benign-looking deletion control that can be over-credited:
Why this is a false positive:
The system can delete the primary user record, but the review does not prove that deletion propagates to embeddings, vector-store replicas, cached retrieval chunks, prompt/completion logs, training dataset snapshots, fine-tuning artifacts, analytics exports, and backups. A DSAR endpoint can exist while AI-derived copies of personal data persist and remain retrievable.
Coverage Gaps
Missed variant 1: source document deletion does not remove embeddings
The source document is deleted, but vector rows, chunk text, and search indexes remain available to RAG retrieval.
Missed variant 2: training snapshots retain opted-out data
Consent withdrawal removes new data from ingestion, but already-created fine-tuning datasets and model checkpoints are not flagged for retraining, unlearning, or exclusion.
Missed variant 3: analytics and backup stores extend retention
Prompt/completion logs, BI exports, and backup systems retain PII beyond the primary AI store's retention period.
Edge Cases
Remediation Quality
Comparison to Other Tools
Overall Assessment
Strengths: Strong privacy lifecycle coverage for training data, prompt/completion PII, retention, memorization, EU AI Act, and consent.
Needs improvement: Add operational deletion propagation evidence so reviewers can distinguish a DSAR endpoint from actual removal across AI-specific derived stores.
Priority recommendations:
Sources Checked
Bounty Info