UnitOneAI · catcherintheroad-hub · Jun 6, 2026
diff --git a/skills/ai-security/ai-data-privacy/SKILL.md b/skills/ai-security/ai-data-privacy/SKILL.md
@@ -240,6 +240,60 @@ Grep: "backup|snapshot|archive" in **/*.{yaml,yml,json,toml}
 
 ---
 
+### Step 3A -- Deletion Propagation Evidence
+
+Assess whether deletion, erasure, consent withdrawal, and retention expiry propagate from primary records into AI-specific derived stores. A DSAR endpoint or primary database delete is incomplete if embeddings, vector indexes, prompt logs, training snapshots, model artifacts, analytics exports, or backups can still retain or retrieve the personal data.
+
+**What to look for in code and configuration:**
+
+- **Source-to-derived mapping:** Can the system map a data subject, source document, or conversation ID to all derived chunks, embeddings, prompts, completions, training examples, evaluation examples, analytics exports, and backups?
+- **Vector store deletion:** When source content is deleted or access is revoked, are vector rows, chunk text, metadata filters, replicas, and retrieval caches tombstoned or physically removed?
+- **Training data snapshots:** Does deletion or consent withdrawal mark existing fine-tuning datasets, model checkpoints, adapters, and evaluation sets for exclusion, retraining, unlearning, or documented risk acceptance?
+- **Provider retention:** Are third-party LLM, embedding, logging, and analytics provider retention settings documented, including zero-data-retention or no-training configurations where applicable?
+- **Backup and archive handling:** Do backups, object-store versions, warehouse exports, and BI extracts have aligned retention, deletion windows, and legal-hold handling?
+- **Proof of propagation:** Does the DSAR workflow produce evidence that each downstream store was deleted, tombstoned, expired, or placed under a documented legal hold?
+
+**Detection methods using allowed tools:**
+
+```
+# Find deletion and DSAR workflow code
+Grep: "dsar|delete_request|erasure|right_to_delete|forget|consent_withdraw" in **/*.{py,ts,js,yaml,yml,json,md}
+Grep: "delete|tombstone|purge|reindex|invalidate|remove_embedding" in **/*.{py,ts,js,yaml,yml,json}
+
+# Find AI-derived stores that need propagation
+Grep: "embedding|vector|chunk|retrieval_cache|prompt_log|completion_log|fine_tune|checkpoint|dataset_snapshot" in **/*.{py,ts,js,yaml,yml,json,md}
+Grep: "backup|snapshot|archive|warehouse|analytics|export|legal_hold" in **/*.{py,sh,yaml,yml,json,toml,md}
+```
+
+**Deletion propagation evidence checklist:**
+
+| Store Type | Required Evidence | Common Violation |
+|---|---|---|
+| Source documents | Source IDs linked to data subject and retention basis | Document deleted without derived-store mapping |
+| Embeddings/vector indexes | Chunk IDs, vector IDs, metadata filters, replicas, and cache invalidation status | Source deleted but embeddings remain searchable |
+| Prompt/completion logs | Redaction, deletion, or retention exemption with access controls | Full prompts retained in observability tools |
+| Training snapshots | Dataset version, affected records, retraining/unlearning decision, and exclusion proof | Opted-out data remains in fine-tuning snapshot |
+| Model artifacts | Memorization risk assessment or retrain/unlearn decision when training data is removed | Model treated as unrelated to deletion request |
+| Analytics exports | Warehouse/table/export deletion status and retention window | BI exports outlive primary deletion |
+| Backups/archives | Restoration guardrails, deletion-on-restore process, legal-hold scope and expiry | Backup retention silently extends personal data retention |
+| Third-party providers | DPA, retention configuration, no-training setting, deletion confirmation | Provider retention assumed but not evidenced |
+
+**What constitutes a finding:**
+
+| Condition | Severity |
+|---|---|
+| DSAR or deletion workflow cannot map primary data to AI-derived embeddings, logs, or training snapshots | High |
+| Source deletion does not delete or tombstone vector store chunks and embeddings | High |
+| Consent withdrawal does not affect existing fine-tuning datasets or model artifact decisions | High |
+| Third-party LLM or embedding provider retention settings are undocumented | High |
+| Backup restore can resurrect deleted AI data without reapplying deletion ledger | Medium |
+| Analytics exports retain PII beyond primary retention without justification | Medium |
+| Legal holds lack scope, authority, and expiration metadata | Medium |
+
+**False positive to avoid:** Do not mark deletion compliance as pass because the primary application record can be deleted. Confirm propagation evidence for every AI-derived store and document residual risk where physical deletion is delayed, legally blocked, or technically infeasible.
+
+---
+
 ### Step 4 -- Model Memorization Risk Assessment
 
 Evaluate the risk that models deployed in the system have memorized and can reproduce personal data from their training corpus.
@@ -408,10 +462,16 @@ Grep: "consent_check|is_consented|has_consent|filter_consented|exclude_opted_out
 [Description or reference to diagram showing personal data flows through AI components:
 user input -> prompt assembly -> LLM API -> completion -> output -> logging/storage]
 
+## Deletion Propagation Evidence
+
+| Data Subject / Source | Derived Stores | Embeddings Deleted | Logs Redacted/Deleted | Training Snapshot Action | Provider Retention Evidence | Backup / Legal Hold Status | Residual Risk |
+|---|---|---|---|---|---|---|---|
+| [subject/source ID] | [vector/logs/datasets/etc.] | [Yes/No/N/A] | [Yes/No/N/A] | [exclude/retrain/unlearn/accept] | [DPA/config/confirmation] | [status/expiry] | [Low/Medium/High] |
+
 ## Findings
 
 ### Finding [N]: [Title]
-- **Category:** [Training Data | Prompt/Completion PII | Data Retention | Memorization | EU AI Act | Consent]
+- **Category:** [Training Data | Prompt/Completion PII | Data Retention | Deletion Propagation | Memorization | EU AI Act | Consent]
 - **Severity:** [Critical | High | Medium | Low | Informational]
 - **OWASP LLM Category:** LLM02:2025 -- Sensitive Information Disclosure
 - **NIST AI RMF Function:** [GOVERN | MAP | MEASURE | MANAGE] [subcategory]
@@ -430,6 +490,7 @@ user input -> prompt assembly -> LLM API -> completion -> output -> logging/stor
 | Training data privacy | [Yes/Partial/No] | [description] | [severity] |
 | PII in prompts/completions | [Yes/Partial/No] | [description] | [severity] |
 | Data retention | [Yes/Partial/No] | [description] | [severity] |
+| Deletion propagation | [Yes/Partial/No] | [description] | [severity] |
 | Memorization risk | [Yes/Partial/No] | [description] | [severity] |
 | EU AI Act compliance | [Yes/Partial/No/N/A] | [description] | [severity] |
 | Consent management | [Yes/Partial/No] | [description] | [severity] |

diff --git a/skills/ai-security/ai-data-privacy/tests/deletion-propagation-edge-cases.md b/skills/ai-security/ai-data-privacy/tests/deletion-propagation-edge-cases.md
@@ -0,0 +1,109 @@
+# Deletion Propagation Edge Cases
+
+These fixtures validate AI data privacy review behavior for deletion, erasure, consent withdrawal, and retention expiry across AI-derived stores.
+
+## Case 1: DSAR Deletes Primary User Only
+
+```yaml
+dsar:
+  endpoint: DELETE /privacy/users/{id}
+  deletes:
+    - users
+    - profiles
+  not_mapped:
+    - conversation_logs
+    - prompt_logs
+    - vector_chunks
+    - embeddings
+    - analytics_exports
+```
+
+**Expected result:** High severity finding.
+
+**Reason:** The workflow deletes primary records but cannot prove removal from AI-derived stores that may still contain personal data.
+
+## Case 2: Source Document Deleted, Embeddings Remain Searchable
+
+```yaml
+rag:
+  source_document:
+    id: doc-123
+    deleted: true
+  vector_store:
+    chunks:
+      - id: chunk-123-a
+        source_id: doc-123
+        text_retained: true
+        embedding_retained: true
+    retrieval_cache:
+      invalidated: false
+```
+
+**Expected result:** High severity finding.
+
+**Reason:** Deleted source content can still be retrieved through chunk text, embeddings, or cache entries.
+
+## Case 3: Consent Withdrawal Does Not Affect Training Snapshots
+
+```yaml
+consent:
+  user_id: user-77
+  ai_training_opt_out: true
+  changed_at: "2026-06-06"
+training_data:
+  snapshots:
+    - id: ft-2026-05-01
+      contains_user_id: user-77
+      excluded_after_withdrawal: false
+model_artifacts:
+  retraining_decision: none
+  unlearning_decision: none
+```
+
+**Expected result:** High severity finding.
+
+**Reason:** Consent withdrawal is not propagated to existing fine-tuning data or model artifact risk decisions.
+
+## Case 4: Complete Propagation Ledger
+
+```yaml
+deletion_ledger:
+  request_id: dsar-456
+  subject_id: user-77
+  source_records:
+    deleted: true
+  embeddings:
+    vector_ids:
+      - vec-1
+      - vec-2
+    tombstoned: true
+    retrieval_cache_invalidated: true
+    reindexed_at: "2026-06-06T10:00:00Z"
+  prompt_logs:
+    redacted: true
+    retention_exception: none
+  training_snapshots:
+    affected:
+      - ft-2026-05-01
+    action: exclude_from_next_training
+    model_risk_decision: retrain_not_required_low_memorization_risk
+  analytics_exports:
+    purged: true
+  provider_retention:
+    llm_api: zero_data_retention_enabled
+    embedding_api: deletion_confirmed
+  backups:
+    restore_guardrail: reapply_deletion_ledger
+    legal_hold: none
+```
+
+**Expected result:** Pass for deletion propagation evidence if implementation evidence matches the ledger.
+
+**Reason:** The workflow maps primary records to derived stores, deletes or redacts each downstream copy, handles provider retention, and prevents backup restore from resurrecting deleted data.
+
+## Review Assertions
+
+- Do not credit a DSAR endpoint unless derived AI stores are mapped.
+- Confirm vector chunks, embeddings, metadata filters, replicas, and caches are deleted or tombstoned.
+- Confirm consent withdrawal affects existing training snapshots and model artifact decisions.
+- Confirm backup restore procedures reapply the deletion ledger.