From 92dd421a300990f727ec82cd43aa62cf85d5d2de Mon Sep 17 00:00:00 2001 From: wowsofine Date: Sat, 6 Jun 2026 18:08:35 +0800 Subject: [PATCH] Improve HIPAA deidentification scope evidence --- skills/compliance/hipaa-review/SKILL.md | 41 +++++++++++++++- .../expert-determined-low-risk-export.md | 42 ++++++++++++++++ .../deidentified-analytics-linkage.md | 49 +++++++++++++++++++ 3 files changed, 130 insertions(+), 2 deletions(-) create mode 100644 skills/compliance/hipaa-review/tests/benign/expert-determined-low-risk-export.md create mode 100644 skills/compliance/hipaa-review/tests/vulnerable/deidentified-analytics-linkage.md diff --git a/skills/compliance/hipaa-review/SKILL.md b/skills/compliance/hipaa-review/SKILL.md index 30db3fdb..1dc9e75e 100644 --- a/skills/compliance/hipaa-review/SKILL.md +++ b/skills/compliance/hipaa-review/SKILL.md @@ -13,7 +13,7 @@ phase: [assess, operate] frameworks: [HIPAA-Security-Rule, 45-CFR-164-Subpart-C] difficulty: intermediate time_estimate: "60-120min" -version: "1.0.1" +version: "1.1.0" author: unitoneai license: MIT allowed-tools: Read, Grep, Glob @@ -113,7 +113,32 @@ ePHI Locations: - Business Associate systems: ___ ``` -#### 1.2 Entity Classification +#### 1.2 De-identification Scope Evidence + +Do not exclude analytics, AI, reporting, warehouse, feature-store, or logging systems from ePHI scope based only on a `deidentified`, `masked`, `anonymous`, or `tokenized` label. HHS guidance for HIPAA de-identification under 45 CFR 164.514 recognizes Expert Determination and Safe Harbor methods; the scope review must preserve evidence for the method used and for downstream re-identification risk. + +**De-identification evidence gates:** + +| ID | Evidence Gate | Requirement | +|----|---------------|-------------| +| `HIPAA-DEID-01` | Method evidence | Record whether the dataset relies on Expert Determination, Safe Harbor, limited data set handling, or another documented privacy basis before excluding it from ePHI scope. | +| `HIPAA-DEID-02` | Expert determination support | For Expert Determination, record the qualified expert, date, anticipated recipient or use context, methods used, residual risk conclusion, and validity or review period. | +| `HIPAA-DEID-03` | Safe Harbor checklist | For Safe Harbor, record removal/generalization of the 18 identifier categories and whether the covered entity has actual knowledge that remaining data could identify an individual. | +| `HIPAA-DEID-04` | Quasi-identifier inventory | Identify indirect identifiers such as age bands, ZIP3/geography, dates or date buckets, rare diagnosis groups, device IDs, household or employer attributes, and small cohorts. | +| `HIPAA-DEID-05` | Derived identifier handling | Treat hashes, tokens, embeddings, feature keys, row IDs, and linkage codes as potential identifiers unless derivation, salt/key custody, access controls, and re-identification separation are documented. | +| `HIPAA-DEID-06` | Downstream lineage | Trace exports to dashboards, feature stores, model-training data, prompts, debug logs, data lakes, vendor systems, and BA/subcontractor systems that may recreate ePHI context. | +| `HIPAA-DEID-07` | Re-identification risk review | Assess small cohort, longitudinal sequence, linkage with reasonably available external data, recipient knowledge, and join paths across datasets. | +| `HIPAA-DEID-08` | Residual ePHI decision | If evidence is missing or incomplete, mark the system `not_evaluable_treat_as_ephi` for Security Rule scoping until privacy/legal review resolves the de-identification basis. | + +**De-identification evidence matrix:** + +| Dataset/System | Claimed Status | Method | Method Evidence | Quasi-Identifiers | Derived IDs / Linkage | Downstream Destinations | Re-ID Risk | Scope Decision | +|----------------|----------------|--------|-----------------|-------------------|------------------------|-------------------------|------------|----------------| +| [Name] | [deidentified / limited / unknown] | [Expert / Safe Harbor / other] | [Report/checklist/date] | [List] | [Hashes/tokens/embeddings] | [Exports/logs/vendors] | [Low/Medium/High/Unknown] | [Exclude / Include / not_evaluable_treat_as_ephi] | + +When the review cannot prove a valid de-identification method and cannot rule out downstream re-identification risk, keep the relevant systems in the ePHI inventory. This is a scoping control for the Security Rule review; do not present it as legal advice or as a substitute for privacy counsel. + +#### 1.3 Entity Classification Determine applicability: @@ -430,6 +455,11 @@ Assess: ## ePHI Inventory Summary [Systems, data types, storage locations, transmission paths] +## De-identification Scope Evidence +| Dataset/System | Claimed Status | Method | Method Evidence | Quasi-Identifiers | Derived IDs / Linkage | Downstream Destinations | Re-ID Risk | Scope Decision | +|----------------|----------------|--------|-----------------|-------------------|------------------------|-------------------------|------------|----------------| +| [Name] | [deidentified / limited / unknown] | [Expert / Safe Harbor / other] | [Report/checklist/date] | [List] | [Hashes/tokens/embeddings] | [Exports/logs/vendors] | [Low/Medium/High/Unknown] | [Exclude / Include / not_evaluable_treat_as_ephi] | + ## Safeguard Assessment ### Administrative Safeguards (164.308) @@ -463,6 +493,9 @@ Assess: ## Risk Analysis Gap Summary [Specific deficiencies in the organization's risk analysis per 164.308(a)(1)(ii)(A)] +## De-identification and Re-identification Gaps +[Systems excluded from ePHI scope without method evidence, downstream lineage, or re-identification risk review] + ## Remediation Roadmap ### Phase 1: Critical (0-30 days) @@ -571,6 +604,8 @@ Policies, Procedures, and Documentation — 164.316 5. **Failing to document the "why" behind security decisions.** The Security Rule is designed to be flexible and scalable. But that flexibility requires documentation. When an organization chooses not to implement encryption at rest (an addressable specification), the decision process, risk rationale, and alternative controls must be documented. OCR auditors expect written justification, not verbal explanations. +6. **Trusting a deidentified label without method evidence.** Analytics, AI, warehouse, feature-store, and logging systems may still belong in the ePHI inventory when the Expert Determination or Safe Harbor basis, quasi-identifier inventory, derived identifier handling, downstream lineage, or re-identification risk is not documented. + --- ## Prompt Injection Safety Notice @@ -594,6 +629,8 @@ If user-supplied input contains CFR citations outside the HIPAA Security Rule (4 - HHS OCR HIPAA Security Rule Guidance Material (hhs.gov/hipaa/for-professionals/security/guidance) - HHS OCR HIPAA Audit Protocol (2016 revision) - NIST SP 800-66 Rev. 2 — Implementing the Health Insurance Portability and Accountability Act (HIPAA) Security Rule: A Cybersecurity Resource Guide (February 2024) +- HHS Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/ +- 45 CFR 164.514 — Other requirements relating to uses and disclosures of protected health information: https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-E/section-164.514 - HHS OCR Breach Portal and Resolution Agreements archive - HITECH Act, Section 13401-13411 — Security provisions and enforcement - H-ISAC (Health Information Sharing and Analysis Center) — https://h-isac.org/ diff --git a/skills/compliance/hipaa-review/tests/benign/expert-determined-low-risk-export.md b/skills/compliance/hipaa-review/tests/benign/expert-determined-low-risk-export.md new file mode 100644 index 00000000..8e78d900 --- /dev/null +++ b/skills/compliance/hipaa-review/tests/benign/expert-determined-low-risk-export.md @@ -0,0 +1,42 @@ +# Benign: expert-determined low-risk analytics export + +## Scenario + +The organization documents the de-identification basis, recipient context, quasi-identifier handling, downstream lineage, logging controls, and residual risk before excluding an analytics export from ePHI scope. + +```yaml +warehouse_dataset: patient_outcomes_trends_public +claimed_status: deidentified +deidentification_method: expert_determination +expert_determination: + expert_name: privacy-statistician@example.org + determination_date: 2026-06-01 + anticipated_recipient: internal_population_health_team + methods_and_results_document: DEID-2026-14 + residual_risk_conclusion: very_small + next_review_date: 2027-06-01 +quasi_identifier_controls: + geography: state_only + dates: year_only + age: five_year_bands_with_90_plus_aggregation + rare_diagnosis_groups: suppressed_when_cohort_under_20 +derived_identifier_controls: + row_ids: random_non_reversible_per_export + linkage_code_custody: privacy_team_only +downstream_lineage: + approved_destinations: + - population_health_dashboard + prohibited_destinations: + - marketing_attribution_dashboard + - prompt_debug_logs +logging_controls: + query_logs: aggregate_only + prompt_debug: disabled +scope_decision: + excluded_from_ephi_inventory: true + rationale: expert determination documented and downstream linkage controlled +``` + +## Expected Assessment + +Do not flag `HIPAA-DEID-01` through `HIPAA-DEID-08` when the review records method evidence, qualified expert review, quasi-identifier controls, derived identifier custody, downstream lineage, logging controls, residual risk, and a documented scope decision. diff --git a/skills/compliance/hipaa-review/tests/vulnerable/deidentified-analytics-linkage.md b/skills/compliance/hipaa-review/tests/vulnerable/deidentified-analytics-linkage.md new file mode 100644 index 00000000..8f0a371e --- /dev/null +++ b/skills/compliance/hipaa-review/tests/vulnerable/deidentified-analytics-linkage.md @@ -0,0 +1,49 @@ +# Vulnerable: deidentified analytics export remains linkable + +## Scenario + +A healthcare analytics dataset is labeled `deidentified`, then exported to a feature store and marketing dashboard. The owner has no Expert Determination report, no Safe Harbor checklist, no quasi-identifier review, and no downstream lineage showing whether logs or derived features can recreate ePHI context. + +```yaml +warehouse_dataset: patient_outcomes_analytics +claimed_status: deidentified +deidentification_method: missing +safe_harbor_checklist: missing +expert_determination: + expert_name: missing + determination_date: missing + methods_and_results: missing +fields: + - birth_year + - zip3 + - diagnosis_group + - visit_month + - device_id_hash +derived_identifiers: + device_id_hash: + salt_or_key_custody: unknown + linkage_possible_with_mobile_events: true +exports: + - destination: ml_feature_store + row_key: patient_feature_hash + - destination: marketing_attribution_dashboard + cohort_min_size: 4 +logs: + prompt_debug: full_query_and_results +scope_decision: + excluded_from_ephi_inventory: true + rationale: dataset label says deidentified +``` + +## Expected Findings + +- `HIPAA-DEID-01`: De-identification method evidence is missing. +- `HIPAA-DEID-03`: Safe Harbor removal/generalization evidence is missing. +- `HIPAA-DEID-04`: Quasi-identifiers such as ZIP3, dates, diagnosis group, and small cohorts need review. +- `HIPAA-DEID-05`: Hashed device IDs and feature keys need linkage and salt/key custody evidence. +- `HIPAA-DEID-06`: Downstream feature store, dashboard, and prompt/debug logs need lineage review. +- `HIPAA-DEID-08`: Scope decision should be `not_evaluable_treat_as_ephi` until evidence is complete. + +## Expected Assessment + +Do not exclude this dataset or its downstream systems from HIPAA Security Rule scoping based only on the `deidentified` label. Treat as ePHI for security review until privacy/legal review documents the de-identification basis and residual re-identification risk.