Skip to content

Commit 92df329

Browse files
committed
Refactor the rules group classification
1 parent 04859bf commit 92df329

28 files changed

Lines changed: 992 additions & 65 deletions

src/fastssv/rules/README.md

Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
# FastSSV Rules Architecture
2+
3+
This directory contains all OMOP CDM validation rules, organized by the type of issue they tackle.
4+
5+
## Directory Structure
6+
7+
```
8+
rules/
9+
├── concept_standardization/ # Standard, valid, and domain-appropriate concepts
10+
│ ├── standard_concept_enforcement.py
11+
│ ├── invalid_reason_enforcement.py
12+
│ ├── hierarchy_expansion.py
13+
│ └── domain_segregation.py
14+
15+
├── temporal/ # Temporal logic and observation period validation
16+
│ ├── observation_period_anchoring.py
17+
│ └── future_information_leakage.py
18+
19+
├── joins/ # Table relationships and join path validation
20+
│ ├── join_path_validation.py
21+
│ └── maps_to_direction.py
22+
23+
├── data_quality/ # Schema compliance and unmapped data handling
24+
│ ├── unmapped_concept_handling.py
25+
│ └── schema_validation.py
26+
27+
├── domain_specific/ # Table-specific validation rules
28+
│ └── measurement/
29+
│ └── measurement_unit_validation.py
30+
31+
└── anti_patterns/ # Common mistakes and anti-patterns
32+
├── no_string_identification.py
33+
├── concept_code_requires_vocabulary_id.py
34+
├── concept_lookup_context.py
35+
└── concept_name_lookup.py
36+
```
37+
38+
## Rule Categories
39+
40+
### 1. Concept Standardization (4 rules)
41+
**Purpose**: Ensures concepts are standard, valid, hierarchically complete, and domain-appropriate
42+
43+
- **standard_concept_enforcement**: Enforces `standard_concept = 'S'` or 'Maps to' relationship
44+
- **invalid_reason_enforcement**: Filters out deprecated concepts via `invalid_reason IS NULL`
45+
- **hierarchy_expansion**: Requires concept_ancestor for drug/condition hierarchy
46+
- **domain_segregation**: Ensures clinical tables join to concepts with correct domain_id
47+
48+
**When to use**: Any query that filters or joins on concept_id fields
49+
50+
---
51+
52+
### 2. Temporal (2 rules)
53+
**Purpose**: Validates temporal logic and prevents temporal bias in cohort studies
54+
55+
- **observation_period_anchoring**: Ensures temporal constraints are anchored to observation_period
56+
- **future_information_leakage**: Detects temporal bias from cross-table date comparisons
57+
58+
**When to use**: Queries with date filters, cohort definitions, or temporal windows
59+
60+
---
61+
62+
### 3. Joins (2 rules)
63+
**Purpose**: Ensures proper table relationships and join paths
64+
65+
- **join_path_validation**: Verifies concept/concept_relationship tables properly join to clinical tables
66+
- **maps_to_direction**: Checks 'Maps to' relationship direction (concept_id_1 → concept_id_2)
67+
68+
**When to use**: Queries joining vocabulary tables to clinical tables
69+
70+
---
71+
72+
### 4. Data Quality (2 rules)
73+
**Purpose**: Schema compliance and handling of unmapped/missing data
74+
75+
- **unmapped_concept_handling**: Warns when filtering by concept_id without handling concept_id = 0
76+
- **schema_validation**: Validates column references against OMOP CDM schema
77+
78+
**When to use**: All queries (foundational checks)
79+
80+
---
81+
82+
### 5. Domain Specific (1 rule)
83+
**Purpose**: Table-specific validation rules
84+
85+
#### Measurement
86+
- **measurement_unit_validation**: Ensures numeric measurement filters include unit_concept_id
87+
88+
**When to use**: Queries filtering measurement.value_as_number
89+
90+
**Future domains**: drug/, condition/, visit/, etc.
91+
92+
---
93+
94+
### 6. Anti-Patterns (4 rules)
95+
**Purpose**: Common mistakes and anti-patterns to avoid
96+
97+
- **no_string_identification**: Prevents string matching on *_source_value columns
98+
- **concept_code_requires_vocabulary_id**: Ensures concept_code filters include vocabulary_id
99+
- **concept_lookup_context**: Allows concept table string filters only in concept_id lookup contexts
100+
- **concept_name_lookup**: Warns against filtering by concept_name (unstable, non-unique)
101+
102+
**When to use**: Educational - catches common beginner mistakes
103+
104+
---
105+
106+
## How Rules Work
107+
108+
### Registration
109+
All rules use the `@register` decorator to auto-register themselves:
110+
111+
```python
112+
from fastssv.core.registry import register
113+
114+
@register
115+
class MyRule(Rule):
116+
rule_id = "category.my_rule"
117+
name = "My Rule"
118+
severity = Severity.ERROR
119+
120+
def validate(self, sql: str, dialect: str = "postgres") -> List[RuleViolation]:
121+
# Implementation
122+
pass
123+
```
124+
125+
### Accessing Rules
126+
```python
127+
from fastssv.rules import get_all_rules, get_rules_by_category
128+
129+
# Get all rules
130+
all_rules = get_all_rules()
131+
132+
# Get rules by category (legacy)
133+
semantic_rules = get_rules_by_category("semantic")
134+
vocabulary_rules = get_rules_by_category("vocabulary")
135+
```
136+
137+
### Running Validation
138+
```python
139+
from fastssv.rules import get_all_rules
140+
141+
sql = "SELECT * FROM condition_occurrence WHERE condition_source_value = 'E11.9'"
142+
143+
for rule_cls in get_all_rules():
144+
rule = rule_cls()
145+
violations = rule.validate(sql)
146+
for v in violations:
147+
print(f"[{v.severity}] {v.message}")
148+
```
149+
150+
---
151+
152+
## Adding New Rules
153+
154+
### Step 1: Choose the Right Category
155+
156+
- **Concept issues?**`concept_standardization/`
157+
- **Temporal issues?**`temporal/`
158+
- **Join issues?**`joins/`
159+
- **Data quality issues?**`data_quality/`
160+
- **Table-specific?**`domain_specific/{table}/`
161+
- **Common mistake?**`anti_patterns/`
162+
163+
### Step 2: Create the Rule File
164+
165+
```python
166+
"""My New Rule.
167+
168+
Brief description of what this rule validates and why it matters.
169+
"""
170+
171+
from typing import List
172+
from sqlglot import exp
173+
from fastssv.core.base import Rule, RuleViolation, Severity
174+
from fastssv.core.helpers import parse_sql
175+
from fastssv.core.registry import register
176+
177+
178+
@register
179+
class MyNewRule(Rule):
180+
"""One-line description."""
181+
182+
rule_id = "category.my_new_rule"
183+
name = "My New Rule"
184+
description = "Detailed description of what this rule checks"
185+
severity = Severity.ERROR # or Severity.WARNING
186+
suggested_fix = "How to fix violations"
187+
188+
def validate(self, sql: str, dialect: str = "postgres") -> List[RuleViolation]:
189+
"""Validate SQL and return list of violations."""
190+
violations = []
191+
192+
trees, error = parse_sql(sql, dialect)
193+
if error:
194+
return []
195+
196+
for tree in trees:
197+
# Your validation logic here
198+
pass
199+
200+
return violations
201+
202+
203+
__all__ = ["MyNewRule"]
204+
```
205+
206+
### Step 3: Update Category `__init__.py`
207+
208+
Add your rule to the category's `__init__.py`:
209+
210+
```python
211+
from .my_new_rule import MyNewRule
212+
213+
__all__ = [
214+
# ... existing rules
215+
"MyNewRule",
216+
]
217+
```
218+
219+
### Step 4: Test Your Rule
220+
221+
```python
222+
from fastssv.rules import get_all_rules
223+
224+
# Your rule should auto-register
225+
rules = get_all_rules()
226+
print(f"Total rules: {len(rules)}")
227+
228+
# Test with sample SQL
229+
test_sql = "..."
230+
for rule_cls in rules:
231+
if rule_cls.rule_id == "category.my_new_rule":
232+
rule = rule_cls()
233+
violations = rule.validate(test_sql)
234+
print(violations)
235+
```
236+
237+
---
238+
239+
## Migration Notes
240+
241+
### Before (Old Structure)
242+
```
243+
rules/
244+
├── semantic/ # Mixed semantic rules
245+
│ ├── standard_concept.py
246+
│ ├── unmapped_concept.py
247+
│ └── ...
248+
└── vocabulary/ # Mixed vocabulary rules
249+
├── no_string_id.py
250+
└── ...
251+
```
252+
253+
### After (New Structure)
254+
Rules are organized by **issue type**, not just "semantic" vs "vocabulary". This makes it easier to:
255+
- Find rules related to a specific problem
256+
- Add new rules to logical categories
257+
- Understand what each category validates
258+
259+
### Legacy Compatibility
260+
The old `semantic` and `vocabulary` categories still work via `get_rules_by_category()`, but new code should use the new structure.
261+
262+
---
263+
264+
## Related Documentation
265+
266+
- **Implementation Status**: See `/rules/IMPLEMENTATION_STATUS.md` for a checklist of all rules from `omop_rules.json`
267+
- **Rule Reference**: See `/rules/omop_rules.json` for the full list of planned rules
268+
- **Core API**: See `/src/fastssv/core/` for base classes and helpers
269+
270+
---
271+
272+
## Statistics
273+
274+
- **Total Rules**: 15 (as of 2025-03-13)
275+
- **Categories**: 6
276+
- **Coverage**: ~7-10% of omop_rules.json (350+ rules)
277+
- **Focus**: Critical semantic violations that lead to incorrect analytical results

src/fastssv/rules/__init__.py

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,25 @@
11
"""FastSSV validation rules module.
22
33
This module auto-imports all rule submodules to trigger rule registration.
4+
5+
Rules are organized by the type of issue they tackle:
6+
- concept_standardization: Rules for standard, valid, and domain-appropriate concepts
7+
- temporal: Rules for temporal logic and observation period validation
8+
- joins: Rules for proper table relationships and join paths
9+
- data_quality: Rules for schema compliance and unmapped data handling
10+
- domain_specific: Table-specific validation rules (measurement, drug, etc.)
11+
- anti_patterns: Common mistakes and anti-patterns to avoid
412
"""
513

614
# Import rule modules to trigger registration via @register decorator
7-
from . import semantic, vocabulary
15+
from . import (
16+
anti_patterns,
17+
concept_standardization,
18+
data_quality,
19+
domain_specific,
20+
joins,
21+
temporal,
22+
)
823

924
# For backward compatibility, also provide the legacy functions
1025
from fastssv.core.base import RuleViolation, Severity
@@ -65,8 +80,12 @@ def validate_omop_vocabulary_rules(sql: str, dialect: str = "postgres") -> list[
6580

6681
__all__ = [
6782
# Rule modules
68-
"semantic",
69-
"vocabulary",
83+
"concept_standardization",
84+
"temporal",
85+
"joins",
86+
"data_quality",
87+
"domain_specific",
88+
"anti_patterns",
7089
# Legacy functions
7190
"validate_omop_semantic_rules",
7291
"validate_omop_vocabulary_rules",
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
"""Anti-Pattern Rules.
2+
3+
Rules catching common mistakes and anti-patterns to avoid.
4+
"""
5+
6+
from .no_string_identification import NoStringIdentificationRule
7+
from .concept_code_requires_vocabulary_id import ConceptCodeRequiresVocabularyIdRule
8+
from .concept_lookup_context import ConceptLookupContextRule
9+
from .concept_name_lookup import ConceptNameLookupRule
10+
11+
__all__ = [
12+
"NoStringIdentificationRule",
13+
"ConceptCodeRequiresVocabularyIdRule",
14+
"ConceptLookupContextRule",
15+
"ConceptNameLookupRule",
16+
]

src/fastssv/rules/vocabulary/concept_code_vocab_id.py renamed to src/fastssv/rules/anti_patterns/concept_code_requires_vocabulary_id.py

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
"""Concept Standardization Rules.
2+
3+
Rules ensuring proper use of standard concepts, valid concepts,
4+
hierarchically complete concept sets, and domain-appropriate concepts.
5+
"""
6+
7+
from .standard_concept_enforcement import StandardConceptEnforcementRule
8+
from .invalid_reason_enforcement import InvalidReasonEnforcementRule
9+
from .hierarchy_expansion import HierarchyExpansionRule
10+
from .domain_segregation import DomainSegregationRule
11+
12+
__all__ = [
13+
"StandardConceptEnforcementRule",
14+
"InvalidReasonEnforcementRule",
15+
"HierarchyExpansionRule",
16+
"DomainSegregationRule",
17+
]

src/fastssv/rules/semantic/domain_segregation.py renamed to src/fastssv/rules/concept_standardization/domain_segregation.py

File renamed without changes.

src/fastssv/rules/semantic/hierarchy_expansion.py renamed to src/fastssv/rules/concept_standardization/hierarchy_expansion.py

File renamed without changes.

0 commit comments

Comments
 (0)