Skip to content

Commit fd7f10b

Browse files
authored
Merge pull request #98 from igerber/feature/improve-pr-workflow
Add pre-merge-check skill and improve PR review workflow
2 parents 14bc71f + 608d83c commit fd7f10b

3 files changed

Lines changed: 329 additions & 6 deletions

File tree

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
---
2+
description: Run pre-merge checks before submitting a PR
3+
argument-hint: ""
4+
---
5+
6+
# Pre-Merge Check
7+
8+
Run automated checks and display the pre-merge checklist before submitting a PR.
9+
10+
## Instructions
11+
12+
### 1. Identify Changed Files
13+
14+
Get the list of files that will be included in the PR:
15+
16+
```bash
17+
# Get all changed files (tracked modifications + staged + untracked)
18+
git diff --name-only HEAD
19+
git diff --cached --name-only
20+
git ls-files --others --exclude-standard
21+
```
22+
23+
Categorize files into:
24+
- **Methodology files**: `diff_diff/*.py` (excluding `__init__.py`)
25+
- **Test files**: `tests/*.py`
26+
- **Documentation files**: `*.md`, `*.rst`, `docs/**`
27+
28+
### 2. Run Automated Pattern Checks
29+
30+
#### 2.1 NaN Handling Pattern Check (for methodology files)
31+
32+
If any methodology files changed, run these pattern checks:
33+
34+
```bash
35+
# Check for bad t-stat pattern (should use np.nan, not 0.0 or 0)
36+
grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py || true
37+
38+
# Check for potential division issues without NaN handling
39+
grep -E -n "/[[:space:]]*se\>" diff_diff/*.py | grep -v "np.nan" | grep -v "#" || true
40+
```
41+
42+
**Report findings**: If matches found, warn about potential NaN handling issues.
43+
44+
#### 2.2 Test Existence Check
45+
46+
For each changed methodology file, check if corresponding test file was also changed:
47+
48+
| Methodology File | Expected Test File |
49+
|------------------|-------------------|
50+
| `diff_diff/staggered.py` | `tests/test_staggered.py` |
51+
| `diff_diff/estimators.py` | `tests/test_estimators.py` |
52+
| `diff_diff/twfe.py` | `tests/test_estimators.py` |
53+
| `diff_diff/synthetic_did.py` | `tests/test_estimators.py` |
54+
| `diff_diff/sun_abraham.py` | `tests/test_sun_abraham.py` |
55+
| `diff_diff/triple_diff.py` | `tests/test_triple_diff.py` |
56+
| `diff_diff/trop.py` | `tests/test_trop.py` |
57+
| `diff_diff/bacon.py` | `tests/test_bacon.py` |
58+
| `diff_diff/linalg.py` | `tests/test_linalg.py` |
59+
| `diff_diff/utils.py` | `tests/test_utils.py` |
60+
| `diff_diff/diagnostics.py` | `tests/test_diagnostics.py` |
61+
| `diff_diff/prep.py` | `tests/test_prep.py` |
62+
| `diff_diff/visualization.py` | `tests/test_visualization.py` |
63+
| `diff_diff/honest_did.py` | `tests/test_honest_did.py` |
64+
| `diff_diff/power.py` | `tests/test_power.py` |
65+
| `diff_diff/pretrends.py` | `tests/test_pretrends.py` |
66+
67+
**Report**: List any methodology files without corresponding test changes.
68+
69+
#### 2.3 Docstring Check (heuristic)
70+
71+
For changed `.py` files, run a quick check for functions without docstrings:
72+
```bash
73+
# Find public function definitions (heuristic check)
74+
grep -n "^def [^_]" <changed-py-files> | head -10
75+
grep -n "^ def [^_]" <changed-py-files> | head -10
76+
```
77+
78+
Note: This is a heuristic. Verify docstrings exist for new public functions.
79+
80+
### 3. Display Context-Specific Checklist
81+
82+
Based on what changed, display the appropriate checklist items:
83+
84+
#### Always Show (Core Checklist)
85+
```
86+
## Pre-Merge Checklist
87+
88+
Based on your changes to: <list of changed files>
89+
90+
### Behavioral Completeness
91+
- [ ] Happy path tested
92+
- [ ] Edge cases tested (empty data, NaN inputs, boundary conditions)
93+
- [ ] Error/warning paths tested with behavioral assertions
94+
```
95+
96+
#### If Methodology Files Changed
97+
```
98+
### Inference Field Consistency
99+
- [ ] If SE can be 0/undefined, ALL inference fields (t-stat, p-value, CI) return NaN
100+
- [ ] Aggregation methods propagate NaN correctly
101+
- [ ] Bootstrap methods handle NaN in base estimates
102+
103+
### Control Group Logic (if adding new modes/code paths)
104+
- [ ] Control group composition verified for new code paths
105+
- [ ] "Not-yet-treated" excludes the treatment cohort itself
106+
- [ ] Parameter interactions tested with all aggregation methods
107+
```
108+
109+
#### If Documentation Files Changed
110+
```
111+
### Documentation Sync
112+
- [ ] Docstrings updated for changed function signatures
113+
- [ ] README updated if user-facing behavior changes
114+
- [ ] REGISTRY.md updated if methodology edge cases change
115+
```
116+
117+
#### If This Appears to Be a Bug Fix
118+
```
119+
### Pattern Consistency (Bug Fix)
120+
- [ ] Grepped for similar patterns across codebase before fixing
121+
- [ ] Fixed ALL occurrences, not just the one that was reported
122+
- [ ] Verified fix doesn't break other code paths
123+
```
124+
125+
### 4. Ask About Running Tests
126+
127+
Use AskUserQuestion to offer test options:
128+
129+
```
130+
Would you like to run tests now?
131+
132+
Options:
133+
1. Yes - run full test suite (pytest)
134+
2. Yes - run only tests for changed files
135+
3. No - skip tests for now
136+
```
137+
138+
If user chooses option 1:
139+
```bash
140+
pytest
141+
```
142+
143+
If user chooses option 2, run targeted tests based on changed files:
144+
```bash
145+
pytest tests/test_staggered.py tests/test_estimators.py # (example)
146+
```
147+
148+
### 5. Report Summary
149+
150+
```
151+
## Pre-Merge Check Complete
152+
153+
### Automated Checks
154+
- Pattern checks: [PASS/WARN - N potential issues found]
155+
- Test coverage: [PASS/WARN - N methodology files without test changes]
156+
157+
### Manual Checklist
158+
Review the checklist items above before running /submit-pr
159+
160+
### Findings to Address
161+
<list any warnings from pattern checks>
162+
163+
### Next Steps
164+
- Address any warnings above
165+
- Complete manual checklist items
166+
- When ready: /submit-pr "Your PR title"
167+
```
168+
169+
## Notes
170+
171+
- This skill is read-only - it only analyzes and reports, doesn't modify files
172+
- Run this BEFORE `/submit-pr` to catch issues early
173+
- Pattern checks are heuristics - review flagged items manually to confirm
174+
- If pattern checks find issues, fix them before submitting

.github/codex/prompts/pr_review.md

Lines changed: 38 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,44 @@ TOP PRIORITY: Methodology adherence to source material.
88
3) Flag any mismatch, missing assumption check, incorrect variance/SE, or undocumented deviation as P0/P1.
99

1010
SECONDARY PRIORITIES (in order):
11-
2) Code quality
12-
3) Performance
13-
4) Maintainability
14-
5) Minimization of tech debt
15-
6) Security (including accidental secrets)
16-
7) Documentation + tests
11+
2) Edge case coverage (see checklist below)
12+
3) Code quality
13+
4) Performance
14+
5) Maintainability
15+
6) Minimization of tech debt
16+
7) Security (including accidental secrets)
17+
8) Documentation + tests
18+
19+
## Edge Case Review (learned from PR #97 analysis)
20+
21+
When reviewing new features or code paths, specifically check:
22+
23+
1. **Empty Result Sets**:
24+
- Does the code handle when filters produce no matching data?
25+
- Example: `base_period="varying"` with no valid pre-treatment periods
26+
- Flag as P1 if new code paths lack empty-data handling
27+
28+
2. **NaN/Inf Propagation**:
29+
- If SE can be 0 or undefined, are ALL inference fields (t-stat, p-value, CI) set to NaN?
30+
- Search for patterns: `if se > 0 else 0.0` → should be `else np.nan`
31+
- Check ALL occurrences of this pattern in affected files
32+
- Flag as P0 if statistical output could be misleading (e.g., t_stat=0.0 instead of NaN)
33+
34+
3. **Parameter Interactions**:
35+
- Does new parameter interact correctly with all aggregation methods?
36+
- Does new parameter interact correctly with bootstrap/inference?
37+
- Example: `anticipation` parameter must affect group aggregation filtering
38+
- Flag as P1 if new parameter isn't tested with all existing code paths
39+
40+
4. **Control/Comparison Group Logic**:
41+
- For new code paths, is the control group defined correctly?
42+
- Example: "not-yet-treated" should exclude the treatment cohort itself
43+
- Flag as P0 if control group composition could bias estimates
44+
45+
5. **Pattern Consistency**:
46+
- If the PR fixes a pattern bug, verify ALL occurrences were fixed
47+
- Command to check: `grep -n "pattern" diff_diff/*.py`
48+
- Flag as P1 if only partial fixes were made
1749

1850
Rules:
1951
- Review ONLY the changes introduced by this PR (diff) and the minimum surrounding context needed.

CLAUDE.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -514,3 +514,120 @@ When adding code that emits warnings or handles errors:
514514
4. **Protect arithmetic operations**:
515515
- [ ] Wrap ALL related operations in `np.errstate()`, not just the final one
516516
- [ ] Include division, matrix multiplication, and any operation that can overflow/underflow
517+
518+
### Reviewing New Features or Code Paths
519+
520+
When reviewing PRs that add new features, modes, or code paths (learned from PR #97 analysis):
521+
522+
1. **Edge Case Coverage**:
523+
- [ ] Empty result sets (no matching data for a filter condition)
524+
- [ ] NaN/Inf propagation through ALL inference fields (SE, t-stat, p-value, CI)
525+
- [ ] Parameter interactions (e.g., new param × existing aggregation methods)
526+
- [ ] Control/comparison group composition for all code paths
527+
528+
2. **Documentation Completeness**:
529+
- [ ] All new parameters have docstrings with type, default, and description
530+
- [ ] Methodology docs match implementation behavior (equations, edge cases)
531+
- [ ] Edge cases documented in `docs/methodology/REGISTRY.md`
532+
533+
3. **Logic Audit for New Code Paths**:
534+
- [ ] When adding new modes (like `base_period="varying"`), trace ALL downstream effects
535+
- [ ] Check aggregation methods handle the new mode correctly
536+
- [ ] Check bootstrap/inference methods handle the new mode correctly
537+
- [ ] Explicitly test control group composition in new code paths
538+
539+
4. **Pattern Consistency**:
540+
- [ ] Search for similar patterns in codebase (e.g., `t_stat = x / se if se > 0 else ...`)
541+
- [ ] Ensure new code follows established patterns or updates ALL instances
542+
- [ ] If fixing a pattern, grep for ALL occurrences first:
543+
```bash
544+
grep -n "if.*se.*> 0.*else" diff_diff/*.py
545+
```
546+
547+
### Fixing Bugs Across Multiple Locations
548+
549+
When a bug fix involves a pattern that appears in multiple places (learned from PR #97 analysis):
550+
551+
1. **Find All Instances First**:
552+
- [ ] Use grep/search to find ALL occurrences of the pattern before fixing
553+
- [ ] Document the locations found (file:line)
554+
- [ ] Example: `t_stat = effect / se if se > 0 else 0.0` appeared in 7 locations
555+
556+
2. **Fix Comprehensively in One Round**:
557+
- [ ] Fix ALL instances in the same PR/commit
558+
- [ ] Create a test that covers all locations
559+
- [ ] Don't fix incrementally across multiple review rounds
560+
561+
3. **Regression Test the Fix**:
562+
- [ ] Verify fix doesn't break other code paths
563+
- [ ] For early-return fixes: ensure downstream code still runs when needed
564+
- [ ] Example: Bootstrap early return must still compute per-effect SEs
565+
566+
4. **Common Patterns to Watch For**:
567+
- `if se > 0 else 0.0` → should be `else np.nan` for undefined statistics
568+
- `if len(data) > 0 else return` → check what downstream code expects
569+
- `mask = (condition)` → verify mask logic for all parameter combinations
570+
571+
### Pre-Merge Review Checklist
572+
573+
Final checklist before approving a PR:
574+
575+
1. **Behavioral Completeness**:
576+
- [ ] Happy path tested
577+
- [ ] Edge cases tested (empty data, NaN inputs, boundary conditions)
578+
- [ ] Error/warning paths tested with behavioral assertions
579+
580+
2. **Inference Field Consistency**:
581+
- [ ] If one inference field (SE, t-stat, p-value) can be NaN, all related fields handle it
582+
- [ ] Aggregation methods propagate NaN correctly
583+
- [ ] Bootstrap methods handle NaN in base estimates
584+
585+
3. **Documentation Sync**:
586+
- [ ] Docstrings updated for all changed signatures
587+
- [ ] README updated if user-facing behavior changes
588+
- [ ] REGISTRY.md updated if methodology edge cases change
589+
590+
## Task Implementation Workflow
591+
592+
When implementing features or fixes, follow this workflow to ensure quality and catch issues early:
593+
594+
### Phase 1: Planning
595+
- Use `EnterPlanMode` for non-trivial tasks
596+
- Consult `docs/methodology/REGISTRY.md` for methodology-critical code
597+
- Identify all files and code paths that will be affected
598+
- For bug fixes: grep for the pattern first to find ALL occurrences
599+
600+
### Phase 2: Implementation
601+
- Follow the relevant development checklists above
602+
- Write tests alongside implementation (not after)
603+
- For bug fixes: fix ALL occurrences in the same commit
604+
- For new parameters: ensure all aggregation/bootstrap paths handle them
605+
606+
### Phase 3: Pre-Merge Review
607+
**Run `/pre-merge-check` before submitting**. This skill will:
608+
1. Run automated pattern checks on changed files (NaN handling, etc.)
609+
2. Check for missing test coverage
610+
3. Display context-specific checklist items based on what changed
611+
4. Optionally run the test suite
612+
613+
Address any warnings before proceeding.
614+
615+
### Phase 4: Submit
616+
- Use `/submit-pr` to create the PR
617+
- Automated AI review will run additional methodology and edge case checks
618+
- The PR template will prompt for methodology references if applicable
619+
620+
### Quick Reference: Common Patterns to Check
621+
622+
Before submitting methodology changes, verify these patterns:
623+
624+
```bash
625+
# Find potential NaN handling issues (should use np.nan, not 0.0)
626+
grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py
627+
628+
# Find all t_stat calculations to ensure consistency
629+
grep -n "t_stat.*=" diff_diff/*.py
630+
631+
# Find all inference field assignments
632+
grep -n "\(se\|t_stat\|p_value\|ci_lower\|ci_upper\).*=" diff_diff/*.py | head -30
633+
```

0 commit comments

Comments
 (0)