Merge pull request #98 from igerber/feature/improve-pr-workflow

igerber · web-flow · commit fd7f10b1a2b2 · 2026-01-22T10:04:12.000-05:00
Add pre-merge-check skill and improve PR review workflow
diff --git a/.claude/commands/pre-merge-check.md b/.claude/commands/pre-merge-check.md
@@ -0,0 +1,174 @@
+---
+description: Run pre-merge checks before submitting a PR
+argument-hint: ""
+---
+
+# Pre-Merge Check
+
+Run automated checks and display the pre-merge checklist before submitting a PR.
+
+## Instructions
+
+### 1. Identify Changed Files
+
+Get the list of files that will be included in the PR:
+
+```bash
+# Get all changed files (tracked modifications + staged + untracked)
+git diff --name-only HEAD
+git diff --cached --name-only
+git ls-files --others --exclude-standard
+```
+
+Categorize files into:
+- **Methodology files**: `diff_diff/*.py` (excluding `__init__.py`)
+- **Test files**: `tests/*.py`
+- **Documentation files**: `*.md`, `*.rst`, `docs/**`
+
+### 2. Run Automated Pattern Checks
+
+#### 2.1 NaN Handling Pattern Check (for methodology files)
+
+If any methodology files changed, run these pattern checks:
+
+```bash
+# Check for bad t-stat pattern (should use np.nan, not 0.0 or 0)
+grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py || true
+
+# Check for potential division issues without NaN handling
+grep -E -n "/[[:space:]]*se\>" diff_diff/*.py | grep -v "np.nan" | grep -v "#" || true
+```
+
+**Report findings**: If matches found, warn about potential NaN handling issues.
+
+#### 2.2 Test Existence Check
+
+For each changed methodology file, check if corresponding test file was also changed:
+
+| Methodology File | Expected Test File |
+|------------------|-------------------|
+| `diff_diff/staggered.py` | `tests/test_staggered.py` |
+| `diff_diff/estimators.py` | `tests/test_estimators.py` |
+| `diff_diff/twfe.py` | `tests/test_estimators.py` |
+| `diff_diff/synthetic_did.py` | `tests/test_estimators.py` |
+| `diff_diff/sun_abraham.py` | `tests/test_sun_abraham.py` |
+| `diff_diff/triple_diff.py` | `tests/test_triple_diff.py` |
+| `diff_diff/trop.py` | `tests/test_trop.py` |
+| `diff_diff/bacon.py` | `tests/test_bacon.py` |
+| `diff_diff/linalg.py` | `tests/test_linalg.py` |
+| `diff_diff/utils.py` | `tests/test_utils.py` |
+| `diff_diff/diagnostics.py` | `tests/test_diagnostics.py` |
+| `diff_diff/prep.py` | `tests/test_prep.py` |
+| `diff_diff/visualization.py` | `tests/test_visualization.py` |
+| `diff_diff/honest_did.py` | `tests/test_honest_did.py` |
+| `diff_diff/power.py` | `tests/test_power.py` |
+| `diff_diff/pretrends.py` | `tests/test_pretrends.py` |
+
+**Report**: List any methodology files without corresponding test changes.
+
+#### 2.3 Docstring Check (heuristic)
+
+For changed `.py` files, run a quick check for functions without docstrings:
+```bash
+# Find public function definitions (heuristic check)
+grep -n "^def [^_]" <changed-py-files> | head -10
+grep -n "^    def [^_]" <changed-py-files> | head -10
+```
+
+Note: This is a heuristic. Verify docstrings exist for new public functions.
+
+### 3. Display Context-Specific Checklist
+
+Based on what changed, display the appropriate checklist items:
+
+#### Always Show (Core Checklist)
+```
+## Pre-Merge Checklist
+
+Based on your changes to: <list of changed files>
+
+### Behavioral Completeness
+- [ ] Happy path tested
+- [ ] Edge cases tested (empty data, NaN inputs, boundary conditions)
+- [ ] Error/warning paths tested with behavioral assertions
+```
+
+#### If Methodology Files Changed
+```
+### Inference Field Consistency
+- [ ] If SE can be 0/undefined, ALL inference fields (t-stat, p-value, CI) return NaN
+- [ ] Aggregation methods propagate NaN correctly
+- [ ] Bootstrap methods handle NaN in base estimates
+
+### Control Group Logic (if adding new modes/code paths)
+- [ ] Control group composition verified for new code paths
+- [ ] "Not-yet-treated" excludes the treatment cohort itself
+- [ ] Parameter interactions tested with all aggregation methods
+```
+
+#### If Documentation Files Changed
+```
+### Documentation Sync
+- [ ] Docstrings updated for changed function signatures
+- [ ] README updated if user-facing behavior changes
+- [ ] REGISTRY.md updated if methodology edge cases change
+```
+
+#### If This Appears to Be a Bug Fix
+```
+### Pattern Consistency (Bug Fix)
+- [ ] Grepped for similar patterns across codebase before fixing
+- [ ] Fixed ALL occurrences, not just the one that was reported
+- [ ] Verified fix doesn't break other code paths
+```
+
+### 4. Ask About Running Tests
+
+Use AskUserQuestion to offer test options:
+
+```
+Would you like to run tests now?
+
+Options:
+1. Yes - run full test suite (pytest)
+2. Yes - run only tests for changed files
+3. No - skip tests for now
+```
+
+If user chooses option 1:
+```bash
+pytest
+```
+
+If user chooses option 2, run targeted tests based on changed files:
+```bash
+pytest tests/test_staggered.py tests/test_estimators.py  # (example)
+```
+
+### 5. Report Summary
+
+```
+## Pre-Merge Check Complete
+
+### Automated Checks
+- Pattern checks: [PASS/WARN - N potential issues found]
+- Test coverage: [PASS/WARN - N methodology files without test changes]
+
+### Manual Checklist
+Review the checklist items above before running /submit-pr
+
+### Findings to Address
+<list any warnings from pattern checks>
+
+### Next Steps
+- Address any warnings above
+- Complete manual checklist items
+- When ready: /submit-pr "Your PR title"
+```
+
+## Notes
+
+- This skill is read-only - it only analyzes and reports, doesn't modify files
+- Run this BEFORE `/submit-pr` to catch issues early
+- Pattern checks are heuristics - review flagged items manually to confirm
+- If pattern checks find issues, fix them before submitting
diff --git a/.github/codex/prompts/pr_review.md b/.github/codex/prompts/pr_review.md
@@ -8,12 +8,44 @@ TOP PRIORITY: Methodology adherence to source material.
   3) Flag any mismatch, missing assumption check, incorrect variance/SE, or undocumented deviation as P0/P1.
 
 SECONDARY PRIORITIES (in order):
-2) Code quality
-3) Performance
-4) Maintainability
-5) Minimization of tech debt
-6) Security (including accidental secrets)
-7) Documentation + tests
+2) Edge case coverage (see checklist below)
+3) Code quality
+4) Performance
+5) Maintainability
+6) Minimization of tech debt
+7) Security (including accidental secrets)
+8) Documentation + tests
+
+## Edge Case Review (learned from PR #97 analysis)
+
+When reviewing new features or code paths, specifically check:
+
+1. **Empty Result Sets**:
+   - Does the code handle when filters produce no matching data?
+   - Example: `base_period="varying"` with no valid pre-treatment periods
+   - Flag as P1 if new code paths lack empty-data handling
+
+2. **NaN/Inf Propagation**:
+   - If SE can be 0 or undefined, are ALL inference fields (t-stat, p-value, CI) set to NaN?
+   - Search for patterns: `if se > 0 else 0.0` → should be `else np.nan`
+   - Check ALL occurrences of this pattern in affected files
+   - Flag as P0 if statistical output could be misleading (e.g., t_stat=0.0 instead of NaN)
+
+3. **Parameter Interactions**:
+   - Does new parameter interact correctly with all aggregation methods?
+   - Does new parameter interact correctly with bootstrap/inference?
+   - Example: `anticipation` parameter must affect group aggregation filtering
+   - Flag as P1 if new parameter isn't tested with all existing code paths
+
+4. **Control/Comparison Group Logic**:
+   - For new code paths, is the control group defined correctly?
+   - Example: "not-yet-treated" should exclude the treatment cohort itself
+   - Flag as P0 if control group composition could bias estimates
+
+5. **Pattern Consistency**:
+   - If the PR fixes a pattern bug, verify ALL occurrences were fixed
+   - Command to check: `grep -n "pattern" diff_diff/*.py`
+   - Flag as P1 if only partial fixes were made
 
 Rules:
 - Review ONLY the changes introduced by this PR (diff) and the minimum surrounding context needed.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -514,3 +514,120 @@ When adding code that emits warnings or handles errors:
 4. **Protect arithmetic operations**:
    - [ ] Wrap ALL related operations in `np.errstate()`, not just the final one
    - [ ] Include division, matrix multiplication, and any operation that can overflow/underflow
+
+### Reviewing New Features or Code Paths
+
+When reviewing PRs that add new features, modes, or code paths (learned from PR #97 analysis):
+
+1. **Edge Case Coverage**:
+   - [ ] Empty result sets (no matching data for a filter condition)
+   - [ ] NaN/Inf propagation through ALL inference fields (SE, t-stat, p-value, CI)
+   - [ ] Parameter interactions (e.g., new param × existing aggregation methods)
+   - [ ] Control/comparison group composition for all code paths
+
+2. **Documentation Completeness**:
+   - [ ] All new parameters have docstrings with type, default, and description
+   - [ ] Methodology docs match implementation behavior (equations, edge cases)
+   - [ ] Edge cases documented in `docs/methodology/REGISTRY.md`
+
+3. **Logic Audit for New Code Paths**:
+   - [ ] When adding new modes (like `base_period="varying"`), trace ALL downstream effects
+   - [ ] Check aggregation methods handle the new mode correctly
+   - [ ] Check bootstrap/inference methods handle the new mode correctly
+   - [ ] Explicitly test control group composition in new code paths
+
+4. **Pattern Consistency**:
+   - [ ] Search for similar patterns in codebase (e.g., `t_stat = x / se if se > 0 else ...`)
+   - [ ] Ensure new code follows established patterns or updates ALL instances
+   - [ ] If fixing a pattern, grep for ALL occurrences first:
+     ```bash
+     grep -n "if.*se.*> 0.*else" diff_diff/*.py
+     ```
+
+### Fixing Bugs Across Multiple Locations
+
+When a bug fix involves a pattern that appears in multiple places (learned from PR #97 analysis):
+
+1. **Find All Instances First**:
+   - [ ] Use grep/search to find ALL occurrences of the pattern before fixing
+   - [ ] Document the locations found (file:line)
+   - [ ] Example: `t_stat = effect / se if se > 0 else 0.0` appeared in 7 locations
+
+2. **Fix Comprehensively in One Round**:
+   - [ ] Fix ALL instances in the same PR/commit
+   - [ ] Create a test that covers all locations
+   - [ ] Don't fix incrementally across multiple review rounds
+
+3. **Regression Test the Fix**:
+   - [ ] Verify fix doesn't break other code paths
+   - [ ] For early-return fixes: ensure downstream code still runs when needed
+   - [ ] Example: Bootstrap early return must still compute per-effect SEs
+
+4. **Common Patterns to Watch For**:
+   - `if se > 0 else 0.0` → should be `else np.nan` for undefined statistics
+   - `if len(data) > 0 else return` → check what downstream code expects
+   - `mask = (condition)` → verify mask logic for all parameter combinations
+
+### Pre-Merge Review Checklist
+
+Final checklist before approving a PR:
+
+1. **Behavioral Completeness**:
+   - [ ] Happy path tested
+   - [ ] Edge cases tested (empty data, NaN inputs, boundary conditions)
+   - [ ] Error/warning paths tested with behavioral assertions
+
+2. **Inference Field Consistency**:
+   - [ ] If one inference field (SE, t-stat, p-value) can be NaN, all related fields handle it
+   - [ ] Aggregation methods propagate NaN correctly
+   - [ ] Bootstrap methods handle NaN in base estimates
+
+3. **Documentation Sync**:
+   - [ ] Docstrings updated for all changed signatures
+   - [ ] README updated if user-facing behavior changes
+   - [ ] REGISTRY.md updated if methodology edge cases change
+
+## Task Implementation Workflow
+
+When implementing features or fixes, follow this workflow to ensure quality and catch issues early:
+
+### Phase 1: Planning
+- Use `EnterPlanMode` for non-trivial tasks
+- Consult `docs/methodology/REGISTRY.md` for methodology-critical code
+- Identify all files and code paths that will be affected
+- For bug fixes: grep for the pattern first to find ALL occurrences
+
+### Phase 2: Implementation
+- Follow the relevant development checklists above
+- Write tests alongside implementation (not after)
+- For bug fixes: fix ALL occurrences in the same commit
+- For new parameters: ensure all aggregation/bootstrap paths handle them
+
+### Phase 3: Pre-Merge Review
+**Run `/pre-merge-check` before submitting**. This skill will:
+1. Run automated pattern checks on changed files (NaN handling, etc.)
+2. Check for missing test coverage
+3. Display context-specific checklist items based on what changed
+4. Optionally run the test suite
+
+Address any warnings before proceeding.
+
+### Phase 4: Submit
+- Use `/submit-pr` to create the PR
+- Automated AI review will run additional methodology and edge case checks
+- The PR template will prompt for methodology references if applicable
+
+### Quick Reference: Common Patterns to Check
+
+Before submitting methodology changes, verify these patterns:
+
+```bash
+# Find potential NaN handling issues (should use np.nan, not 0.0)
+grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py
+
+# Find all t_stat calculations to ensure consistency
+grep -n "t_stat.*=" diff_diff/*.py
+
+# Find all inference field assignments
+grep -n "\(se\|t_stat\|p_value\|ci_lower\|ci_upper\).*=" diff_diff/*.py | head -30
+```