Skip to content

Commit 6751fcb

Browse files
igerberclaude
andcommitted
Add METHODOLOGY_REVIEW.md for tracking estimator review progress
Create tracking document to monitor methodology reviews for all 12 estimators against academic references and R implementations. Includes: - Review status summary table for all estimators - Detailed notes template for each estimator organized by category - Review process guidelines with checklist - Priority ordering and deviation documentation format Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent f406c08 commit 6751fcb

1 file changed

Lines changed: 334 additions & 0 deletions

File tree

METHODOLOGY_REVIEW.md

Lines changed: 334 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,334 @@
1+
# Methodology Review
2+
3+
This document tracks the progress of reviewing each estimator's implementation against the Methodology Registry and academic references. It ensures that implementations are correct, consistent, and well-documented.
4+
5+
For the methodology registry with academic foundations and key equations, see [docs/methodology/REGISTRY.md](docs/methodology/REGISTRY.md).
6+
7+
---
8+
9+
## Overview
10+
11+
Each estimator in diff-diff should be periodically reviewed to ensure:
12+
1. **Correctness**: Implementation matches the academic paper's equations
13+
2. **Reference alignment**: Behavior matches reference implementations (R packages, Stata commands)
14+
3. **Edge case handling**: Documented edge cases are handled correctly
15+
4. **Standard errors**: SE formulas match the documented approach
16+
17+
---
18+
19+
## Review Status Summary
20+
21+
| Estimator | Module | R Reference | Status | Last Review |
22+
|-----------|--------|-------------|--------|-------------|
23+
| DifferenceInDifferences | `estimators.py` | `fixest::feols()` | Not Started | - |
24+
| MultiPeriodDiD | `estimators.py` | `fixest::feols()` | Not Started | - |
25+
| TwoWayFixedEffects | `twfe.py` | `fixest::feols()` | Not Started | - |
26+
| CallawaySantAnna | `staggered.py` | `did::att_gt()` | Not Started | - |
27+
| SunAbraham | `sun_abraham.py` | `fixest::sunab()` | Not Started | - |
28+
| SyntheticDiD | `synthetic_did.py` | `synthdid::synthdid_estimate()` | Not Started | - |
29+
| TripleDifference | `triple_diff.py` | (forthcoming) | Not Started | - |
30+
| TROP | `trop.py` | (forthcoming) | Not Started | - |
31+
| BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | Not Started | - |
32+
| HonestDiD | `honest_did.py` | `HonestDiD` package | Not Started | - |
33+
| PreTrendsPower | `pretrends.py` | `pretrends` package | Not Started | - |
34+
| PowerAnalysis | `power.py` | `pwr` / `DeclareDesign` | Not Started | - |
35+
36+
**Status legend:**
37+
- **Not Started**: No formal review conducted
38+
- **In Progress**: Review underway
39+
- **Complete**: Review finished, implementation verified
40+
41+
---
42+
43+
## Detailed Review Notes
44+
45+
### Core DiD Estimators
46+
47+
#### DifferenceInDifferences
48+
49+
| Field | Value |
50+
|-------|-------|
51+
| Module | `estimators.py` |
52+
| Primary Reference | Wooldridge (2010), Angrist & Pischke (2009) |
53+
| R Reference | `fixest::feols()` |
54+
| Status | Not Started |
55+
| Last Review | - |
56+
57+
**Corrections Made:**
58+
- (None yet)
59+
60+
**Outstanding Concerns:**
61+
- (None yet)
62+
63+
---
64+
65+
#### MultiPeriodDiD
66+
67+
| Field | Value |
68+
|-------|-------|
69+
| Module | `estimators.py` |
70+
| Primary Reference | Freyaldenhoven et al. (2021) |
71+
| R Reference | `fixest::feols()` |
72+
| Status | Not Started |
73+
| Last Review | - |
74+
75+
**Corrections Made:**
76+
- (None yet)
77+
78+
**Outstanding Concerns:**
79+
- (None yet)
80+
81+
---
82+
83+
#### TwoWayFixedEffects
84+
85+
| Field | Value |
86+
|-------|-------|
87+
| Module | `twfe.py` |
88+
| Primary Reference | Wooldridge (2010), Ch. 10 |
89+
| R Reference | `fixest::feols()` |
90+
| Status | Not Started |
91+
| Last Review | - |
92+
93+
**Corrections Made:**
94+
- (None yet)
95+
96+
**Outstanding Concerns:**
97+
- (None yet)
98+
99+
---
100+
101+
### Modern Staggered Estimators
102+
103+
#### CallawaySantAnna
104+
105+
| Field | Value |
106+
|-------|-------|
107+
| Module | `staggered.py` |
108+
| Primary Reference | Callaway & Sant'Anna (2021) |
109+
| R Reference | `did::att_gt()` |
110+
| Status | Not Started |
111+
| Last Review | - |
112+
113+
**Corrections Made:**
114+
- (None yet)
115+
116+
**Outstanding Concerns:**
117+
- (None yet)
118+
119+
---
120+
121+
#### SunAbraham
122+
123+
| Field | Value |
124+
|-------|-------|
125+
| Module | `sun_abraham.py` |
126+
| Primary Reference | Sun & Abraham (2021) |
127+
| R Reference | `fixest::sunab()` |
128+
| Status | Not Started |
129+
| Last Review | - |
130+
131+
**Corrections Made:**
132+
- (None yet)
133+
134+
**Outstanding Concerns:**
135+
- (None yet)
136+
137+
---
138+
139+
### Advanced Estimators
140+
141+
#### SyntheticDiD
142+
143+
| Field | Value |
144+
|-------|-------|
145+
| Module | `synthetic_did.py` |
146+
| Primary Reference | Arkhangelsky et al. (2021) |
147+
| R Reference | `synthdid::synthdid_estimate()` |
148+
| Status | Not Started |
149+
| Last Review | - |
150+
151+
**Corrections Made:**
152+
- (None yet)
153+
154+
**Outstanding Concerns:**
155+
- (None yet)
156+
157+
---
158+
159+
#### TripleDifference
160+
161+
| Field | Value |
162+
|-------|-------|
163+
| Module | `triple_diff.py` |
164+
| Primary Reference | Ortiz-Villavicencio & Sant'Anna (2025) |
165+
| R Reference | (forthcoming) |
166+
| Status | Not Started |
167+
| Last Review | - |
168+
169+
**Corrections Made:**
170+
- (None yet)
171+
172+
**Outstanding Concerns:**
173+
- (None yet)
174+
175+
---
176+
177+
#### TROP
178+
179+
| Field | Value |
180+
|-------|-------|
181+
| Module | `trop.py` |
182+
| Primary Reference | Athey, Imbens, Qu & Viviano (2025) |
183+
| R Reference | (forthcoming) |
184+
| Status | Not Started |
185+
| Last Review | - |
186+
187+
**Corrections Made:**
188+
- (None yet)
189+
190+
**Outstanding Concerns:**
191+
- (None yet)
192+
193+
---
194+
195+
### Diagnostics & Sensitivity
196+
197+
#### BaconDecomposition
198+
199+
| Field | Value |
200+
|-------|-------|
201+
| Module | `bacon.py` |
202+
| Primary Reference | Goodman-Bacon (2021) |
203+
| R Reference | `bacondecomp::bacon()` |
204+
| Status | Not Started |
205+
| Last Review | - |
206+
207+
**Corrections Made:**
208+
- (None yet)
209+
210+
**Outstanding Concerns:**
211+
- (None yet)
212+
213+
---
214+
215+
#### HonestDiD
216+
217+
| Field | Value |
218+
|-------|-------|
219+
| Module | `honest_did.py` |
220+
| Primary Reference | Rambachan & Roth (2023) |
221+
| R Reference | `HonestDiD` package |
222+
| Status | Not Started |
223+
| Last Review | - |
224+
225+
**Corrections Made:**
226+
- (None yet)
227+
228+
**Outstanding Concerns:**
229+
- (None yet)
230+
231+
---
232+
233+
#### PreTrendsPower
234+
235+
| Field | Value |
236+
|-------|-------|
237+
| Module | `pretrends.py` |
238+
| Primary Reference | Roth (2022) |
239+
| R Reference | `pretrends` package |
240+
| Status | Not Started |
241+
| Last Review | - |
242+
243+
**Corrections Made:**
244+
- (None yet)
245+
246+
**Outstanding Concerns:**
247+
- (None yet)
248+
249+
---
250+
251+
#### PowerAnalysis
252+
253+
| Field | Value |
254+
|-------|-------|
255+
| Module | `power.py` |
256+
| Primary Reference | Bloom (1995), Burlig et al. (2020) |
257+
| R Reference | `pwr` / `DeclareDesign` |
258+
| Status | Not Started |
259+
| Last Review | - |
260+
261+
**Corrections Made:**
262+
- (None yet)
263+
264+
**Outstanding Concerns:**
265+
- (None yet)
266+
267+
---
268+
269+
## Review Process Guidelines
270+
271+
### Review Checklist
272+
273+
For each estimator, complete the following steps:
274+
275+
- [ ] **Read primary academic source** - Review the key paper(s) cited in REGISTRY.md
276+
- [ ] **Compare key equations** - Verify implementation matches equations in REGISTRY.md
277+
- [ ] **Run benchmark against R reference** - Execute `benchmarks/run_benchmarks.py --estimator <name>` if available
278+
- [ ] **Verify edge case handling** - Check behavior matches REGISTRY.md documentation
279+
- [ ] **Check standard error formula** - Confirm SE computation matches reference
280+
- [ ] **Document any deviations** - Add notes explaining intentional differences with rationale
281+
282+
### When to Update This Document
283+
284+
1. **After completing a review**: Update status to "Complete" and add date
285+
2. **When making corrections**: Document what was fixed in the "Corrections Made" section
286+
3. **When identifying issues**: Add to "Outstanding Concerns" for future investigation
287+
4. **When deviating from reference**: Document the deviation and rationale
288+
289+
### Deviation Documentation
290+
291+
When our implementation intentionally differs from the reference implementation, document:
292+
293+
1. **What differs**: Specific behavior or formula that differs
294+
2. **Why**: Rationale (e.g., "defensive enhancement", "bug in R package", "follows updated paper")
295+
3. **Impact**: Whether results differ in practice
296+
4. **Cross-reference**: Update REGISTRY.md edge cases section
297+
298+
Example:
299+
```
300+
**Deviation (2025-01-15)**: CallawaySantAnna returns NaN for t_stat when SE is non-finite,
301+
whereas R's `did::att_gt` would error. This is a defensive enhancement that provides
302+
more graceful handling of edge cases while still signaling invalid inference to users.
303+
```
304+
305+
### Priority Order
306+
307+
Suggested order for reviews based on usage and complexity:
308+
309+
1. **High priority** (most used, complex methodology):
310+
- CallawaySantAnna
311+
- SyntheticDiD
312+
- HonestDiD
313+
314+
2. **Medium priority** (commonly used, simpler methodology):
315+
- DifferenceInDifferences
316+
- TwoWayFixedEffects
317+
- MultiPeriodDiD
318+
- SunAbraham
319+
- BaconDecomposition
320+
321+
3. **Lower priority** (newer or less commonly used):
322+
- TripleDifference
323+
- TROP
324+
- PreTrendsPower
325+
- PowerAnalysis
326+
327+
---
328+
329+
## Related Documents
330+
331+
- [REGISTRY.md](docs/methodology/REGISTRY.md) - Academic foundations and key equations
332+
- [ROADMAP.md](ROADMAP.md) - Feature roadmap
333+
- [TODO.md](TODO.md) - Technical debt tracking
334+
- [CLAUDE.md](CLAUDE.md) - Development guidelines

0 commit comments

Comments
 (0)