Skip to content

Commit fe9fd5a

Browse files
authored
Merge pull request #14 from igerber/claude/research-library-features-bru1W
2 parents 9e7223f + d124a02 commit fe9fd5a

8 files changed

Lines changed: 2517 additions & 3 deletions

File tree

CLAUDE.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,20 @@ mypy diff_diff
4141
- `MultiPeriodDiD` - Event-study style DiD with period-specific treatment effects
4242
- `SyntheticDiD` - Synthetic control combined with DiD (Arkhangelsky et al. 2021)
4343

44+
- **`diff_diff/staggered.py`** - Staggered adoption DiD estimators:
45+
- `CallawaySantAnna` - Callaway & Sant'Anna (2021) estimator for heterogeneous treatment timing
46+
- `CallawaySantAnnaResults` - Results with group-time ATT(g,t) and aggregations
47+
- `GroupTimeEffect` - Container for individual group-time effects
48+
4449
- **`diff_diff/results.py`** - Dataclass containers for estimation results:
4550
- `DiDResults`, `MultiPeriodDiDResults`, `SyntheticDiDResults`, `PeriodEffect`
4651
- Each provides `summary()`, `to_dict()`, `to_dataframe()` methods
4752

53+
- **`diff_diff/visualization.py`** - Plotting functions:
54+
- `plot_event_study` - Publication-ready event study coefficient plots
55+
- `plot_group_effects` - Treatment effects by cohort visualization
56+
- Works with MultiPeriodDiD, CallawaySantAnna, or DataFrames
57+
4858
- **`diff_diff/utils.py`** - Statistical utilities:
4959
- Robust/cluster standard errors (`compute_robust_se`)
5060
- Parallel trends tests (`check_parallel_trends`, `check_parallel_trends_robust`, `equivalence_test_trends`)

README.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,10 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
6969
- **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
7070
- **Panel data support**: Two-way fixed effects estimator for panel designs
7171
- **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
72+
- **Staggered adoption**: Callaway-Sant'Anna (2021) estimator for heterogeneous treatment timing
7273
- **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
74+
- **Event study plots**: Publication-ready visualization of treatment effects
75+
- **Parallel trends testing**: Multiple methods including equivalence tests
7376
- **Data prep utilities**: Helper functions for common data preparation tasks
7477

7578
## Data Preparation
@@ -560,6 +563,148 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
560563
================================================================================
561564
```
562565

566+
### Staggered Difference-in-Differences (Callaway-Sant'Anna)
567+
568+
When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption.
569+
570+
```python
571+
from diff_diff import CallawaySantAnna
572+
573+
# Panel data with staggered treatment
574+
# 'first_treat' = period when unit was first treated (0 if never treated)
575+
cs = CallawaySantAnna()
576+
results = cs.fit(
577+
panel_data,
578+
outcome='sales',
579+
unit='firm_id',
580+
time='year',
581+
first_treat='first_treat', # 0 for never-treated, else first treatment year
582+
aggregate='event_study' # Compute event study effects
583+
)
584+
585+
# View results
586+
results.print_summary()
587+
588+
# Access group-time effects ATT(g,t)
589+
for (group, time), effect in results.group_time_effects.items():
590+
print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}")
591+
592+
# Event study effects (averaged by relative time)
593+
for rel_time, effect in results.event_study_effects.items():
594+
print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
595+
596+
# Convert to DataFrame
597+
df = results.to_dataframe(level='event_study')
598+
```
599+
600+
Output:
601+
```
602+
=====================================================================================
603+
Callaway-Sant'Anna Staggered Difference-in-Differences Results
604+
=====================================================================================
605+
606+
Total observations: 600
607+
Treated units: 35
608+
Control units: 15
609+
Treatment cohorts: 3
610+
Time periods: 8
611+
Control group: never_treated
612+
613+
-------------------------------------------------------------------------------------
614+
Overall Average Treatment Effect on the Treated
615+
-------------------------------------------------------------------------------------
616+
Parameter Estimate Std. Err. t-stat P>|t| Sig.
617+
-------------------------------------------------------------------------------------
618+
ATT 2.5000 0.3521 7.101 0.0000 ***
619+
-------------------------------------------------------------------------------------
620+
621+
95% Confidence Interval: [1.8099, 3.1901]
622+
623+
-------------------------------------------------------------------------------------
624+
Event Study (Dynamic) Effects
625+
-------------------------------------------------------------------------------------
626+
Rel. Period Estimate Std. Err. t-stat P>|t| Sig.
627+
-------------------------------------------------------------------------------------
628+
0 2.1000 0.4521 4.645 0.0000 ***
629+
1 2.5000 0.4123 6.064 0.0000 ***
630+
2 2.8000 0.5234 5.349 0.0000 ***
631+
-------------------------------------------------------------------------------------
632+
633+
Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
634+
=====================================================================================
635+
```
636+
637+
**When to use Callaway-Sant'Anna vs TWFE:**
638+
639+
| Scenario | Use TWFE | Use Callaway-Sant'Anna |
640+
|----------|----------|------------------------|
641+
| All units treated at same time |||
642+
| Staggered adoption, homogeneous effects |||
643+
| Staggered adoption, heterogeneous effects |||
644+
| Need event study with staggered timing |||
645+
| Fewer than ~20 treated units || Depends on design |
646+
647+
**Parameters:**
648+
649+
```python
650+
CallawaySantAnna(
651+
control_group='never_treated', # or 'not_yet_treated'
652+
anticipation=0, # Periods before treatment with effects
653+
estimation_method='dr', # 'dr', 'ipw', or 'reg'
654+
alpha=0.05, # Significance level
655+
cluster=None, # Column for cluster SEs
656+
n_bootstrap=0, # Must be 0 (bootstrap not yet implemented)
657+
seed=None # Random seed
658+
)
659+
```
660+
661+
**Current limitations:**
662+
- Bootstrap inference (`n_bootstrap > 0`) is not yet implemented
663+
- Covariate adjustment for conditional parallel trends is not yet implemented
664+
665+
### Event Study Visualization
666+
667+
Create publication-ready event study plots:
668+
669+
```python
670+
from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna
671+
672+
# From MultiPeriodDiD
673+
did = MultiPeriodDiD()
674+
results = did.fit(data, outcome='y', treatment='treated',
675+
time='period', post_periods=[3, 4, 5])
676+
plot_event_study(results, title="Treatment Effects Over Time")
677+
678+
# From CallawaySantAnna (with event study aggregation)
679+
cs = CallawaySantAnna()
680+
results = cs.fit(data, outcome='y', unit='unit', time='period',
681+
first_treat='first_treat', aggregate='event_study')
682+
plot_event_study(results, title="Staggered DiD Event Study")
683+
684+
# From a DataFrame
685+
df = pd.DataFrame({
686+
'period': [-2, -1, 0, 1, 2],
687+
'effect': [0.1, 0.05, 0.0, 2.5, 2.8],
688+
'se': [0.3, 0.25, 0.0, 0.4, 0.45]
689+
})
690+
plot_event_study(df, reference_period=0)
691+
692+
# With customization
693+
ax = plot_event_study(
694+
results,
695+
title="Dynamic Treatment Effects",
696+
xlabel="Years Relative to Treatment",
697+
ylabel="Effect on Sales ($1000s)",
698+
color="#2563eb",
699+
marker="o",
700+
shade_pre=True, # Shade pre-treatment region
701+
show_zero_line=True, # Horizontal line at y=0
702+
show_reference_line=True, # Vertical line at reference period
703+
figsize=(10, 6),
704+
show=False # Don't call plt.show(), return axes
705+
)
706+
```
707+
563708
### Synthetic Difference-in-Differences
564709

565710
Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes.

TODO.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# diff-diff Library Roadmap
2+
3+
This document tracks planned features and improvements for the diff-diff library.
4+
5+
## Priority 1: Critical Improvements
6+
7+
### Wild Cluster Bootstrap
8+
**Status**: Not Started
9+
**Effort**: Medium
10+
**Impact**: High
11+
12+
Standard cluster-robust standard errors are biased with few clusters (<50). Wild bootstrap provides valid inference even with 5-10 clusters.
13+
14+
**Implementation Notes**:
15+
- Add `wild_bootstrap_se()` function in `utils.py`
16+
- Support Rademacher and Webb weights
17+
- Integrate with existing estimators via parameter
18+
- Reference: Cameron, Gelbach, and Miller (2008)
19+
20+
### Placebo Tests Module
21+
**Status**: Not Started
22+
**Effort**: Medium
23+
**Impact**: Medium
24+
25+
Implement standard diagnostic tools for DiD:
26+
- Fake treatment timing tests (assign treatment before it actually occurred)
27+
- Fake treatment group tests (run DiD on never-treated units)
28+
- Permutation-based inference
29+
30+
**Implementation Notes**:
31+
- Create `diff_diff/diagnostics.py` module
32+
- Add `run_placebo_test()` function
33+
- Support multiple placebo specifications
34+
35+
---
36+
37+
## Priority 2: Advanced Methods
38+
39+
### Honest DiD / Sensitivity Analysis (Rambachan-Roth)
40+
**Status**: Not Started
41+
**Effort**: High
42+
**Impact**: High
43+
44+
Pre-trends testing has low power and can exacerbate bias. Sensitivity analysis asks: "How robust are results to violations of parallel trends?"
45+
46+
**Features**:
47+
- Compute bounds under restrictions on trend deviations
48+
- Confidence intervals valid under partial identification
49+
- Breakdown analysis visualization
50+
51+
**References**:
52+
- Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies.
53+
- R package: `HonestDiD`
54+
55+
### Borusyak-Jaravel-Spiess Imputation Estimator
56+
**Status**: Not Started
57+
**Effort**: High
58+
**Impact**: Medium
59+
60+
Alternative to Callaway-Sant'Anna that's more efficient when parallel trends hold across all periods.
61+
62+
**Implementation Notes**:
63+
- Impute Y(0) for treated observations using control outcomes
64+
- Support both regression and matrix completion approaches
65+
- Reference: Borusyak, Jaravel, and Spiess (2024)
66+
67+
### Sun-Abraham Estimator
68+
**Status**: Not Started
69+
**Effort**: Medium
70+
**Impact**: Medium
71+
72+
Interaction-weighted estimator for staggered DiD. Focuses on "cohort-specific average treatment effects on the treated" (CATT).
73+
74+
**Reference**: Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics.
75+
76+
---
77+
78+
## Priority 3: Machine Learning Extensions
79+
80+
### Double/Debiased ML for DiD
81+
**Status**: Not Started
82+
**Effort**: High
83+
**Impact**: Medium
84+
85+
For high-dimensional settings with many covariates. Uses machine learning for nuisance parameter estimation.
86+
87+
**Implementation Notes**:
88+
- Integrate with scikit-learn estimators
89+
- Support cross-fitting
90+
- Implement DR-DiD with ML components
91+
- Reference: Chernozhukov et al. (2018), Chang (2020)
92+
93+
### Parallel Trends Forest
94+
**Status**: Not Started
95+
**Effort**: High
96+
**Impact**: Medium
97+
98+
Uses machine learning to construct optimal control samples when using DiD in relatively long panels with little randomization.
99+
100+
**Reference**: Shahn et al. (2023)
101+
102+
---
103+
104+
## Priority 4: Usability Enhancements
105+
106+
### Power Analysis Tools
107+
**Status**: Not Started
108+
**Effort**: Medium
109+
**Impact**: Medium
110+
111+
Help practitioners determine sample size requirements:
112+
- Minimum detectable effect given sample size
113+
- Required sample size for target power
114+
- Visualization of power curves
115+
116+
### Enhanced Visualization
117+
**Status**: Partial
118+
**Effort**: Low
119+
**Impact**: Medium
120+
121+
Current: Basic event study plots implemented.
122+
123+
**Additions needed**:
124+
- Pre-trends shading with significance markers
125+
- Comparison plots across specifications
126+
- Synthetic control weight visualization
127+
- Interactive plots (optional Plotly support)
128+
129+
### Improved Formula Interface
130+
**Status**: Not Started
131+
**Effort**: Low
132+
**Impact**: Low
133+
134+
Current formula support is basic. Enhancements:
135+
- Support for multiple interactions
136+
- Polynomial terms
137+
- Factor notation (C() for categorical)
138+
- Formula objects like patsy/formulaic
139+
140+
---
141+
142+
## Code Quality
143+
144+
### Add Test Coverage for Utils
145+
**Status**: Not Started
146+
**Effort**: Low
147+
**Impact**: Medium
148+
149+
The `utils.py` module has no dedicated tests. Need coverage for:
150+
- `check_parallel_trends()`
151+
- `check_parallel_trends_robust()`
152+
- `equivalence_test_trends()`
153+
- Synthetic control weight functions
154+
155+
### Implement `predict()` Method
156+
**Status**: Not Started
157+
**Effort**: Low
158+
**Impact**: Low
159+
160+
`DifferenceInDifferences.predict()` currently raises `NotImplementedError`. Implementation requires storing column names during fit.
161+
162+
---
163+
164+
## Documentation
165+
166+
### Example Notebooks
167+
**Status**: Not Started
168+
**Effort**: Medium
169+
**Impact**: High
170+
171+
Create Jupyter notebooks demonstrating:
172+
1. Basic 2x2 DiD with real-world data
173+
2. Staggered adoption with CallawaySantAnna
174+
3. Synthetic DiD walkthrough
175+
4. Parallel trends testing and diagnostics
176+
5. Visualization and reporting
177+
178+
### API Reference
179+
**Status**: Partial
180+
**Effort**: Medium
181+
**Impact**: Medium
182+
183+
Docstrings exist but no built API documentation site. Consider:
184+
- Sphinx/ReadTheDocs setup
185+
- mkdocs-material
186+
187+
---
188+
189+
## Completed Features (v0.4.0)
190+
191+
- [x] Callaway-Sant'Anna estimator for staggered DiD
192+
- [x] Event study visualization (`plot_event_study`)
193+
- [x] Group effects visualization (`plot_group_effects`)
194+
- [x] Export TwoWayFixedEffects in public API
195+
- [x] Export parallel trends testing utilities
196+
- [x] CallawaySantAnnaResults with event study and group aggregation
197+
- [x] Comprehensive test coverage for new estimator (17 tests)

0 commit comments

Comments
 (0)