Skip to content

Commit 2afcde7

Browse files
authored
Merge pull request #48 from igerber/claude/implement-ddd-estimator-mWDwi
2 parents 9b622cd + 20a126e commit 2afcde7

8 files changed

Lines changed: 2827 additions & 10 deletions

File tree

CLAUDE.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,13 @@ mypy diff_diff
6060
- Alternative to Callaway-Sant'Anna with different weighting scheme
6161
- Useful robustness check when both estimators agree
6262

63+
- **`diff_diff/triple_diff.py`** - Triple Difference (DDD) estimator:
64+
- `TripleDifference` - Ortiz-Villavicencio & Sant'Anna (2025) estimator for DDD designs
65+
- `TripleDifferenceResults` - Results with ATT, SEs, cell means, diagnostics
66+
- `triple_difference()` - Convenience function for quick estimation
67+
- Regression adjustment, IPW, and doubly robust estimation methods
68+
- Proper covariate handling (unlike naive DDD implementations)
69+
6370
- **`diff_diff/bacon.py`** - Goodman-Bacon decomposition for TWFE diagnostics:
6471
- `BaconDecomposition` - Decompose TWFE into weighted 2x2 comparisons (Goodman-Bacon 2021)
6572
- `BaconDecompositionResults` - Results with comparison weights and estimates by type
@@ -148,13 +155,15 @@ mypy diff_diff
148155
- `05_honest_did.ipynb` - Honest DiD sensitivity analysis for parallel trends violations
149156
- `06_power_analysis.ipynb` - Power analysis for study design, MDE, simulation-based power
150157
- `07_pretrends_power.ipynb` - Pre-trends power analysis (Roth 2022), MDV, power curves
158+
- `08_triple_diff.ipynb` - Triple Difference (DDD) estimation with proper covariate handling
151159

152160
### Test Structure
153161

154162
Tests mirror the source modules:
155163
- `tests/test_estimators.py` - Tests for DifferenceInDifferences, TWFE, MultiPeriodDiD, SyntheticDiD
156164
- `tests/test_staggered.py` - Tests for CallawaySantAnna
157165
- `tests/test_sun_abraham.py` - Tests for SunAbraham interaction-weighted estimator
166+
- `tests/test_triple_diff.py` - Tests for Triple Difference (DDD) estimator
158167
- `tests/test_bacon.py` - Tests for Goodman-Bacon decomposition
159168
- `tests/test_utils.py` - Tests for parallel trends, robust SE, synthetic weights
160169
- `tests/test_diagnostics.py` - Tests for placebo tests

README.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
7171
- **Panel data support**: Two-way fixed effects estimator for panel designs
7272
- **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
7373
- **Staggered adoption**: Callaway-Sant'Anna (2021) and Sun-Abraham (2021) estimators for heterogeneous treatment timing
74+
- **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
7475
- **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
7576
- **Event study plots**: Publication-ready visualization of treatment effects
7677
- **Parallel trends testing**: Multiple methods including equivalence tests
@@ -93,6 +94,8 @@ We provide Jupyter notebook tutorials in `docs/tutorials/`:
9394
| `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics |
9495
| `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization |
9596
| `06_power_analysis.ipynb` | Power analysis, MDE, sample size calculations, simulation-based power |
97+
| `07_pretrends_power.ipynb` | Pre-trends power analysis (Roth 2022), MDV, power curves |
98+
| `08_triple_diff.ipynb` | Triple Difference (DDD) estimation with proper covariate handling |
9699

97100
## Data Preparation
98101

@@ -866,6 +869,77 @@ print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}")
866869
# If results differ substantially, investigate heterogeneity
867870
```
868871

872+
### Triple Difference (DDD)
873+
874+
Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
875+
876+
```python
877+
from diff_diff import TripleDifference, triple_difference
878+
879+
# Basic usage
880+
ddd = TripleDifference(estimation_method='dr') # doubly robust (recommended)
881+
results = ddd.fit(
882+
data,
883+
outcome='wages',
884+
group='policy_state', # 1=state enacted policy, 0=control state
885+
partition='female', # 1=women (affected by policy), 0=men
886+
time='post' # 1=post-policy, 0=pre-policy
887+
)
888+
889+
# View results
890+
results.print_summary()
891+
print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
892+
893+
# With covariates (properly incorporated, unlike naive DDD)
894+
results = ddd.fit(
895+
data,
896+
outcome='wages',
897+
group='policy_state',
898+
partition='female',
899+
time='post',
900+
covariates=['age', 'education', 'experience']
901+
)
902+
```
903+
904+
**Estimation methods:**
905+
906+
| Method | Description | When to use |
907+
|--------|-------------|-------------|
908+
| `"dr"` | Doubly robust | Recommended. Consistent if either outcome or propensity model is correct |
909+
| `"reg"` | Regression adjustment | Simple outcome regression with full interactions |
910+
| `"ipw"` | Inverse probability weighting | When propensity score model is well-specified |
911+
912+
```python
913+
# Compare estimation methods
914+
for method in ['reg', 'ipw', 'dr']:
915+
est = TripleDifference(estimation_method=method)
916+
res = est.fit(data, outcome='y', group='g', partition='p', time='t')
917+
print(f"{method}: ATT={res.att:.3f} (SE={res.se:.3f})")
918+
```
919+
920+
**Convenience function:**
921+
922+
```python
923+
# One-liner estimation
924+
results = triple_difference(
925+
data,
926+
outcome='wages',
927+
group='policy_state',
928+
partition='female',
929+
time='post',
930+
covariates=['age', 'education'],
931+
estimation_method='dr'
932+
)
933+
```
934+
935+
**Why use DDD instead of DiD?**
936+
937+
DDD allows for violations of parallel trends that are:
938+
- Group-specific (e.g., economic shocks in treatment states)
939+
- Partition-specific (e.g., trends affecting women everywhere)
940+
941+
As long as these biases are additive, DDD differences them out. The key assumption is that the *differential* trend between eligible and ineligible units would be the same across groups.
942+
869943
### Event Study Visualization
870944

871945
Create publication-ready event study plots:
@@ -1661,6 +1735,60 @@ SunAbraham(
16611735
| `print_summary(alpha)` | Print summary to stdout |
16621736
| `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') |
16631737

1738+
### TripleDifference
1739+
1740+
```python
1741+
TripleDifference(
1742+
estimation_method='dr', # 'dr' (doubly robust), 'reg', or 'ipw'
1743+
robust=True, # Use HC1 robust standard errors
1744+
cluster=None, # Column for cluster-robust SEs
1745+
alpha=0.05, # Significance level for CIs
1746+
pscore_trim=0.01 # Propensity score trimming threshold
1747+
)
1748+
```
1749+
1750+
**fit() Parameters:**
1751+
1752+
| Parameter | Type | Description |
1753+
|-----------|------|-------------|
1754+
| `data` | DataFrame | Input data |
1755+
| `outcome` | str | Outcome variable column name |
1756+
| `group` | str | Group indicator column (0/1): 1=treated group |
1757+
| `partition` | str | Partition/eligibility indicator column (0/1): 1=eligible |
1758+
| `time` | str | Time indicator column (0/1): 1=post-treatment |
1759+
| `covariates` | list | Covariate column names for adjustment |
1760+
1761+
### TripleDifferenceResults
1762+
1763+
**Attributes:**
1764+
1765+
| Attribute | Description |
1766+
|-----------|-------------|
1767+
| `att` | Average Treatment effect on the Treated |
1768+
| `se` | Standard error of ATT |
1769+
| `t_stat` | T-statistic |
1770+
| `p_value` | P-value for H0: ATT = 0 |
1771+
| `conf_int` | Tuple of (lower, upper) confidence bounds |
1772+
| `n_obs` | Total number of observations |
1773+
| `n_treated_eligible` | Obs in treated group & eligible partition |
1774+
| `n_treated_ineligible` | Obs in treated group & ineligible partition |
1775+
| `n_control_eligible` | Obs in control group & eligible partition |
1776+
| `n_control_ineligible` | Obs in control group & ineligible partition |
1777+
| `estimation_method` | Method used ('dr', 'reg', or 'ipw') |
1778+
| `group_means` | Dict of cell means for diagnostics |
1779+
| `pscore_stats` | Propensity score statistics (IPW/DR only) |
1780+
| `is_significant` | Boolean for significance at alpha |
1781+
| `significance_stars` | String of significance stars |
1782+
1783+
**Methods:**
1784+
1785+
| Method | Description |
1786+
|--------|-------------|
1787+
| `summary(alpha)` | Get formatted summary string |
1788+
| `print_summary(alpha)` | Print summary to stdout |
1789+
| `to_dict()` | Convert to dictionary |
1790+
| `to_dataframe()` | Convert to pandas DataFrame |
1791+
16641792
### HonestDiD
16651793

16661794
```python
@@ -2024,6 +2152,18 @@ This library implements methods from the following scholarly works:
20242152

20252153
- **Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021).** "Synthetic Difference-in-Differences." *American Economic Review*, 111(12), 4088-4118. [https://doi.org/10.1257/aer.20190159](https://doi.org/10.1257/aer.20190159)
20262154

2155+
### Triple Difference (DDD)
2156+
2157+
- **Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025).** "Better Understanding Triple Differences Estimators." *Working Paper*. [https://arxiv.org/abs/2505.09942](https://arxiv.org/abs/2505.09942)
2158+
2159+
This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The `TripleDifference` class implements their regression adjustment, inverse probability weighting, and doubly robust estimators.
2160+
2161+
- **Gruber, J. (1994).** "The Incidence of Mandated Maternity Benefits." *American Economic Review*, 84(3), 622-641. [https://www.jstor.org/stable/2118071](https://www.jstor.org/stable/2118071)
2162+
2163+
Classic paper introducing the Triple Difference design for policy evaluation.
2164+
2165+
- **Olden, A., & Møen, J. (2022).** "The Triple Difference Estimator." *The Econometrics Journal*, 25(3), 531-553. [https://doi.org/10.1093/ectj/utac010](https://doi.org/10.1093/ectj/utac010)
2166+
20272167
### Parallel Trends and Pre-Trend Testing
20282168

20292169
- **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236)

ROADMAP.md

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,19 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
66

77
---
88

9-
## Current Status (v1.2.0)
9+
## Current Status (v1.3.0)
1010

1111
diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis:
1212

13-
- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Synthetic DiD
13+
- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Synthetic DiD, Triple Difference (DDD)
1414
- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap
1515
- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
1616
- **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
1717
- **Study design**: Power analysis tools
1818

1919
---
2020

21-
## Near-Term Enhancements (v1.3)
21+
## Near-Term Enhancements (v1.4)
2222

2323
High-value additions building on our existing foundation.
2424

@@ -42,14 +42,27 @@ Two-stage approach gaining traction in applied work. First residualizes outcomes
4242

4343
**Reference**: Gardner (2022). *Working Paper*.
4444

45-
### Triple Difference (DDD) Estimators
45+
### Staggered Triple Difference (DDD)
4646

47-
Extends DiD to settings requiring a third differencing dimension. Common DDD implementations are invalid when covariates are needed for identification.
47+
Extend the existing `TripleDifference` estimator to handle staggered adoption settings. The current implementation handles 2-period DDD; this extends to multi-period designs.
4848

49-
- Regression adjustment, IPW, and doubly robust DDD estimators
50-
- Staggered adoption support with multiple comparison groups
51-
- Proper covariate integration (naive "two DiD difference" approaches fail)
52-
- Bias reduction and precision gains over standard approaches
49+
**Multi-period/Staggered Support:**
50+
- Group-time ATT(g,t) for DDD designs with variation in treatment timing
51+
- Handle settings where groups adopt at different times
52+
- Multiple comparison groups (never-treated, not-yet-treated in either dimension)
53+
- `StaggeredTripleDifference` class or extended `TripleDifference` with `first_treat` parameter
54+
55+
**Event Study Aggregation:**
56+
- Dynamic treatment effects over time (event study coefficients)
57+
- Pre-treatment placebo effects for parallel trends assessment
58+
- `aggregate='event_study'` parameter like `CallawaySantAnna`
59+
- Integration with `plot_event_study()` visualization
60+
61+
**Multiplier Bootstrap Inference:**
62+
- Multiplier bootstrap for valid inference in staggered settings
63+
- Rademacher, Mammen, and Webb weight options (matching existing estimators)
64+
- `n_bootstrap` parameter and `DDDBootstrapResults` class
65+
- Clustered bootstrap for panel data
5366

5467
**Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). *Working Paper*. R package: `triplediff`.
5568

@@ -61,7 +74,7 @@ Extends DiD to settings requiring a third differencing dimension. Common DDD imp
6174

6275
---
6376

64-
## Medium-Term Enhancements (v1.4+)
77+
## Medium-Term Enhancements (v1.5+)
6578

6679
Extending diff-diff to handle more complex settings.
6780

diff_diff/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,11 @@
8181
SunAbraham,
8282
SunAbrahamResults,
8383
)
84+
from diff_diff.triple_diff import (
85+
TripleDifference,
86+
TripleDifferenceResults,
87+
triple_difference,
88+
)
8489
from diff_diff.utils import (
8590
WildBootstrapResults,
8691
check_parallel_trends,
@@ -107,6 +112,7 @@
107112
"SyntheticDiD",
108113
"CallawaySantAnna",
109114
"SunAbraham",
115+
"TripleDifference",
110116
# Bacon Decomposition
111117
"BaconDecomposition",
112118
"BaconDecompositionResults",
@@ -123,6 +129,8 @@
123129
"GroupTimeEffect",
124130
"SunAbrahamResults",
125131
"SABootstrapResults",
132+
"TripleDifferenceResults",
133+
"triple_difference",
126134
# Visualization
127135
"plot_event_study",
128136
"plot_group_effects",

0 commit comments

Comments
 (0)