igerber
diff --git a/‎CLAUDE.md‎
Lines changed: 10 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 145 additions & 0 deletions b/‎README.md‎
Lines changed: 145 additions & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 197 additions & 0 deletions b/‎TODO.md‎
Lines changed: 197 additions & 0 deletions
@@ -41,10 +41,20 @@ mypy diff_diff
   - `MultiPeriodDiD` - Event-study style DiD with period-specific treatment effects
   - `SyntheticDiD` - Synthetic control combined with DiD (Arkhangelsky et al. 2021)
 
+- **`diff_diff/staggered.py`** - Staggered adoption DiD estimators:
+  - `CallawaySantAnna` - Callaway & Sant'Anna (2021) estimator for heterogeneous treatment timing
+  - `CallawaySantAnnaResults` - Results with group-time ATT(g,t) and aggregations
+  - `GroupTimeEffect` - Container for individual group-time effects
+
 - **`diff_diff/results.py`** - Dataclass containers for estimation results:
   - `DiDResults`, `MultiPeriodDiDResults`, `SyntheticDiDResults`, `PeriodEffect`
   - Each provides `summary()`, `to_dict()`, `to_dataframe()` methods
 
+- **`diff_diff/visualization.py`** - Plotting functions:
+  - `plot_event_study` - Publication-ready event study coefficient plots
+  - `plot_group_effects` - Treatment effects by cohort visualization
+  - Works with MultiPeriodDiD, CallawaySantAnna, or DataFrames
+
 - **`diff_diff/utils.py`** - Statistical utilities:
   - Robust/cluster standard errors (`compute_robust_se`)
   - Parallel trends tests (`check_parallel_trends`, `check_parallel_trends_robust`, `equivalence_test_trends`)
 
@@ -69,7 +69,10 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
 - **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
 - **Panel data support**: Two-way fixed effects estimator for panel designs
 - **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
+- **Staggered adoption**: Callaway-Sant'Anna (2021) estimator for heterogeneous treatment timing
 - **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
+- **Event study plots**: Publication-ready visualization of treatment effects
+- **Parallel trends testing**: Multiple methods including equivalence tests
 - **Data prep utilities**: Helper functions for common data preparation tasks
 
 ## Data Preparation
@@ -560,6 +563,148 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
 ================================================================================
 ```
 
+### Staggered Difference-in-Differences (Callaway-Sant'Anna)
+
+When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption.
+
+```python
+from diff_diff import CallawaySantAnna
+
+# Panel data with staggered treatment
+# 'first_treat' = period when unit was first treated (0 if never treated)
+cs = CallawaySantAnna()
+results = cs.fit(
+    panel_data,
+    outcome='sales',
+    unit='firm_id',
+    time='year',
+    first_treat='first_treat',  # 0 for never-treated, else first treatment year
+    aggregate='event_study'      # Compute event study effects
+)
+
+# View results
+results.print_summary()
+
+# Access group-time effects ATT(g,t)
+for (group, time), effect in results.group_time_effects.items():
+    print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}")
+
+# Event study effects (averaged by relative time)
+for rel_time, effect in results.event_study_effects.items():
+    print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
+
+# Convert to DataFrame
+df = results.to_dataframe(level='event_study')
+```
+
+Output:
+```
+=====================================================================================
+          Callaway-Sant'Anna Staggered Difference-in-Differences Results
+=====================================================================================
+
+Total observations:                     600
+Treated units:                           35
+Control units:                           15
+Treatment cohorts:                        3
+Time periods:                             8
+Control group:                never_treated
+
+-------------------------------------------------------------------------------------
+                  Overall Average Treatment Effect on the Treated
+-------------------------------------------------------------------------------------
+Parameter         Estimate     Std. Err.     t-stat      P>|t|   Sig.
+-------------------------------------------------------------------------------------
+ATT                 2.5000       0.3521       7.101     0.0000   ***
+-------------------------------------------------------------------------------------
+
+95% Confidence Interval: [1.8099, 3.1901]
+
+-------------------------------------------------------------------------------------
+                          Event Study (Dynamic) Effects
+-------------------------------------------------------------------------------------
+Rel. Period       Estimate     Std. Err.     t-stat      P>|t|   Sig.
+-------------------------------------------------------------------------------------
+0                   2.1000       0.4521       4.645     0.0000   ***
+1                   2.5000       0.4123       6.064     0.0000   ***
+2                   2.8000       0.5234       5.349     0.0000   ***
+-------------------------------------------------------------------------------------
+
+Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
+=====================================================================================
+```
+
+**When to use Callaway-Sant'Anna vs TWFE:**
+
+| Scenario | Use TWFE | Use Callaway-Sant'Anna |
+|----------|----------|------------------------|
+| All units treated at same time | ✓ | ✓ |
+| Staggered adoption, homogeneous effects | ✓ | ✓ |
+| Staggered adoption, heterogeneous effects | ✗ | ✓ |
+| Need event study with staggered timing | ✗ | ✓ |
+| Fewer than ~20 treated units | ✓ | Depends on design |
+
+**Parameters:**
+
+```python
+CallawaySantAnna(
+    control_group='never_treated',  # or 'not_yet_treated'
+    anticipation=0,                  # Periods before treatment with effects
+    estimation_method='dr',          # 'dr', 'ipw', or 'reg'
+    alpha=0.05,                      # Significance level
+    cluster=None,                    # Column for cluster SEs
+    n_bootstrap=0,                   # Must be 0 (bootstrap not yet implemented)
+    seed=None                        # Random seed
+)
+```
+
+**Current limitations:**
+- Bootstrap inference (`n_bootstrap > 0`) is not yet implemented
+- Covariate adjustment for conditional parallel trends is not yet implemented
+
+### Event Study Visualization
+
+Create publication-ready event study plots:
+
+```python
+from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna
+
+# From MultiPeriodDiD
+did = MultiPeriodDiD()
+results = did.fit(data, outcome='y', treatment='treated',
+                  time='period', post_periods=[3, 4, 5])
+plot_event_study(results, title="Treatment Effects Over Time")
+
+# From CallawaySantAnna (with event study aggregation)
+cs = CallawaySantAnna()
+results = cs.fit(data, outcome='y', unit='unit', time='period',
+                 first_treat='first_treat', aggregate='event_study')
+plot_event_study(results, title="Staggered DiD Event Study")
+
+# From a DataFrame
+df = pd.DataFrame({
+    'period': [-2, -1, 0, 1, 2],
+    'effect': [0.1, 0.05, 0.0, 2.5, 2.8],
+    'se': [0.3, 0.25, 0.0, 0.4, 0.45]
+})
+plot_event_study(df, reference_period=0)
+
+# With customization
+ax = plot_event_study(
+    results,
+    title="Dynamic Treatment Effects",
+    xlabel="Years Relative to Treatment",
+    ylabel="Effect on Sales ($1000s)",
+    color="#2563eb",
+    marker="o",
+    shade_pre=True,           # Shade pre-treatment region
+    show_zero_line=True,      # Horizontal line at y=0
+    show_reference_line=True, # Vertical line at reference period
+    figsize=(10, 6),
+    show=False                # Don't call plt.show(), return axes
+)
+```
+
 ### Synthetic Difference-in-Differences
 
 Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes.
 
@@ -0,0 +1,197 @@
+# diff-diff Library Roadmap
+
+This document tracks planned features and improvements for the diff-diff library.
+
+## Priority 1: Critical Improvements
+
+### Wild Cluster Bootstrap
+**Status**: Not Started
+**Effort**: Medium
+**Impact**: High
+
+Standard cluster-robust standard errors are biased with few clusters (<50). Wild bootstrap provides valid inference even with 5-10 clusters.
+
+**Implementation Notes**:
+- Add `wild_bootstrap_se()` function in `utils.py`
+- Support Rademacher and Webb weights
+- Integrate with existing estimators via parameter
+- Reference: Cameron, Gelbach, and Miller (2008)
+
+### Placebo Tests Module
+**Status**: Not Started
+**Effort**: Medium
+**Impact**: Medium
+
+Implement standard diagnostic tools for DiD:
+- Fake treatment timing tests (assign treatment before it actually occurred)
+- Fake treatment group tests (run DiD on never-treated units)
+- Permutation-based inference
+
+**Implementation Notes**:
+- Create `diff_diff/diagnostics.py` module
+- Add `run_placebo_test()` function
+- Support multiple placebo specifications
+
+---
+
+## Priority 2: Advanced Methods
+
+### Honest DiD / Sensitivity Analysis (Rambachan-Roth)
+**Status**: Not Started
+**Effort**: High
+**Impact**: High
+
+Pre-trends testing has low power and can exacerbate bias. Sensitivity analysis asks: "How robust are results to violations of parallel trends?"
+
+**Features**:
+- Compute bounds under restrictions on trend deviations
+- Confidence intervals valid under partial identification
+- Breakdown analysis visualization
+
+**References**:
+- Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies.
+- R package: `HonestDiD`
+
+### Borusyak-Jaravel-Spiess Imputation Estimator
+**Status**: Not Started
+**Effort**: High
+**Impact**: Medium
+
+Alternative to Callaway-Sant'Anna that's more efficient when parallel trends hold across all periods.
+
+**Implementation Notes**:
+- Impute Y(0) for treated observations using control outcomes
+- Support both regression and matrix completion approaches
+- Reference: Borusyak, Jaravel, and Spiess (2024)
+
+### Sun-Abraham Estimator
+**Status**: Not Started
+**Effort**: Medium
+**Impact**: Medium
+
+Interaction-weighted estimator for staggered DiD. Focuses on "cohort-specific average treatment effects on the treated" (CATT).
+
+**Reference**: Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics.
+
+---
+
+## Priority 3: Machine Learning Extensions
+
+### Double/Debiased ML for DiD
+**Status**: Not Started
+**Effort**: High
+**Impact**: Medium
+
+For high-dimensional settings with many covariates. Uses machine learning for nuisance parameter estimation.
+
+**Implementation Notes**:
+- Integrate with scikit-learn estimators
+- Support cross-fitting
+- Implement DR-DiD with ML components
+- Reference: Chernozhukov et al. (2018), Chang (2020)
+
+### Parallel Trends Forest
+**Status**: Not Started
+**Effort**: High
+**Impact**: Medium
+
+Uses machine learning to construct optimal control samples when using DiD in relatively long panels with little randomization.
+
+**Reference**: Shahn et al. (2023)
+
+---
+
+## Priority 4: Usability Enhancements
+
+### Power Analysis Tools
+**Status**: Not Started
+**Effort**: Medium
+**Impact**: Medium
+
+Help practitioners determine sample size requirements:
+- Minimum detectable effect given sample size
+- Required sample size for target power
+- Visualization of power curves
+
+### Enhanced Visualization
+**Status**: Partial
+**Effort**: Low
+**Impact**: Medium
+
+Current: Basic event study plots implemented.
+
+**Additions needed**:
+- Pre-trends shading with significance markers
+- Comparison plots across specifications
+- Synthetic control weight visualization
+- Interactive plots (optional Plotly support)
+
+### Improved Formula Interface
+**Status**: Not Started
+**Effort**: Low
+**Impact**: Low
+
+Current formula support is basic. Enhancements:
+- Support for multiple interactions
+- Polynomial terms
+- Factor notation (C() for categorical)
+- Formula objects like patsy/formulaic
+
+---
+
+## Code Quality
+
+### Add Test Coverage for Utils
+**Status**: Not Started
+**Effort**: Low
+**Impact**: Medium
+
+The `utils.py` module has no dedicated tests. Need coverage for:
+- `check_parallel_trends()`
+- `check_parallel_trends_robust()`
+- `equivalence_test_trends()`
+- Synthetic control weight functions
+
+### Implement `predict()` Method
+**Status**: Not Started
+**Effort**: Low
+**Impact**: Low
+
+`DifferenceInDifferences.predict()` currently raises `NotImplementedError`. Implementation requires storing column names during fit.
+
+---
+
+## Documentation
+
+### Example Notebooks
+**Status**: Not Started
+**Effort**: Medium
+**Impact**: High
+
+Create Jupyter notebooks demonstrating:
+1. Basic 2x2 DiD with real-world data
+2. Staggered adoption with CallawaySantAnna
+3. Synthetic DiD walkthrough
+4. Parallel trends testing and diagnostics
+5. Visualization and reporting
+
+### API Reference
+**Status**: Partial
+**Effort**: Medium
+**Impact**: Medium
+
+Docstrings exist but no built API documentation site. Consider:
+- Sphinx/ReadTheDocs setup
+- mkdocs-material
+
+---
+
+## Completed Features (v0.4.0)
+
+- [x] Callaway-Sant'Anna estimator for staggered DiD
+- [x] Event study visualization (`plot_event_study`)
+- [x] Group effects visualization (`plot_group_effects`)
+- [x] Export TwoWayFixedEffects in public API
+- [x] Export parallel trends testing utilities
+- [x] CallawaySantAnnaResults with event study and group aggregation
+- [x] Comprehensive test coverage for new estimator (17 tests)