|
1 | 1 | # diff-diff |
2 | 2 |
|
3 | | -A library for computing difference in differences (diff-in-diff or DiD), a quasi-experimental method of statistical analysis. |
| 3 | +A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +```bash |
| 8 | +pip install diff-diff |
| 9 | +``` |
| 10 | + |
| 11 | +Or install from source: |
| 12 | + |
| 13 | +```bash |
| 14 | +git clone https://github.com/igerber/diff-diff.git |
| 15 | +cd diff-diff |
| 16 | +pip install -e . |
| 17 | +``` |
| 18 | + |
| 19 | +## Quick Start |
| 20 | + |
| 21 | +```python |
| 22 | +import pandas as pd |
| 23 | +from diff_diff import DifferenceInDifferences |
| 24 | + |
| 25 | +# Create sample data |
| 26 | +data = pd.DataFrame({ |
| 27 | + 'outcome': [10, 11, 15, 18, 9, 10, 12, 13], |
| 28 | + 'treated': [1, 1, 1, 1, 0, 0, 0, 0], |
| 29 | + 'post': [0, 0, 1, 1, 0, 0, 1, 1] |
| 30 | +}) |
| 31 | + |
| 32 | +# Fit the model |
| 33 | +did = DifferenceInDifferences() |
| 34 | +results = did.fit(data, outcome='outcome', treatment='treated', time='post') |
| 35 | + |
| 36 | +# View results |
| 37 | +print(results) # DiDResults(ATT=3.5000*, SE=1.2583, p=0.0367) |
| 38 | +results.print_summary() |
| 39 | +``` |
| 40 | + |
| 41 | +Output: |
| 42 | +``` |
| 43 | +====================================================================== |
| 44 | + Difference-in-Differences Estimation Results |
| 45 | +====================================================================== |
| 46 | +
|
| 47 | +Observations: 8 |
| 48 | +Treated units: 4 |
| 49 | +Control units: 4 |
| 50 | +R-squared: 0.9123 |
| 51 | +
|
| 52 | +---------------------------------------------------------------------- |
| 53 | +Parameter Estimate Std. Err. t-stat P>|t| |
| 54 | +---------------------------------------------------------------------- |
| 55 | +ATT 3.5000 1.2583 2.782 0.0367 |
| 56 | +---------------------------------------------------------------------- |
| 57 | +
|
| 58 | +95% Confidence Interval: [0.3912, 6.6088] |
| 59 | +
|
| 60 | +Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1 |
| 61 | +====================================================================== |
| 62 | +``` |
| 63 | + |
| 64 | +## Features |
| 65 | + |
| 66 | +- **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()` |
| 67 | +- **Pythonic results**: Easy access to coefficients, standard errors, and confidence intervals |
| 68 | +- **Multiple interfaces**: Column names or R-style formulas |
| 69 | +- **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors |
| 70 | +- **Panel data support**: Two-way fixed effects estimator for panel designs |
| 71 | + |
| 72 | +## Usage |
| 73 | + |
| 74 | +### Basic DiD with Column Names |
| 75 | + |
| 76 | +```python |
| 77 | +from diff_diff import DifferenceInDifferences |
| 78 | + |
| 79 | +did = DifferenceInDifferences(robust=True, alpha=0.05) |
| 80 | +results = did.fit( |
| 81 | + data, |
| 82 | + outcome='sales', |
| 83 | + treatment='treated', |
| 84 | + time='post_policy' |
| 85 | +) |
| 86 | + |
| 87 | +# Access results |
| 88 | +print(f"ATT: {results.att:.4f}") |
| 89 | +print(f"Standard Error: {results.se:.4f}") |
| 90 | +print(f"P-value: {results.p_value:.4f}") |
| 91 | +print(f"95% CI: {results.conf_int}") |
| 92 | +print(f"Significant: {results.is_significant}") |
| 93 | +``` |
| 94 | + |
| 95 | +### Using Formula Interface |
| 96 | + |
| 97 | +```python |
| 98 | +# R-style formula syntax |
| 99 | +results = did.fit(data, formula='outcome ~ treated * post') |
| 100 | + |
| 101 | +# Explicit interaction syntax |
| 102 | +results = did.fit(data, formula='outcome ~ treated + post + treated:post') |
| 103 | + |
| 104 | +# With covariates |
| 105 | +results = did.fit(data, formula='outcome ~ treated * post + age + income') |
| 106 | +``` |
| 107 | + |
| 108 | +### Including Covariates |
| 109 | + |
| 110 | +```python |
| 111 | +results = did.fit( |
| 112 | + data, |
| 113 | + outcome='outcome', |
| 114 | + treatment='treated', |
| 115 | + time='post', |
| 116 | + covariates=['age', 'income', 'education'] |
| 117 | +) |
| 118 | +``` |
| 119 | + |
| 120 | +### Cluster-Robust Standard Errors |
| 121 | + |
| 122 | +```python |
| 123 | +did = DifferenceInDifferences(cluster='state') |
| 124 | +results = did.fit( |
| 125 | + data, |
| 126 | + outcome='outcome', |
| 127 | + treatment='treated', |
| 128 | + time='post' |
| 129 | +) |
| 130 | +``` |
| 131 | + |
| 132 | +### Two-Way Fixed Effects (Panel Data) |
| 133 | + |
| 134 | +```python |
| 135 | +from diff_diff.estimators import TwoWayFixedEffects |
| 136 | + |
| 137 | +twfe = TwoWayFixedEffects() |
| 138 | +results = twfe.fit( |
| 139 | + panel_data, |
| 140 | + outcome='outcome', |
| 141 | + treatment='treated', |
| 142 | + time='year', |
| 143 | + unit='firm_id' |
| 144 | +) |
| 145 | +``` |
| 146 | + |
| 147 | +## Working with Results |
| 148 | + |
| 149 | +### Export Results |
| 150 | + |
| 151 | +```python |
| 152 | +# As dictionary |
| 153 | +results.to_dict() |
| 154 | +# {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...} |
| 155 | + |
| 156 | +# As DataFrame |
| 157 | +df = results.to_dataframe() |
| 158 | +``` |
| 159 | + |
| 160 | +### Check Significance |
| 161 | + |
| 162 | +```python |
| 163 | +if results.is_significant: |
| 164 | + print(f"Effect is significant at {did.alpha} level") |
| 165 | + |
| 166 | +# Get significance stars |
| 167 | +print(f"ATT: {results.att}{results.significance_stars}") |
| 168 | +# ATT: 3.5000* |
| 169 | +``` |
| 170 | + |
| 171 | +### Access Full Regression Output |
| 172 | + |
| 173 | +```python |
| 174 | +# All coefficients |
| 175 | +results.coefficients |
| 176 | +# {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5} |
| 177 | + |
| 178 | +# Variance-covariance matrix |
| 179 | +results.vcov |
| 180 | + |
| 181 | +# Residuals and fitted values |
| 182 | +results.residuals |
| 183 | +results.fitted_values |
| 184 | + |
| 185 | +# R-squared |
| 186 | +results.r_squared |
| 187 | +``` |
| 188 | + |
| 189 | +## Checking Assumptions |
| 190 | + |
| 191 | +### Parallel Trends |
| 192 | + |
| 193 | +```python |
| 194 | +from diff_diff.utils import check_parallel_trends |
| 195 | + |
| 196 | +trends = check_parallel_trends( |
| 197 | + data, |
| 198 | + outcome='outcome', |
| 199 | + time='period', |
| 200 | + treatment_group='treated' |
| 201 | +) |
| 202 | + |
| 203 | +print(f"Treated trend: {trends['treated_trend']:.4f}") |
| 204 | +print(f"Control trend: {trends['control_trend']:.4f}") |
| 205 | +print(f"Difference p-value: {trends['p_value']:.4f}") |
| 206 | +print(f"Parallel trends plausible: {trends['parallel_trends_plausible']}") |
| 207 | +``` |
| 208 | + |
| 209 | +## API Reference |
| 210 | + |
| 211 | +### DifferenceInDifferences |
| 212 | + |
| 213 | +```python |
| 214 | +DifferenceInDifferences( |
| 215 | + robust=True, # Use HC1 robust standard errors |
| 216 | + cluster=None, # Column for cluster-robust SEs |
| 217 | + alpha=0.05 # Significance level for CIs |
| 218 | +) |
| 219 | +``` |
| 220 | + |
| 221 | +**Methods:** |
| 222 | + |
| 223 | +| Method | Description | |
| 224 | +|--------|-------------| |
| 225 | +| `fit(data, outcome, treatment, time, formula, covariates)` | Fit the DiD model | |
| 226 | +| `summary()` | Get formatted summary string | |
| 227 | +| `print_summary()` | Print summary to stdout | |
| 228 | +| `get_params()` | Get estimator parameters (sklearn-compatible) | |
| 229 | +| `set_params(**params)` | Set estimator parameters (sklearn-compatible) | |
| 230 | + |
| 231 | +### DiDResults |
| 232 | + |
| 233 | +**Attributes:** |
| 234 | + |
| 235 | +| Attribute | Description | |
| 236 | +|-----------|-------------| |
| 237 | +| `att` | Average Treatment effect on the Treated | |
| 238 | +| `se` | Standard error of ATT | |
| 239 | +| `t_stat` | T-statistic | |
| 240 | +| `p_value` | P-value for H0: ATT = 0 | |
| 241 | +| `conf_int` | Tuple of (lower, upper) confidence bounds | |
| 242 | +| `n_obs` | Number of observations | |
| 243 | +| `n_treated` | Number of treated units | |
| 244 | +| `n_control` | Number of control units | |
| 245 | +| `r_squared` | R-squared of regression | |
| 246 | +| `coefficients` | Dictionary of all coefficients | |
| 247 | +| `is_significant` | Boolean for significance at alpha | |
| 248 | +| `significance_stars` | String of significance stars | |
| 249 | + |
| 250 | +**Methods:** |
| 251 | + |
| 252 | +| Method | Description | |
| 253 | +|--------|-------------| |
| 254 | +| `summary(alpha)` | Get formatted summary string | |
| 255 | +| `print_summary(alpha)` | Print summary to stdout | |
| 256 | +| `to_dict()` | Convert to dictionary | |
| 257 | +| `to_dataframe()` | Convert to pandas DataFrame | |
| 258 | + |
| 259 | +## Requirements |
| 260 | + |
| 261 | +- Python >= 3.9 |
| 262 | +- numpy >= 1.20 |
| 263 | +- pandas >= 1.3 |
| 264 | +- scipy >= 1.7 |
| 265 | + |
| 266 | +## Development |
| 267 | + |
| 268 | +```bash |
| 269 | +# Install with dev dependencies |
| 270 | +pip install -e ".[dev]" |
| 271 | + |
| 272 | +# Run tests |
| 273 | +pytest |
| 274 | + |
| 275 | +# Format code |
| 276 | +black diff_diff tests |
| 277 | +ruff check diff_diff tests |
| 278 | +``` |
| 279 | + |
| 280 | +## License |
| 281 | + |
| 282 | +MIT License |
0 commit comments