Skip to content

Commit fbc0ec9

Browse files
committed
Add initial diff-diff library implementation
Implement difference-in-differences (DiD) library with: - DifferenceInDifferences estimator with sklearn-like API (fit, get_params, set_params) - DiDResults class with statsmodels-style output (summary tables, coefficients, p-values) - Support for formula interface (R-style) and column name interface - Heteroskedasticity-robust (HC1) and cluster-robust standard errors - TwoWayFixedEffects estimator for panel data - Utility functions for parallel trends testing - Comprehensive test suite (16 tests) - pyproject.toml for modern Python packaging
1 parent d1306c4 commit fbc0ec9

9 files changed

Lines changed: 1768 additions & 1 deletion

File tree

.gitignore

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# Distribution / packaging
7+
.Python
8+
build/
9+
develop-eggs/
10+
dist/
11+
downloads/
12+
eggs/
13+
.eggs/
14+
lib/
15+
lib64/
16+
parts/
17+
sdist/
18+
var/
19+
wheels/
20+
*.egg-info/
21+
.installed.cfg
22+
*.egg
23+
24+
# Testing
25+
.tox/
26+
.coverage
27+
.coverage.*
28+
htmlcov/
29+
.pytest_cache/
30+
.hypothesis/
31+
32+
# Environments
33+
.env
34+
.venv
35+
env/
36+
venv/
37+
ENV/
38+
39+
# IDE
40+
.idea/
41+
.vscode/
42+
*.swp
43+
*.swo
44+
*~
45+
46+
# Jupyter
47+
.ipynb_checkpoints/
48+
49+
# OS
50+
.DS_Store
51+
Thumbs.db

README.md

Lines changed: 280 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,282 @@
11
# diff-diff
22

3-
A library for computing difference in differences (diff-in-diff or DiD), a quasi-experimental method of statistical analysis.
3+
A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.
4+
5+
## Installation
6+
7+
```bash
8+
pip install diff-diff
9+
```
10+
11+
Or install from source:
12+
13+
```bash
14+
git clone https://github.com/igerber/diff-diff.git
15+
cd diff-diff
16+
pip install -e .
17+
```
18+
19+
## Quick Start
20+
21+
```python
22+
import pandas as pd
23+
from diff_diff import DifferenceInDifferences
24+
25+
# Create sample data
26+
data = pd.DataFrame({
27+
'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
28+
'treated': [1, 1, 1, 1, 0, 0, 0, 0],
29+
'post': [0, 0, 1, 1, 0, 0, 1, 1]
30+
})
31+
32+
# Fit the model
33+
did = DifferenceInDifferences()
34+
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
35+
36+
# View results
37+
print(results) # DiDResults(ATT=3.5000*, SE=1.2583, p=0.0367)
38+
results.print_summary()
39+
```
40+
41+
Output:
42+
```
43+
======================================================================
44+
Difference-in-Differences Estimation Results
45+
======================================================================
46+
47+
Observations: 8
48+
Treated units: 4
49+
Control units: 4
50+
R-squared: 0.9123
51+
52+
----------------------------------------------------------------------
53+
Parameter Estimate Std. Err. t-stat P>|t|
54+
----------------------------------------------------------------------
55+
ATT 3.5000 1.2583 2.782 0.0367
56+
----------------------------------------------------------------------
57+
58+
95% Confidence Interval: [0.3912, 6.6088]
59+
60+
Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
61+
======================================================================
62+
```
63+
64+
## Features
65+
66+
- **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()`
67+
- **Pythonic results**: Easy access to coefficients, standard errors, and confidence intervals
68+
- **Multiple interfaces**: Column names or R-style formulas
69+
- **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
70+
- **Panel data support**: Two-way fixed effects estimator for panel designs
71+
72+
## Usage
73+
74+
### Basic DiD with Column Names
75+
76+
```python
77+
from diff_diff import DifferenceInDifferences
78+
79+
did = DifferenceInDifferences(robust=True, alpha=0.05)
80+
results = did.fit(
81+
data,
82+
outcome='sales',
83+
treatment='treated',
84+
time='post_policy'
85+
)
86+
87+
# Access results
88+
print(f"ATT: {results.att:.4f}")
89+
print(f"Standard Error: {results.se:.4f}")
90+
print(f"P-value: {results.p_value:.4f}")
91+
print(f"95% CI: {results.conf_int}")
92+
print(f"Significant: {results.is_significant}")
93+
```
94+
95+
### Using Formula Interface
96+
97+
```python
98+
# R-style formula syntax
99+
results = did.fit(data, formula='outcome ~ treated * post')
100+
101+
# Explicit interaction syntax
102+
results = did.fit(data, formula='outcome ~ treated + post + treated:post')
103+
104+
# With covariates
105+
results = did.fit(data, formula='outcome ~ treated * post + age + income')
106+
```
107+
108+
### Including Covariates
109+
110+
```python
111+
results = did.fit(
112+
data,
113+
outcome='outcome',
114+
treatment='treated',
115+
time='post',
116+
covariates=['age', 'income', 'education']
117+
)
118+
```
119+
120+
### Cluster-Robust Standard Errors
121+
122+
```python
123+
did = DifferenceInDifferences(cluster='state')
124+
results = did.fit(
125+
data,
126+
outcome='outcome',
127+
treatment='treated',
128+
time='post'
129+
)
130+
```
131+
132+
### Two-Way Fixed Effects (Panel Data)
133+
134+
```python
135+
from diff_diff.estimators import TwoWayFixedEffects
136+
137+
twfe = TwoWayFixedEffects()
138+
results = twfe.fit(
139+
panel_data,
140+
outcome='outcome',
141+
treatment='treated',
142+
time='year',
143+
unit='firm_id'
144+
)
145+
```
146+
147+
## Working with Results
148+
149+
### Export Results
150+
151+
```python
152+
# As dictionary
153+
results.to_dict()
154+
# {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...}
155+
156+
# As DataFrame
157+
df = results.to_dataframe()
158+
```
159+
160+
### Check Significance
161+
162+
```python
163+
if results.is_significant:
164+
print(f"Effect is significant at {did.alpha} level")
165+
166+
# Get significance stars
167+
print(f"ATT: {results.att}{results.significance_stars}")
168+
# ATT: 3.5000*
169+
```
170+
171+
### Access Full Regression Output
172+
173+
```python
174+
# All coefficients
175+
results.coefficients
176+
# {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5}
177+
178+
# Variance-covariance matrix
179+
results.vcov
180+
181+
# Residuals and fitted values
182+
results.residuals
183+
results.fitted_values
184+
185+
# R-squared
186+
results.r_squared
187+
```
188+
189+
## Checking Assumptions
190+
191+
### Parallel Trends
192+
193+
```python
194+
from diff_diff.utils import check_parallel_trends
195+
196+
trends = check_parallel_trends(
197+
data,
198+
outcome='outcome',
199+
time='period',
200+
treatment_group='treated'
201+
)
202+
203+
print(f"Treated trend: {trends['treated_trend']:.4f}")
204+
print(f"Control trend: {trends['control_trend']:.4f}")
205+
print(f"Difference p-value: {trends['p_value']:.4f}")
206+
print(f"Parallel trends plausible: {trends['parallel_trends_plausible']}")
207+
```
208+
209+
## API Reference
210+
211+
### DifferenceInDifferences
212+
213+
```python
214+
DifferenceInDifferences(
215+
robust=True, # Use HC1 robust standard errors
216+
cluster=None, # Column for cluster-robust SEs
217+
alpha=0.05 # Significance level for CIs
218+
)
219+
```
220+
221+
**Methods:**
222+
223+
| Method | Description |
224+
|--------|-------------|
225+
| `fit(data, outcome, treatment, time, formula, covariates)` | Fit the DiD model |
226+
| `summary()` | Get formatted summary string |
227+
| `print_summary()` | Print summary to stdout |
228+
| `get_params()` | Get estimator parameters (sklearn-compatible) |
229+
| `set_params(**params)` | Set estimator parameters (sklearn-compatible) |
230+
231+
### DiDResults
232+
233+
**Attributes:**
234+
235+
| Attribute | Description |
236+
|-----------|-------------|
237+
| `att` | Average Treatment effect on the Treated |
238+
| `se` | Standard error of ATT |
239+
| `t_stat` | T-statistic |
240+
| `p_value` | P-value for H0: ATT = 0 |
241+
| `conf_int` | Tuple of (lower, upper) confidence bounds |
242+
| `n_obs` | Number of observations |
243+
| `n_treated` | Number of treated units |
244+
| `n_control` | Number of control units |
245+
| `r_squared` | R-squared of regression |
246+
| `coefficients` | Dictionary of all coefficients |
247+
| `is_significant` | Boolean for significance at alpha |
248+
| `significance_stars` | String of significance stars |
249+
250+
**Methods:**
251+
252+
| Method | Description |
253+
|--------|-------------|
254+
| `summary(alpha)` | Get formatted summary string |
255+
| `print_summary(alpha)` | Print summary to stdout |
256+
| `to_dict()` | Convert to dictionary |
257+
| `to_dataframe()` | Convert to pandas DataFrame |
258+
259+
## Requirements
260+
261+
- Python >= 3.9
262+
- numpy >= 1.20
263+
- pandas >= 1.3
264+
- scipy >= 1.7
265+
266+
## Development
267+
268+
```bash
269+
# Install with dev dependencies
270+
pip install -e ".[dev]"
271+
272+
# Run tests
273+
pytest
274+
275+
# Format code
276+
black diff_diff tests
277+
ruff check diff_diff tests
278+
```
279+
280+
## License
281+
282+
MIT License

diff_diff/__init__.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
"""
2+
diff-diff: A library for Difference-in-Differences analysis.
3+
4+
This library provides sklearn-like estimators for causal inference
5+
using the difference-in-differences methodology.
6+
"""
7+
8+
from diff_diff.estimators import DifferenceInDifferences
9+
from diff_diff.results import DiDResults
10+
11+
__version__ = "0.1.0"
12+
__all__ = ["DifferenceInDifferences", "DiDResults"]

0 commit comments

Comments
 (0)