Skip to content

Commit d7043f0

Browse files
committed
Merge main into spillover-conley-wave-c-event-study to resolve [Unreleased] CHANGELOG conflict (PR #457 BaconDecomposition R parity goldens)
# Conflicts: # CHANGELOG.md
2 parents eb35ccf + 25d5ed4 commit d7043f0

19 files changed

Lines changed: 1303 additions & 310 deletions

CHANGELOG.md

Lines changed: 4 additions & 2 deletions
Large diffs are not rendered by default.

METHODOLOGY_REVIEW.md

Lines changed: 27 additions & 24 deletions
Large diffs are not rendered by default.

TODO.md

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -74,19 +74,17 @@ Deferred items from PR reviews that were not addressed before merge.
7474

7575
| Issue | Location | PR | Priority |
7676
|-------|----------|----|----------|
77-
| BaconDecomposition R parity goldens: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time (2026-05-16). R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`, writing `benchmarks/data/r_bacondecomp_golden.json`. `tests/test_methodology_bacon.py::TestBaconParityR` (3 tests) skips with a pointer until the JSON lands. The PR-B audit substantiates Theorem 1 (Eqs. 7-9 + 10e-g) via hand-calculable + machine-precision identity tests; R parity is desirable as a cross-language anchor but not the only substantiation. Mirrors StaggeredTripleDifference precedent (PR #245). | `benchmarks/R/generate_bacon_golden.R`, `benchmarks/data/r_bacondecomp_golden.json` (TBD), `tests/test_methodology_bacon.py::TestBaconParityR` | follow-up | Medium |
7877
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
7978
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
8079
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
8180
| dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | follow-up | Medium |
8281
| CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
8382
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
8483
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
85-
| EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
86-
| TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
8784
| Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
8885
| Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega*`_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` || Low |
89-
| TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` || Low |
86+
| TROP: extend Wave 4's `_setup_trop_data` helper to also cover the duplicated bootstrap resampling loop in `_bootstrap_variance` / `_bootstrap_variance_global` (~40 LoC dedup; mirrors the data-setup helper pattern with a `fit_callable` parameter for the per-draw refit step). | `trop_local.py`, `trop_global.py` | follow-up | Low |
87+
| TripleDifference power auto-routing: `power.simulate_power` ignores `n_periods` for DDD because `_ddd_dgp_kwargs` is hard-coded to the cross-sectional `generate_ddd_data`. Now that `generate_ddd_panel_data` exists (Wave 4), add a new `_EstimatorProfile` registry entry (or extend the existing one) to route to the panel DGP when `n_periods > 2`. | `power.py`, `prep_dgp.py` | follow-up | Low |
9088
| StaggeredTripleDifference R cross-validation: CSV fixtures not committed (gitignored); tests skip without local R + triplediff. Commit fixtures or generate deterministically. | `tests/test_methodology_staggered_triple_diff.py` | #245 | Medium |
9189
| StaggeredTripleDifference R parity: benchmark only tests no-covariate path (xformla=~1). Add covariate-adjusted scenarios and aggregation SE parity assertions. | `benchmarks/R/benchmark_staggered_triplediff.R` | #245 | Medium |
9290
| StaggeredTripleDifference: per-cohort group-effect SEs include WIF (conservative vs R's wif=NULL). Documented in REGISTRY. Could override mixin for exact R match. | `staggered_triple_diff.py` | #245 | Low |
@@ -170,9 +168,6 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
170168

171169
#### Tier A — Quick wins (≤1 day, ≤3 CI rounds expected)
172170

173-
- EfficientDiD `control_group="last_cohort"` REGISTRY-vs-code alignment with `anticipation>0` (`efficient_did.py`, one design decision)
174-
- TripleDifference: add `generate_ddd_panel_data` for panel DDD power analysis (`prep_dgp.py`, `power.py`)
175-
- TROP: extract shared data-setup helper between `fit()` and `_fit_global()` (~150 LoC dedup; `trop.py`, `trop_global.py`, `trop_local.py`)
176171
- WooldridgeDiD: optional efficiency hint when method/outcome pairing is sub-optimal (NOT a canonical-link violation per W2023 Prop 3.1 — see Methodology/Correctness row for the corrected framing)
177172

178173
(SyntheticDiD `placebo_effects``variance_effects` rename moved to Tier B — the user-facing field rename + one-release deprecation alias is too large for ≤1 day / ≤3 CI rounds.)

benchmarks/R/generate_bacon_golden.R

Lines changed: 44 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,21 @@
77
#
88
# The diff-diff BaconDecomposition implementation (`diff_diff/bacon.py`) with
99
# the default ``weights="exact"`` is expected to match the values in this JSON
10-
# to atol=1e-6 on the per-component (treated, control, type) tuples, and to
11-
# match the TWFE coefficient to the same tolerance. The ``weights="approximate"``
12-
# path is a library-only optimization and is NOT covered by this parity harness.
10+
# at atol=1e-6 along a three-tier contract:
11+
# (1) aggregate TWFE coefficient + weights-sum on all 3 fixtures;
12+
# (2) direct per-component (treated, control, type) parity on the 2
13+
# non-remap fixtures AND on the 6 timing-vs-timing rows of
14+
# `always_treated_remapped`;
15+
# (3) cohort-level fold-back parity for the U bucket on
16+
# `always_treated_remapped` — Python's paper-footnote-11 remap folds
17+
# R's separate `Later vs Always Treated` + `Treated vs Untreated`
18+
# rows into a single `treated_vs_never` cell per cohort, so the
19+
# aggregate is invariant per Theorem 1 but the per-component
20+
# breakdown differs by convention. See REGISTRY notes:
21+
# `**Note (R parity convention divergence on always-treated)**` and
22+
# `**Deviation (first-period boundary extension on always-treated remap)**`.
23+
# The ``weights="approximate"`` path is a library-only optimization and is
24+
# NOT covered by this parity harness.
1325
#
1426
# Three fixtures:
1527
# 1. uniform_3groups_with_never_treated — 3 timing groups + never-treated U;
@@ -18,8 +30,8 @@
1830
# 2. two_groups_no_never_treated — 2 timing groups only; tests the
1931
# timing-only decomposition where the s_{kU} terms drop.
2032
# 3. always_treated_remapped — 3 timing groups + 1 always-treated cohort
21-
# (first_treat = 1). Validates that Python's warn+remap of t_i < 1 into
22-
# U matches R bacondecomp's native behavior.
33+
# (first_treat = 1). Validates the convention-divergent U-bucket
34+
# fold-back on Python's warn+remap of always-treated units into U.
2335
#
2436
# Run:
2537
# cd benchmarks/R && Rscript generate_bacon_golden.R
@@ -193,11 +205,21 @@ df2 <- build_panel(
193205
fixture_2 <- extract_bacon(df2, "two_groups_no_never_treated")
194206

195207
cat("Building fixture 3: always_treated_remapped...\n")
196-
# 3 timing-cohorts + 5 always-treated units (first_treat = 1, i.e., treated
197-
# in every observable period) + 30 never-treated. R's bacondecomp natively
198-
# groups the first_treat=1 cohort with U (since they are treated throughout
199-
# every observable period and never serve as a within-window control), which
200-
# matches what diff-diff's warn+remap does in Python.
208+
# 3 timing-cohorts (3, 4, 5) + 5 always-treated units (first_treat = 1, i.e.,
209+
# treated in every observable period) + 25 never-treated. R's bacondecomp
210+
# keeps the first_treat=1 cohort as a *separate* timing cohort (not in U) and
211+
# emits a `Later vs Always Treated` comparison row for each later cohort
212+
# alongside the standard `Treated vs Untreated` row. Python's paper-footnote-11
213+
# convention remaps these units into the U bucket and folds R's two columns
214+
# of components into a single `treated_vs_never` cell per treated cohort.
215+
# The aggregate (TWFE coefficient + weights-sum) is invariant per Theorem 1,
216+
# but the per-component breakdown differs by convention — see REGISTRY
217+
# `**Note (R parity convention divergence on always-treated)**` and
218+
# `**Deviation (first-period boundary extension on always-treated remap)**`.
219+
# `tests/test_methodology_bacon.py::TestBaconParityR` carves out the U-bucket
220+
# rows for direct per-component parity (keeping the 6 timing-vs-timing rows
221+
# under direct parity) and asserts the U-bucket fold-back separately via
222+
# `test_always_treated_remapped_fold_back_matches_r` at atol=1e-6.
201223
df3 <- build_panel(
202224
n_units_per_cohort = 25L,
203225
n_periods = 6L,
@@ -220,8 +242,18 @@ out <- list(
220242
r_version = R.version.string,
221243
description = paste(
222244
"Goodman-Bacon (2021) decomposition parity goldens for diff-diff",
223-
"BaconDecomposition. Parity target: atol=1e-6 on per-component",
224-
"(treated, control, type) tuples plus the TWFE coefficient."
245+
"BaconDecomposition. Parity target at atol=1e-6:",
246+
"(1) aggregate TWFE coefficient + weights-sum across all 3 fixtures;",
247+
"(2) direct per-component (treated, control, type) parity on the 2",
248+
"non-remap fixtures AND on the 6 timing-vs-timing rows of",
249+
"always_treated_remapped;",
250+
"(3) cohort-level fold-back parity for the U bucket on",
251+
"always_treated_remapped (Python's paper-footnote-11 remap folds",
252+
"R's separate Later-vs-Always-Treated + Treated-vs-Untreated rows",
253+
"into a single treated_vs_never cell per cohort; aggregate is",
254+
"invariant per Theorem 1, breakdown differs by convention).",
255+
"See REGISTRY Note (R parity convention divergence on always-treated)",
256+
"+ Deviation (first-period boundary extension)."
225257
)
226258
),
227259
uniform_3groups_with_never_treated = fixture_1,

benchmarks/data/r_bacondecomp_golden.json

Lines changed: 211 additions & 0 deletions
Large diffs are not rendered by default.

diff_diff/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@
125125
generate_continuous_did_data,
126126
generate_did_data,
127127
generate_ddd_data,
128+
generate_ddd_panel_data,
128129
generate_event_study_data,
129130
generate_factor_data,
130131
generate_panel_data,
@@ -409,6 +410,7 @@
409410
"generate_staggered_data",
410411
"generate_factor_data",
411412
"generate_ddd_data",
413+
"generate_ddd_panel_data",
412414
"generate_panel_data",
413415
"generate_event_study_data",
414416
"generate_staggered_ddd_data",

diff_diff/bacon.py

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -475,7 +475,15 @@ def fit(
475475
excluding the never-treated sentinels ``0`` and ``np.inf``)
476476
are automatically remapped to the ``U`` (untreated) bucket
477477
per Goodman-Bacon (2021) footnote 11, with a
478-
``UserWarning``. Detection uses ordered-time logic on the
478+
``UserWarning``. **Library boundary extension:** the paper
479+
uses the strict inequality ``t_i < 1`` (units treated
480+
*before* the first observable period); the library uses the
481+
**inclusive** ``first_treat <= min(time)`` rule, additionally
482+
folding units treated *at* the first observable period
483+
(``first_treat == min(time)``) into ``U`` because such units
484+
have no untreated cell in-panel. See REGISTRY's
485+
``**Deviation (first-period boundary extension on
486+
always-treated remap)**`` block for the full contract. Detection uses ordered-time logic on the
479487
**time axis** so panels whose ``time`` column contains
480488
negative or zero-crossing labels (e.g. event-time
481489
``time ∈ [-2,..,3]``) are handled correctly; the ``0``
@@ -1302,9 +1310,16 @@ def bacon_decompose(
13021310
>>> from diff_diff import bacon_decompose
13031311
>>>
13041312
>>> # Default: paper-faithful Goodman-Bacon (2021) Theorem 1 weights
1305-
>>> # (weights="exact"); intended to match R bacondecomp::bacon() at
1306-
>>> # atol=1e-6 (R parity goldens pending — see TODO.md "R parity
1307-
>>> # goldens generation" for the deferred validation step).
1313+
>>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 on
1314+
>>> # the aggregate (TWFE coefficient + weights-sum) across all panels,
1315+
>>> # and on the per-component breakdown when there are no
1316+
>>> # always-treated / first-period-treated cohorts (i.e. all
1317+
>>> # non-sentinel first_treat values are strictly greater than
1318+
>>> # min(time)). For panels with always-treated units, the
1319+
>>> # per-component breakdown diverges by convention (Python remaps
1320+
>>> # to U per paper footnote 11; R emits `Later vs Always Treated`);
1321+
>>> # see REGISTRY note on R parity convention divergence. Validated
1322+
>>> # via tests/test_methodology_bacon.py::TestBaconParityR.
13081323
>>> results = bacon_decompose(
13091324
... data=panel_df,
13101325
... outcome='earnings',

diff_diff/efficient_did.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -162,9 +162,11 @@ class EfficientDiD(EfficientDiDBootstrapMixin):
162162
Which units serve as the comparison group:
163163
``"never_treated"`` requires a never-treated cohort (raises if
164164
none exist); ``"last_cohort"`` reclassifies the latest treatment
165-
cohort as pseudo-never-treated and drops post-treatment periods
166-
for that cohort. Distinct from CallawaySantAnna's
167-
``"not_yet_treated"`` — see REGISTRY.md for details.
165+
cohort as pseudo-never-treated and drops periods at
166+
``t >= last_g - anticipation`` so the pseudo-control's
167+
pre-treatment window excludes anticipation-contaminated periods.
168+
Distinct from CallawaySantAnna's ``"not_yet_treated"`` — see
169+
REGISTRY.md for details.
168170
n_bootstrap : int, default 0
169171
Number of multiplier bootstrap iterations (0 = analytical only).
170172
bootstrap_weights : str, default ``"rademacher"``
@@ -173,7 +175,9 @@ class EfficientDiD(EfficientDiDBootstrapMixin):
173175
Random seed for reproducibility.
174176
anticipation : int, default 0
175177
Number of anticipation periods (shifts the effective treatment
176-
boundary forward by this amount).
178+
boundary forward by this amount). When combined with
179+
``control_group="last_cohort"``, also trims the pseudo-control
180+
period set at ``t >= last_g - anticipation`` (see REGISTRY.md).
177181
sieve_k_max : int or None
178182
Maximum polynomial degree for sieve ratio estimation. None = auto
179183
(``min(floor(n_gp^{1/5}), 5)``). Only used with covariates.

diff_diff/prep.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from diff_diff.prep_dgp import ( # noqa: F401
2020
generate_continuous_did_data,
2121
generate_ddd_data,
22+
generate_ddd_panel_data,
2223
generate_did_data,
2324
generate_event_study_data,
2425
generate_factor_data,

0 commit comments

Comments
 (0)