igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 4 additions & 2 deletions b/‎CHANGELOG.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎METHODOLOGY_REVIEW.md‎
Lines changed: 27 additions & 24 deletions b/‎METHODOLOGY_REVIEW.md‎
Lines changed: 27 additions & 24 deletions
diff --git a/‎TODO.md‎
Lines changed: 2 additions & 7 deletions b/‎TODO.md‎
Lines changed: 2 additions & 7 deletions
diff --git a/‎benchmarks/R/generate_bacon_golden.R‎
Lines changed: 44 additions & 12 deletions b/‎benchmarks/R/generate_bacon_golden.R‎
Lines changed: 44 additions & 12 deletions
diff --git a/‎benchmarks/data/r_bacondecomp_golden.json‎
Lines changed: 211 additions & 0 deletions b/‎benchmarks/data/r_bacondecomp_golden.json‎
Lines changed: 211 additions & 0 deletions
diff --git a/‎diff_diff/__init__.py‎
Lines changed: 2 additions & 0 deletions b/‎diff_diff/__init__.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎diff_diff/bacon.py‎
Lines changed: 19 additions & 4 deletions b/‎diff_diff/bacon.py‎
Lines changed: 19 additions & 4 deletions
diff --git a/‎diff_diff/efficient_did.py‎
Lines changed: 8 additions & 4 deletions b/‎diff_diff/efficient_did.py‎
Lines changed: 8 additions & 4 deletions
diff --git a/‎diff_diff/prep.py‎
Lines changed: 1 addition & 0 deletions b/‎diff_diff/prep.py‎
Lines changed: 1 addition & 0 deletions
@@ -74,19 +74,17 @@ Deferred items from PR reviews that were not addressed before merge.
 
 | Issue | Location | PR | Priority |
 |-------|----------|----|----------|
-| BaconDecomposition R parity goldens: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time (2026-05-16). R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`, writing `benchmarks/data/r_bacondecomp_golden.json`. `tests/test_methodology_bacon.py::TestBaconParityR` (3 tests) skips with a pointer until the JSON lands. The PR-B audit substantiates Theorem 1 (Eqs. 7-9 + 10e-g) via hand-calculable + machine-precision identity tests; R parity is desirable as a cross-language anchor but not the only substantiation. Mirrors StaggeredTripleDifference precedent (PR #245). | `benchmarks/R/generate_bacon_golden.R`, `benchmarks/data/r_bacondecomp_golden.json` (TBD), `tests/test_methodology_bacon.py::TestBaconParityR` | follow-up | Medium |
 | dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
 | dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
 | dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
 | dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | follow-up | Medium |
 | CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
-| EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
-| TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
 | Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` | — | Low |
-| TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low |
+| TROP: extend Wave 4's `_setup_trop_data` helper to also cover the duplicated bootstrap resampling loop in `_bootstrap_variance` / `_bootstrap_variance_global` (~40 LoC dedup; mirrors the data-setup helper pattern with a `fit_callable` parameter for the per-draw refit step). | `trop_local.py`, `trop_global.py` | follow-up | Low |
+| TripleDifference power auto-routing: `power.simulate_power` ignores `n_periods` for DDD because `_ddd_dgp_kwargs` is hard-coded to the cross-sectional `generate_ddd_data`. Now that `generate_ddd_panel_data` exists (Wave 4), add a new `_EstimatorProfile` registry entry (or extend the existing one) to route to the panel DGP when `n_periods > 2`. | `power.py`, `prep_dgp.py` | follow-up | Low |
 | StaggeredTripleDifference R cross-validation: CSV fixtures not committed (gitignored); tests skip without local R + triplediff. Commit fixtures or generate deterministically. | `tests/test_methodology_staggered_triple_diff.py` | #245 | Medium |
 | StaggeredTripleDifference R parity: benchmark only tests no-covariate path (xformla=~1). Add covariate-adjusted scenarios and aggregation SE parity assertions. | `benchmarks/R/benchmark_staggered_triplediff.R` | #245 | Medium |
 | StaggeredTripleDifference: per-cohort group-effect SEs include WIF (conservative vs R's wif=NULL). Documented in REGISTRY. Could override mixin for exact R match. | `staggered_triple_diff.py` | #245 | Low |
@@ -170,9 +168,6 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 
 #### Tier A — Quick wins (≤1 day, ≤3 CI rounds expected)
 
-- EfficientDiD `control_group="last_cohort"` REGISTRY-vs-code alignment with `anticipation>0` (`efficient_did.py`, one design decision)
-- TripleDifference: add `generate_ddd_panel_data` for panel DDD power analysis (`prep_dgp.py`, `power.py`)
-- TROP: extract shared data-setup helper between `fit()` and `_fit_global()` (~150 LoC dedup; `trop.py`, `trop_global.py`, `trop_local.py`)
 - WooldridgeDiD: optional efficiency hint when method/outcome pairing is sub-optimal (NOT a canonical-link violation per W2023 Prop 3.1 — see Methodology/Correctness row for the corrected framing)
 
 (SyntheticDiD `placebo_effects` → `variance_effects` rename moved to Tier B — the user-facing field rename + one-release deprecation alias is too large for ≤1 day / ≤3 CI rounds.)
 
@@ -7,9 +7,21 @@
 #
 # The diff-diff BaconDecomposition implementation (`diff_diff/bacon.py`) with
 # the default ``weights="exact"`` is expected to match the values in this JSON
-# to atol=1e-6 on the per-component (treated, control, type) tuples, and to
-# match the TWFE coefficient to the same tolerance. The ``weights="approximate"``
-# path is a library-only optimization and is NOT covered by this parity harness.
+# at atol=1e-6 along a three-tier contract:
+#   (1) aggregate TWFE coefficient + weights-sum on all 3 fixtures;
+#   (2) direct per-component (treated, control, type) parity on the 2
+#       non-remap fixtures AND on the 6 timing-vs-timing rows of
+#       `always_treated_remapped`;
+#   (3) cohort-level fold-back parity for the U bucket on
+#       `always_treated_remapped` — Python's paper-footnote-11 remap folds
+#       R's separate `Later vs Always Treated` + `Treated vs Untreated`
+#       rows into a single `treated_vs_never` cell per cohort, so the
+#       aggregate is invariant per Theorem 1 but the per-component
+#       breakdown differs by convention. See REGISTRY notes:
+#       `**Note (R parity convention divergence on always-treated)**` and
+#       `**Deviation (first-period boundary extension on always-treated remap)**`.
+# The ``weights="approximate"`` path is a library-only optimization and is
+# NOT covered by this parity harness.
 #
 # Three fixtures:
 #   1. uniform_3groups_with_never_treated — 3 timing groups + never-treated U;
@@ -18,8 +30,8 @@
 #   2. two_groups_no_never_treated — 2 timing groups only; tests the
 #      timing-only decomposition where the s_{kU} terms drop.
 #   3. always_treated_remapped — 3 timing groups + 1 always-treated cohort
-#      (first_treat = 1). Validates that Python's warn+remap of t_i < 1 into
-#      U matches R bacondecomp's native behavior.
+#      (first_treat = 1). Validates the convention-divergent U-bucket
+#      fold-back on Python's warn+remap of always-treated units into U.
 #
 # Run:
 #   cd benchmarks/R && Rscript generate_bacon_golden.R
@@ -193,11 +205,21 @@ df2 <- build_panel(
 fixture_2 <- extract_bacon(df2, "two_groups_no_never_treated")
 
 cat("Building fixture 3: always_treated_remapped...\n")
-# 3 timing-cohorts + 5 always-treated units (first_treat = 1, i.e., treated
-# in every observable period) + 30 never-treated. R's bacondecomp natively
-# groups the first_treat=1 cohort with U (since they are treated throughout
-# every observable period and never serve as a within-window control), which
-# matches what diff-diff's warn+remap does in Python.
+# 3 timing-cohorts (3, 4, 5) + 5 always-treated units (first_treat = 1, i.e.,
+# treated in every observable period) + 25 never-treated. R's bacondecomp
+# keeps the first_treat=1 cohort as a *separate* timing cohort (not in U) and
+# emits a `Later vs Always Treated` comparison row for each later cohort
+# alongside the standard `Treated vs Untreated` row. Python's paper-footnote-11
+# convention remaps these units into the U bucket and folds R's two columns
+# of components into a single `treated_vs_never` cell per treated cohort.
+# The aggregate (TWFE coefficient + weights-sum) is invariant per Theorem 1,
+# but the per-component breakdown differs by convention — see REGISTRY
+# `**Note (R parity convention divergence on always-treated)**` and
+# `**Deviation (first-period boundary extension on always-treated remap)**`.
+# `tests/test_methodology_bacon.py::TestBaconParityR` carves out the U-bucket
+# rows for direct per-component parity (keeping the 6 timing-vs-timing rows
+# under direct parity) and asserts the U-bucket fold-back separately via
+# `test_always_treated_remapped_fold_back_matches_r` at atol=1e-6.
 df3 <- build_panel(
   n_units_per_cohort   = 25L,
   n_periods            = 6L,
@@ -220,8 +242,18 @@ out <- list(
     r_version            = R.version.string,
     description          = paste(
       "Goodman-Bacon (2021) decomposition parity goldens for diff-diff",
-      "BaconDecomposition. Parity target: atol=1e-6 on per-component",
-      "(treated, control, type) tuples plus the TWFE coefficient."
+      "BaconDecomposition. Parity target at atol=1e-6:",
+      "(1) aggregate TWFE coefficient + weights-sum across all 3 fixtures;",
+      "(2) direct per-component (treated, control, type) parity on the 2",
+      "non-remap fixtures AND on the 6 timing-vs-timing rows of",
+      "always_treated_remapped;",
+      "(3) cohort-level fold-back parity for the U bucket on",
+      "always_treated_remapped (Python's paper-footnote-11 remap folds",
+      "R's separate Later-vs-Always-Treated + Treated-vs-Untreated rows",
+      "into a single treated_vs_never cell per cohort; aggregate is",
+      "invariant per Theorem 1, breakdown differs by convention).",
+      "See REGISTRY Note (R parity convention divergence on always-treated)",
+      "+ Deviation (first-period boundary extension)."
     )
   ),
   uniform_3groups_with_never_treated = fixture_1,
 
@@ -125,6 +125,7 @@
     generate_continuous_did_data,
     generate_did_data,
     generate_ddd_data,
+    generate_ddd_panel_data,
     generate_event_study_data,
     generate_factor_data,
     generate_panel_data,
@@ -409,6 +410,7 @@
     "generate_staggered_data",
     "generate_factor_data",
     "generate_ddd_data",
+    "generate_ddd_panel_data",
     "generate_panel_data",
     "generate_event_study_data",
     "generate_staggered_ddd_data",
 
@@ -475,7 +475,15 @@ def fit(
             excluding the never-treated sentinels ``0`` and ``np.inf``)
             are automatically remapped to the ``U`` (untreated) bucket
             per Goodman-Bacon (2021) footnote 11, with a
-            ``UserWarning``. Detection uses ordered-time logic on the
+            ``UserWarning``. **Library boundary extension:** the paper
+            uses the strict inequality ``t_i < 1`` (units treated
+            *before* the first observable period); the library uses the
+            **inclusive** ``first_treat <= min(time)`` rule, additionally
+            folding units treated *at* the first observable period
+            (``first_treat == min(time)``) into ``U`` because such units
+            have no untreated cell in-panel. See REGISTRY's
+            ``**Deviation (first-period boundary extension on
+            always-treated remap)**`` block for the full contract. Detection uses ordered-time logic on the
             **time axis** so panels whose ``time`` column contains
             negative or zero-crossing labels (e.g. event-time
             ``time ∈ [-2,..,3]``) are handled correctly; the ``0``
@@ -1302,9 +1310,16 @@ def bacon_decompose(
     >>> from diff_diff import bacon_decompose
     >>>
     >>> # Default: paper-faithful Goodman-Bacon (2021) Theorem 1 weights
-    >>> # (weights="exact"); intended to match R bacondecomp::bacon() at
-    >>> # atol=1e-6 (R parity goldens pending — see TODO.md "R parity
-    >>> # goldens generation" for the deferred validation step).
+    >>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 on
+    >>> # the aggregate (TWFE coefficient + weights-sum) across all panels,
+    >>> # and on the per-component breakdown when there are no
+    >>> # always-treated / first-period-treated cohorts (i.e. all
+    >>> # non-sentinel first_treat values are strictly greater than
+    >>> # min(time)). For panels with always-treated units, the
+    >>> # per-component breakdown diverges by convention (Python remaps
+    >>> # to U per paper footnote 11; R emits `Later vs Always Treated`);
+    >>> # see REGISTRY note on R parity convention divergence. Validated
+    >>> # via tests/test_methodology_bacon.py::TestBaconParityR.
     >>> results = bacon_decompose(
     ...     data=panel_df,
     ...     outcome='earnings',
 
@@ -162,9 +162,11 @@ class EfficientDiD(EfficientDiDBootstrapMixin):
         Which units serve as the comparison group:
         ``"never_treated"`` requires a never-treated cohort (raises if
         none exist); ``"last_cohort"`` reclassifies the latest treatment
-        cohort as pseudo-never-treated and drops post-treatment periods
-        for that cohort.  Distinct from CallawaySantAnna's
-        ``"not_yet_treated"`` — see REGISTRY.md for details.
+        cohort as pseudo-never-treated and drops periods at
+        ``t >= last_g - anticipation`` so the pseudo-control's
+        pre-treatment window excludes anticipation-contaminated periods.
+        Distinct from CallawaySantAnna's ``"not_yet_treated"`` — see
+        REGISTRY.md for details.
     n_bootstrap : int, default 0
         Number of multiplier bootstrap iterations (0 = analytical only).
     bootstrap_weights : str, default ``"rademacher"``
@@ -173,7 +175,9 @@ class EfficientDiD(EfficientDiDBootstrapMixin):
         Random seed for reproducibility.
     anticipation : int, default 0
         Number of anticipation periods (shifts the effective treatment
-        boundary forward by this amount).
+        boundary forward by this amount). When combined with
+        ``control_group="last_cohort"``, also trims the pseudo-control
+        period set at ``t >= last_g - anticipation`` (see REGISTRY.md).
     sieve_k_max : int or None
         Maximum polynomial degree for sieve ratio estimation. None = auto
         (``min(floor(n_gp^{1/5}), 5)``). Only used with covariates.
 
@@ -19,6 +19,7 @@
 from diff_diff.prep_dgp import (  # noqa: F401
     generate_continuous_did_data,
     generate_ddd_data,
+    generate_ddd_panel_data,
     generate_did_data,
     generate_event_study_data,
     generate_factor_data,