You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
13
13
- **WooldridgeDiD `weights="cohort_share"` on `aggregate()` (paper W2025 Eq. 7.4 / Eq. 7.6).** `WooldridgeDiDResults.aggregate(type, weights="cell" | "cohort_share")` exposes the paper's cohort-share aggregation as an opt-in alternative to the default cell-count weighting (which matches Stata `jwdid_estat`). Under `weights="cohort_share"`, per-cell weights are `∝ N_g` (per-cohort unit count): for `type="simple"` (paper Eq. 7.4) the simple-overall ATT normalizes across all post-treatment cells; for `type="event"` (paper Eq. 7.6 cohort-share-by-exposure) the normalization is per-event-time across cohorts present at event-time `e`. `type="group"` and `type="calendar"` raise `ValueError` under cohort_share (no paper closed-form). The Bell-McCaffrey contrast DOF (`vcov_type="hc2_bm"`) is rebuilt under the active weighting scheme so SE + DOF reflect the actual aggregation. On balanced panels with uniform within-cohort cell counts the two schemes coincide (paper Section 7.5 footnote). New `_n_g_per_cohort` field on `WooldridgeDiDResults` carries the per-cohort unit counts; populated in all three fit paths (OLS / logit / Poisson). Closes TODO row 95.
14
14
- **WooldridgeDiD `cohort_trends=True` for paper W2025 Section 8 / Eq. 8.1 heterogeneous cohort-specific linear trends.** `WooldridgeDiD(cohort_trends=True)` adds linear `dg_i · t` interactions for each treated cohort to the design matrix. Under the heterogeneous-trends DGP `y = c_i + α_t + δ_g · t + τ · w_{it} + u_{it}`, the parameter recovers `τ` even when parallel trends fails (paper Section 8.3). **OLS-path only:** `cohort_trends=True` + `method ∈ {"logit","poisson"}` raises `NotImplementedError` at `__init__` per paper Section 8's OLS scope; the error message cites the paper section explicitly. **Auto-routes to full-dummy mode** regardless of `vcov_type` (matching the absorb→fixed_effects auto-route pattern at `feedback_absorb_to_fixed_effects_auto_route`): composing `dg_i · t` with the within-transformation yields `(dg_i − mean(dg_i)) · (t − mean(t))` which is algebraically correct but non-trivial to verify on every panel shape; the full-dummy auto-route keeps math closure verified on the same paths already locked by PR #483's HC2 / HC2-BM / classical R-parity goldens. New `cohort_trend_coefs: Dict[g → δ_g]` attribute on `WooldridgeDiDResults` (empty dict under default `cohort_trends=False`). Closes the PR-A Requirements Checklist heterogeneous-trends gap (item 11 in `docs/methodology/papers/wooldridge-2025-review.md`).
15
15
-**WooldridgeDiD R-parity goldens for `etwfe(family="poisson")` + `etwfe(family="logit")`.**`benchmarks/R/generate_wooldridge_golden.R` extended to fit R `etwfe` on Poisson + logit DGPs and persist log-link coefficient + SE goldens to `benchmarks/data/wooldridge_golden.json` (Poisson + logit blocks alongside the existing OLS vcov_type blocks). `benchmarks/R/requirements.R` pins `etwfe >= 0.5.0`. The R goldens cover diff-diff's nonlinear surfaces only at the surface level (fit completes + log-link goldens present + structured correctly); numerical cell-level R-parity between diff-diff's response-scale ATT (paper W2023 ASF / APE) and R `etwfe`'s log-link cell coefficient is deferred — requires either `emfx()`-based APE extraction on the R side or link-function inversion with baseline-mean adjustment (new TODO row added).
16
-
- **`tests/test_methodology_wooldridge.py` extended with 6 paper-equation-numbered methodology classes + 1 library-deviations class.** Adds 48 new tests (60 total in the file, including the existing 12 vcov_type R-parity tests from PR #483) covering Theorem 3.1 Mundlak ≡ TWFE equivalence, Proposition 5.1 imputation ≡ POLS, Section 6 event study, Section 7 aggregation paths (paper Eqs. 7.4 / 7.6 hand-calc), Section 8 heterogeneous trends, Section 10 unbalanced panels + time-varying covariates, plus a `TestW2025LibraryDeviations` class consolidating 5 surviving deviations (HC1 finite-sample factor, QMLE sandwich `(n-1)/(n-k)`, nonlinear-vs-fixest, logit cohort+time dummies, anticipation + aggregation). Per-class seed decorrelation via `_BASE_SEED_*` module constants mirrors the HAD precedent at `tests/test_methodology_had.py:78-83`. New DGP helpers (`_make_two_cohort_three_period_panel`, `_make_three_cohort_four_period_panel`, `_make_heterogeneous_trends_panel`, `_make_unbalanced_panel`) reusable across the methodology classes. Two new surface-only R-parity classes (`TestWooldridgeParityRPoisson`, `TestWooldridgeParityRLogit`) lock the Poisson + logit goldens at the structural level.
16
+
- **`tests/test_methodology_wooldridge.py` extended with 6 paper-equation-numbered methodology classes + 1 library-deviations class.** Net ~70 new tests across 10 classes (joining the existing 12 vcov_type R-parity tests from PR #483) covering Theorem 3.1 Mundlak ≡ TWFE equivalence, Proposition 5.1 imputation ≡ POLS, Section 6 event study, Section 7 aggregation paths (paper Eqs. 7.4 / 7.6 hand-calc + survey/bootstrap/never-treated rejections), Section 8 heterogeneous trends (per-cohort identification + all-treated last-cohort drop + survey/never-treated cross-product rejections + reporting metadata), Section 10 unbalanced panels + time-varying covariates, plus a `TestW2025LibraryDeviations` class consolidating 5 surviving deviations (HC1 finite-sample factor, QMLE sandwich `(n-1)/(n-k)`, nonlinear-vs-fixest, logit cohort+time dummies, anticipation + aggregation). Per-class seed decorrelation via `_BASE_SEED_*` module constants mirrors the HAD precedent at `tests/test_methodology_had.py:78-83`. New DGP helpers (`_make_two_cohort_three_period_panel`, `_make_three_cohort_four_period_panel`, `_make_heterogeneous_trends_panel`, `_make_unbalanced_panel`) reusable across the methodology classes. Two new surface-only R-parity classes (`TestWooldridgeParityRPoisson`, `TestWooldridgeParityRLogit`) lock the Poisson + logit goldens at the structural level.
17
17
- **WooldridgeDiD `vcov_type` parameter, OLS path (Phase 1b PR 3/8).** `WooldridgeDiD(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` on `method="ols"` (defaults to `"hc1"`, preserves prior behavior at machine precision — the WLS-CR1 sandwich is algebraically invariant between the prior within-transform path and the new branched path, differing only by float64 multiplication ordering at sub-ULP scale; the full 106-test `tests/test_wooldridge.py` baseline still passes unchanged). `hc2_bm` auto-routes to a full-dummy saturated design (`[intercept, X_design, unit_dummies, time_dummies]`) + clubSandwich WLS-CR2 algebra (PR #475) — matches `clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt` at `atol=1e-10` on the new `benchmarks/data/wooldridge_golden.json` fixture. `classical`/`hc2` supported via full-dummy + auto-drop of the unit auto-cluster (one-way families); explicit `cluster="X"` + one-way family raises at the linalg validator. Per-cell + aggregate p-values/CIs on `classical`/`hc2` paths use the residual DOF `n - rank(X)` (matches R `lm()` / `coef_test()` t-distribution), not normal-theory. **Bell-McCaffrey Satterthwaite DOF is threaded across ALL hc2_bm user-facing inference surfaces**: (1) per-cell `group_time_effects[(g, t)]` use `coef_test()$df_Satt` (matches R at atol=1e-6 from CI inversion); (2) overall ATT uses the post-period-aggregation contrast DOF from `_compute_cr2_bm_contrast_dof` (matches R `Wald_test(test="HTZ")$df_denom` at atol=1e-10); (3) `.aggregate("group" | "calendar" | "event")` recomputes contrast-specific BM DOFs lazily from BM artifacts stored on the Results object — the REDUCED kept-column design (`X_red`), cluster_ids, reduced bread matrix, and reduced-space coef-index map (using the reduced kept-column design after rank-deficient drops keeps the bread non-singular and matches the subspace `solve_ols` actually estimated in). Fail-closed (all-NaN inference) when BM DOF unavailable, mirrors PR #475 R7 and PR #479 R3. `method ∈ {"logit","poisson"}` + `vcov_type != "hc1"` raises `NotImplementedError` at `__init__` (GLM CR2-BM-on-pseudo-residuals composition needs derivation; deferred to follow-up TODO row). `SurveyDesign` + `vcov_type != "hc1"` raises `NotImplementedError` at `fit()` (survey TSL overrides analytical sandwich). `n_bootstrap > 0` + one-way (`hc2`/`classical`) raises at `fit()` regardless of `cluster=` setting (multiplier bootstrap is intrinsically clustered, but one-way vcov_type does not compose with cluster_ids — either the auto-cluster is dropped when `cluster=None` leaving the bootstrap with no cluster to draw at, or the linalg validator rejects one-way + cluster_ids when `cluster=X`). `conley` rejected at `__init__` with a deferral pointer. `vcov_type`, `cluster_name`, `n_clusters` added to `WooldridgeDiDResults` for downstream introspection (per `feedback_results_vcov_label_cluster_metadata`). Third PR of the Phase 1b standalone-estimator threading initiative (5 PRs to follow: CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD).
18
18
- **`SpilloverDiD(survey_design=SurveyDesign.subpopulation(...))` full-design retention via zero-pad scores (Wave E.3).** Closes the Wave E.1/E.2/follow-up documented limitation at `REGISTRY.md:3249`: `SurveyDesign.subpopulation()`-derived designs AND warn-and-drop fits now preserve the full-domain resolved survey design — `n_psu` / `n_strata` / `df_survey` / Binder TSL per-stratum centering reflect the FULL domain rather than the post-`finite_mask` fit sample. **Documented synthesis (library-convention adoption, NOT new methodology):** Wave E.3 adopts the canonical "zero-pad scores to full panel + retain full-design resolved survey" pattern from R `survey::svyrecvar(subset())` (Lumley 2010 §2.5) already established in `diff_diff/imputation.py:2175-2183` (PreTrendsImputation lead regression — Omega_0 scores zero-padded back to full panel length) and `diff_diff/prep.py:1401-1432` (DCDH cell variance — IF zero-padded outside the cell). Wave E.3 propagates the same convention to SpilloverDiD's Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up stratified-Conley + serial Bartlett meat. **Mechanical realization (one new `_compute_gmm_corrected_meat` kwarg):** the gamma_hat / Psi build stays on SURVEY-FINITE-MASK inputs (`X_1_sparse_fit`, `X_10_sparse_fit`, `eps_10_fit` built on `survey_finite_mask = finite_mask & survey_weights > 0`; `X_2_kept_gamma`, `eps_2_fit_gamma`, `survey_weights_fit_gamma` projected from the fit-sample frame down to survey_finite_mask) so the drop-first stage-1 FE column space is bit-identical to the pre-E.3 path. `_compute_gmm_corrected_meat` gains a new optional kwarg `score_pad_mask: Optional[np.ndarray] = None`: when supplied, the helper zero-pads the fit-sample `Psi` to full panel length AFTER construction but BEFORE kernel dispatch via `Psi_padded[score_pad_mask] = Psi`. Kernel-dispatch arrays (`cluster_ids`, `conley_coords`, `conley_time`, `conley_unit`, `resolved_survey`) are passed at FULL length so the meat helpers (Binder TSL / stratified-Conley / serial Bartlett) see the full-domain PSU / strata / centroid / time geometry. The `_validate_conley_kwargs` call inside the helper reads `n_for_conley = len(score_pad_mask)` when the kwarg is set so the Conley shape checks see the full-length geometry. **`gamma_hat` invariance:** the gamma_hat solve operates on fit-sample inputs throughout — bit-identical to the pre-E.3 path (critical for the case where `_build_butts_fe_design_csr`'s `pd.factorize` re-compaction would drop a different unit's column under a full-length FE build than under a fit-length one). **Bread invariance:** `A_22 = X_2_kept' W X_2_kept` at `spillover.py:3187-3214` still uses fit-length `X_2_kept` because `A_22_full = X_2_full' W_full X_2_full` equals `A_22_kept` when zero-weight rows contribute zero. **A2 invariant:** warn-and-drop and `SurveyDesign.subpopulation()` drops are treated identically — both apply the zero-pad mechanism. The "both mechanisms compose cleanly" case (subpop-excluded row that is ALSO warn-and-dropped) produces `Psi = 0` from either cause; the PSU still counts toward `n_psu_full`. Hand-computation methodology anchor at `_scratch/wave_e3_smoke.py` codifies the A2 invariant on 4 PSU × 4 period × 3 obs synthetic. **Subpopulation parity vs upstream-subset:** `df_survey` matches the full domain regardless of how many rows the subpopulation mask excludes (mirrors R `svyglm(design=subset(d, mask))` vs `svyglm(design=svydesign(data=data[mask], ...))`). SE may differ by design — subpopulation retains zero-padded PSU geometry; upstream-subset drops PSUs entirely. **Pre-E.3 baseline parity:** when `finite_mask.all() == True` AND all weights `> 0`, the Wave E.3 zero-pad is a no-op — ATT + SE + n_psu + df_survey match pre-E.3 baseline values via FIXED GOLDEN values at `test_c` (`rtol=1e-12, atol=1e-12`). **Cross-surface n_psu consistency:** top-level `res.n_psu` reads from `len(resolved_survey_fit.weights)` on the implicit-PSU branch (was `int(finite_mask.sum())` pre-codex-R1-P2-fix); this keeps `res.n_psu == res.survey_metadata.n_psu` on weights-only / strata-only survey designs under warn-and-drop. Regression at `test_c2`. **Restrictions inherited:** replicate-weight variance + subpopulation continues to raise `NotImplementedError` at the Wave E.1 gate. TwoStageDiD's analogous `finite_mask + design-subset` pattern at `two_stage.py:567-601` is NOT yet adopted to Wave E.3 — separate parity follow-up tracked in `TODO.md` (an expected-divergence test was attempted but TwoStageDiD's always-treated handling at `two_stage.py:294-336` differs from SpilloverDiD's per-unit Omega_0 check, so the divergence didn't materialize on the standard fixture; the parity follow-up should add its own targeted regression). **Implementation:** `spillover.py:2845-2896` design-subset block deleted; `survey_weights_fit = survey_weights[finite_mask]` retained for the stage-2 OLS solve which still operates on the fit sample; `cluster_ids_full[finite_mask]` subset dropped on the survey path. `_compute_gmm_corrected_meat` call at `spillover.py:3163` now receives FIT-LENGTH gamma_hat-construction inputs (unchanged) plus FULL-LENGTH kernel-dispatch arrays (`cluster_ids_for_meat`, `conley_*_for_meat`, `resolved_survey_fit`) plus the new `score_pad_mask=survey_finite_mask` kwarg; no-survey path passes `score_pad_mask=None` and uses fit-length variables throughout (bit-identical to pre-E.3). `_compute_gmm_corrected_meat` at `two_stage.py:62-80` adds one new optional kwarg `score_pad_mask: Optional[np.ndarray] = None` and one post-Psi-construction zero-pad block; the `_validate_conley_kwargs` call uses `n_for_conley = len(score_pad_mask)` when the kwarg is set. Within-unit-constancy validator at `spillover.py:2913` updated to operate on full-length unit array. Second `compute_survey_metadata` recompute at `spillover.py:2954-2959` uses full-length `raw_w`. No `_compute_stratified_meat_from_psu_scores` / `_compute_stratified_conley_meat` / `_compute_stratified_serial_bartlett_meat` signature changes. **Tests:** new `TestSpilloverDiDWaveE3SubpopulationFullDesign` and `TestSpilloverDiDWaveE3SubpopulationFullDesignEventStudy` classes in `tests/test_spillover.py` (19 tests: pre-E.3 baseline parity via pinned goldens, n_psu cross-surface consistency on implicit-PSU branch, A2 invariant (zero-pad mechanics via mock-spy), subpopulation × explicit-PSU parity, conley + lag>0 + subpopulation × explicit-PSU / cluster-injection / weights-only branches, cluster-as-PSU + subpopulation parity, unit with BOTH zero weight AND no Omega_0 support, gamma_hat-build sample excludes zero-weight rows, n_obs / n_treated / n_control / n_far_away_obs reflect count_mask, warn-drop SE drift golden, ATT bit-equality under PSU-last-sort exclusion, exact event-study n_obs propagation, event-study on both is_staggered branches with analytical + conley+lag variants). Pre-existing Wave E.1 `test_p2_finite_mask_forces_drop_under_survey` assertion flipped from `n_psu=8` (subset) to `n_psu=10` (full domain) to reflect the new contract.
19
19
- **ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion.** Tracker row flipped **In Progress** → **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473) and ContinuousDiD precedent (PR #476). REGISTRY `## ChaisemartinDHaultfoeuille` gains a formal `### Deviations from the paper / from R / library extensions` block consolidating 7 documented deviations into a single AI-review-recognized labeled surface (per CLAUDE.md "Documenting Deviations (AI Review Compatibility)"): (D1) equal-cell weighting (deviation from BOTH AER 2020 Equation 3 AND R `DIDmultiplegtDYN`); (D2) period-based vs cohort-based stable controls; (D3) balanced-baseline panel + interior-gap drops + terminal-missingness retention + cell-period-allocator targeted `ValueError`; (D4) SE normalization `N_l` vs R `G` (~4% smaller analytical SE); (D5) singleton-cohort degeneracy → NaN with `UserWarning`; (D6) `<50%` switcher warning at far horizons (library extension citing Favara-Imbs application, footnote 14 of NBER WP 29873); (D7) Phase 3 `DID^X` covariate first-stage equal-cell weights. R cross-language coverage holds at documented tolerance bands in `tests/test_chaisemartin_dhaultfoeuille_parity.py` (`POINT_RTOL = 1e-4` on pure-direction point estimates, `MIXED_POINT_RTOL = 0.025` on mixed-direction, `PURE_DIRECTION_SE_RTOL = 0.05` on pure-direction SE, `SE_RTOL = 0.10` on multi-horizon SE, `se_rtol=0.15` on the long-panel `L_max=5` joiners-only scenario where cell-count-weighting compounds). No source code changes, no new tests, no new docstrings — consolidation only against the existing 12 methodology tests (`tests/test_methodology_chaisemartin_dhaultfoeuille.py`), 26 R-parity tests (`tests/test_chaisemartin_dhaultfoeuille_parity.py`), 352 unit tests (`tests/test_chaisemartin_dhaultfoeuille.py`), survey suites (`tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, three cell-period coverage suites), and two primary-source DCDH paper reviews on disk (2020 AER + 2022/2023 NBER WP 29873 via PR #478; the `dechaisemartin-2026-review.md` on disk is HAD's primary source, not DCDH's, and is referenced as adjacent context only). The REGISTRY Deviations block uses semantic section-name anchors (rather than fragile line numbers) for back-references to other parts of the DCDH section — an intentional divergence from the PR #476 ContinuousDiD precedent reflecting PR-A wording-drift CI feedback that flagged line-number cross-references as drift-prone in long sections. `METHODOLOGY_REVIEW.md` DCDH row promoted **In Progress** → **Complete**; L27 In Progress example paragraph re-pointed to WooldridgeDiD; L1289 priority-order queue item #6 (DCDH) removed and items #7-#11 renumbered to #6-#10.
0 commit comments