Skip to content

Commit 5b9f289

Browse files
committed
wooldridge: CI R11 P3 fixes — paper-review aggregation table reflects shipped opt-in surface + test-count genericization
1 parent b9986c5 commit 5b9f289

3 files changed

Lines changed: 10 additions & 10 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
- **WooldridgeDiD `weights="cohort_share"` on `aggregate()` (paper W2025 Eq. 7.4 / Eq. 7.6).** `WooldridgeDiDResults.aggregate(type, weights="cell" | "cohort_share")` exposes the paper's cohort-share aggregation as an opt-in alternative to the default cell-count weighting (which matches Stata `jwdid_estat`). Under `weights="cohort_share"`, per-cell weights are `∝ N_g` (per-cohort unit count): for `type="simple"` (paper Eq. 7.4) the simple-overall ATT normalizes across all post-treatment cells; for `type="event"` (paper Eq. 7.6 cohort-share-by-exposure) the normalization is per-event-time across cohorts present at event-time `e`. `type="group"` and `type="calendar"` raise `ValueError` under cohort_share (no paper closed-form). The Bell-McCaffrey contrast DOF (`vcov_type="hc2_bm"`) is rebuilt under the active weighting scheme so SE + DOF reflect the actual aggregation. On balanced panels with uniform within-cohort cell counts the two schemes coincide (paper Section 7.5 footnote). New `_n_g_per_cohort` field on `WooldridgeDiDResults` carries the per-cohort unit counts; populated in all three fit paths (OLS / logit / Poisson). Closes TODO row 95.
1414
- **WooldridgeDiD `cohort_trends=True` for paper W2025 Section 8 / Eq. 8.1 heterogeneous cohort-specific linear trends.** `WooldridgeDiD(cohort_trends=True)` adds linear `dg_i · t` interactions for each treated cohort to the design matrix. Under the heterogeneous-trends DGP `y = c_i + α_t + δ_g · t + τ · w_{it} + u_{it}`, the parameter recovers `τ` even when parallel trends fails (paper Section 8.3). **OLS-path only:** `cohort_trends=True` + `method ∈ {"logit","poisson"}` raises `NotImplementedError` at `__init__` per paper Section 8's OLS scope; the error message cites the paper section explicitly. **Auto-routes to full-dummy mode** regardless of `vcov_type` (matching the absorb→fixed_effects auto-route pattern at `feedback_absorb_to_fixed_effects_auto_route`): composing `dg_i · t` with the within-transformation yields `(dg_i − mean(dg_i)) · (t − mean(t))` which is algebraically correct but non-trivial to verify on every panel shape; the full-dummy auto-route keeps math closure verified on the same paths already locked by PR #483's HC2 / HC2-BM / classical R-parity goldens. New `cohort_trend_coefs: Dict[g → δ_g]` attribute on `WooldridgeDiDResults` (empty dict under default `cohort_trends=False`). Closes the PR-A Requirements Checklist heterogeneous-trends gap (item 11 in `docs/methodology/papers/wooldridge-2025-review.md`).
1515
- **WooldridgeDiD R-parity goldens for `etwfe(family="poisson")` + `etwfe(family="logit")`.** `benchmarks/R/generate_wooldridge_golden.R` extended to fit R `etwfe` on Poisson + logit DGPs and persist log-link coefficient + SE goldens to `benchmarks/data/wooldridge_golden.json` (Poisson + logit blocks alongside the existing OLS vcov_type blocks). `benchmarks/R/requirements.R` pins `etwfe >= 0.5.0`. The R goldens cover diff-diff's nonlinear surfaces only at the surface level (fit completes + log-link goldens present + structured correctly); numerical cell-level R-parity between diff-diff's response-scale ATT (paper W2023 ASF / APE) and R `etwfe`'s log-link cell coefficient is deferred — requires either `emfx()`-based APE extraction on the R side or link-function inversion with baseline-mean adjustment (new TODO row added).
16-
- **`tests/test_methodology_wooldridge.py` extended with 6 paper-equation-numbered methodology classes + 1 library-deviations class.** Adds 48 new tests (60 total in the file, including the existing 12 vcov_type R-parity tests from PR #483) covering Theorem 3.1 Mundlak ≡ TWFE equivalence, Proposition 5.1 imputation ≡ POLS, Section 6 event study, Section 7 aggregation paths (paper Eqs. 7.4 / 7.6 hand-calc), Section 8 heterogeneous trends, Section 10 unbalanced panels + time-varying covariates, plus a `TestW2025LibraryDeviations` class consolidating 5 surviving deviations (HC1 finite-sample factor, QMLE sandwich `(n-1)/(n-k)`, nonlinear-vs-fixest, logit cohort+time dummies, anticipation + aggregation). Per-class seed decorrelation via `_BASE_SEED_*` module constants mirrors the HAD precedent at `tests/test_methodology_had.py:78-83`. New DGP helpers (`_make_two_cohort_three_period_panel`, `_make_three_cohort_four_period_panel`, `_make_heterogeneous_trends_panel`, `_make_unbalanced_panel`) reusable across the methodology classes. Two new surface-only R-parity classes (`TestWooldridgeParityRPoisson`, `TestWooldridgeParityRLogit`) lock the Poisson + logit goldens at the structural level.
16+
- **`tests/test_methodology_wooldridge.py` extended with 6 paper-equation-numbered methodology classes + 1 library-deviations class.** Net ~70 new tests across 10 classes (joining the existing 12 vcov_type R-parity tests from PR #483) covering Theorem 3.1 Mundlak ≡ TWFE equivalence, Proposition 5.1 imputation ≡ POLS, Section 6 event study, Section 7 aggregation paths (paper Eqs. 7.4 / 7.6 hand-calc + survey/bootstrap/never-treated rejections), Section 8 heterogeneous trends (per-cohort identification + all-treated last-cohort drop + survey/never-treated cross-product rejections + reporting metadata), Section 10 unbalanced panels + time-varying covariates, plus a `TestW2025LibraryDeviations` class consolidating 5 surviving deviations (HC1 finite-sample factor, QMLE sandwich `(n-1)/(n-k)`, nonlinear-vs-fixest, logit cohort+time dummies, anticipation + aggregation). Per-class seed decorrelation via `_BASE_SEED_*` module constants mirrors the HAD precedent at `tests/test_methodology_had.py:78-83`. New DGP helpers (`_make_two_cohort_three_period_panel`, `_make_three_cohort_four_period_panel`, `_make_heterogeneous_trends_panel`, `_make_unbalanced_panel`) reusable across the methodology classes. Two new surface-only R-parity classes (`TestWooldridgeParityRPoisson`, `TestWooldridgeParityRLogit`) lock the Poisson + logit goldens at the structural level.
1717
- **WooldridgeDiD `vcov_type` parameter, OLS path (Phase 1b PR 3/8).** `WooldridgeDiD(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` on `method="ols"` (defaults to `"hc1"`, preserves prior behavior at machine precision — the WLS-CR1 sandwich is algebraically invariant between the prior within-transform path and the new branched path, differing only by float64 multiplication ordering at sub-ULP scale; the full 106-test `tests/test_wooldridge.py` baseline still passes unchanged). `hc2_bm` auto-routes to a full-dummy saturated design (`[intercept, X_design, unit_dummies, time_dummies]`) + clubSandwich WLS-CR2 algebra (PR #475) — matches `clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt` at `atol=1e-10` on the new `benchmarks/data/wooldridge_golden.json` fixture. `classical`/`hc2` supported via full-dummy + auto-drop of the unit auto-cluster (one-way families); explicit `cluster="X"` + one-way family raises at the linalg validator. Per-cell + aggregate p-values/CIs on `classical`/`hc2` paths use the residual DOF `n - rank(X)` (matches R `lm()` / `coef_test()` t-distribution), not normal-theory. **Bell-McCaffrey Satterthwaite DOF is threaded across ALL hc2_bm user-facing inference surfaces**: (1) per-cell `group_time_effects[(g, t)]` use `coef_test()$df_Satt` (matches R at atol=1e-6 from CI inversion); (2) overall ATT uses the post-period-aggregation contrast DOF from `_compute_cr2_bm_contrast_dof` (matches R `Wald_test(test="HTZ")$df_denom` at atol=1e-10); (3) `.aggregate("group" | "calendar" | "event")` recomputes contrast-specific BM DOFs lazily from BM artifacts stored on the Results object — the REDUCED kept-column design (`X_red`), cluster_ids, reduced bread matrix, and reduced-space coef-index map (using the reduced kept-column design after rank-deficient drops keeps the bread non-singular and matches the subspace `solve_ols` actually estimated in). Fail-closed (all-NaN inference) when BM DOF unavailable, mirrors PR #475 R7 and PR #479 R3. `method ∈ {"logit","poisson"}` + `vcov_type != "hc1"` raises `NotImplementedError` at `__init__` (GLM CR2-BM-on-pseudo-residuals composition needs derivation; deferred to follow-up TODO row). `SurveyDesign` + `vcov_type != "hc1"` raises `NotImplementedError` at `fit()` (survey TSL overrides analytical sandwich). `n_bootstrap > 0` + one-way (`hc2`/`classical`) raises at `fit()` regardless of `cluster=` setting (multiplier bootstrap is intrinsically clustered, but one-way vcov_type does not compose with cluster_ids — either the auto-cluster is dropped when `cluster=None` leaving the bootstrap with no cluster to draw at, or the linalg validator rejects one-way + cluster_ids when `cluster=X`). `conley` rejected at `__init__` with a deferral pointer. `vcov_type`, `cluster_name`, `n_clusters` added to `WooldridgeDiDResults` for downstream introspection (per `feedback_results_vcov_label_cluster_metadata`). Third PR of the Phase 1b standalone-estimator threading initiative (5 PRs to follow: CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD).
1818
- **`SpilloverDiD(survey_design=SurveyDesign.subpopulation(...))` full-design retention via zero-pad scores (Wave E.3).** Closes the Wave E.1/E.2/follow-up documented limitation at `REGISTRY.md:3249`: `SurveyDesign.subpopulation()`-derived designs AND warn-and-drop fits now preserve the full-domain resolved survey design — `n_psu` / `n_strata` / `df_survey` / Binder TSL per-stratum centering reflect the FULL domain rather than the post-`finite_mask` fit sample. **Documented synthesis (library-convention adoption, NOT new methodology):** Wave E.3 adopts the canonical "zero-pad scores to full panel + retain full-design resolved survey" pattern from R `survey::svyrecvar(subset())` (Lumley 2010 §2.5) already established in `diff_diff/imputation.py:2175-2183` (PreTrendsImputation lead regression — Omega_0 scores zero-padded back to full panel length) and `diff_diff/prep.py:1401-1432` (DCDH cell variance — IF zero-padded outside the cell). Wave E.3 propagates the same convention to SpilloverDiD's Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up stratified-Conley + serial Bartlett meat. **Mechanical realization (one new `_compute_gmm_corrected_meat` kwarg):** the gamma_hat / Psi build stays on SURVEY-FINITE-MASK inputs (`X_1_sparse_fit`, `X_10_sparse_fit`, `eps_10_fit` built on `survey_finite_mask = finite_mask & survey_weights > 0`; `X_2_kept_gamma`, `eps_2_fit_gamma`, `survey_weights_fit_gamma` projected from the fit-sample frame down to survey_finite_mask) so the drop-first stage-1 FE column space is bit-identical to the pre-E.3 path. `_compute_gmm_corrected_meat` gains a new optional kwarg `score_pad_mask: Optional[np.ndarray] = None`: when supplied, the helper zero-pads the fit-sample `Psi` to full panel length AFTER construction but BEFORE kernel dispatch via `Psi_padded[score_pad_mask] = Psi`. Kernel-dispatch arrays (`cluster_ids`, `conley_coords`, `conley_time`, `conley_unit`, `resolved_survey`) are passed at FULL length so the meat helpers (Binder TSL / stratified-Conley / serial Bartlett) see the full-domain PSU / strata / centroid / time geometry. The `_validate_conley_kwargs` call inside the helper reads `n_for_conley = len(score_pad_mask)` when the kwarg is set so the Conley shape checks see the full-length geometry. **`gamma_hat` invariance:** the gamma_hat solve operates on fit-sample inputs throughout — bit-identical to the pre-E.3 path (critical for the case where `_build_butts_fe_design_csr`'s `pd.factorize` re-compaction would drop a different unit's column under a full-length FE build than under a fit-length one). **Bread invariance:** `A_22 = X_2_kept' W X_2_kept` at `spillover.py:3187-3214` still uses fit-length `X_2_kept` because `A_22_full = X_2_full' W_full X_2_full` equals `A_22_kept` when zero-weight rows contribute zero. **A2 invariant:** warn-and-drop and `SurveyDesign.subpopulation()` drops are treated identically — both apply the zero-pad mechanism. The "both mechanisms compose cleanly" case (subpop-excluded row that is ALSO warn-and-dropped) produces `Psi = 0` from either cause; the PSU still counts toward `n_psu_full`. Hand-computation methodology anchor at `_scratch/wave_e3_smoke.py` codifies the A2 invariant on 4 PSU × 4 period × 3 obs synthetic. **Subpopulation parity vs upstream-subset:** `df_survey` matches the full domain regardless of how many rows the subpopulation mask excludes (mirrors R `svyglm(design=subset(d, mask))` vs `svyglm(design=svydesign(data=data[mask], ...))`). SE may differ by design — subpopulation retains zero-padded PSU geometry; upstream-subset drops PSUs entirely. **Pre-E.3 baseline parity:** when `finite_mask.all() == True` AND all weights `> 0`, the Wave E.3 zero-pad is a no-op — ATT + SE + n_psu + df_survey match pre-E.3 baseline values via FIXED GOLDEN values at `test_c` (`rtol=1e-12, atol=1e-12`). **Cross-surface n_psu consistency:** top-level `res.n_psu` reads from `len(resolved_survey_fit.weights)` on the implicit-PSU branch (was `int(finite_mask.sum())` pre-codex-R1-P2-fix); this keeps `res.n_psu == res.survey_metadata.n_psu` on weights-only / strata-only survey designs under warn-and-drop. Regression at `test_c2`. **Restrictions inherited:** replicate-weight variance + subpopulation continues to raise `NotImplementedError` at the Wave E.1 gate. TwoStageDiD's analogous `finite_mask + design-subset` pattern at `two_stage.py:567-601` is NOT yet adopted to Wave E.3 — separate parity follow-up tracked in `TODO.md` (an expected-divergence test was attempted but TwoStageDiD's always-treated handling at `two_stage.py:294-336` differs from SpilloverDiD's per-unit Omega_0 check, so the divergence didn't materialize on the standard fixture; the parity follow-up should add its own targeted regression). **Implementation:** `spillover.py:2845-2896` design-subset block deleted; `survey_weights_fit = survey_weights[finite_mask]` retained for the stage-2 OLS solve which still operates on the fit sample; `cluster_ids_full[finite_mask]` subset dropped on the survey path. `_compute_gmm_corrected_meat` call at `spillover.py:3163` now receives FIT-LENGTH gamma_hat-construction inputs (unchanged) plus FULL-LENGTH kernel-dispatch arrays (`cluster_ids_for_meat`, `conley_*_for_meat`, `resolved_survey_fit`) plus the new `score_pad_mask=survey_finite_mask` kwarg; no-survey path passes `score_pad_mask=None` and uses fit-length variables throughout (bit-identical to pre-E.3). `_compute_gmm_corrected_meat` at `two_stage.py:62-80` adds one new optional kwarg `score_pad_mask: Optional[np.ndarray] = None` and one post-Psi-construction zero-pad block; the `_validate_conley_kwargs` call uses `n_for_conley = len(score_pad_mask)` when the kwarg is set. Within-unit-constancy validator at `spillover.py:2913` updated to operate on full-length unit array. Second `compute_survey_metadata` recompute at `spillover.py:2954-2959` uses full-length `raw_w`. No `_compute_stratified_meat_from_psu_scores` / `_compute_stratified_conley_meat` / `_compute_stratified_serial_bartlett_meat` signature changes. **Tests:** new `TestSpilloverDiDWaveE3SubpopulationFullDesign` and `TestSpilloverDiDWaveE3SubpopulationFullDesignEventStudy` classes in `tests/test_spillover.py` (19 tests: pre-E.3 baseline parity via pinned goldens, n_psu cross-surface consistency on implicit-PSU branch, A2 invariant (zero-pad mechanics via mock-spy), subpopulation × explicit-PSU parity, conley + lag>0 + subpopulation × explicit-PSU / cluster-injection / weights-only branches, cluster-as-PSU + subpopulation parity, unit with BOTH zero weight AND no Omega_0 support, gamma_hat-build sample excludes zero-weight rows, n_obs / n_treated / n_control / n_far_away_obs reflect count_mask, warn-drop SE drift golden, ATT bit-equality under PSU-last-sort exclusion, exact event-study n_obs propagation, event-study on both is_staggered branches with analytical + conley+lag variants). Pre-existing Wave E.1 `test_p2_finite_mask_forces_drop_under_survey` assertion flipped from `n_psu=8` (subset) to `n_psu=10` (full domain) to reflect the new contract.
1919
- **ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion.** Tracker row flipped **In Progress** → **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473) and ContinuousDiD precedent (PR #476). REGISTRY `## ChaisemartinDHaultfoeuille` gains a formal `### Deviations from the paper / from R / library extensions` block consolidating 7 documented deviations into a single AI-review-recognized labeled surface (per CLAUDE.md "Documenting Deviations (AI Review Compatibility)"): (D1) equal-cell weighting (deviation from BOTH AER 2020 Equation 3 AND R `DIDmultiplegtDYN`); (D2) period-based vs cohort-based stable controls; (D3) balanced-baseline panel + interior-gap drops + terminal-missingness retention + cell-period-allocator targeted `ValueError`; (D4) SE normalization `N_l` vs R `G` (~4% smaller analytical SE); (D5) singleton-cohort degeneracy → NaN with `UserWarning`; (D6) `<50%` switcher warning at far horizons (library extension citing Favara-Imbs application, footnote 14 of NBER WP 29873); (D7) Phase 3 `DID^X` covariate first-stage equal-cell weights. R cross-language coverage holds at documented tolerance bands in `tests/test_chaisemartin_dhaultfoeuille_parity.py` (`POINT_RTOL = 1e-4` on pure-direction point estimates, `MIXED_POINT_RTOL = 0.025` on mixed-direction, `PURE_DIRECTION_SE_RTOL = 0.05` on pure-direction SE, `SE_RTOL = 0.10` on multi-horizon SE, `se_rtol=0.15` on the long-panel `L_max=5` joiners-only scenario where cell-count-weighting compounds). No source code changes, no new tests, no new docstrings — consolidation only against the existing 12 methodology tests (`tests/test_methodology_chaisemartin_dhaultfoeuille.py`), 26 R-parity tests (`tests/test_chaisemartin_dhaultfoeuille_parity.py`), 352 unit tests (`tests/test_chaisemartin_dhaultfoeuille.py`), survey suites (`tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, three cell-period coverage suites), and two primary-source DCDH paper reviews on disk (2020 AER + 2022/2023 NBER WP 29873 via PR #478; the `dechaisemartin-2026-review.md` on disk is HAD's primary source, not DCDH's, and is referenced as adjacent context only). The REGISTRY Deviations block uses semantic section-name anchors (rather than fragile line numbers) for back-references to other parts of the DCDH section — an intentional divergence from the PR #476 ContinuousDiD precedent reflecting PR-A wording-drift CI feedback that flagged line-number cross-references as drift-prone in long sections. `METHODOLOGY_REVIEW.md` DCDH row promoted **In Progress** → **Complete**; L27 In Progress example paragraph re-pointed to WooldridgeDiD; L1289 priority-order queue item #6 (DCDH) removed and items #7-#11 renumbered to #6-#10.

0 commit comments

Comments
 (0)