igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎METHODOLOGY_REVIEW.md‎
Lines changed: 30 additions & 15 deletions b/‎METHODOLOGY_REVIEW.md‎
Lines changed: 30 additions & 15 deletions
diff --git a/‎TODO.md‎
Lines changed: 1 addition & 0 deletions b/‎TODO.md‎
Lines changed: 1 addition & 0 deletions
@@ -10,6 +10,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Fixed
 - **Covariate names that collide with reserved structural terms now raise `ValueError` instead of silently corrupting the coefficient dict (`DifferenceInDifferences`, `MultiPeriodDiD`, `TwoWayFixedEffects`).** These estimators build their `coefficients` dict by zipping a variable-name list -- structural term names PLUS the user covariate column names appended verbatim -- with the fitted coefficient vector. A covariate whose name equaled a reserved structural name (`const`; the treatment/time column names; the `{treatment}:{time}` interaction; MultiPeriodDiD `period_{p}` dummies and `{treatment}:period_{p}` interactions; `TwoWayFixedEffects` `ATT`; fixed-effect / unit / time dummy names; or an internal `_`-prefixed working column such as `_treat_time` / `_did_treatment` / `_treatment_post`) silently **overwrote** that structural coefficient via Python dict last-write-wins -- e.g. a covariate named `const` dropped the intercept -- with no error or warning. A new shared `validate_covariate_names` helper (`diff_diff/utils.py`) is now called in each of the three `fit()` methods before the design matrix is built; it raises `ValueError` on a collision (the comparison is case-sensitive, so e.g. `Const` is still allowed) **and** on duplicate names within `covariates` (which collapse to a single dict entry the same way). Fixed-effect/unit/time dummy reserved names are taken from the same `pd.get_dummies(..., drop_first=True)` call used to build them, so they match exactly (including for pandas `Categorical` columns with a non-default category order). For `TwoWayFixedEffects` the guard fires on **all** variance paths: the default within-transform path returns only `{"ATT": att}` (no covariate is a dict key there), but a covariate named `_treatment_post` would still clobber the internal interaction column, so guarding both paths is uniform and forward-compatible. **Potentially breaking:** a fit that previously *succeeded* with a colliding (or duplicated) covariate name -- silently returning a corrupted coefficient dict -- now raises; rename the covariate column(s). The staggered / influence-function estimators (CallawaySantAnna, SunAbraham, StaggeredTripleDifference, EfficientDiD, TwoStageDiD, ImputationDiD, WooldridgeDiD, dCDH, StackedDiD) key results by `(g, t)` tuples / relative-time indices, never covariate names, and `TripleDifference` / `SyntheticControl` / `SyntheticDiD` do not expose covariates by name, so none are affected. New tests in `tests/test_utils.py`, `tests/test_estimators.py`, and `tests/test_estimators_vcov_type.py`.
 
+### Changed
+- **EfficientDiD methodology-review-tracker promotion: In Progress → Complete, with a covariate outcome-regression upgrade (behavior change).** Completes the source-validation pass (PR-B) of the Chen, Sant'Anna & Xie (2025, arXiv:2506.17729v1) audit — PR-A (#515) added the paper review on file; this PR validates the source against the code, eliminates the one real deviation, adds paper-equation Verified Components, and flips the tracker. **Behavior change:** the covariate doubly-robust path's outcome regression `m̂(X)` was a **linear OLS** working model — consistent (doubly robust) but attaining the semiparametric efficiency bound only when the conditional mean is linear in the covariates. It is replaced by a **polynomial sieve** (total degree up to K, AIC/BIC order selection, the same basis family as the propensity-ratio sieve), so with the sieve propensity ratio and the kernel-smoothed conditional `Ω*(X)` all nuisances are estimated nonparametrically and the covariate path attains the bound under the paper's regularity conditions (Section 4 / Theorem 4.1). The order is chosen by an OLS information criterion `IC = n·ln(RSS/n) + c_n·p_K`, where `p_K = comb(K+d, d)` is the sieve basis dimension (number of fitted coefficients; `c_n = 2` AIC, `ln(n)` BIC), on the within-group (survey-weighted) residual sum of squares, using the **raw** within-group observation count for both `n` and the penalty so the selected order — and hence `m̂` — is invariant to the survey-weight scale (the existing `test_survey_phase3.py` scale-invariance asserts still hold to `atol=1e-8`). **Degree 1 reproduces the prior linear OLS up to floating point**, so AIC/BIC degrades to linear when the conditional mean is linear and covariate-fit numbers change only when a higher order is selected (i.e. when linear was inadequate); `sieve_k_max=1` forces every covariate-path sieve to degree 1 (it recovers the linear outcome-regression component but also degree-1-constrains the propensity sieves, so it does **not** reproduce the exact pre-PR estimator). The sieve is a *growing* sieve — the candidate degree is `floor(n_group^{1/5})` with **no fixed ceiling**, giving a basis dimension `p_K = comb(K+d,d)` bounded by `n_basis < n_group` (so `p_K/n → 0` for the low-dimensional covariate settings typical of DiD; Assumption C.1's rate is on the dimension, not the degree). This satisfies C.1's growing-sieve uniform-consistency / `o_p(n^{-1/2})` product-rate conditions (Theorem 4.1) under which the bound is attained asymptotically; a frozen finite-order sieve would not. (High-dimensional `X` faces the usual curse of dimensionality, where the paper's ML-nuisance option applies.) This also removes the prior hard `K≤5` cap from the two pre-existing propensity-ratio / inverse-propensity sieves (a no-op for groups under ~3,125 units, where `floor(n^{1/5}) < 5` anyway; it only activates higher orders at large n). The small-group overfit cap (`n_basis < n_group`), the rank-guard + partial-skip warnings, and the WLS survey path mirror the propensity sieve; if every degree is rank-skipped the estimator falls back to the intercept-only within-group mean (distinct from the propensity sieve's constant-ratio-1 fallback). The no-covariate path, weights, generated outcomes, `Ω*`, SE, aggregation, and Hausman are **unchanged** — the audit verified them correct against the paper (no other corrections). The Theorem A.1 Hausman statistic computation was extracted into a behavior-preserving `_hausman_quadratic_form` helper for unit-testability. New `tests/test_methodology_efficient_did.py` with paper-equation-numbered Verified Components (Eq 3.5/3.13 inverse-covariance weights + the min-variance property; Eq 3.9 generated-outcome telescoping; Eq 3.13/§4.1 no-covariate closed form; Corollary 3.1/3.2 PT-Post = Callaway-Sant'Anna; Theorem 4.1 SE = `sqrt(mean(EIF²)/n)`; Theorem A.1 / Eq A.2 Hausman with the restricted−efficient covariance direction, the effective-rank DOF safeguard on a rank-deficient `V`, and the covariance-direction guard; plus the sieve nonlinear-recovery / linear-degradation / efficiency-gain checks). The HRS Table 6 anchor (`tests/test_efficient_did_validation.py::TestHRSReplication`, a derived openICPSR 116186 subset) is tightened from 0.1·SE to **0.05·SE** (the fit is deterministic; all cells are < 0.03·SE), with the data license/redistribution and the 656-vs-652 sample difference documented in `tests/data/README.md`. REGISTRY `## EfficientDiD` Notes updated (outcome regression now sieve + bound-attainment under Assumption C.1; new K=1-fallback edge-case Note); module/class docstrings and the paper review's "open working-model choice" pointer reconciled; `METHODOLOGY_REVIEW.md` row promoted to **Complete** (`Last Review = 2026-06-01`) with a Verified Components / Corrections Made / Deviations detail block; priority queue pruned.
 ## [3.5.0] - 2026-06-01
 
 ### Added
 
@@ -50,7 +50,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
 | ImputationDiD | `imputation.py` | `didimputation` | **In Progress** | — |
 | TwoStageDiD | `two_stage.py` | `did2s` | **In Progress** | — |
 | WooldridgeDiD (ETWFE) | `wooldridge.py` | `etwfe` (R) / `jwdid` (Stata) | **Complete** | 2026-05-22 |
-| EfficientDiD | `efficient_did.py` | (no canonical R package) | **In Progress** | — |
+| EfficientDiD | `efficient_did.py` | (no canonical R package) | **Complete** | 2026-06-01 |
 
 ### Continuous & Universal-Treatment Estimators
 
@@ -631,20 +631,36 @@ and covariate-adjusted specifications.)
 | Module | `efficient_did.py`, `efficient_did_bootstrap.py`, `efficient_did_covariates.py`, `efficient_did_weights.py` |
 | Primary Reference | Chen, Sant'Anna & Xie (2025), *Efficient Difference-in-Differences and Event Study Estimators* |
 | R Reference | (no canonical R package; paper compares against `did` / `DIDmultiplegt` / BJS / Gardner / Wooldridge as benchmarks rather than providing a reference implementation) |
-| Status | **In Progress** |
-| Last Review | — |
+| Status | **Complete** |
+| Last Review | 2026-06-01 |
 
 **Documentation in place:**
-- REGISTRY.md section: `## EfficientDiD` (full Theorem 4.1 EIF, sieve-based propensity-ratio estimation with AIC/BIC, kernel-smoothed conditional covariance, Hausman pretest for PT-All vs PT-Post, survey support)
-- Implementation: 130 unit tests in `tests/test_efficient_did.py` + 12 validation tests in `tests/test_efficient_did_validation.py`
-- Hausman pretest: implemented per Theorem A.1 with Moore-Penrose pseudoinverse for finite-sample non-PSD variance-difference matrix
-- Survey support: pweight + strata/PSU/FPC via TSL on EIF scores; covariates DR path with WLS outcome regression and weighted sieve normal equations
+- REGISTRY.md section: `## EfficientDiD` (full Theorem 4.1 EIF, sieve-based propensity-ratio and outcome-regression estimation with AIC/BIC, kernel-smoothed conditional covariance, Hausman pretest for PT-All vs PT-Post, survey support)
 - Paper review on file: `docs/methodology/papers/chen-santanna-xie-2025-review.md` (PR-A, 2026-05-31) — faithful paper-sourced transcription of arXiv:2506.17729v1 (assumptions S/O/NA/PT-Post/PT-All; Theorem 3.1/3.2 EIFs + Corollaries 3.1/3.2; §4 sieve/kernel DR estimation; Theorem 4.1 SEs; Theorem A.1 Hausman; HRS Table 6 anchor)
-
-**Outstanding for promotion (PR-B source validation; paper review now on file):**
-- Dedicated `tests/test_methodology_efficient_did.py` with Theorem 3.2 / Equation 3.5 / Equation 4.3 numbered Verified Components walk-through
-- Cross-language anchor: the paper's empirical replication uses HRS data following Sun-Abraham (2021); a same-data benchmark against the paper's reported numbers (or a same-DGP MC against R alternatives) would substantiate the EIF construction
-- Documented deviations: linear OLS working models for outcome regressions vs. paper's general nonparametric specification (DR safety net acknowledged but not separately validated); fixed-weight bootstrap aggregation vs. WIF-corrected analytical aggregation
+- Tests: `tests/test_efficient_did.py` (unit/API), `tests/test_efficient_did_validation.py` (HRS Table 6 + Compustat MC), and `tests/test_methodology_efficient_did.py` (PR-B paper-equation Verified Components)
+
+**Corrections Made (PR-B source validation):** (None — implementation verified correct.) The PR-B walk-through traced each paper result against the source and found the no-covariate path (multi-baseline efficiency recovered via the `g'=g` same-cohort pairs), the generated outcome (Eq 3.9), the optimal weights (Eq 3.5/3.13), the conditional covariance Ω*(X) (Eq 3.12), the analytical SE (Theorem 4.1), the cohort-size event-study aggregation (with the `(G_g − π_g)` WIF correction), and the Hausman covariance direction (Eq A.2, restricted − efficient) all correct.
+
+**Implementation change (deliberate, decided with the maintainer — eliminates a deviation rather than fixing a bug):**
+- Covariate outcome regression upgraded from linear OLS to a **polynomial sieve** (AIC/BIC order selection, same basis family as the propensity-ratio sieve; a *growing* sieve with no fixed order ceiling — `floor(n_group^{1/5})`, bounded by `n_basis < n_group` — which, since C.1's rate is on the sieve *dimension* `p_K = comb(K+d,d)` (not the polynomial degree, which differ once `d > 1`), satisfies Assumption C.1's uniform-consistency / `o_p(n^{-1/2})` product-rate conditions for the low-dimensional covariate settings typical of DiD) so the doubly-robust covariate path attains the semiparametric efficiency bound asymptotically under the paper's nonparametric-nuisance specification (Section 4 / Theorem 4.1), not only when the conditional mean is linear. Degree 1 reproduces the prior linear OLS *outcome regression*; `sieve_k_max=1` forces all covariate-path sieves to degree 1 (it recovers the linear outcome component but also degree-1-constrains the propensity sieves, so it does **not** reproduce the exact pre-PR estimator). Removing the hard `K≤5` cap also updates the two pre-existing propensity-ratio / inverse-propensity sieves (a no-op for groups under ~3,125 units). `diff_diff/efficient_did_covariates.py::estimate_outcome_regression`.
+- Extracted `_hausman_quadratic_form` (behaviour-preserving) so the Theorem A.1 statistic and effective-rank DOF logic are unit-testable in isolation.
+
+**Verified Components** (`tests/test_methodology_efficient_did.py`, paper-equation-numbered):
+- [x] Inverse-covariance optimal weights `1'Ω*⁻¹/(1'Ω*⁻¹1)` (Eq 3.5 / 3.13) + the min-variance property + the singular-Ω* pseudoinverse path.
+- [x] No-covariate generated outcome (Eq 3.9): `g'=g` telescopes to the per-baseline DiD (Eq 3.3); `g'=∞` to the period-1 long-difference.
+- [x] No-covariate efficient ATT = `weights @ generated_outcomes` (Eq 3.13 / §4.1), rebuilt independently from within-group sample means/covariances.
+- [x] PT-Post just-identified reduction = Callaway-Sant'Anna single-baseline (Corollary 3.1 single-date exact at 1e-9; Corollary 3.2 staggered).
+- [x] Analytical SE = `sqrt(mean(EIF²)/n)` (Theorem 4.1).
+- [x] Hausman statistic (Theorem A.1 / Eq A.2): `H = Δ'V⁺Δ`, `V = aCov(ẼS) − aCov(ÊS)` (restricted − efficient); `df = |E|` (well-conditioned) and `df = effective_rank < |E|` (rank-deficient safeguard); covariance-direction guard.
+- [x] Covariate sieve outcome regression: recovers a nonlinear-in-X conditional mean (K≥2), reproduces linear OLS on linear data (K=1), and beats a forced-linear working model under a nonlinear-nuisance conditional-PT DGP.
+- [x] Empirical anchor: HRS Table 6 (`tests/test_efficient_did_validation.py::TestHRSReplication`) on the paper's data (a derived openICPSR 116186 subset) matches all six ATT(g,t) + ES(0)/ES(1)/ES(2) + ES_avg to < 0.03 published SE; the Compustat MC confirms unbiasedness, the CS efficiency gain, coverage, and SE calibration.
+
+**Deviations (ratified; each carries a recognized REGISTRY label):**
+- `overall_att` is the cohort-size-weighted post-treatment average (Callaway-Sant'Anna convention), not the paper's uniform `ES_avg` (Eq 2.3); `ES_avg` is recoverable from the event-study output.
+- The multiplier bootstrap re-aggregates with fixed cohort-size weights (matching the CS bootstrap); the analytical path carries the `(G_g − π_g)` WIF weight-estimation correction.
+- The Hausman χ² uses the effective rank of `V` as degrees of freedom (a finite-sample safeguard equal to `|E|` when `V` is well-conditioned), rather than a fixed `|E|`.
+- `vcov_type` is permanently narrow to `{"hc1"}` (the IF-based variance has no single design matrix for analytical-sandwich families); the polynomial sieve basis and the Silverman kernel bandwidth are working-model choices the paper leaves open.
+- SEs are i.i.d.-asymptotics (`sqrt(mean(EIF²)/n)` or multiplier bootstrap); cluster/survey EIF variants are documented library extensions beyond the paper's stated scope.
 
 ---
 
@@ -1410,11 +1426,10 @@ more graceful handling of edge cases while still signaling invalid inference to
 
 Promotion priority for the **In Progress** entries, ordered by what's blocked on substantive review work (top of list = needs review next) vs. consolidation pass (bottom of list = mostly tracker walk-through):
 
-**Substantive-review-blocked (still missing a methodology test file / R parity — and, except for EfficientDiD, a paper review):**
+**Substantive-review-blocked (still missing a methodology test file / R parity and a paper review):**
 
 1. **PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way.
-2. **EfficientDiD** — **paper review on file** (PR-A, `chen-santanna-xie-2025-review.md`); remaining PR-B work is the source-validation pass — `tests/test_methodology_efficient_did.py` (Theorem 3.1/3.2 / Eq 3.5 / Eq 4.3 Verified Components), the HRS Table 6 cross-language anchor, and the documented deviations against Chen, Sant'Anna & Xie (2025).
-3. **ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.
+2. **ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.
 
 **Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):**
 
 
@@ -162,6 +162,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | Rust-backend HC2 implementation. Current Rust path only supports HC1; HC2 and CR2 Bell-McCaffrey fall through to the NumPy backend. For large-n fits this is noticeable. | `rust/src/linalg.rs` | Phase 1a | Low |
 | CR2 Bell-McCaffrey DOF uses a naive `O(n² k)` per-coefficient loop over cluster pairs. Pustejovsky-Tipton (2018) Appendix B has a scores-based formulation that avoids the full `n × n` `M` matrix. Switch when a user hits a large-`n` cluster-robust design. | `linalg.py::_compute_cr2_bm` | Phase 1a | Low |
 | `SyntheticControl` retains a full `_SyntheticControlFitSnapshot` (pivoted outcome/predictor panels) on EVERY fit to support the opt-in `in_space_placebo()`, so callers who never run the placebo still pay O(units × periods × predictor-vars) memory (same as `SyntheticDiD`'s always-on snapshot for `in_time_placebo`). Store a compact array/index representation instead of per-variable DataFrames, or build the snapshot lazily on first placebo call (would need to retain the source data, ~same cost). | `synthetic_control.py` snapshot build, `synthetic_control_results.py::_SyntheticControlFitSnapshot` | follow-up | Low |
+| EfficientDiD DR (covariate) path rebuilds the full polynomial sieve basis `_polynomial_sieve_basis(X, K)` for every candidate `K` inside each of the three nuisance fits (outcome regression, propensity ratio, inverse propensity), per `fit()`. After the growing-sieve cap removal (PR-B), large covariate-adjusted fits at large `n` pay more avoidable basis-construction cost. Cache the basis per `(X, K)` within a `fit()` and share it across the nuisance helpers. | `diff_diff/efficient_did_covariates.py` (the three sieve helpers) | PR-B follow-up | Low |
 
 #### Testing/Docs