You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Validate the EfficientDiD implementation against Chen, Sant'Anna & Xie
(2025, arXiv:2506.17729v1) and promote the methodology-review row to Complete.
- Upgrade the covariate doubly-robust outcome regression m_hat(X) from linear
OLS to a polynomial sieve (AIC/BIC order selection, same basis family as the
propensity-ratio sieve) so the covariate path attains the semiparametric
efficiency bound asymptotically under the paper's growing-sieve regularity
conditions (Assumption C.1 / Theorem 4.1), not only when the conditional mean
is linear. The OLS-RSS criterion uses the raw within-group count for n and the
penalty, so order selection is survey-weight-scale invariant. Degree 1
reproduces the prior linear OLS (set sieve_k_max=1 to force it).
- Make the sieves genuinely growing: remove the hard K<=5 ceiling across all
three nuisance sieves (outcome regression + the two pre-existing propensity
sieves) so the candidate order grows as floor(n_group^(1/5)) bounded by
n_basis<n_group (K/n->0) -- the regime Assumption C.1(5)/(6) require. A frozen
finite-order sieve would not generically attain the bound. No-op for groups
under ~3,125 units (floor(n^(1/5))<5 there); only activates higher orders at
large n. Behavior change for covariate fits.
- Extract _hausman_quadratic_form (behavior-preserving) for unit-testability.
- Add tests/test_methodology_efficient_did.py: paper-equation Verified
Components (Eq 3.5/3.13 weights, Eq 3.9 generated-outcome telescoping, Sec 4.1
closed form, Cor 3.1/3.2 PT-Post=CS, Thm 4.1 SE, Thm A.1 Hausman incl.
rank-deficient-V DOF + covariance-direction guard, sieve recovery, weighted
scale-invariance + fallback, growing-sieve order>5 at large n).
- Tighten the HRS Table 6 anchor to 0.05*SE; document the openICPSR 116186 data
license + the 656-vs-652 sample difference.
- Reconcile the now-stale linear-OLS claims to the growing sieve across REGISTRY,
the paper review, docstrings, api/efficient_did.rst, choosing_estimator.rst,
and llms-full.txt.
- METHODOLOGY_REVIEW.md row -> Complete; CHANGELOG entry; priority queue pruned.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
10
10
### Fixed
11
11
- **Covariate names that collide with reserved structural terms now raise `ValueError` instead of silently corrupting the coefficient dict (`DifferenceInDifferences`, `MultiPeriodDiD`, `TwoWayFixedEffects`).** These estimators build their `coefficients` dict by zipping a variable-name list -- structural term names PLUS the user covariate column names appended verbatim -- with the fitted coefficient vector. A covariate whose name equaled a reserved structural name (`const`; the treatment/time column names; the `{treatment}:{time}` interaction; MultiPeriodDiD `period_{p}` dummies and `{treatment}:period_{p}` interactions; `TwoWayFixedEffects` `ATT`; fixed-effect / unit / time dummy names; or an internal `_`-prefixed working column such as `_treat_time` / `_did_treatment` / `_treatment_post`) silently **overwrote** that structural coefficient via Python dict last-write-wins -- e.g. a covariate named `const` dropped the intercept -- with no error or warning. A new shared `validate_covariate_names` helper (`diff_diff/utils.py`) is now called in each of the three `fit()` methods before the design matrix is built; it raises `ValueError` on a collision (the comparison is case-sensitive, so e.g. `Const` is still allowed) **and** on duplicate names within `covariates` (which collapse to a single dict entry the same way). Fixed-effect/unit/time dummy reserved names are taken from the same `pd.get_dummies(..., drop_first=True)` call used to build them, so they match exactly (including for pandas `Categorical` columns with a non-default category order). For `TwoWayFixedEffects` the guard fires on **all** variance paths: the default within-transform path returns only `{"ATT": att}` (no covariate is a dict key there), but a covariate named `_treatment_post` would still clobber the internal interaction column, so guarding both paths is uniform and forward-compatible. **Potentially breaking:** a fit that previously *succeeded* with a colliding (or duplicated) covariate name -- silently returning a corrupted coefficient dict -- now raises; rename the covariate column(s). The staggered / influence-function estimators (CallawaySantAnna, SunAbraham, StaggeredTripleDifference, EfficientDiD, TwoStageDiD, ImputationDiD, WooldridgeDiD, dCDH, StackedDiD) key results by `(g, t)` tuples / relative-time indices, never covariate names, and `TripleDifference` / `SyntheticControl` / `SyntheticDiD` do not expose covariates by name, so none are affected. New tests in `tests/test_utils.py`, `tests/test_estimators.py`, and `tests/test_estimators_vcov_type.py`.
12
12
13
+
### Changed
14
+
- **EfficientDiD methodology-review-tracker promotion: In Progress → Complete, with a covariate outcome-regression upgrade (behavior change).** Completes the source-validation pass (PR-B) of the Chen, Sant'Anna & Xie (2025, arXiv:2506.17729v1) audit — PR-A (#515) added the paper review on file; this PR validates the source against the code, eliminates the one real deviation, adds paper-equation Verified Components, and flips the tracker. **Behavior change:** the covariate doubly-robust path's outcome regression `m̂(X)` was a **linear OLS** working model — consistent (doubly robust) but attaining the semiparametric efficiency bound only when the conditional mean is linear in the covariates. It is replaced by a **polynomial sieve** (total degree up to K, AIC/BIC order selection, the same basis family as the propensity-ratio sieve), so with the sieve propensity ratio and the kernel-smoothed conditional `Ω*(X)` all nuisances are estimated nonparametrically and the covariate path attains the bound under the paper's regularity conditions (Section 4 / Theorem 4.1). The order is chosen by an OLS information criterion `IC = n·ln(RSS/n) + c_n·p_K`, where `p_K = comb(K+d, d)` is the sieve basis dimension (number of fitted coefficients; `c_n = 2` AIC, `ln(n)` BIC), on the within-group (survey-weighted) residual sum of squares, using the **raw** within-group observation count for both `n` and the penalty so the selected order — and hence `m̂` — is invariant to the survey-weight scale (the existing `test_survey_phase3.py` scale-invariance asserts still hold to `atol=1e-8`). **Degree 1 reproduces the prior linear OLS up to floating point**, so AIC/BIC degrades to linear when the conditional mean is linear and covariate-fit numbers change only when a higher order is selected (i.e. when linear was inadequate); `sieve_k_max=1` forces every covariate-path sieve to degree 1 (it recovers the linear outcome-regression component but also degree-1-constrains the propensity sieves, so it does **not** reproduce the exact pre-PR estimator). The sieve is a *growing* sieve — the candidate degree is `floor(n_group^{1/5})` with **no fixed ceiling**, giving a basis dimension `p_K = comb(K+d,d)` bounded by `n_basis < n_group` (so `p_K/n → 0` for the low-dimensional covariate settings typical of DiD; Assumption C.1's rate is on the dimension, not the degree). This satisfies C.1's growing-sieve uniform-consistency / `o_p(n^{-1/2})` product-rate conditions (Theorem 4.1) under which the bound is attained asymptotically; a frozen finite-order sieve would not. (High-dimensional `X` faces the usual curse of dimensionality, where the paper's ML-nuisance option applies.) This also removes the prior hard `K≤5` cap from the two pre-existing propensity-ratio / inverse-propensity sieves (a no-op for groups under ~3,125 units, where `floor(n^{1/5}) < 5` anyway; it only activates higher orders at large n). The small-group overfit cap (`n_basis < n_group`), the rank-guard + partial-skip warnings, and the WLS survey path mirror the propensity sieve; if every degree is rank-skipped the estimator falls back to the intercept-only within-group mean (distinct from the propensity sieve's constant-ratio-1 fallback). The no-covariate path, weights, generated outcomes, `Ω*`, SE, aggregation, and Hausman are **unchanged** — the audit verified them correct against the paper (no other corrections). The Theorem A.1 Hausman statistic computation was extracted into a behavior-preserving `_hausman_quadratic_form` helper for unit-testability. New `tests/test_methodology_efficient_did.py` with paper-equation-numbered Verified Components (Eq 3.5/3.13 inverse-covariance weights + the min-variance property; Eq 3.9 generated-outcome telescoping; Eq 3.13/§4.1 no-covariate closed form; Corollary 3.1/3.2 PT-Post = Callaway-Sant'Anna; Theorem 4.1 SE = `sqrt(mean(EIF²)/n)`; Theorem A.1 / Eq A.2 Hausman with the restricted−efficient covariance direction, the effective-rank DOF safeguard on a rank-deficient `V`, and the covariance-direction guard; plus the sieve nonlinear-recovery / linear-degradation / efficiency-gain checks). The HRS Table 6 anchor (`tests/test_efficient_did_validation.py::TestHRSReplication`, a derived openICPSR 116186 subset) is tightened from 0.1·SE to **0.05·SE** (the fit is deterministic; all cells are < 0.03·SE), with the data license/redistribution and the 656-vs-652 sample difference documented in `tests/data/README.md`. REGISTRY `## EfficientDiD` Notes updated (outcome regression now sieve + bound-attainment under Assumption C.1; new K=1-fallback edge-case Note); module/class docstrings and the paper review's "open working-model choice" pointer reconciled; `METHODOLOGY_REVIEW.md` row promoted to **Complete** (`Last Review = 2026-06-01`) with a Verified Components / Corrections Made / Deviations detail block; priority queue pruned.
| Primary Reference | Chen, Sant'Anna & Xie (2025), *Efficient Difference-in-Differences and Event Study Estimators*|
633
633
| R Reference | (no canonical R package; paper compares against `did` / `DIDmultiplegt` / BJS / Gardner / Wooldridge as benchmarks rather than providing a reference implementation) |
634
-
| Status |**In Progress**|
635
-
| Last Review |—|
634
+
| Status |**Complete**|
635
+
| Last Review |2026-06-01|
636
636
637
637
**Documentation in place:**
638
-
- REGISTRY.md section: `## EfficientDiD` (full Theorem 4.1 EIF, sieve-based propensity-ratio estimation with AIC/BIC, kernel-smoothed conditional covariance, Hausman pretest for PT-All vs PT-Post, survey support)
639
-
- Implementation: 130 unit tests in `tests/test_efficient_did.py` + 12 validation tests in `tests/test_efficient_did_validation.py`
640
-
- Hausman pretest: implemented per Theorem A.1 with Moore-Penrose pseudoinverse for finite-sample non-PSD variance-difference matrix
641
-
- Survey support: pweight + strata/PSU/FPC via TSL on EIF scores; covariates DR path with WLS outcome regression and weighted sieve normal equations
638
+
- REGISTRY.md section: `## EfficientDiD` (full Theorem 4.1 EIF, sieve-based propensity-ratio and outcome-regression estimation with AIC/BIC, kernel-smoothed conditional covariance, Hausman pretest for PT-All vs PT-Post, survey support)
642
639
- Paper review on file: `docs/methodology/papers/chen-santanna-xie-2025-review.md` (PR-A, 2026-05-31) — faithful paper-sourced transcription of arXiv:2506.17729v1 (assumptions S/O/NA/PT-Post/PT-All; Theorem 3.1/3.2 EIFs + Corollaries 3.1/3.2; §4 sieve/kernel DR estimation; Theorem 4.1 SEs; Theorem A.1 Hausman; HRS Table 6 anchor)
643
-
644
-
**Outstanding for promotion (PR-B source validation; paper review now on file):**
- Cross-language anchor: the paper's empirical replication uses HRS data following Sun-Abraham (2021); a same-data benchmark against the paper's reported numbers (or a same-DGP MC against R alternatives) would substantiate the EIF construction
647
-
- Documented deviations: linear OLS working models for outcome regressions vs. paper's general nonparametric specification (DR safety net acknowledged but not separately validated); fixed-weight bootstrap aggregation vs. WIF-corrected analytical aggregation
**Corrections Made (PR-B source validation):** (None — implementation verified correct.) The PR-B walk-through traced each paper result against the source and found the no-covariate path (multi-baseline efficiency recovered via the `g'=g` same-cohort pairs), the generated outcome (Eq 3.9), the optimal weights (Eq 3.5/3.13), the conditional covariance Ω*(X) (Eq 3.12), the analytical SE (Theorem 4.1), the cohort-size event-study aggregation (with the `(G_g − π_g)` WIF correction), and the Hausman covariance direction (Eq A.2, restricted − efficient) all correct.
643
+
644
+
**Implementation change (deliberate, decided with the maintainer — eliminates a deviation rather than fixing a bug):**
645
+
- Covariate outcome regression upgraded from linear OLS to a **polynomial sieve** (AIC/BIC order selection, same basis family as the propensity-ratio sieve; a *growing* sieve with no fixed order ceiling — `floor(n_group^{1/5})`, bounded by `n_basis < n_group` — which, since C.1's rate is on the sieve *dimension* `p_K = comb(K+d,d)` (not the polynomial degree, which differ once `d > 1`), satisfies Assumption C.1's uniform-consistency / `o_p(n^{-1/2})` product-rate conditions for the low-dimensional covariate settings typical of DiD) so the doubly-robust covariate path attains the semiparametric efficiency bound asymptotically under the paper's nonparametric-nuisance specification (Section 4 / Theorem 4.1), not only when the conditional mean is linear. Degree 1 reproduces the prior linear OLS *outcome regression*; `sieve_k_max=1` forces all covariate-path sieves to degree 1 (it recovers the linear outcome component but also degree-1-constrains the propensity sieves, so it does **not** reproduce the exact pre-PR estimator). Removing the hard `K≤5` cap also updates the two pre-existing propensity-ratio / inverse-propensity sieves (a no-op for groups under ~3,125 units). `diff_diff/efficient_did_covariates.py::estimate_outcome_regression`.
646
+
- Extracted `_hausman_quadratic_form` (behaviour-preserving) so the Theorem A.1 statistic and effective-rank DOF logic are unit-testable in isolation.
-[x] Covariate sieve outcome regression: recovers a nonlinear-in-X conditional mean (K≥2), reproduces linear OLS on linear data (K=1), and beats a forced-linear working model under a nonlinear-nuisance conditional-PT DGP.
656
+
-[x] Empirical anchor: HRS Table 6 (`tests/test_efficient_did_validation.py::TestHRSReplication`) on the paper's data (a derived openICPSR 116186 subset) matches all six ATT(g,t) + ES(0)/ES(1)/ES(2) + ES_avg to < 0.03 published SE; the Compustat MC confirms unbiasedness, the CS efficiency gain, coverage, and SE calibration.
657
+
658
+
**Deviations (ratified; each carries a recognized REGISTRY label):**
659
+
-`overall_att` is the cohort-size-weighted post-treatment average (Callaway-Sant'Anna convention), not the paper's uniform `ES_avg` (Eq 2.3); `ES_avg` is recoverable from the event-study output.
660
+
- The multiplier bootstrap re-aggregates with fixed cohort-size weights (matching the CS bootstrap); the analytical path carries the `(G_g − π_g)` WIF weight-estimation correction.
661
+
- The Hausman χ² uses the effective rank of `V` as degrees of freedom (a finite-sample safeguard equal to `|E|` when `V` is well-conditioned), rather than a fixed `|E|`.
662
+
-`vcov_type` is permanently narrow to `{"hc1"}` (the IF-based variance has no single design matrix for analytical-sandwich families); the polynomial sieve basis and the Silverman kernel bandwidth are working-model choices the paper leaves open.
663
+
- SEs are i.i.d.-asymptotics (`sqrt(mean(EIF²)/n)` or multiplier bootstrap); cluster/survey EIF variants are documented library extensions beyond the paper's stated scope.
648
664
649
665
---
650
666
@@ -1410,11 +1426,10 @@ more graceful handling of edge cases while still signaling invalid inference to
1410
1426
1411
1427
Promotion priority for the **In Progress** entries, ordered by what's blocked on substantive review work (top of list = needs review next) vs. consolidation pass (bottom of list = mostly tracker walk-through):
1412
1428
1413
-
**Substantive-review-blocked (still missing a methodology test file / R parity — and, except for EfficientDiD, a paper review):**
1429
+
**Substantive-review-blocked (still missing a methodology test file / R parity and a paper review):**
1414
1430
1415
1431
1.**PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way.
1416
-
2.**EfficientDiD** — **paper review on file** (PR-A, `chen-santanna-xie-2025-review.md`); remaining PR-B work is the source-validation pass — `tests/test_methodology_efficient_did.py` (Theorem 3.1/3.2 / Eq 3.5 / Eq 4.3 Verified Components), the HRS Table 6 cross-language anchor, and the documented deviations against Chen, Sant'Anna & Xie (2025).
1417
-
3.**ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.
1432
+
2.**ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.
1418
1433
1419
1434
**Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):**
Copy file name to clipboardExpand all lines: TODO.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -162,6 +162,7 @@ Deferred items from PR reviews that were not addressed before merge.
162
162
| Rust-backend HC2 implementation. Current Rust path only supports HC1; HC2 and CR2 Bell-McCaffrey fall through to the NumPy backend. For large-n fits this is noticeable. |`rust/src/linalg.rs`| Phase 1a | Low |
163
163
| CR2 Bell-McCaffrey DOF uses a naive `O(n² k)` per-coefficient loop over cluster pairs. Pustejovsky-Tipton (2018) Appendix B has a scores-based formulation that avoids the full `n × n``M` matrix. Switch when a user hits a large-`n` cluster-robust design. |`linalg.py::_compute_cr2_bm`| Phase 1a | Low |
164
164
|`SyntheticControl` retains a full `_SyntheticControlFitSnapshot` (pivoted outcome/predictor panels) on EVERY fit to support the opt-in `in_space_placebo()`, so callers who never run the placebo still pay O(units × periods × predictor-vars) memory (same as `SyntheticDiD`'s always-on snapshot for `in_time_placebo`). Store a compact array/index representation instead of per-variable DataFrames, or build the snapshot lazily on first placebo call (would need to retain the source data, ~same cost). |`synthetic_control.py` snapshot build, `synthetic_control_results.py::_SyntheticControlFitSnapshot`| follow-up | Low |
165
+
| EfficientDiD DR (covariate) path rebuilds the full polynomial sieve basis `_polynomial_sieve_basis(X, K)` for every candidate `K` inside each of the three nuisance fits (outcome regression, propensity ratio, inverse propensity), per `fit()`. After the growing-sieve cap removal (PR-B), large covariate-adjusted fits at large `n` pay more avoidable basis-construction cost. Cache the basis per `(X, K)` within a `fit()` and share it across the nuisance helpers. |`diff_diff/efficient_did_covariates.py` (the three sieve helpers) | PR-B follow-up | Low |
0 commit comments