Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed
- **CI + local AI PR-reviewer model upgraded `gpt-5.4` → `gpt-5.5`.** The CI Codex reviewer (`.github/workflows/ai_pr_review.yml`) and the local `/ai-review-local` default (`.claude/scripts/openai_review.py` `DEFAULT_MODEL`) now run `gpt-5.5` @ `xhigh` effort / `read-only` sandbox (all other invocation settings unchanged). Validated empirically before the swap via the `tools/reviewer-eval/` A/B harness: on a real-bug corpus plus a k=6 big-diff de-risk, `gpt-5.5` matched-or-beat `gpt-5.4` on every test-backed recall case (including a bug buried in a ~3k-line methodology diff), added zero false positives, and ran faster; an end-to-end CI canary confirmed the action environment (`openai/codex-action@v1`, codex CLI 0.135.0) runs `gpt-5.5` and catches a planted P0. `gpt-5.5` is also added to the reasoning-model set (`_is_reasoning_model`, for the api-backend timeout/token limits) and to the `PRICING` table at OpenAI's confirmed standard rates (`gpt-5.5` $5/$30, `gpt-5.5-pro` $30/$180 per 1M input/output tokens) — the production CI + local reviewer run `gpt-5.5` via the flat-rate **codex backend**, but `--backend auto` falls back to the metered API path when the codex CLI is unavailable, so the cost estimate must stay accurate there. `gpt-5.4` remains accepted.
- **EfficientDiD methodology-review-tracker promotion: In Progress → Complete, with a covariate outcome-regression upgrade (behavior change).** Completes the source-validation pass (PR-B) of the Chen, Sant'Anna & Xie (2025, arXiv:2506.17729v1) audit — PR-A (#515) added the paper review on file; this PR validates the source against the code, eliminates the one real deviation, adds paper-equation Verified Components, and flips the tracker. **Behavior change:** the covariate doubly-robust path's outcome regression `m̂(X)` was a **linear OLS** working model — consistent (doubly robust) but attaining the semiparametric efficiency bound only when the conditional mean is linear in the covariates. It is replaced by a **polynomial sieve** (total degree up to K, AIC/BIC order selection, the same basis family as the propensity-ratio sieve), so with the sieve propensity ratio and the kernel-smoothed conditional `Ω*(X)` all nuisances are estimated nonparametrically and the covariate path attains the bound under the paper's regularity conditions (Section 4 / Theorem 4.1). The order is chosen by an OLS information criterion `IC = n·ln(RSS/n) + c_n·p_K`, where `p_K = comb(K+d, d)` is the sieve basis dimension (number of fitted coefficients; `c_n = 2` AIC, `ln(n)` BIC), on the within-group (survey-weighted) residual sum of squares, using the within-group **positive-weight support** count for both `n` and the penalty (the raw row count when unweighted) so the selected order — and hence `m̂` — is invariant both to the survey-weight scale and to zero-weight (survey-subpopulation / padded) rows. The weighted RSS / Gram / loss totals are already inert to zero-weight rows; keying order selection (auto-k_max, the `n_basis` admissibility cap, and the IC sample-size terms) off the positive-weight support — in the outcome regression **and** both propensity sieves — keeps selection inert too. The auto Silverman bandwidth for the kernel-smoothed conditional `Ω*(X)` is likewise evaluated on the positive-weight support, so the overidentified (`H > 1`) DR path — where `Ω*(X)` feeds the per-unit efficient weights — is invariant as well (otherwise a zero-weight row with an extreme covariate would inflate the unweighted std and move the bandwidth). So a zero-weight-padded survey fit returns a bit-identical point estimate to the positive-weight-only fit (verified end-to-end in `tests/test_methodology_efficient_did.py::TestSurveyZeroWeightInvariance`, both a just-identified `H=1` case pinning sieve-order selection and an overidentified `H>1` case pinning the kernel bandwidth, plus a helper-level auto-order regression test; each fails under the respective raw-count / all-row selector; the existing `test_survey_phase3.py` scale-invariance asserts still hold to `atol=1e-8`). **Degree 1 reproduces the prior linear OLS up to floating point**, so AIC/BIC degrades to linear when the conditional mean is linear and covariate-fit numbers change only when a higher order is selected (i.e. when linear was inadequate); `sieve_k_max=1` forces every covariate-path sieve to degree 1 (it recovers the linear outcome-regression component but also degree-1-constrains the propensity sieves, so it does **not** reproduce the exact pre-PR estimator). The sieve is a *growing* sieve — the candidate degree is `floor(n_pos^{1/5})` (the positive-weight support `n_pos`, the raw group size when unweighted) with **no fixed ceiling**, giving a basis dimension `p_K = comb(K+d,d)` bounded by `n_basis < n_pos` (so `p_K/n → 0` for the low-dimensional covariate settings typical of DiD; Assumption C.1's rate is on the dimension, not the degree). This satisfies C.1's growing-sieve uniform-consistency / `o_p(n^{-1/2})` product-rate conditions (Theorem 4.1) under which the bound is attained asymptotically; a frozen finite-order sieve would not. (High-dimensional `X` faces the usual curse of dimensionality, where the paper's ML-nuisance option applies.) This also removes the prior hard `K≤5` cap from the two pre-existing propensity-ratio / inverse-propensity sieves (a no-op for groups under ~3,125 units, where `floor(n^{1/5}) < 5` anyway; it only activates higher orders at large n). The small-group overfit cap (`n_basis < n_group`), the rank-guard + partial-skip warnings, and the WLS survey path mirror the propensity sieve; if every degree is rank-skipped the estimator falls back to the intercept-only within-group mean (distinct from the propensity sieve's constant-ratio-1 fallback). The no-covariate path, weights, generated outcomes, `Ω*`, SE, aggregation, and Hausman are **unchanged** — the audit verified them correct against the paper (no other corrections). The Theorem A.1 Hausman statistic computation was extracted into a behavior-preserving `_hausman_quadratic_form` helper for unit-testability. New `tests/test_methodology_efficient_did.py` with paper-equation-numbered Verified Components (Eq 3.5/3.13 inverse-covariance weights + the min-variance property; Eq 3.9 generated-outcome telescoping; Eq 3.13/§4.1 no-covariate closed form; Corollary 3.1/3.2 PT-Post = Callaway-Sant'Anna; Theorem 4.1 SE = `sqrt(mean(EIF²)/n)`; Theorem A.1 / Eq A.2 Hausman with the restricted−efficient covariance direction, the effective-rank DOF safeguard on a rank-deficient `V`, and the covariance-direction guard; plus the sieve nonlinear-recovery / linear-degradation / efficiency-gain checks). The HRS Table 6 anchor (`tests/test_efficient_did_validation.py::TestHRSReplication`, a derived openICPSR 116186 subset) is tightened from 0.1·SE to **0.05·SE** (the fit is deterministic; all cells are < 0.03·SE), with the data license/redistribution and the 656-vs-652 sample difference documented in `tests/data/README.md`. REGISTRY `## EfficientDiD` Notes updated (outcome regression now sieve + bound-attainment under Assumption C.1; new K=1-fallback edge-case Note); module/class docstrings and the paper review's "open working-model choice" pointer reconciled; `METHODOLOGY_REVIEW.md` row promoted to **Complete** (`Last Review = 2026-06-01`) with a Verified Components / Corrections Made / Deviations detail block; priority queue pruned.

## [3.5.0] - 2026-06-01

Expand Down
45 changes: 30 additions & 15 deletions METHODOLOGY_REVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
| ImputationDiD | `imputation.py` | `didimputation` | **In Progress** | — |
| TwoStageDiD | `two_stage.py` | `did2s` | **In Progress** | — |
| WooldridgeDiD (ETWFE) | `wooldridge.py` | `etwfe` (R) / `jwdid` (Stata) | **Complete** | 2026-05-22 |
| EfficientDiD | `efficient_did.py` | (no canonical R package) | **In Progress** | |
| EfficientDiD | `efficient_did.py` | (no canonical R package) | **Complete** | 2026-06-01 |

### Continuous & Universal-Treatment Estimators

Expand Down Expand Up @@ -631,20 +631,36 @@ and covariate-adjusted specifications.)
| Module | `efficient_did.py`, `efficient_did_bootstrap.py`, `efficient_did_covariates.py`, `efficient_did_weights.py` |
| Primary Reference | Chen, Sant'Anna & Xie (2025), *Efficient Difference-in-Differences and Event Study Estimators* |
| R Reference | (no canonical R package; paper compares against `did` / `DIDmultiplegt` / BJS / Gardner / Wooldridge as benchmarks rather than providing a reference implementation) |
| Status | **In Progress** |
| Last Review | |
| Status | **Complete** |
| Last Review | 2026-06-01 |

**Documentation in place:**
- REGISTRY.md section: `## EfficientDiD` (full Theorem 4.1 EIF, sieve-based propensity-ratio estimation with AIC/BIC, kernel-smoothed conditional covariance, Hausman pretest for PT-All vs PT-Post, survey support)
- Implementation: 130 unit tests in `tests/test_efficient_did.py` + 12 validation tests in `tests/test_efficient_did_validation.py`
- Hausman pretest: implemented per Theorem A.1 with Moore-Penrose pseudoinverse for finite-sample non-PSD variance-difference matrix
- Survey support: pweight + strata/PSU/FPC via TSL on EIF scores; covariates DR path with WLS outcome regression and weighted sieve normal equations
- REGISTRY.md section: `## EfficientDiD` (full Theorem 4.1 EIF, sieve-based propensity-ratio and outcome-regression estimation with AIC/BIC, kernel-smoothed conditional covariance, Hausman pretest for PT-All vs PT-Post, survey support)
- Paper review on file: `docs/methodology/papers/chen-santanna-xie-2025-review.md` (PR-A, 2026-05-31) — faithful paper-sourced transcription of arXiv:2506.17729v1 (assumptions S/O/NA/PT-Post/PT-All; Theorem 3.1/3.2 EIFs + Corollaries 3.1/3.2; §4 sieve/kernel DR estimation; Theorem 4.1 SEs; Theorem A.1 Hausman; HRS Table 6 anchor)

**Outstanding for promotion (PR-B source validation; paper review now on file):**
- Dedicated `tests/test_methodology_efficient_did.py` with Theorem 3.2 / Equation 3.5 / Equation 4.3 numbered Verified Components walk-through
- Cross-language anchor: the paper's empirical replication uses HRS data following Sun-Abraham (2021); a same-data benchmark against the paper's reported numbers (or a same-DGP MC against R alternatives) would substantiate the EIF construction
- Documented deviations: linear OLS working models for outcome regressions vs. paper's general nonparametric specification (DR safety net acknowledged but not separately validated); fixed-weight bootstrap aggregation vs. WIF-corrected analytical aggregation
- Tests: `tests/test_efficient_did.py` (unit/API), `tests/test_efficient_did_validation.py` (HRS Table 6 + Compustat MC), and `tests/test_methodology_efficient_did.py` (PR-B paper-equation Verified Components)

**Corrections Made (PR-B source validation):** (None — implementation verified correct.) The PR-B walk-through traced each paper result against the source and found the no-covariate path (multi-baseline efficiency recovered via the `g'=g` same-cohort pairs), the generated outcome (Eq 3.9), the optimal weights (Eq 3.5/3.13), the conditional covariance Ω*(X) (Eq 3.12), the analytical SE (Theorem 4.1), the cohort-size event-study aggregation (with the `(G_g − π_g)` WIF correction), and the Hausman covariance direction (Eq A.2, restricted − efficient) all correct.

**Implementation change (deliberate, decided with the maintainer — eliminates a deviation rather than fixing a bug):**
- Covariate outcome regression upgraded from linear OLS to a **polynomial sieve** (AIC/BIC order selection, same basis family as the propensity-ratio sieve; a *growing* sieve with no fixed order ceiling — `floor(n_pos^{1/5})` over the positive-weight support `n_pos` (raw group size when unweighted; zero-weight survey rows are inert for order selection), bounded by `n_basis < n_pos` — which, since C.1's rate is on the sieve *dimension* `p_K = comb(K+d,d)` (not the polynomial degree, which differ once `d > 1`), satisfies Assumption C.1's uniform-consistency / `o_p(n^{-1/2})` product-rate conditions for the low-dimensional covariate settings typical of DiD) so the doubly-robust covariate path attains the semiparametric efficiency bound asymptotically under the paper's nonparametric-nuisance specification (Section 4 / Theorem 4.1), not only when the conditional mean is linear. Degree 1 reproduces the prior linear OLS *outcome regression*; `sieve_k_max=1` forces all covariate-path sieves to degree 1 (it recovers the linear outcome component but also degree-1-constrains the propensity sieves, so it does **not** reproduce the exact pre-PR estimator). Removing the hard `K≤5` cap also updates the two pre-existing propensity-ratio / inverse-propensity sieves (a no-op for groups under ~3,125 units). `diff_diff/efficient_did_covariates.py::estimate_outcome_regression`.
- Extracted `_hausman_quadratic_form` (behaviour-preserving) so the Theorem A.1 statistic and effective-rank DOF logic are unit-testable in isolation.

**Verified Components** (`tests/test_methodology_efficient_did.py`, paper-equation-numbered):
- [x] Inverse-covariance optimal weights `1'Ω*⁻¹/(1'Ω*⁻¹1)` (Eq 3.5 / 3.13) + the min-variance property + the singular-Ω* pseudoinverse path.
- [x] No-covariate generated outcome (Eq 3.9): `g'=g` telescopes to the per-baseline DiD (Eq 3.3); `g'=∞` to the period-1 long-difference.
- [x] No-covariate efficient ATT = `weights @ generated_outcomes` (Eq 3.13 / §4.1), rebuilt independently from within-group sample means/covariances.
- [x] PT-Post just-identified reduction = Callaway-Sant'Anna single-baseline (Corollary 3.1 single-date exact at 1e-9; Corollary 3.2 staggered).
- [x] Analytical SE = `sqrt(mean(EIF²)/n)` (Theorem 4.1).
- [x] Hausman statistic (Theorem A.1 / Eq A.2): `H = Δ'V⁺Δ`, `V = aCov(ẼS) − aCov(ÊS)` (restricted − efficient); `df = |E|` (well-conditioned) and `df = effective_rank < |E|` (rank-deficient safeguard); covariance-direction guard.
- [x] Covariate sieve outcome regression: recovers a nonlinear-in-X conditional mean (K≥2), reproduces linear OLS on linear data (K=1), and beats a forced-linear working model under a nonlinear-nuisance conditional-PT DGP.
- [x] Empirical anchor: HRS Table 6 (`tests/test_efficient_did_validation.py::TestHRSReplication`) on the paper's data (a derived openICPSR 116186 subset) matches all six ATT(g,t) + ES(0)/ES(1)/ES(2) + ES_avg to < 0.03 published SE; the Compustat MC confirms unbiasedness, the CS efficiency gain, coverage, and SE calibration.

**Deviations (ratified; each carries a recognized REGISTRY label):**
- `overall_att` is the cohort-size-weighted post-treatment average (Callaway-Sant'Anna convention), not the paper's uniform `ES_avg` (Eq 2.3); `ES_avg` is recoverable from the event-study output.
- The multiplier bootstrap re-aggregates with fixed cohort-size weights (matching the CS bootstrap); the analytical path carries the `(G_g − π_g)` WIF weight-estimation correction.
- The Hausman χ² uses the effective rank of `V` as degrees of freedom (a finite-sample safeguard equal to `|E|` when `V` is well-conditioned), rather than a fixed `|E|`.
- `vcov_type` is permanently narrow to `{"hc1"}` (the IF-based variance has no single design matrix for analytical-sandwich families); the polynomial sieve basis and the Silverman kernel bandwidth are working-model choices the paper leaves open.
- SEs are i.i.d.-asymptotics (`sqrt(mean(EIF²)/n)` or multiplier bootstrap); cluster/survey EIF variants are documented library extensions beyond the paper's stated scope.

---

Expand Down Expand Up @@ -1410,11 +1426,10 @@ more graceful handling of edge cases while still signaling invalid inference to

Promotion priority for the **In Progress** entries, ordered by what's blocked on substantive review work (top of list = needs review next) vs. consolidation pass (bottom of list = mostly tracker walk-through):

**Substantive-review-blocked (still missing a methodology test file / R parity and, except for EfficientDiD, a paper review):**
**Substantive-review-blocked (still missing a methodology test file / R parity and a paper review):**

1. **PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way.
2. **EfficientDiD** — **paper review on file** (PR-A, `chen-santanna-xie-2025-review.md`); remaining PR-B work is the source-validation pass — `tests/test_methodology_efficient_did.py` (Theorem 3.1/3.2 / Eq 3.5 / Eq 4.3 Verified Components), the HRS Table 6 cross-language anchor, and the documented deviations against Chen, Sant'Anna & Xie (2025).
3. **ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.
2. **ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.

**Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):**

Expand Down
Loading
Loading