DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally by igerber · Pull Request #458 · igerber/diff-diff

igerber · 2026-05-16T20:35:47Z

Summary

Lift DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) NotImplementedError at diff_diff/estimators.py:382 (was line 373). Auto-route promotes absorb columns to fixed_effects= internally for HC2/HC2-BM fits so the existing full-dummy-design code path computes the algebraically correct vcov.
Empirical methodology gate: read clubSandwich source (R/CR-adjustments.R) before writing code. Confirmed the unweighted CR2 algebra (A_g = (I - H_gg)^{-1/2} with H on the full model matrix) is what diff-diff's existing _compute_cr2_bm already produces. Singleton-cluster CR2 trick (cluster=1:n) reduces to one-way HC2-BM Satterthwaite DOF.
R-parity at ~1e-10 vs lm() + sandwich::vcovHC(type="HC2") and lm() + clubSandwich::vcovCR(cluster=..., type="CR2") via new absorbed_fe_did scenario in benchmarks/data/clubsandwich_cr2_golden.json and new tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity test class.
TODO row 100 partial drain: DiD-absorb sub-gate addressed; TWFE and MPD-absorb sub-gates remain as documented follow-ups (different fit-path structure).

Methodology references (required if estimator / math changes)

Method name(s): HC2 leverage (Eicker-Huber-White), CR2 Bell-McCaffrey Satterthwaite DOF (Bell & McCaffrey 2002; Imbens & Kolesar 2016; Pustejovsky & Tipton 2018).
Paper / source link(s): Pustejovsky & Tipton (2018, J Business & Economic Statistics) §3.3 "absorbing form"; clubSandwich source at jepusto/clubSandwich/R/CR-adjustments.R.
Any intentional deviations from the source (and why): None. The auto-route produces algebra bit-equal to clubSandwich's vcovCR.lm(... type="CR2") for the unweighted case. Weighted variant (with the more elaborate Theta quadratic correction documented in CR-adjustments.R's inverse_var=FALSE branch) is deferred to a follow-up.

Validation

Tests added/updated:
- New tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity class (2 tests) — R-parity at 1e-10 against absorbed_fe_did golden scenario.
- tests/test_estimators_vcov_type.py::test_did_absorb_rejects_hc2_and_hc2_bm flipped from "raises NotImplementedError" to "auto-routes; matches fixed_effects= path bit-equal" (renamed test_did_absorb_hc2_and_hc2_bm_auto_route).
- tests/test_linalg_hc2_bm.py::test_cr2_parity_with_golden skips scenarios that don't fit the simple y ~ x 2-column contract.
Backtest / simulation / notebook evidence (if applicable): N/A — no tutorial updates.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-16T20:43:07Z

Overall Assessment

Needs changes

Executive Summary

The core routing idea is methodologically plausible: HC2 depends on projection-matrix leverage, and CR2 depends on cluster-block hat-matrix adjustments; Pustejovsky-Tipton’s absorbed-form result supports the unweighted fixed-effects case, so recovering the full FE hat via a full-dummy path is consistent with the source material. citeturn1search15turn0search0turn1search14
DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) is now supported in code, but the official methodology registry still says that exact surface must raise NotImplementedError.
The new parity harness generates clustered CR2 and Bell-McCaffrey df targets, but the tests only pin the unclustered SE/ATT path and explicitly discard the df target.
No security issues stood out in the changed files.

Methodology

P1 Impact: the shipped methodology contract is now self-contradictory. docs/methodology/REGISTRY.md L2550-L2553 still documents DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) as rejected, while diff_diff/estimators.py L366-L386 now silently auto-routes that case through fixed_effects=. Under your review rubric, this is an undocumented methodology/default-behavior change, and the changelog note at CHANGELOG.md does not mitigate it. Concrete fix: replace the stale REGISTRY note with a new **Note:** describing the supported DiD auto-route, its unweighted scope, the remaining TWFE/MPD rejections, and the user-visible result-object behavior.

Code Quality

No material findings in the changed implementation beyond the contract issues below.

Performance

No blocker. The dummy-expansion cost is the expected tradeoff of this implementation choice.

Maintainability

P2 Impact: the auto-route changes more than the variance path. Once diff_diff/estimators.py L382-L386 flips absorb into fixed_effects, the stored coefficients, vcov, residuals, fitted_values, and r_squared all come from the full-dummy branch at diff_diff/estimators.py L635-L678. The changelog only documents result.coefficients, and existing absorb guidance still talks about within-transformed R² semantics (for example tests/test_methodology_did.py L1247-L1252). Concrete fix: either normalize the returned DiDResults back to the absorb-style surface, or document and regression-test the full set of changed fields in REGISTRY.

Tech Debt

No separate finding. Updating TODO.md to keep only the remaining TWFE/MPD gates is appropriate.

Security

No findings.

Documentation/Tests

P1 Impact: the new validation does not actually pin the newly opened clustered CR2 path or the Bell-McCaffrey df-based inference. The R generator writes both clustered CR2 targets and per-coefficient df at benchmarks/R/generate_clubsandwich_golden.R L97-L117, and the golden JSON carries vcov_cr2 / dof_cr2 at benchmarks/data/clubsandwich_cr2_golden.json L45-L48. But tests/test_estimators_vcov_type.py L1032-L1053 only checks unclustered hc2_bm SE/ATT, explicitly throws away expected_dof_slope, and never exercises DifferenceInDifferences(cluster="unit", vcov_type="hc2_bm").fit(..., absorb=[...]). That leaves a silent mis-inference lane for both clustered CR2 and BM p-value/CI propagation. Concrete fix: add estimator-level golden tests for the clustered absorb+CR2 path using vcov_cr2/dof_cr2, and assert one-way hc2_bm inference through p_value/conf_int (or equivalent df-sensitive quantities), not just se.

Path to Approval

Update docs/methodology/REGISTRY.md so the DiD absorbed-FE HC2/HC2-BM surface is documented as supported via auto-route, with an explicit Note covering scope and remaining exclusions.
Add an end-to-end golden test for DifferenceInDifferences(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=["unit","period"]) against the stored CR2 targets.
Add a df-sensitive regression test for one-way hc2_bm on the absorb path, and either document or normalize the broader DiDResults surface change (coefficients, vcov, residuals, fitted_values, r_squared).

R1 review surfaced 2 P1 + 1 P2; all in-scope fixes. **P1.1 — REGISTRY contradicted code.** REGISTRY.md:2552 still said `DiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` raises NotImplementedError. Replaced the blanket-rejection Note with a per-estimator status block: DiD path is now SUPPORTED via auto-route (with the full DiDResults surface change documented inline); TWFE and MultiPeriodDiD paths still reject and are tracked as follow-ups. **P1.2 — Parity tests missed clustered-CR2 and df-sensitive inference.** The previous test class pinned only unclustered HC2-BM SE/ATT and explicitly discarded the df target. Two new tests: - `test_absorb_hc2_bm_clustered_matches_clubsandwich`: exercises `DiD(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=[...])` against the R `vcovCR(..., cluster=d$unit, type="CR2")` target, asserting SE+ATT match at 1e-10. - `test_absorb_hc2_bm_df_sensitive_inference`: asserts HC2 and HC2-BM give the SAME `se` but DIFFERENT `p_value` and `conf_int` (the BM Satterthwaite DOF must propagate to inference; CI width is wider under BM). This catches silent regressions where the auto-route passes SE through but uses n-k for inference. **P2 — CHANGELOG only mentioned `result.coefficients`.** The auto-route also affects `vcov`, `residuals`, `fitted_values`, `r_squared` (all come from the full-dummy fit under the route; `r_squared` is computed on the un-demeaned outcome and will typically be higher than the within-R²). Extended the CHANGELOG entry with the full `DiDResults`-surface contract change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T20:53:15Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d88354606957eafe4bc4030ccdd2a0f2f23687f3

Overall Assessment

Looks good

Executive Summary

The prior P1 methodology-contract issue is resolved: the registry now documents the DiD absorbed-FE auto-route, its remaining scope limits, and the broader DiDResults surface change at docs/methodology/REGISTRY.md L2551-L2555; the changelog mirrors that at CHANGELOG.md.
The prior P1 validation gap is largely resolved: the PR adds absorbed-FE parity fixtures plus estimator-level checks for auto-route equivalence, clustered CR2, and df-sensitive HC2 vs HC2-BM inference at benchmarks/R/generate_clubsandwich_golden.R L72-L118, tests/test_estimators_vcov_type.py L668-L708, and tests/test_estimators_vcov_type.py L1032-L1136.
The routing choice itself is consistent with the cited methodology: Pustejovsky-Tipton extend CR2/BRL to fixed-effects models, including absorbed-FE application, and clubSandwich's CR2 path uses Satterthwaite degrees of freedom; the package also tests CR2 against HC behavior when all clusters are singletons. (jepusto.com)
[Newly identified] One non-blocking scope mismatch remains: the legacy multi-absorb survey-weight guard still fires before the new auto-route, so the blanket "SUPPORTED" wording is slightly broader than the current implementation.

Methodology

Severity: P2
Impact: [Newly identified] DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) is documented as supported in docs/methodology/REGISTRY.md L2552-L2555, but DiD.fit() still raises for survey_design= with len(absorb) > 1 because the weighted multi-absorb guard at diff_diff/estimators.py L347-L355 runs before the auto-route at diff_diff/estimators.py L366-L386. The old survey regression test still locks in that rejection at tests/test_survey.py L2648-L2671. This is a scope/documentation mismatch, not a silent numbers bug.
Concrete fix: Either move the auto-route ahead of the survey-weight guard when the fit is going to the full-dummy fixed_effects= path, or narrow the REGISTRY/CHANGELOG wording to exclude survey-weighted multi-absorb fits.

Code Quality

No findings.

Performance

No findings. The full-dummy cost is the explicit tradeoff of this implementation choice.

Maintainability

No findings. The earlier ambiguity about the broader DiDResults surface is now explicitly documented in docs/methodology/REGISTRY.md and CHANGELOG.md.

Tech Debt

No findings. Remaining absorbed-FE HC2/HC2-BM follow-up work is now correctly narrowed to TwoWayFixedEffects and MultiPeriodDiD in TODO.md.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: The new fixture stores one-way and clustered Satterthwaite targets at benchmarks/R/generate_clubsandwich_golden.R L97-L117, but the estimator tests only assert exact SE/ATT parity for the clustered CR2 path and only directional df sensitivity for the one-way BM path at tests/test_estimators_vcov_type.py L1032-L1136. Because BM df is computed centrally in diff_diff/linalg.py L2799-L2841 and consumed in diff_diff/linalg.py L3181-L3226, I do not see evidence of a current correctness bug; this is test hardening, not a blocker.
Concrete fix: Optionally add exact p_value/CI or critical-value assertions against the stored dof_hc2_bm and dof_cr2 targets.
Reviewer note: I could not execute the suite in this environment because pytest and numpy are not installed.

R2 review flagged that REGISTRY/CHANGELOG documented `DiD(absorb=..., vcov_type in {hc2,hc2_bm})` as SUPPORTED, but the legacy `len(absorb) > 1 + survey_weights` guard at estimators.py:347 fired BEFORE the auto-route, so weighted multi-absorb fits still raised. The guard's rationale ("single-pass demeaning isn't the correct weighted FWL projection for N>1 absorbed dimensions") doesn't apply when we're auto-routing to fixed_effects= — the fixed_effects= path builds the full-dummy design and solves WLS directly with no within-transform. Reorder: move the auto-route block above the multi-absorb-survey guard. The guard now only fires when absorb was NOT consumed by the auto-route (i.e., hc1/classical/conley/etc. — paths that still demean). Adds `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` to pin the new placement against silent regression. The existing `test_survey.py` multi-absorb-survey rejection tests continue to pass (they use the default vcov_type=hc1 path which still hits the guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T21:01:04Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7d5ecd6d1889a1c565b116d30dc57ba3ff6d4f05

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review scope issue is resolved: DifferenceInDifferences.fit() now auto-routes absorb to fixed_effects before the legacy multi-absorb survey guard, and the scope notes/tests were updated to match (diff_diff/estimators.py:L347-L382, tests/test_estimators_vcov_type.py:L1098-L1138, docs/methodology/REGISTRY.md:L2551-L2555, TODO.md:L100-L100).
The methodology change is consistent with the existing registry rationale: HC2/CR2 small-sample corrections are now sent through the full-dummy FE path for DiD, while MultiPeriodDiD and TwoWayFixedEffects remain explicitly out of scope (docs/methodology/REGISTRY.md:L2551-L2555).
Validation is materially better than in the last review: the PR adds a new clubSandwich golden scenario plus estimator-level checks for unclustered HC2-BM, clustered CR2, survey multi-absorb ordering, and df-sensitive HC2 vs HC2-BM inference (benchmarks/R/generate_clubsandwich_golden.R:L72-L118, tests/test_estimators_vcov_type.py:L981-L1178).
I did not find any new unmitigated P0 or P1 issues in the changed diff.
[Newly identified] One P3 documentation-precision issue remains around the survey path.

Methodology
No findings. The changed estimator path in diff_diff/estimators.py:L347-L382 matches the registry’s full-hat-matrix rationale for HC2 / HC2-BM on absorbed FE, and the updated scope split in docs/methodology/REGISTRY.md:L2551-L2555 is now internally consistent.

Code Quality
No findings.

Performance
No findings. The explicit full-dummy route is the intended tradeoff for obtaining the correct leverage/hat-matrix quantities.

Maintainability
No findings.

Tech Debt
No findings. The remaining absorbed-FE HC2/HC2-BM follow-up work is still correctly narrowed to TwoWayFixedEffects and MultiPeriodDiD in TODO.md:L100-L100.

Security
No findings.

Documentation/Tests

Severity: P3 [Newly identified]
Impact: The new support wording in CHANGELOG.md:L11-L11 and docs/methodology/REGISTRY.md:L2552-L2555 reads as though absorb + hc2/hc2_bm now directly computes HC2/Bell-McCaffrey variance on every fit path. On survey fits, the library still bypasses analytical HC/CR variance and uses survey TSL or replicate variance instead (diff_diff/linalg.py:L2706-L2713, diff_diff/linalg.py:L2778-L2786, diff_diff/linalg.py:L2913-L2966; summary suppression rationale at diff_diff/results.py:L213-L221). This is not a correctness bug on the shipped path, but the wording is broader than the actual variance behavior.
Concrete fix: Add one sentence to the CHANGELOG/REGISTRY note clarifying that survey_design= continues to use survey variance, and that the new auto-route only changes the FE handling / removes the old absorbed-FE rejection.

Reviewer note: I could not execute the automated tests here because pytest and numpy are not installed in this environment.

R3 informational P3: REGISTRY/CHANGELOG wording could be read as implying survey fits compute HC2/HC2-BM analytically. Survey fits actually use Taylor-series linearization or replicate-weight variance regardless of `vcov_type` — the auto-route only changes the FE handling and removes the prior absorbed-FE reject. Added one-sentence clarifications in both surfaces so the documentation matches the variance dispatch in linalg.py / results.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T21:10:04Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 366ef8481b231e21321209cd02cc50ffc75519d6

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review documentation concern is resolved: CHANGELOG.md:L11 and docs/methodology/REGISTRY.md:L2551-L2555 now explicitly state that survey_design= keeps survey variance rather than analytical HC2/HC2-BM, and tests/test_estimators_vcov_type.py:L1098-L1138 pins the route ordering.
I did not find an unmitigated methodology defect in the new DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) path.
Validation is materially better: tests/test_estimators_vcov_type.py:L664-L708 checks absorb-vs-fixed-effects equivalence, and tests/test_estimators_vcov_type.py:L981-L1178 adds absorbed-FE R/clubSandwich parity, clustered CR2 parity, survey-route placement, and HC2-vs-HC2-BM df-sensitive inference coverage.
[Newly identified] Minor contract drift remains: the new auto-route in diff_diff/estimators.py:L347-L376 now runs before the longstanding absorb + fixed_effects rejection in diff_diff/estimators.py:L390-L396, but the surrounding API docs and existing rejection tests still describe that combination as invalid.

Methodology
No findings. The new route in diff_diff/estimators.py:L347-L376 matches the registry rationale in docs/methodology/REGISTRY.md:L2551-L2555: HC2 relies on model leverage (hatvalues), clubSandwich CR2 builds cluster adjustments from cluster hat blocks and an inverse square root, and Pustejovsky-Tipton explicitly discuss that BRL/CR2 adjustments for fixed-effects models depend on the full regression representation rather than a naively absorbed fit. citeturn1view0turn1view1turn1view2

Code Quality
No findings.

Performance
No findings. Routing to the full-dummy path is the expected tradeoff for a leverage-based vcov family.

Maintainability

Severity: P3 [Newly identified]
Impact: diff_diff/estimators.py:L347-L376 rewrites absorb into fixed_effects before the mixed-input guard, so DifferenceInDifferences(vcov_type="hc2"|"hc2_bm") now silently accepts absorb= together with fixed_effects=. That is probably numerically fine, but it drifts from the existing public contract still described in the fit() docstring at diff_diff/estimators.py:L254-L261 and the unchanged rejection tests at tests/test_survey.py:L3179-L3224.
Concrete fix: Either move the absorb and fixed_effects validation above the new auto-route to preserve the old API, or explicitly document and regression-test that mixed fixed_effects + absorb is intentionally supported on the HC2/HC2-BM full-dummy path.

Tech Debt
No findings. The remaining absorbed-FE HC2/HC2-BM work is correctly narrowed to MPD/TWFE in TODO.md:L100.

Security
No findings.

Documentation/Tests
No additional findings. The new coverage is strong for the changed lane, and the prior survey-scope wording issue is addressed. I could not execute the test suite in this environment because pytest and pandas are not installed.

R4 informational P3: my prior reordering moved the auto-route ahead of the multi-absorb-survey guard (correct) but ALSO left it ahead of the existing `absorb + fixed_effects` mutual-exclusion check. On hc2/hc2_bm the user-facing rejection vanished — the two args silently merged. Move the `absorb + fixed_effects` validation ABOVE the auto-route so the public-contract rejection fires regardless of vcov_type. Add a regression test that pins the rejection across hc1/hc2/hc2_bm to prevent silent regression on this contract. The legacy multi-absorb + survey-weights guard stays BELOW the auto-route (intentional from R2: when auto-routing, the demeaning rationale of that guard doesn't apply). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T21:16:40Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3a8a20195508af1526d43e6c2618a5c9b4cc0f6f

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review concern is resolved: the absorb + fixed_effects rejection now runs before the HC2/HC2-BM auto-route in diff_diff/estimators.py:L347-L361, and regression coverage was added in tests/test_estimators_vcov_type.py:L1098-L1127.
I did not find any unmitigated P0/P1 methodology defects in the changed DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) path. The implementation matches the registry rationale in docs/methodology/REGISTRY.md:L2551-L2555.
Validation is materially stronger: absorb-vs-fixed-effects equivalence is pinned in tests/test_estimators_vcov_type.py:L664-L708, CR2/HC2-BM golden coverage is added in tests/test_estimators_vcov_type.py:L1032-L1096, and the new R fixture is generated in benchmarks/R/generate_clubsandwich_golden.R:L72-L118.
Remaining absorbed-FE HC2/HC2-BM limitations are correctly narrowed to TwoWayFixedEffects and MultiPeriodDiD in TODO.md:L100 and docs/methodology/REGISTRY.md:L2553-L2555.
I could not execute the targeted tests here because the environment is missing pytest and numpy; this review is based on static diff/code inspection.

Methodology

No findings.

Code Quality

No findings.

Performance

No findings. Routing leverage-based HC2/CR2 work to the full-dummy design is the expected tradeoff for correctness here.

Maintainability

No findings.

Tech Debt

No findings. The remaining unsupported absorbed-FE HC2/HC2-BM surfaces are properly tracked in TODO.md:L100.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: CHANGELOG.md:L11 and docs/methodology/REGISTRY.md:L2552 say the new absorbed-FE DiD route was validated against both sandwich::vcovHC(type="HC2") and clubSandwich::vcovCR(...), but the committed R golden generator only materializes CR2-derived targets (vcov_did_hc2_bm and clustered vcov_did_cr2) in benchmarks/R/generate_clubsandwich_golden.R:L95-L117, and the new hc2 test in tests/test_estimators_vcov_type.py:L1055-L1063 only checks finite SE plus ATT parity, not an external HC2 SE target. That leaves the external HC2-parity claim under-pinned in-tree for this scenario.
Concrete fix: Add an R sandwich::vcovHC(type="HC2") target for absorbed_fe_did and assert the treat_post SE in tests/test_estimators_vcov_type.py:L1055-L1063, or narrow the changelog/registry wording so it only claims external golden parity for the CR2/HC2-BM lanes.

R5 informational P3: CHANGELOG/REGISTRY claimed parity against BOTH `sandwich::vcovHC(type="HC2")` AND `clubSandwich::vcovCR(...)`, but the R generator only materialized CR2-derived targets for `absorbed_fe_did`. The HC2-parity claim was verified in a throwaway smoke test but not pinned in-tree. Added `vcov_hc2` to the R generator output (computed via `sandwich::vcovHC(fit_did, type = "HC2")`) and a corresponding Python parity test `test_absorb_hc2_matches_sandwich_vcovhc` that asserts the treat_post slope SE matches at 1e-10. Replaces the prior weaker `test_absorb_hc2_matches_full_dummy_design` (which only checked finite-SE + ATT parity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T21:22:34Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 597d0fc48b35317663749da8d96d1eb2d685e156

Overall Assessment

✅ Looks good

Executive Summary

The prior HC2-parity gap is resolved: the R generator now materializes an external sandwich::vcovHC(type="HC2") target for the absorbed-FE DiD scenario, and the test suite now asserts the treat_post SE against it in benchmarks/R/generate_clubsandwich_golden.R:L72-L123 and tests/test_estimators_vcov_type.py:L1055-L1074.
The estimator change remains methodologically consistent with the registry: DifferenceInDifferences.fit() only auto-routes absorb to fixed_effects for vcov_type in {"hc2","hc2_bm"}, matching the documented full-hat-matrix rationale in diff_diff/estimators.py:L363-L404 and docs/methodology/REGISTRY.md:L2550-L2555.
The earlier contract-drift risk is still closed off: absorb + fixed_effects is rejected before the auto-route, with regression coverage in diff_diff/estimators.py:L347-L361 and tests/test_estimators_vcov_type.py:L1109-L1137.
Clustered CR2 parity and the weighted survey multi-absorb placement are both explicitly covered in tests/test_estimators_vcov_type.py:L1076-L1179.
The generic CR2 golden harness correctly skips the new multi-column DiD fixture and leaves that case to the DiD-specific parity class in tests/test_linalg_hc2_bm.py:L548-L574.
I did not find any unmitigated P0/P1 issues in the changed files. I could not execute pytest here because this environment is missing both pytest and numpy, so this is a static review.

Methodology

No findings. The change in diff_diff/estimators.py:L363-L404 matches the methodology note in docs/methodology/REGISTRY.md:L2550-L2555, and the remaining unsupported absorbed-FE HC2/HC2-BM paths are accurately narrowed to TWFE and MPD in TODO.md:L100-L101.

Code Quality

No findings. The PR preserves the absorb + fixed_effects mutual-exclusion contract before any auto-route logic in diff_diff/estimators.py:L347-L361.

Performance

No findings. Routing only the HC2/HC2-BM branch to the full-dummy path is the expected correctness tradeoff; HC1/CR1 absorbed fits remain on the cheaper within-transformed route in diff_diff/estimators.py:L372-L388.

Maintainability

No findings. The new benchmark/test split is clean: the DiD-specific absorbed-FE scenario lives in the estimator parity tests, while the generic y ~ x CR2 harness stays focused on _compute_cr2_bm inputs in tests/test_linalg_hc2_bm.py:L548-L574.

Tech Debt

No findings. The follow-up surface is documented and tracked in TODO.md:L100-L101.

Security

No findings.

Documentation/Tests

No findings. The previous documentation/test mismatch around external HC2 parity is addressed by the added R generator target and matching assertion in benchmarks/R/generate_clubsandwich_golden.R:L97-L123 and tests/test_estimators_vcov_type.py:L1055-L1074.

Lifts `DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})` NotImplementedError at `estimators.py:373` (previous) → auto-route at line 382 (new). FWL preserves coefficients and residuals under within-transform but not the hat matrix, so HC2 leverage and CR2 Bell-McCaffrey DOF need the FULL FE hat. Internally promoting `absorb=` to `fixed_effects=` for HC2/HC2-BM fits builds the full-dummy design and routes through the existing fixed-effects code path, which already computes the correct vcov. Verified by reading clubSandwich's `R/CR-adjustments.R` source (the CR2 unweighted branch's `A_g = (I - H_gg)^{-1/2}` with H built on the full model matrix is exactly what diff-diff's existing `_compute_cr2_bm` produces). Singleton-cluster CR2 (`cluster=1:n`) reduces to one-way HC2-BM Satterthwaite DOF — the PT2018-blessed workaround we use for the unclustered HC2-BM goldens. Parity tested at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` via new `tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against new `absorbed_fe_did` scenario in `benchmarks/data/clubsandwich_cr2_golden.json` (regenerated via the extended `benchmarks/R/generate_clubsandwich_golden.R`). Out of scope (TODO.md partial drain): `TwoWayFixedEffects` and `MultiPeriodDiD(absorb=...)` rejections remain — they have different fit-path structure that needs separate surgery. Weighted variants (`hc2_bm + weights`) and Conley + absorb paths are unchanged. Behavioral note: under the auto-route, `result.coefficients` now contains the FE-dummy entries (matching the `fixed_effects=` path), not the slope-only view a plain `absorb=` returns. Downstream consumers reading `result.att` are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R1 review surfaced 2 P1 + 1 P2; all in-scope fixes. **P1.1 — REGISTRY contradicted code.** REGISTRY.md:2552 still said `DiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` raises NotImplementedError. Replaced the blanket-rejection Note with a per-estimator status block: DiD path is now SUPPORTED via auto-route (with the full DiDResults surface change documented inline); TWFE and MultiPeriodDiD paths still reject and are tracked as follow-ups. **P1.2 — Parity tests missed clustered-CR2 and df-sensitive inference.** The previous test class pinned only unclustered HC2-BM SE/ATT and explicitly discarded the df target. Two new tests: - `test_absorb_hc2_bm_clustered_matches_clubsandwich`: exercises `DiD(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=[...])` against the R `vcovCR(..., cluster=d$unit, type="CR2")` target, asserting SE+ATT match at 1e-10. - `test_absorb_hc2_bm_df_sensitive_inference`: asserts HC2 and HC2-BM give the SAME `se` but DIFFERENT `p_value` and `conf_int` (the BM Satterthwaite DOF must propagate to inference; CI width is wider under BM). This catches silent regressions where the auto-route passes SE through but uses n-k for inference. **P2 — CHANGELOG only mentioned `result.coefficients`.** The auto-route also affects `vcov`, `residuals`, `fitted_values`, `r_squared` (all come from the full-dummy fit under the route; `r_squared` is computed on the un-demeaned outcome and will typically be higher than the within-R²). Extended the CHANGELOG entry with the full `DiDResults`-surface contract change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R2 review flagged that REGISTRY/CHANGELOG documented `DiD(absorb=..., vcov_type in {hc2,hc2_bm})` as SUPPORTED, but the legacy `len(absorb) > 1 + survey_weights` guard at estimators.py:347 fired BEFORE the auto-route, so weighted multi-absorb fits still raised. The guard's rationale ("single-pass demeaning isn't the correct weighted FWL projection for N>1 absorbed dimensions") doesn't apply when we're auto-routing to fixed_effects= — the fixed_effects= path builds the full-dummy design and solves WLS directly with no within-transform. Reorder: move the auto-route block above the multi-absorb-survey guard. The guard now only fires when absorb was NOT consumed by the auto-route (i.e., hc1/classical/conley/etc. — paths that still demean). Adds `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` to pin the new placement against silent regression. The existing `test_survey.py` multi-absorb-survey rejection tests continue to pass (they use the default vcov_type=hc1 path which still hits the guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R3 informational P3: REGISTRY/CHANGELOG wording could be read as implying survey fits compute HC2/HC2-BM analytically. Survey fits actually use Taylor-series linearization or replicate-weight variance regardless of `vcov_type` — the auto-route only changes the FE handling and removes the prior absorbed-FE reject. Added one-sentence clarifications in both surfaces so the documentation matches the variance dispatch in linalg.py / results.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R4 informational P3: my prior reordering moved the auto-route ahead of the multi-absorb-survey guard (correct) but ALSO left it ahead of the existing `absorb + fixed_effects` mutual-exclusion check. On hc2/hc2_bm the user-facing rejection vanished — the two args silently merged. Move the `absorb + fixed_effects` validation ABOVE the auto-route so the public-contract rejection fires regardless of vcov_type. Add a regression test that pins the rejection across hc1/hc2/hc2_bm to prevent silent regression on this contract. The legacy multi-absorb + survey-weights guard stays BELOW the auto-route (intentional from R2: when auto-routing, the demeaning rationale of that guard doesn't apply). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R5 informational P3: CHANGELOG/REGISTRY claimed parity against BOTH `sandwich::vcovHC(type="HC2")` AND `clubSandwich::vcovCR(...)`, but the R generator only materialized CR2-derived targets for `absorbed_fe_did`. The HC2-parity claim was verified in a throwaway smoke test but not pinned in-tree. Added `vcov_hc2` to the R generator output (computed via `sandwich::vcovHC(fit_did, type = "HC2")`) and a corresponding Python parity test `test_absorb_hc2_matches_sandwich_vcovhc` that asserts the treat_post slope SE matches at 1e-10. Replaces the prior weaker `test_absorb_hc2_matches_full_dummy_design` (which only checked finite-SE + ATT parity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-17T00:08:47Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b49324b55bb23ca3966b34d5a49f9e50c1231b6

Overall Assessment

✅ Looks good

Executive Summary

Previous re-review P1+ items appear addressed: the absorb + fixed_effects rejection now fires before the auto-route, preserving the public API contract in diff_diff/estimators.py:L347-L360.
The implementation is methodologically aligned with the registry: only DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) is lifted, while TWFE and MPD remain rejected and explicitly tracked in docs/methodology/REGISTRY.md:L2550-L2557 and TODO.md:L99-L100.
The new parity surface is materially stronger: the R generator now emits absorbed-FE DiD HC2, singleton-cluster HC2-BM, and unit-cluster CR2 targets in benchmarks/R/generate_clubsandwich_golden.R:L72-L123, and the estimator tests exercise those paths in tests/test_estimators_vcov_type.py:L981-L1219.
I did not find any unmitigated P0/P1 issues in the changed files.
Static review only: I could not execute pytest here because pytest is not installed in this environment.

Methodology

No findings. The new route in diff_diff/estimators.py:L363-L404 matches the methodology note in docs/methodology/REGISTRY.md:L2551-L2555, and the remaining unsupported absorbed-FE surfaces are correctly narrowed to TWFE/MPD in TODO.md:L99-L100.

Code Quality

No findings. The contract-preserving absorb + fixed_effects rejection is correctly placed ahead of the auto-route in diff_diff/estimators.py:L347-L360.

Performance

No findings. The expensive full-dummy path is limited to the hc2 / hc2_bm branch, while HC1/CR1 absorbed fits stay on the cheaper within-transformed route in diff_diff/estimators.py:L363-L404.

Maintainability

No findings. The implementation, registry note, TODO narrowing, R benchmark generator, and estimator parity tests are internally consistent across the changed files.

Tech Debt

No findings. The remaining absorbed-FE HC2/HC2-BM work is explicitly tracked rather than silently broadened in scope (TODO.md:L99-L100).

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The new absorbed-FE fixture records exact dof_hc2_bm and dof_cr2, but the estimator tests only pin ATT/SE parity and a qualitative HC2-vs-HC2-BM inference difference; a future regression in exact Satterthwaite DOF propagation on this rank-deficient FE design could still pass as long as BM still widens the CI. Concrete fix: assert exact p_value/conf_int parity (or derived DOF parity) for the treat_post coefficient using the stored fixture values on both the singleton-cluster and unit-cluster CR2 paths. References: benchmarks/R/generate_clubsandwich_golden.R:L102-L123, tests/test_estimators_vcov_type.py:L1040-L1053, tests/test_estimators_vcov_type.py:L1076-L1107, tests/test_estimators_vcov_type.py:L1181-L1219.

…ixed_effects= Mirrors PR igerber#458 (DiD-absorb auto-route) on MultiPeriodDiD: when absorb= is paired with vcov_type in {hc2, hc2_bm}, the fit promotes the absorb columns to fixed_effects= internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (treated + period_X dummies + treated:period_X interactions + factor(unit)). Verified at ~1e-15 vs lm() + sandwich::vcovHC(type="HC2") and lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2") on a new 5-cohort x 5-period mpd_absorbed_fe_did fixture. Includes three-guard reorder so the auto-route sits BETWEEN the absorb + fixed_effects mutual-exclusion check (above) and the multi-absorb + survey-weights reject (below), matching the DiD ordering. The survey-replicate absorb-refit branch at estimators.py:1689 is short- circuited under the auto-route (the standard compute_replicate_vcov path applies on the fixed full-dummy design; no per-replicate refit needed). Tests: new TestMPDAbsorbedFERParity class (7 tests) mirrors PR igerber#458's TestDiDAbsorbedFERParity, pinning parity targets on per-period interaction coefficients (treated:period_4) to avoid the treated x unit collinearity baked into MPD's time-invariant ever-treated indicator. Existing test_multi_period_absorb_rejects_hc2_and_hc2_bm deleted. REGISTRY.md per-estimator status block updated (MPD moves REJECT -> SUPPORTED; TWFE remains the only REJECT case). TODO row 99 narrowed to TWFE-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…vg_att inference Closes Gate 6 of the six HC2/HC2-BM NotImplementedError gates: MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") at estimators.py:1657 previously raised NotImplementedError because _compute_cr2_bm returns per-coefficient Satterthwaite DOF only — the post-period-average ATT (`avg_att = (1/n_post) Sum_{t >= t_treat} beta_t`) is a compound contrast that needed a cluster-aware contrast DOF helper. New _compute_cr2_bm_contrast_dof in diff_diff/linalg.py generalizes the per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra (`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`, `DOF = trace(B)^2 / trace(B^2)`). _compute_cr2_bm is refactored to call the new helper via a private _cr2_bm_dof_inner with `contrasts=eye(k)`; refactor regression at atol=1e-10 confirms the per-coefficient DOFs are preserved (matmul ordering differs slightly from the prior inline loop). MultiPeriodDiD.fit() extends its existing avg_att DOF block (introduced in PR igerber#459) to branch on effective_cluster_ids: one-way _compute_bm_dof_from_contrasts when None, cluster-aware _compute_cr2_bm_contrast_dof otherwise. Cluster IDs are per-observation length n and are NOT subscripted by the rank-deficient column-drop mask `_kept` (which indexes coefficients, not observations). R parity verified at atol=1e-10 against clubSandwich's Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom on a new mpd_clustered_avg_att_dof fixture in benchmarks/data/clubsandwich_cr2_golden.json. On a 1-row constraint matrix, HTZ reduces to a Satterthwaite t-test and its df_denom IS the BM Satterthwaite DOF. The pre-flight smoke test against this same R target passed at atol=1e-13 before any source edits. Tests: - TestCR2BMContrastDOF (4 new tests): refactor regression vs library, R-parity for compound contrast, shape validation, cluster-count validation. - test_multi_period_cluster_plus_hc2_bm_rejected flipped to test_multi_period_cluster_plus_hc2_bm_produces_finite_inference (end-to-end MPD wire-through with finite avg_att / period_effects inference assertions). After this PR, 3 of 6 HC2/HC2-BM gates are lifted (DiD-absorb igerber#458, MPD-absorb igerber#459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE absorb (Gate 1), weighted HC2-BM (Gates 4-5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026

igerber and others added 6 commits May 16, 2026 20:00

igerber force-pushed the wave-5-hc2-cr2-bm-extensions branch from 597d0fc to 3b49324 Compare May 17, 2026 00:03

igerber merged commit a7bd40d into main May 17, 2026
26 checks passed

igerber deleted the wave-5-hc2-cr2-bm-extensions branch May 17, 2026 01:09

igerber mentioned this pull request May 17, 2026

Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally#458

DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally#458
igerber merged 6 commits into
mainfrom
wave-5-hc2-cr2-bm-extensions

igerber commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 16, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant