DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally#458
Conversation
|
Overall Assessment Needs changes Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
R1 review surfaced 2 P1 + 1 P2; all in-scope fixes.
**P1.1 — REGISTRY contradicted code.** REGISTRY.md:2552 still said
`DiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` raises NotImplementedError.
Replaced the blanket-rejection Note with a per-estimator status block:
DiD path is now SUPPORTED via auto-route (with the full DiDResults
surface change documented inline); TWFE and MultiPeriodDiD paths still
reject and are tracked as follow-ups.
**P1.2 — Parity tests missed clustered-CR2 and df-sensitive inference.**
The previous test class pinned only unclustered HC2-BM SE/ATT and
explicitly discarded the df target. Two new tests:
- `test_absorb_hc2_bm_clustered_matches_clubsandwich`: exercises
`DiD(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=[...])`
against the R `vcovCR(..., cluster=d$unit, type="CR2")` target,
asserting SE+ATT match at 1e-10.
- `test_absorb_hc2_bm_df_sensitive_inference`: asserts HC2 and HC2-BM
give the SAME `se` but DIFFERENT `p_value` and `conf_int` (the BM
Satterthwaite DOF must propagate to inference; CI width is wider
under BM). This catches silent regressions where the auto-route
passes SE through but uses n-k for inference.
**P2 — CHANGELOG only mentioned `result.coefficients`.** The auto-route
also affects `vcov`, `residuals`, `fitted_values`, `r_squared` (all
come from the full-dummy fit under the route; `r_squared` is computed
on the un-demeaned outcome and will typically be higher than the
within-R²). Extended the CHANGELOG entry with the full
`DiDResults`-surface contract change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R2 review flagged that REGISTRY/CHANGELOG documented
`DiD(absorb=..., vcov_type in {hc2,hc2_bm})` as SUPPORTED, but the
legacy `len(absorb) > 1 + survey_weights` guard at estimators.py:347
fired BEFORE the auto-route, so weighted multi-absorb fits still raised.
The guard's rationale ("single-pass demeaning isn't the correct weighted
FWL projection for N>1 absorbed dimensions") doesn't apply when we're
auto-routing to fixed_effects= — the fixed_effects= path builds the
full-dummy design and solves WLS directly with no within-transform.
Reorder: move the auto-route block above the multi-absorb-survey guard.
The guard now only fires when absorb was NOT consumed by the auto-route
(i.e., hc1/classical/conley/etc. — paths that still demean).
Adds `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` to pin the
new placement against silent regression. The existing `test_survey.py`
multi-absorb-survey rejection tests continue to pass (they use the
default vcov_type=hc1 path which still hits the guard).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
Reviewer note: I could not execute the automated tests here because |
R3 informational P3: REGISTRY/CHANGELOG wording could be read as implying survey fits compute HC2/HC2-BM analytically. Survey fits actually use Taylor-series linearization or replicate-weight variance regardless of `vcov_type` — the auto-route only changes the FE handling and removes the prior absorbed-FE reject. Added one-sentence clarifications in both surfaces so the documentation matches the variance dispatch in linalg.py / results.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology Code Quality Performance Maintainability
Tech Debt Security Documentation/Tests |
R4 informational P3: my prior reordering moved the auto-route ahead of the multi-absorb-survey guard (correct) but ALSO left it ahead of the existing `absorb + fixed_effects` mutual-exclusion check. On hc2/hc2_bm the user-facing rejection vanished — the two args silently merged. Move the `absorb + fixed_effects` validation ABOVE the auto-route so the public-contract rejection fires regardless of vcov_type. Add a regression test that pins the rejection across hc1/hc2/hc2_bm to prevent silent regression on this contract. The legacy multi-absorb + survey-weights guard stays BELOW the auto-route (intentional from R2: when auto-routing, the demeaning rationale of that guard doesn't apply). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. Code Quality No findings. Performance No findings. Routing leverage-based HC2/CR2 work to the full-dummy design is the expected tradeoff for correctness here. Maintainability No findings. Tech Debt No findings. The remaining unsupported absorbed-FE HC2/HC2-BM surfaces are properly tracked in Security No findings. Documentation/Tests Severity: P3 |
R5 informational P3: CHANGELOG/REGISTRY claimed parity against BOTH `sandwich::vcovHC(type="HC2")` AND `clubSandwich::vcovCR(...)`, but the R generator only materialized CR2-derived targets for `absorbed_fe_did`. The HC2-parity claim was verified in a throwaway smoke test but not pinned in-tree. Added `vcov_hc2` to the R generator output (computed via `sandwich::vcovHC(fit_did, type = "HC2")`) and a corresponding Python parity test `test_absorb_hc2_matches_sandwich_vcovhc` that asserts the treat_post slope SE matches at 1e-10. Replaces the prior weaker `test_absorb_hc2_matches_full_dummy_design` (which only checked finite-SE + ATT parity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. The change in Code Quality No findings. The PR preserves the Performance No findings. Routing only the HC2/HC2-BM branch to the full-dummy path is the expected correctness tradeoff; HC1/CR1 absorbed fits remain on the cheaper within-transformed route in Maintainability No findings. The new benchmark/test split is clean: the DiD-specific absorbed-FE scenario lives in the estimator parity tests, while the generic Tech Debt No findings. The follow-up surface is documented and tracked in Security No findings. Documentation/Tests No findings. The previous documentation/test mismatch around external HC2 parity is addressed by the added R generator target and matching assertion in |
Lifts `DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`
NotImplementedError at `estimators.py:373` (previous) → auto-route at line
382 (new). FWL preserves coefficients and residuals under within-transform
but not the hat matrix, so HC2 leverage and CR2 Bell-McCaffrey DOF need
the FULL FE hat. Internally promoting `absorb=` to `fixed_effects=` for
HC2/HC2-BM fits builds the full-dummy design and routes through the
existing fixed-effects code path, which already computes the correct vcov.
Verified by reading clubSandwich's `R/CR-adjustments.R` source (the CR2
unweighted branch's `A_g = (I - H_gg)^{-1/2}` with H built on the full
model matrix is exactly what diff-diff's existing `_compute_cr2_bm`
produces). Singleton-cluster CR2 (`cluster=1:n`) reduces to one-way
HC2-BM Satterthwaite DOF — the PT2018-blessed workaround we use for the
unclustered HC2-BM goldens.
Parity tested at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and
`lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` via new
`tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against
new `absorbed_fe_did` scenario in `benchmarks/data/clubsandwich_cr2_golden.json`
(regenerated via the extended `benchmarks/R/generate_clubsandwich_golden.R`).
Out of scope (TODO.md partial drain): `TwoWayFixedEffects` and
`MultiPeriodDiD(absorb=...)` rejections remain — they have different
fit-path structure that needs separate surgery. Weighted variants
(`hc2_bm + weights`) and Conley + absorb paths are unchanged.
Behavioral note: under the auto-route, `result.coefficients` now contains
the FE-dummy entries (matching the `fixed_effects=` path), not the
slope-only view a plain `absorb=` returns. Downstream consumers reading
`result.att` are unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R1 review surfaced 2 P1 + 1 P2; all in-scope fixes.
**P1.1 — REGISTRY contradicted code.** REGISTRY.md:2552 still said
`DiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` raises NotImplementedError.
Replaced the blanket-rejection Note with a per-estimator status block:
DiD path is now SUPPORTED via auto-route (with the full DiDResults
surface change documented inline); TWFE and MultiPeriodDiD paths still
reject and are tracked as follow-ups.
**P1.2 — Parity tests missed clustered-CR2 and df-sensitive inference.**
The previous test class pinned only unclustered HC2-BM SE/ATT and
explicitly discarded the df target. Two new tests:
- `test_absorb_hc2_bm_clustered_matches_clubsandwich`: exercises
`DiD(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=[...])`
against the R `vcovCR(..., cluster=d$unit, type="CR2")` target,
asserting SE+ATT match at 1e-10.
- `test_absorb_hc2_bm_df_sensitive_inference`: asserts HC2 and HC2-BM
give the SAME `se` but DIFFERENT `p_value` and `conf_int` (the BM
Satterthwaite DOF must propagate to inference; CI width is wider
under BM). This catches silent regressions where the auto-route
passes SE through but uses n-k for inference.
**P2 — CHANGELOG only mentioned `result.coefficients`.** The auto-route
also affects `vcov`, `residuals`, `fitted_values`, `r_squared` (all
come from the full-dummy fit under the route; `r_squared` is computed
on the un-demeaned outcome and will typically be higher than the
within-R²). Extended the CHANGELOG entry with the full
`DiDResults`-surface contract change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 review flagged that REGISTRY/CHANGELOG documented
`DiD(absorb=..., vcov_type in {hc2,hc2_bm})` as SUPPORTED, but the
legacy `len(absorb) > 1 + survey_weights` guard at estimators.py:347
fired BEFORE the auto-route, so weighted multi-absorb fits still raised.
The guard's rationale ("single-pass demeaning isn't the correct weighted
FWL projection for N>1 absorbed dimensions") doesn't apply when we're
auto-routing to fixed_effects= — the fixed_effects= path builds the
full-dummy design and solves WLS directly with no within-transform.
Reorder: move the auto-route block above the multi-absorb-survey guard.
The guard now only fires when absorb was NOT consumed by the auto-route
(i.e., hc1/classical/conley/etc. — paths that still demean).
Adds `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` to pin the
new placement against silent regression. The existing `test_survey.py`
multi-absorb-survey rejection tests continue to pass (they use the
default vcov_type=hc1 path which still hits the guard).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R3 informational P3: REGISTRY/CHANGELOG wording could be read as implying survey fits compute HC2/HC2-BM analytically. Survey fits actually use Taylor-series linearization or replicate-weight variance regardless of `vcov_type` — the auto-route only changes the FE handling and removes the prior absorbed-FE reject. Added one-sentence clarifications in both surfaces so the documentation matches the variance dispatch in linalg.py / results.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R4 informational P3: my prior reordering moved the auto-route ahead of the multi-absorb-survey guard (correct) but ALSO left it ahead of the existing `absorb + fixed_effects` mutual-exclusion check. On hc2/hc2_bm the user-facing rejection vanished — the two args silently merged. Move the `absorb + fixed_effects` validation ABOVE the auto-route so the public-contract rejection fires regardless of vcov_type. Add a regression test that pins the rejection across hc1/hc2/hc2_bm to prevent silent regression on this contract. The legacy multi-absorb + survey-weights guard stays BELOW the auto-route (intentional from R2: when auto-routing, the demeaning rationale of that guard doesn't apply). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R5 informational P3: CHANGELOG/REGISTRY claimed parity against BOTH `sandwich::vcovHC(type="HC2")` AND `clubSandwich::vcovCR(...)`, but the R generator only materialized CR2-derived targets for `absorbed_fe_did`. The HC2-parity claim was verified in a throwaway smoke test but not pinned in-tree. Added `vcov_hc2` to the R generator output (computed via `sandwich::vcovHC(fit_did, type = "HC2")`) and a corresponding Python parity test `test_absorb_hc2_matches_sandwich_vcovhc` that asserts the treat_post slope SE matches at 1e-10. Replaces the prior weaker `test_absorb_hc2_matches_full_dummy_design` (which only checked finite-SE + ATT parity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
597d0fc to
3b49324
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ixed_effects= Mirrors PR igerber#458 (DiD-absorb auto-route) on MultiPeriodDiD: when absorb= is paired with vcov_type in {hc2, hc2_bm}, the fit promotes the absorb columns to fixed_effects= internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (treated + period_X dummies + treated:period_X interactions + factor(unit)). Verified at ~1e-15 vs lm() + sandwich::vcovHC(type="HC2") and lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2") on a new 5-cohort x 5-period mpd_absorbed_fe_did fixture. Includes three-guard reorder so the auto-route sits BETWEEN the absorb + fixed_effects mutual-exclusion check (above) and the multi-absorb + survey-weights reject (below), matching the DiD ordering. The survey-replicate absorb-refit branch at estimators.py:1689 is short- circuited under the auto-route (the standard compute_replicate_vcov path applies on the fixed full-dummy design; no per-replicate refit needed). Tests: new TestMPDAbsorbedFERParity class (7 tests) mirrors PR igerber#458's TestDiDAbsorbedFERParity, pinning parity targets on per-period interaction coefficients (treated:period_4) to avoid the treated x unit collinearity baked into MPD's time-invariant ever-treated indicator. Existing test_multi_period_absorb_rejects_hc2_and_hc2_bm deleted. REGISTRY.md per-estimator status block updated (MPD moves REJECT -> SUPPORTED; TWFE remains the only REJECT case). TODO row 99 narrowed to TWFE-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vg_att inference
Closes Gate 6 of the six HC2/HC2-BM NotImplementedError gates:
MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") at estimators.py:1657
previously raised NotImplementedError because _compute_cr2_bm returns
per-coefficient Satterthwaite DOF only — the post-period-average ATT
(`avg_att = (1/n_post) Sum_{t >= t_treat} beta_t`) is a compound
contrast that needed a cluster-aware contrast DOF helper.
New _compute_cr2_bm_contrast_dof in diff_diff/linalg.py generalizes the
per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast
matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra
(`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`,
`DOF = trace(B)^2 / trace(B^2)`). _compute_cr2_bm is refactored to
call the new helper via a private _cr2_bm_dof_inner with
`contrasts=eye(k)`; refactor regression at atol=1e-10 confirms the
per-coefficient DOFs are preserved (matmul ordering differs slightly
from the prior inline loop).
MultiPeriodDiD.fit() extends its existing avg_att DOF block (introduced
in PR igerber#459) to branch on effective_cluster_ids: one-way
_compute_bm_dof_from_contrasts when None, cluster-aware
_compute_cr2_bm_contrast_dof otherwise. Cluster IDs are per-observation
length n and are NOT subscripted by the rank-deficient column-drop
mask `_kept` (which indexes coefficients, not observations).
R parity verified at atol=1e-10 against clubSandwich's
Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom on a new
mpd_clustered_avg_att_dof fixture in
benchmarks/data/clubsandwich_cr2_golden.json. On a 1-row constraint
matrix, HTZ reduces to a Satterthwaite t-test and its df_denom IS the
BM Satterthwaite DOF. The pre-flight smoke test against this same R
target passed at atol=1e-13 before any source edits.
Tests:
- TestCR2BMContrastDOF (4 new tests): refactor regression vs library,
R-parity for compound contrast, shape validation, cluster-count
validation.
- test_multi_period_cluster_plus_hc2_bm_rejected flipped to
test_multi_period_cluster_plus_hc2_bm_produces_finite_inference
(end-to-end MPD wire-through with finite avg_att / period_effects
inference assertions).
After this PR, 3 of 6 HC2/HC2-BM gates are lifted (DiD-absorb igerber#458,
MPD-absorb igerber#459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE
absorb (Gate 1), weighted HC2-BM (Gates 4-5).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})NotImplementedError atdiff_diff/estimators.py:382(was line 373). Auto-route promotes absorb columns tofixed_effects=internally for HC2/HC2-BM fits so the existing full-dummy-design code path computes the algebraically correct vcov.R/CR-adjustments.R) before writing code. Confirmed the unweighted CR2 algebra (A_g = (I - H_gg)^{-1/2}with H on the full model matrix) is what diff-diff's existing_compute_cr2_bmalready produces. Singleton-cluster CR2 trick (cluster=1:n) reduces to one-way HC2-BM Satterthwaite DOF.lm() + sandwich::vcovHC(type="HC2")andlm() + clubSandwich::vcovCR(cluster=..., type="CR2")via newabsorbed_fe_didscenario inbenchmarks/data/clubsandwich_cr2_golden.jsonand newtests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParitytest class.Methodology references (required if estimator / math changes)
jepusto/clubSandwich/R/CR-adjustments.R.vcovCR.lm(... type="CR2")for the unweighted case. Weighted variant (with the more elaborateThetaquadratic correction documented inCR-adjustments.R'sinverse_var=FALSEbranch) is deferred to a follow-up.Validation
tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParityclass (2 tests) — R-parity at 1e-10 againstabsorbed_fe_didgolden scenario.tests/test_estimators_vcov_type.py::test_did_absorb_rejects_hc2_and_hc2_bmflipped from "raises NotImplementedError" to "auto-routes; matchesfixed_effects=path bit-equal" (renamedtest_did_absorb_hc2_and_hc2_bm_auto_route).tests/test_linalg_hc2_bm.py::test_cr2_parity_with_goldenskips scenarios that don't fit the simpley ~ x2-column contract.Security / privacy
Generated with Claude Code