Skip to content

Commit 034356e

Browse files
igerberclaude
andcommitted
linalg: rank-guarded generalized inverse for DR/OR influence-function SEs
A near-singular covariate Gram matrix (constant/collinear covariate) made the per-cell propensity-score Hessian / outcome-regression bread blow up in the influence-function SE of CallawaySantAnna, TripleDifference, and StaggeredTripleDifference: np.linalg.solve/inv raise LinAlgError only on EXACTLY singular matrices, so a near-singular Gram returned a garbage inverse (~1e13) that flowed into the SE while the ATT point estimate stayed valid. Reproduced: CS dr overall_se 5.1e13, TD reg se 1.8e17, SDDD SEs 30-100x inflated. - Add shared _rank_guarded_inv(A, *, rcond=1e-10, tracker) to linalg.py: when rank-deficient it inverts a COLUMN-DROPPED principal submatrix (pivoted QR on the symmetric-equilibrated Gram) — the SAME generalized-inverse convention the point estimate uses (_detect_rank_deficiency / R lm()). The fast path returns solve(A, I) unchanged (R-parity); all-NaN only on true rank-0. - Because it uses the point estimate's column-drop (not a minimum-norm pseudo-inverse, which diverges when the IF multiplier leaves range(A)), the analytical SE equals the well-conditioned near-collinear limit (verified se_ratio ~ 1 across reg/ipw/dr) for every per-cell bread, control AND treated. A covariate rank-deficient only within one cell still enters the other cells' full-rank fits, so a degenerate covariate spec legitimately moves the ATT/SE (surfaced by the aggregate warning) — no min-norm divergence. - CS: route _safe_inv through the helper; fix the var_psi>0-else-0.0 mask so a rank-0 cell yields NaN. SDDD: route the OR-IF and PS-Hessian inversions. TD: route 7 inv/pinv sites + add the per-fit aggregate rank-guard warning. - Rank-guard warning suppressed under rank_deficient_action="silent" (uniform); "error" is enforced upstream at the point-estimate solve (raises before any IF SE), so the IF guard is reached only under warn/silent. - Docs: REGISTRY rank-guard Notes (column-drop = full-rank limit + 1e-10 rationale + error enforcement), CHANGELOG entry, TODO structural-inverse follow-up. - Tests: helper contract, finite-SE/warning/golden-unchanged + survey-weighted + RCS/notyettreated + error-mode + cell-aliasing (control vs drop-one, treated vs near-collinear full-rank limit) for CS/TD/SDDD. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent dd18dc5 commit 034356e

12 files changed

Lines changed: 1000 additions & 119 deletions

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1818
- **TwoStageDiD `vcov_type` threading + Results metadata (Phase 1b interstitial #5, final, permanently narrow).** `TwoStageDiD(vcov_type=...)` now accepts `{"hc1"}` only (default), completing the Phase 1b initiative across all 8 standalone estimators. Analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` are REJECTED at `__init__` / `fit()` with methodology-rooted messages: TwoStageDiD's variance is the Gardner (2022) two-stage GMM cluster-sandwich whose meat is the per-cluster GMM-corrected score `S_g = gamma_hat' c_g - X'_{2g} eps_{2g}`, which folds first-stage FE estimation uncertainty into the score — there is no single hat matrix spanning both stages on which HC2 leverage or Bell-McCaffrey Satterthwaite DOF can be defined, and the Gardner correction has not been derived for the leverage-corrected/homoskedastic meat (no reference implementation; mirrors the SpilloverDiD `classical` rejection). `cluster=` and `survey_design=` paths are numerically unchanged (bit-identical for healthy fits). **`TwoStageDiDResults` additions (no rename, no BC break):** new `vcov_type` / `cluster_name` / `n_clusters` fields + `to_dict()` method. `summary()` renders a `Variance estimator: <label>` line after the survey block (suppressed under bootstrap — `Inference method: bootstrap` + `Bootstrap replications: <n>` shown instead — and under any survey design). Default `cluster=None` renders `"CR1 cluster-robust at <unit>, G=<n_units>"` because the Gardner sandwich auto-clusters at the unit column (did2s no-FSA convention — the `CR1` label carries no `(n-1)/(n-p)` factor, matching R `did2s`; same convention as ImputationDiD's Theorem 3 variance). Defensive `n_clusters<2` NaN guard added to the multiplier-bootstrap path (was ≈0 SE from BLAS roundoff) plus a survey-PSU `n_psu<2` parity guard. `cluster=` with a replicate-weight survey design now raises `NotImplementedError` (replicate-refit variance ignores `cluster=`). `vcov_type='conley'` deferred to a TODO follow-up row.
1919

2020
### Fixed
21-
- **Scale-invariant rank detection and least-squares solve in the shared OLS backend (`diff_diff/linalg.py` + `rust/src/linalg.rs`).** `_detect_rank_deficiency` ran pivoted QR on the raw design matrix with a rank threshold anchored to the largest pivot diagonal, so a covariate on a large raw scale (e.g. population, income in cents, market cap) inflated the threshold and **false-dropped the intercept/treatment/interaction columns to NaN on an otherwise full-rank model** — a `DifferenceInDifferences` fit with a covariate ×1 or ×1e4 returned the correct ATT while the same covariate ×1e8 returned `ATT=NaN`. Even after detection, the `scipy.lstsq(cond=1e-7)` solve (and the Rust SVD truncation) truncated the genuine small-scale direction relative to the huge column, returning finite-but-wrong coefficients. Detection now runs a raw pivoted QR first and only re-checks on column-equilibrated (unit 2-norm) columns when the raw pass reports a deficiency, adopting the higher equilibrated rank only when the raw drop was scale-induced; the least-squares solve equilibrates columns and unscales the coefficients. This is applied in both the Python and Rust backends, making rank detection and the fit invariant to per-column scaling while leaving everything else unchanged: it is a no-op for full-rank well-conditioned designs (R-parity goldens unaffected) and does **not** change which column is dropped in a *well-scaled* collinear design (the established raw pivot selection is preserved); a scale-induced under-count instead adopts the scale-corrected equilibrated selection (which may differ from the raw choice but retains an identified full-rank subset). Also fixes a cryptic `IndexError: arrays used as indices must be of integer type` when a design collapsed to rank 0 (e.g. a constant FE-collinear covariate in `ImputationDiD`/`TwoStageDiD`): `solve_ols` now returns all-NaN coefficients cleanly, and `solve_poisson` raises a clear `ValueError`. (`solve_logit` always prepends an intercept, so rank 0 is unreachable there — its index array is just made `dtype=int` for consistency.) New regression tests in `tests/test_linalg.py` assert scale-invariance of fitted values/t-stats, a finite ATT through the public DiD estimator with a 1e8-scale covariate, rank-0 NaN handling (OLS) / clear `ValueError` (`solve_poisson`), that the scale repair preserves the raw collinear drop selection for well-scaled genuinely-collinear designs, and that the mixed scale+collinearity case retains an identified full-rank subset (huge independent column kept) with valid inference on the kept coefficients. Both backends verified equivalent (`tests/test_rust_backend.py`). **Scope:** this covers covariate fits routed through `solve_ols` — DiD, TWFE, MultiPeriodDiD, ImputationDiD, TwoStageDiD, and TripleDifference. CallawaySantAnna and StaggeredTripleDifference fit their covariate outcome-regression nuisance via estimator-local `cho_solve` on `X'X` / `scipy.lstsq(cond=1e-7)` that are not yet equilibrated; making those scale-robust — plus the DR/OR influence-function SE rank-guard for CS / TripleDifference / StaggeredTripleDifference (local matrix inverses) — is deferred (see TODO.md).
21+
- **Scale-invariant rank detection and least-squares solve in the shared OLS backend (`diff_diff/linalg.py` + `rust/src/linalg.rs`).** `_detect_rank_deficiency` ran pivoted QR on the raw design matrix with a rank threshold anchored to the largest pivot diagonal, so a covariate on a large raw scale (e.g. population, income in cents, market cap) inflated the threshold and **false-dropped the intercept/treatment/interaction columns to NaN on an otherwise full-rank model** — a `DifferenceInDifferences` fit with a covariate ×1 or ×1e4 returned the correct ATT while the same covariate ×1e8 returned `ATT=NaN`. Even after detection, the `scipy.lstsq(cond=1e-7)` solve (and the Rust SVD truncation) truncated the genuine small-scale direction relative to the huge column, returning finite-but-wrong coefficients. Detection now runs a raw pivoted QR first and only re-checks on column-equilibrated (unit 2-norm) columns when the raw pass reports a deficiency, adopting the higher equilibrated rank only when the raw drop was scale-induced; the least-squares solve equilibrates columns and unscales the coefficients. This is applied in both the Python and Rust backends, making rank detection and the fit invariant to per-column scaling while leaving everything else unchanged: it is a no-op for full-rank well-conditioned designs (R-parity goldens unaffected) and does **not** change which column is dropped in a *well-scaled* collinear design (the established raw pivot selection is preserved); a scale-induced under-count instead adopts the scale-corrected equilibrated selection (which may differ from the raw choice but retains an identified full-rank subset). Also fixes a cryptic `IndexError: arrays used as indices must be of integer type` when a design collapsed to rank 0 (e.g. a constant FE-collinear covariate in `ImputationDiD`/`TwoStageDiD`): `solve_ols` now returns all-NaN coefficients cleanly, and `solve_poisson` raises a clear `ValueError`. (`solve_logit` always prepends an intercept, so rank 0 is unreachable there — its index array is just made `dtype=int` for consistency.) New regression tests in `tests/test_linalg.py` assert scale-invariance of fitted values/t-stats, a finite ATT through the public DiD estimator with a 1e8-scale covariate, rank-0 NaN handling (OLS) / clear `ValueError` (`solve_poisson`), that the scale repair preserves the raw collinear drop selection for well-scaled genuinely-collinear designs, and that the mixed scale+collinearity case retains an identified full-rank subset (huge independent column kept) with valid inference on the kept coefficients. Both backends verified equivalent (`tests/test_rust_backend.py`). **Scope:** this covers covariate fits routed through `solve_ols` — DiD, TWFE, MultiPeriodDiD, ImputationDiD, TwoStageDiD, and TripleDifference. CallawaySantAnna and StaggeredTripleDifference fit their covariate outcome-regression nuisance via estimator-local `cho_solve` on `X'X` / `scipy.lstsq(cond=1e-7)` that are not yet equilibrated; making those *point-estimate* solves scale-robust is deferred (see TODO.md). The companion DR/OR influence-function SE rank-guard for CS / TripleDifference / StaggeredTripleDifference (local matrix inverses) is now addressed in the dedicated entry below.
22+
- **Rank-guarded doubly-robust / outcome-regression influence-function standard errors (`CallawaySantAnna`, `TripleDifference`, `StaggeredTripleDifference`).** The per-cell propensity-score Hessian and outcome-regression bread (`X'WX`) are inverted for the influence-function SE via `np.linalg.solve`/`inv`, which raise `LinAlgError` only on *exactly* singular matrices. A **near**-singular Gram from a constant or collinear covariate did not raise, so a garbage inverse (entries ~1e13) flowed straight into the SE — reproduced: `CallawaySantAnna(estimation_method="dr")` `overall_se` ~5.1e13, `TripleDifference(estimation_method="reg")` `se` ~1.8e17, `StaggeredTripleDifference` SEs inflated 30–100× — while the **ATT point estimate stayed valid** (only the SE was wrong). A new shared helper `_rank_guarded_inv` (`diff_diff/linalg.py`) symmetrically equilibrates the Gram (`D^{-1/2} A D^{-1/2}`, `D=diag(A)`) and, when rank-deficient, inverts a **column-dropped** principal submatrix — keeping the most-independent columns via pivoted QR, the **same generalized-inverse convention the point estimate uses** (`_detect_rank_deficiency` / R's `lm()`; `rcond=1e-10` relative-eigenvalue rank threshold). An all-NaN inverse (NaN SE) is returned only on true rank-0. Using the point estimate's column-drop convention (rather than a minimum-norm pseudo-inverse, which diverges sharply when the IF multiplier leaves `range(A)` — e.g. a control or treated-sub-cell bread multiplied by a mean from a cell where the covariate is not collinear) makes the analytical SE equal the **well-conditioned near-collinear limit**: replacing the exactly-collinear covariate with a near-collinear (full-rank) one yields the same SE to working precision (verified `se_ratio ≈ 1` across `reg`/`ipw`/`dr`, for every per-cell bread, control and treated). A covariate that is rank-deficient only within one cell still legitimately enters the other cells' full-rank fits, so the ATT/SE reflect that (poor) covariate specification, consistently with the point estimate. The well-conditioned fast path returns `np.linalg.solve(A, I)` unchanged (R-parity goldens unaffected). Each estimator emits ONE aggregate `UserWarning` reporting the dropped redundant direction(s) + max condition number (`TripleDifference` gains this consolidated warning, which it previously lacked), suppressed under `rank_deficient_action="silent"`. New regression tests in `tests/test_linalg.py` (helper contract), `tests/test_staggered.py`, `tests/test_methodology_triple_diff.py`, and `tests/test_methodology_staggered_triple_diff.py`. **Scope:** the live bug is the SE rank-guard; the estimator-local *point-estimate* coefficient solves remain scale-equilibration follow-ups (no reproduced bug). `EfficientDiD` (covariate path) was diagnosed and is already rank-safe (`pinv(rcond=tol/max_eigval)`); `ContinuousDiD`/`SpilloverDiD` have no user-covariate path.
2223
- **Bertanha-Imbens 2014 citation correction (16 sites across 5 files).** A verification spike confirmed the citation across `diff_diff/linalg.py` (×8), `diff_diff/conley.py` (×1), `diff_diff/guides/llms-full.txt` (×2), `docs/methodology/REGISTRY.md` (×4), and `docs/api/spillover.rst` (×1) was incorrect — NBER w20773 *External Validity in Fuzzy Regression Discontinuity Designs* (JBES 2020, 38(3):593-612) by Bertanha & Imbens covers fuzzy RD external validity, NOT weighted spatial-HAC under sampling weights. Replaced across all 16 sites with the open-problem framing: "weighted spatial-HAC under probability sampling is an open methodological question; no canonical extension of Conley (1999) exists for the combination." At the four `REGISTRY.md` sites the replacement is wrapped in the canonical `**Note (open methodological question):**` label per CLAUDE.md "Documenting Deviations (AI Review Compatibility)". REGISTRY ConleySpatialHAC section gains a new `**Note (deferral status, 2026-05-26):**` splitting the boundary into three parts: **Shipped** — SpilloverDiD + Conley + survey via Wave E.1/E.2/E.3 (PR #468/#474/#482), TwoStageDiD + Conley + survey via Wave E.3 parity (PR #485). **Deferred (generic linalg surface, any `weight_type`)** — DiD/MPD/TWFE/LinearRegression generic path + Conley + `survey_design=`; `LinearRegression` / `compute_robust_vcov` Conley + `weights=` rejected for `pweight`, `aweight`, AND `fweight` (weighted Conley is not implemented on the generic linalg surface). **Open methodological question (subset)** — the `pweight` / `survey_design` portion of the deferral additionally lacks a canonical methodological extension of Conley (1999) for weighted spatial-HAC under probability sampling. **No source-code logic changes:** verified via diff-in-diff pytest output before and after the citation strip (175 passed + 14 warnings, bit-identical pass set on `tests/test_conley_vcov.py`). **Historical CHANGELOG entries (pre-[Unreleased]) intentionally retain the Bertanha-Imbens 2014 attribution** as accurate records of what was claimed at the time of each release; the [Unreleased] entry above supersedes those rationales going forward.
2324

2425
## [3.4.2] - 2026-05-25

TODO.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,7 @@ Deferred items from PR reviews that were not addressed before merge.
8080
| dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | follow-up | Medium |
8181
| CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
8282
| CallawaySantAnna and StaggeredTripleDifference fit their covariate outcome-regression nuisance via estimator-local `cho_solve(X'X)` / `scipy.lstsq(cond=1e-7)` that bypass `solve_ols`, so they are NOT scale-equilibrated — a large-scale covariate can in principle perturb the nuisance fit (TripleDifference's OR fit already routes through `solve_ols` and is covered). Route the local OR fits through the shared scale-robust solver (or equilibrate locally). | `staggered.py`, `staggered_triple_diff.py` | covariate-review | Medium |
83-
| DR/OR influence-function SE rank-guard: `_safe_inv` (and `inv`/`pinv` siblings) only fall back on `LinAlgError`, not near-singularity, so a constant/collinear covariate yields garbage per-cell SE (~1e13 in CS DR; ~3018 in SDDD) that propagates into `overall_se`. Add a per-cell rank check (reuse `_detect_rank_deficiency`) that NaN-s the cell, mirroring the covariate-reg path. Sites: CS (9 `_safe_inv`), TripleDifference (7 inv/pinv), StaggeredTripleDifference (4); SunAbraham not affected. | `staggered.py`, `triple_diff.py`, `staggered_triple_diff.py` | covariate-review | Medium |
84-
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
83+
| Adopt the shared `_rank_guarded_inv` for the *structural* (non-covariate) matrix inverses that share the `LinAlgError`-only fallback pattern and can become near-singular: `continuous_did.py:1056` (dose B-spline basis), `spillover.py:3371` (ring-solve, partially guarded via `kept_cols`), `two_stage.py:3154` (TSL Stage-2 variance), `imputation.py:2403`, `had.py:2413`, `conley.py:1109`. These invert internal bases users cannot perturb with `covariates=` (so not the covariate-triggered SE bug already fixed by the DR/OR rank-guard) — lower priority; the `_rank_guarded_inv` helper is the seam. | `continuous_did.py`, `spillover.py`, `two_stage.py`, `imputation.py`, `had.py`, `conley.py` | dr-or-se-rank-guard | Low || ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
8584
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
8685
| Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
8786
| SyntheticControl: `SyntheticControlResults` not wired into the practitioner / DiagnosticReport / BusinessReport routing, so routing SCM results through those tools yields generic parallel-trends/HonestDiD guidance that doesn't fit SCM. Add SCM to the native-routed rejection sets (mirror SDiD/TROP) and surface SCM-native diagnostics (pre-fit / in-space placebo / in-time placebo / leave-one-out). Deferred to PR-2, where it pairs with the placebo-inference layer those reports would surface. | `practitioner.py`, `diagnostic_report.py`, `business_report.py` | SCM PR-1 → PR-2 | Medium |

0 commit comments

Comments
 (0)