You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
linalg: rank-guarded generalized inverse for DR/OR influence-function SEs
A near-singular covariate Gram matrix (constant/collinear covariate) made the
per-cell propensity-score Hessian / outcome-regression bread blow up in the
influence-function SE of CallawaySantAnna, TripleDifference, and
StaggeredTripleDifference: np.linalg.solve/inv raise LinAlgError only on EXACTLY
singular matrices, so a near-singular Gram returned a garbage inverse (~1e13)
that flowed into the SE while the ATT point estimate stayed valid. Reproduced:
CS dr overall_se 5.1e13, TD reg se 1.8e17, SDDD SEs 30-100x inflated.
- Add shared _rank_guarded_inv(A, *, rcond=1e-10, tracker) to linalg.py: when
rank-deficient it inverts a COLUMN-DROPPED principal submatrix (pivoted QR on
the symmetric-equilibrated Gram) — the SAME generalized-inverse convention the
point estimate uses (_detect_rank_deficiency / R lm()). The fast path returns
solve(A, I) unchanged (R-parity); all-NaN only on true rank-0.
- Because it uses the point estimate's column-drop (not a minimum-norm
pseudo-inverse, which diverges when the IF multiplier leaves range(A)), the
analytical SE equals the well-conditioned near-collinear limit (verified
se_ratio ~ 1 across reg/ipw/dr) for every per-cell bread, control AND treated.
A covariate rank-deficient only within one cell still enters the other cells'
full-rank fits, so a degenerate covariate spec legitimately moves the ATT/SE
(surfaced by the aggregate warning) — no min-norm divergence.
- CS: route _safe_inv through the helper; fix the var_psi>0-else-0.0 mask so a
rank-0 cell yields NaN. SDDD: route the OR-IF and PS-Hessian inversions.
TD: route 7 inv/pinv sites + add the per-fit aggregate rank-guard warning.
- Rank-guard warning suppressed under rank_deficient_action="silent" (uniform);
"error" is enforced upstream at the point-estimate solve (raises before any
IF SE), so the IF guard is reached only under warn/silent.
- Docs: REGISTRY rank-guard Notes (column-drop = full-rank limit + 1e-10
rationale + error enforcement), CHANGELOG entry, TODO structural-inverse
follow-up.
- Tests: helper contract, finite-SE/warning/golden-unchanged + survey-weighted +
RCS/notyettreated + error-mode + cell-aliasing (control vs drop-one,
treated vs near-collinear full-rank limit) for CS/TD/SDDD.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
18
18
- **TwoStageDiD `vcov_type` threading + Results metadata (Phase 1b interstitial #5, final, permanently narrow).** `TwoStageDiD(vcov_type=...)` now accepts `{"hc1"}` only (default), completing the Phase 1b initiative across all 8 standalone estimators. Analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` are REJECTED at `__init__` / `fit()` with methodology-rooted messages: TwoStageDiD's variance is the Gardner (2022) two-stage GMM cluster-sandwich whose meat is the per-cluster GMM-corrected score `S_g = gamma_hat' c_g - X'_{2g} eps_{2g}`, which folds first-stage FE estimation uncertainty into the score — there is no single hat matrix spanning both stages on which HC2 leverage or Bell-McCaffrey Satterthwaite DOF can be defined, and the Gardner correction has not been derived for the leverage-corrected/homoskedastic meat (no reference implementation; mirrors the SpilloverDiD `classical` rejection). `cluster=` and `survey_design=` paths are numerically unchanged (bit-identical for healthy fits). **`TwoStageDiDResults` additions (no rename, no BC break):** new `vcov_type` / `cluster_name` / `n_clusters` fields + `to_dict()` method. `summary()` renders a `Variance estimator: <label>` line after the survey block (suppressed under bootstrap — `Inference method: bootstrap` + `Bootstrap replications: <n>` shown instead — and under any survey design). Default `cluster=None` renders `"CR1 cluster-robust at <unit>, G=<n_units>"` because the Gardner sandwich auto-clusters at the unit column (did2s no-FSA convention — the `CR1` label carries no `(n-1)/(n-p)` factor, matching R `did2s`; same convention as ImputationDiD's Theorem 3 variance). Defensive `n_clusters<2` NaN guard added to the multiplier-bootstrap path (was ≈0 SE from BLAS roundoff) plus a survey-PSU `n_psu<2` parity guard. `cluster=` with a replicate-weight survey design now raises `NotImplementedError` (replicate-refit variance ignores `cluster=`). `vcov_type='conley'` deferred to a TODO follow-up row.
19
19
20
20
### Fixed
21
-
- **Scale-invariant rank detection and least-squares solve in the shared OLS backend (`diff_diff/linalg.py` + `rust/src/linalg.rs`).** `_detect_rank_deficiency` ran pivoted QR on the raw design matrix with a rank threshold anchored to the largest pivot diagonal, so a covariate on a large raw scale (e.g. population, income in cents, market cap) inflated the threshold and **false-dropped the intercept/treatment/interaction columns to NaN on an otherwise full-rank model** — a `DifferenceInDifferences` fit with a covariate ×1 or ×1e4 returned the correct ATT while the same covariate ×1e8 returned `ATT=NaN`. Even after detection, the `scipy.lstsq(cond=1e-7)` solve (and the Rust SVD truncation) truncated the genuine small-scale direction relative to the huge column, returning finite-but-wrong coefficients. Detection now runs a raw pivoted QR first and only re-checks on column-equilibrated (unit 2-norm) columns when the raw pass reports a deficiency, adopting the higher equilibrated rank only when the raw drop was scale-induced; the least-squares solve equilibrates columns and unscales the coefficients. This is applied in both the Python and Rust backends, making rank detection and the fit invariant to per-column scaling while leaving everything else unchanged: it is a no-op for full-rank well-conditioned designs (R-parity goldens unaffected) and does **not** change which column is dropped in a *well-scaled* collinear design (the established raw pivot selection is preserved); a scale-induced under-count instead adopts the scale-corrected equilibrated selection (which may differ from the raw choice but retains an identified full-rank subset). Also fixes a cryptic `IndexError: arrays used as indices must be of integer type` when a design collapsed to rank 0 (e.g. a constant FE-collinear covariate in `ImputationDiD`/`TwoStageDiD`): `solve_ols` now returns all-NaN coefficients cleanly, and `solve_poisson` raises a clear `ValueError`. (`solve_logit` always prepends an intercept, so rank 0 is unreachable there — its index array is just made `dtype=int` for consistency.) New regression tests in `tests/test_linalg.py` assert scale-invariance of fitted values/t-stats, a finite ATT through the public DiD estimator with a 1e8-scale covariate, rank-0 NaN handling (OLS) / clear `ValueError` (`solve_poisson`), that the scale repair preserves the raw collinear drop selection for well-scaled genuinely-collinear designs, and that the mixed scale+collinearity case retains an identified full-rank subset (huge independent column kept) with valid inference on the kept coefficients. Both backends verified equivalent (`tests/test_rust_backend.py`). **Scope:** this covers covariate fits routed through `solve_ols` — DiD, TWFE, MultiPeriodDiD, ImputationDiD, TwoStageDiD, and TripleDifference. CallawaySantAnna and StaggeredTripleDifference fit their covariate outcome-regression nuisance via estimator-local `cho_solve` on `X'X` / `scipy.lstsq(cond=1e-7)` that are not yet equilibrated; making those scale-robust — plus the DR/OR influence-function SE rank-guard for CS / TripleDifference / StaggeredTripleDifference (local matrix inverses) — is deferred (see TODO.md).
21
+
- **Scale-invariant rank detection and least-squares solve in the shared OLS backend (`diff_diff/linalg.py` + `rust/src/linalg.rs`).** `_detect_rank_deficiency` ran pivoted QR on the raw design matrix with a rank threshold anchored to the largest pivot diagonal, so a covariate on a large raw scale (e.g. population, income in cents, market cap) inflated the threshold and **false-dropped the intercept/treatment/interaction columns to NaN on an otherwise full-rank model** — a `DifferenceInDifferences` fit with a covariate ×1 or ×1e4 returned the correct ATT while the same covariate ×1e8 returned `ATT=NaN`. Even after detection, the `scipy.lstsq(cond=1e-7)` solve (and the Rust SVD truncation) truncated the genuine small-scale direction relative to the huge column, returning finite-but-wrong coefficients. Detection now runs a raw pivoted QR first and only re-checks on column-equilibrated (unit 2-norm) columns when the raw pass reports a deficiency, adopting the higher equilibrated rank only when the raw drop was scale-induced; the least-squares solve equilibrates columns and unscales the coefficients. This is applied in both the Python and Rust backends, making rank detection and the fit invariant to per-column scaling while leaving everything else unchanged: it is a no-op for full-rank well-conditioned designs (R-parity goldens unaffected) and does **not** change which column is dropped in a *well-scaled* collinear design (the established raw pivot selection is preserved); a scale-induced under-count instead adopts the scale-corrected equilibrated selection (which may differ from the raw choice but retains an identified full-rank subset). Also fixes a cryptic `IndexError: arrays used as indices must be of integer type` when a design collapsed to rank 0 (e.g. a constant FE-collinear covariate in `ImputationDiD`/`TwoStageDiD`): `solve_ols` now returns all-NaN coefficients cleanly, and `solve_poisson` raises a clear `ValueError`. (`solve_logit` always prepends an intercept, so rank 0 is unreachable there — its index array is just made `dtype=int` for consistency.) New regression tests in `tests/test_linalg.py` assert scale-invariance of fitted values/t-stats, a finite ATT through the public DiD estimator with a 1e8-scale covariate, rank-0 NaN handling (OLS) / clear `ValueError` (`solve_poisson`), that the scale repair preserves the raw collinear drop selection for well-scaled genuinely-collinear designs, and that the mixed scale+collinearity case retains an identified full-rank subset (huge independent column kept) with valid inference on the kept coefficients. Both backends verified equivalent (`tests/test_rust_backend.py`). **Scope:** this covers covariate fits routed through `solve_ols` — DiD, TWFE, MultiPeriodDiD, ImputationDiD, TwoStageDiD, and TripleDifference. CallawaySantAnna and StaggeredTripleDifference fit their covariate outcome-regression nuisance via estimator-local `cho_solve` on `X'X` / `scipy.lstsq(cond=1e-7)` that are not yet equilibrated; making those *point-estimate* solves scale-robust is deferred (see TODO.md). The companion DR/OR influence-function SE rank-guard for CS / TripleDifference / StaggeredTripleDifference (local matrix inverses) is now addressed in the dedicated entry below.
22
+
- **Rank-guarded doubly-robust / outcome-regression influence-function standard errors (`CallawaySantAnna`, `TripleDifference`, `StaggeredTripleDifference`).** The per-cell propensity-score Hessian and outcome-regression bread (`X'WX`) are inverted for the influence-function SE via `np.linalg.solve`/`inv`, which raise `LinAlgError` only on *exactly* singular matrices. A **near**-singular Gram from a constant or collinear covariate did not raise, so a garbage inverse (entries ~1e13) flowed straight into the SE — reproduced: `CallawaySantAnna(estimation_method="dr")` `overall_se` ~5.1e13, `TripleDifference(estimation_method="reg")` `se` ~1.8e17, `StaggeredTripleDifference` SEs inflated 30–100× — while the **ATT point estimate stayed valid** (only the SE was wrong). A new shared helper `_rank_guarded_inv` (`diff_diff/linalg.py`) symmetrically equilibrates the Gram (`D^{-1/2} A D^{-1/2}`, `D=diag(A)`) and, when rank-deficient, inverts a **column-dropped** principal submatrix — keeping the most-independent columns via pivoted QR, the **same generalized-inverse convention the point estimate uses** (`_detect_rank_deficiency` / R's `lm()`; `rcond=1e-10` relative-eigenvalue rank threshold). An all-NaN inverse (NaN SE) is returned only on true rank-0. Using the point estimate's column-drop convention (rather than a minimum-norm pseudo-inverse, which diverges sharply when the IF multiplier leaves `range(A)` — e.g. a control or treated-sub-cell bread multiplied by a mean from a cell where the covariate is not collinear) makes the analytical SE equal the **well-conditioned near-collinear limit**: replacing the exactly-collinear covariate with a near-collinear (full-rank) one yields the same SE to working precision (verified `se_ratio ≈ 1` across `reg`/`ipw`/`dr`, for every per-cell bread, control and treated). A covariate that is rank-deficient only within one cell still legitimately enters the other cells' full-rank fits, so the ATT/SE reflect that (poor) covariate specification, consistently with the point estimate. The well-conditioned fast path returns `np.linalg.solve(A, I)` unchanged (R-parity goldens unaffected). Each estimator emits ONE aggregate `UserWarning` reporting the dropped redundant direction(s) + max condition number (`TripleDifference` gains this consolidated warning, which it previously lacked), suppressed under `rank_deficient_action="silent"`. New regression tests in `tests/test_linalg.py` (helper contract), `tests/test_staggered.py`, `tests/test_methodology_triple_diff.py`, and `tests/test_methodology_staggered_triple_diff.py`. **Scope:** the live bug is the SE rank-guard; the estimator-local *point-estimate* coefficient solves remain scale-equilibration follow-ups (no reproduced bug). `EfficientDiD` (covariate path) was diagnosed and is already rank-safe (`pinv(rcond=tol/max_eigval)`); `ContinuousDiD`/`SpilloverDiD` have no user-covariate path.
22
23
- **Bertanha-Imbens 2014 citation correction (16 sites across 5 files).** A verification spike confirmed the citation across `diff_diff/linalg.py` (×8), `diff_diff/conley.py` (×1), `diff_diff/guides/llms-full.txt` (×2), `docs/methodology/REGISTRY.md` (×4), and `docs/api/spillover.rst` (×1) was incorrect — NBER w20773 *External Validity in Fuzzy Regression Discontinuity Designs* (JBES 2020, 38(3):593-612) by Bertanha & Imbens covers fuzzy RD external validity, NOT weighted spatial-HAC under sampling weights. Replaced across all 16 sites with the open-problem framing: "weighted spatial-HAC under probability sampling is an open methodological question; no canonical extension of Conley (1999) exists for the combination." At the four `REGISTRY.md` sites the replacement is wrapped in the canonical `**Note (open methodological question):**` label per CLAUDE.md "Documenting Deviations (AI Review Compatibility)". REGISTRY ConleySpatialHAC section gains a new `**Note (deferral status, 2026-05-26):**` splitting the boundary into three parts: **Shipped** — SpilloverDiD + Conley + survey via Wave E.1/E.2/E.3 (PR #468/#474/#482), TwoStageDiD + Conley + survey via Wave E.3 parity (PR #485). **Deferred (generic linalg surface, any `weight_type`)** — DiD/MPD/TWFE/LinearRegression generic path + Conley + `survey_design=`; `LinearRegression` / `compute_robust_vcov` Conley + `weights=` rejected for `pweight`, `aweight`, AND `fweight` (weighted Conley is not implemented on the generic linalg surface). **Open methodological question (subset)** — the `pweight` / `survey_design` portion of the deferral additionally lacks a canonical methodological extension of Conley (1999) for weighted spatial-HAC under probability sampling. **No source-code logic changes:** verified via diff-in-diff pytest output before and after the citation strip (175 passed + 14 warnings, bit-identical pass set on `tests/test_conley_vcov.py`). **Historical CHANGELOG entries (pre-[Unreleased]) intentionally retain the Bertanha-Imbens 2014 attribution** as accurate records of what was claimed at the time of each release; the [Unreleased] entry above supersedes those rationales going forward.
Copy file name to clipboardExpand all lines: TODO.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,8 +80,7 @@ Deferred items from PR reviews that were not addressed before merge.
80
80
| dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. |`diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md`| follow-up | Medium |
81
81
| CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) |`staggered.py`|#256| Low |
82
82
| CallawaySantAnna and StaggeredTripleDifference fit their covariate outcome-regression nuisance via estimator-local `cho_solve(X'X)` / `scipy.lstsq(cond=1e-7)` that bypass `solve_ols`, so they are NOT scale-equilibrated — a large-scale covariate can in principle perturb the nuisance fit (TripleDifference's OR fit already routes through `solve_ols` and is covered). Route the local OR fits through the shared scale-robust solver (or equilibrate locally). |`staggered.py`, `staggered_triple_diff.py`| covariate-review | Medium |
83
-
| DR/OR influence-function SE rank-guard: `_safe_inv` (and `inv`/`pinv` siblings) only fall back on `LinAlgError`, not near-singularity, so a constant/collinear covariate yields garbage per-cell SE (~1e13 in CS DR; ~3018 in SDDD) that propagates into `overall_se`. Add a per-cell rank check (reuse `_detect_rank_deficiency`) that NaN-s the cell, mirroring the covariate-reg path. Sites: CS (9 `_safe_inv`), TripleDifference (7 inv/pinv), StaggeredTripleDifference (4); SunAbraham not affected. |`staggered.py`, `triple_diff.py`, `staggered_triple_diff.py`| covariate-review | Medium |
84
-
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels |`imputation.py`|#141| Medium (deferred — only triggers when sparse solver fails) |
83
+
| Adopt the shared `_rank_guarded_inv` for the *structural* (non-covariate) matrix inverses that share the `LinAlgError`-only fallback pattern and can become near-singular: `continuous_did.py:1056` (dose B-spline basis), `spillover.py:3371` (ring-solve, partially guarded via `kept_cols`), `two_stage.py:3154` (TSL Stage-2 variance), `imputation.py:2403`, `had.py:2413`, `conley.py:1109`. These invert internal bases users cannot perturb with `covariates=` (so not the covariate-triggered SE bug already fixed by the DR/OR rank-guard) — lower priority; the `_rank_guarded_inv` helper is the seam. |`continuous_did.py`, `spillover.py`, `two_stage.py`, `imputation.py`, `had.py`, `conley.py`| dr-or-se-rank-guard | Low || ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels |`imputation.py`|#141| Medium (deferred — only triggers when sparse solver fails) |
85
84
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) |`estimators.py`|#218| Medium |
86
85
| Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation |`continuous_did.py`, `efficient_did.py`, `stacked_did.py`|#226| Low |
87
86
| SyntheticControl: `SyntheticControlResults` not wired into the practitioner / DiagnosticReport / BusinessReport routing, so routing SCM results through those tools yields generic parallel-trends/HonestDiD guidance that doesn't fit SCM. Add SCM to the native-routed rejection sets (mirror SDiD/TROP) and surface SCM-native diagnostics (pre-fit / in-space placebo / in-time placebo / leave-one-out). Deferred to PR-2, where it pairs with the placebo-inference layer those reports would surface. |`practitioner.py`, `diagnostic_report.py`, `business_report.py`| SCM PR-1 → PR-2 | Medium |
0 commit comments