igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 2 additions & 0 deletions b/‎TODO.md‎
Lines changed: 2 additions & 0 deletions
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **TwoStageDiD `vcov_type` threading + Results metadata (Phase 1b interstitial #5, final, permanently narrow).** `TwoStageDiD(vcov_type=...)` now accepts `{"hc1"}` only (default), completing the Phase 1b initiative across all 8 standalone estimators. Analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` are REJECTED at `__init__` / `fit()` with methodology-rooted messages: TwoStageDiD's variance is the Gardner (2022) two-stage GMM cluster-sandwich whose meat is the per-cluster GMM-corrected score `S_g = gamma_hat' c_g - X'_{2g} eps_{2g}`, which folds first-stage FE estimation uncertainty into the score — there is no single hat matrix spanning both stages on which HC2 leverage or Bell-McCaffrey Satterthwaite DOF can be defined, and the Gardner correction has not been derived for the leverage-corrected/homoskedastic meat (no reference implementation; mirrors the SpilloverDiD `classical` rejection). `cluster=` and `survey_design=` paths are numerically unchanged (bit-identical for healthy fits). **`TwoStageDiDResults` additions (no rename, no BC break):** new `vcov_type` / `cluster_name` / `n_clusters` fields + `to_dict()` method. `summary()` renders a `Variance estimator: <label>` line after the survey block (suppressed under bootstrap — `Inference method: bootstrap` + `Bootstrap replications: <n>` shown instead — and under any survey design). Default `cluster=None` renders `"CR1 cluster-robust at <unit>, G=<n_units>"` because the Gardner sandwich auto-clusters at the unit column (did2s no-FSA convention — the `CR1` label carries no `(n-1)/(n-p)` factor, matching R `did2s`; same convention as ImputationDiD's Theorem 3 variance). Defensive `n_clusters<2` NaN guard added to the multiplier-bootstrap path (was ≈0 SE from BLAS roundoff) plus a survey-PSU `n_psu<2` parity guard. `cluster=` with a replicate-weight survey design now raises `NotImplementedError` (replicate-refit variance ignores `cluster=`). `vcov_type='conley'` deferred to a TODO follow-up row.
 
 ### Fixed
+- **Scale-invariant rank detection and least-squares solve in the shared OLS backend (`diff_diff/linalg.py` + `rust/src/linalg.rs`).** `_detect_rank_deficiency` ran pivoted QR on the raw design matrix with a rank threshold anchored to the largest pivot diagonal, so a covariate on a large raw scale (e.g. population, income in cents, market cap) inflated the threshold and **false-dropped the intercept/treatment/interaction columns to NaN on an otherwise full-rank model** — a `DifferenceInDifferences` fit with a covariate ×1 or ×1e4 returned the correct ATT while the same covariate ×1e8 returned `ATT=NaN`. Even after detection, the `scipy.lstsq(cond=1e-7)` solve (and the Rust SVD truncation) truncated the genuine small-scale direction relative to the huge column, returning finite-but-wrong coefficients. Detection now runs a raw pivoted QR first and only re-checks on column-equilibrated (unit 2-norm) columns when the raw pass reports a deficiency, adopting the higher equilibrated rank only when the raw drop was scale-induced; the least-squares solve equilibrates columns and unscales the coefficients. This is applied in both the Python and Rust backends, making rank detection and the fit invariant to per-column scaling while leaving everything else unchanged: it is a no-op for full-rank well-conditioned designs (R-parity goldens unaffected) and does **not** change which column is dropped in a *well-scaled* collinear design (the established raw pivot selection is preserved); a scale-induced under-count instead adopts the scale-corrected equilibrated selection (which may differ from the raw choice but retains an identified full-rank subset). Also fixes a cryptic `IndexError: arrays used as indices must be of integer type` when a design collapsed to rank 0 (e.g. a constant FE-collinear covariate in `ImputationDiD`/`TwoStageDiD`): `solve_ols` now returns all-NaN coefficients cleanly, and `solve_poisson` raises a clear `ValueError`. (`solve_logit` always prepends an intercept, so rank 0 is unreachable there — its index array is just made `dtype=int` for consistency.) New regression tests in `tests/test_linalg.py` assert scale-invariance of fitted values/t-stats, a finite ATT through the public DiD estimator with a 1e8-scale covariate, rank-0 NaN handling (OLS) / clear `ValueError` (`solve_poisson`), that the scale repair preserves the raw collinear drop selection for well-scaled genuinely-collinear designs, and that the mixed scale+collinearity case retains an identified full-rank subset (huge independent column kept) with valid inference on the kept coefficients. Both backends verified equivalent (`tests/test_rust_backend.py`). **Scope:** this covers covariate fits routed through `solve_ols` — DiD, TWFE, MultiPeriodDiD, ImputationDiD, TwoStageDiD, and TripleDifference. CallawaySantAnna and StaggeredTripleDifference fit their covariate outcome-regression nuisance via estimator-local `cho_solve` on `X'X` / `scipy.lstsq(cond=1e-7)` that are not yet equilibrated; making those scale-robust — plus the DR/OR influence-function SE rank-guard for CS / TripleDifference / StaggeredTripleDifference (local matrix inverses) — is deferred (see TODO.md).
 - **Bertanha-Imbens 2014 citation correction (16 sites across 5 files).** A verification spike confirmed the citation across `diff_diff/linalg.py` (×8), `diff_diff/conley.py` (×1), `diff_diff/guides/llms-full.txt` (×2), `docs/methodology/REGISTRY.md` (×4), and `docs/api/spillover.rst` (×1) was incorrect — NBER w20773 *External Validity in Fuzzy Regression Discontinuity Designs* (JBES 2020, 38(3):593-612) by Bertanha & Imbens covers fuzzy RD external validity, NOT weighted spatial-HAC under sampling weights. Replaced across all 16 sites with the open-problem framing: "weighted spatial-HAC under probability sampling is an open methodological question; no canonical extension of Conley (1999) exists for the combination." At the four `REGISTRY.md` sites the replacement is wrapped in the canonical `**Note (open methodological question):**` label per CLAUDE.md "Documenting Deviations (AI Review Compatibility)". REGISTRY ConleySpatialHAC section gains a new `**Note (deferral status, 2026-05-26):**` splitting the boundary into three parts: **Shipped** — SpilloverDiD + Conley + survey via Wave E.1/E.2/E.3 (PR #468/#474/#482), TwoStageDiD + Conley + survey via Wave E.3 parity (PR #485). **Deferred (generic linalg surface, any `weight_type`)** — DiD/MPD/TWFE/LinearRegression generic path + Conley + `survey_design=`; `LinearRegression` / `compute_robust_vcov` Conley + `weights=` rejected for `pweight`, `aweight`, AND `fweight` (weighted Conley is not implemented on the generic linalg surface). **Open methodological question (subset)** — the `pweight` / `survey_design` portion of the deferral additionally lacks a canonical methodological extension of Conley (1999) for weighted spatial-HAC under probability sampling. **No source-code logic changes:** verified via diff-in-diff pytest output before and after the citation strip (175 passed + 14 warnings, bit-identical pass set on `tests/test_conley_vcov.py`). **Historical CHANGELOG entries (pre-[Unreleased]) intentionally retain the Bertanha-Imbens 2014 attribution** as accurate records of what was claimed at the time of each release; the [Unreleased] entry above supersedes those rationales going forward.
 
 ## [3.4.2] - 2026-05-25
 
@@ -79,6 +79,8 @@ Deferred items from PR reviews that were not addressed before merge.
 | dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
 | dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | follow-up | Medium |
 | CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
+| CallawaySantAnna and StaggeredTripleDifference fit their covariate outcome-regression nuisance via estimator-local `cho_solve(X'X)` / `scipy.lstsq(cond=1e-7)` that bypass `solve_ols`, so they are NOT scale-equilibrated — a large-scale covariate can in principle perturb the nuisance fit (TripleDifference's OR fit already routes through `solve_ols` and is covered). Route the local OR fits through the shared scale-robust solver (or equilibrate locally). | `staggered.py`, `staggered_triple_diff.py` | covariate-review | Medium |
+| DR/OR influence-function SE rank-guard: `_safe_inv` (and `inv`/`pinv` siblings) only fall back on `LinAlgError`, not near-singularity, so a constant/collinear covariate yields garbage per-cell SE (~1e13 in CS DR; ~3018 in SDDD) that propagates into `overall_se`. Add a per-cell rank check (reuse `_detect_rank_deficiency`) that NaN-s the cell, mirroring the covariate-reg path. Sites: CS (9 `_safe_inv`), TripleDifference (7 inv/pinv), StaggeredTripleDifference (4); SunAbraham not affected. | `staggered.py`, `triple_diff.py`, `staggered_triple_diff.py` | covariate-review | Medium |
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |