igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 1 addition & 0 deletions b/‎TODO.md‎
Lines changed: 1 addition & 0 deletions
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **TwoStageDiD `vcov_type` threading + Results metadata (Phase 1b interstitial #5, final, permanently narrow).** `TwoStageDiD(vcov_type=...)` now accepts `{"hc1"}` only (default), completing the Phase 1b initiative across all 8 standalone estimators. Analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` are REJECTED at `__init__` / `fit()` with methodology-rooted messages: TwoStageDiD's variance is the Gardner (2022) two-stage GMM cluster-sandwich whose meat is the per-cluster GMM-corrected score `S_g = gamma_hat' c_g - X'_{2g} eps_{2g}`, which folds first-stage FE estimation uncertainty into the score — there is no single hat matrix spanning both stages on which HC2 leverage or Bell-McCaffrey Satterthwaite DOF can be defined, and the Gardner correction has not been derived for the leverage-corrected/homoskedastic meat (no reference implementation; mirrors the SpilloverDiD `classical` rejection). `cluster=` and `survey_design=` paths are numerically unchanged (bit-identical for healthy fits). **`TwoStageDiDResults` additions (no rename, no BC break):** new `vcov_type` / `cluster_name` / `n_clusters` fields + `to_dict()` method. `summary()` renders a `Variance estimator: <label>` line after the survey block (suppressed under bootstrap — `Inference method: bootstrap` + `Bootstrap replications: <n>` shown instead — and under any survey design). Default `cluster=None` renders `"CR1 cluster-robust at <unit>, G=<n_units>"` because the Gardner sandwich auto-clusters at the unit column (did2s no-FSA convention — the `CR1` label carries no `(n-1)/(n-p)` factor, matching R `did2s`; same convention as ImputationDiD's Theorem 3 variance). Defensive `n_clusters<2` NaN guard added to the multiplier-bootstrap path (was ≈0 SE from BLAS roundoff) plus a survey-PSU `n_psu<2` parity guard. `cluster=` with a replicate-weight survey design now raises `NotImplementedError` (replicate-refit variance ignores `cluster=`). `vcov_type='conley'` deferred to a TODO follow-up row.
 
 ### Fixed
+- **Scale-invariant rank detection and least-squares solve in the shared OLS backend (`diff_diff/linalg.py` + `rust/src/linalg.rs`).** `_detect_rank_deficiency` ran pivoted QR on the raw design matrix with a rank threshold anchored to the largest pivot diagonal, so a covariate on a large raw scale (e.g. population, income in cents, market cap) inflated the threshold and **false-dropped the intercept/treatment/interaction columns to NaN on an otherwise full-rank model** — a `DifferenceInDifferences` fit with a covariate ×1 or ×1e4 returned the correct ATT while the same covariate ×1e8 returned `ATT=NaN`. Even after detection, the `scipy.lstsq(cond=1e-7)` solve (and the Rust SVD truncation) truncated the genuine small-scale direction relative to the huge column, returning finite-but-wrong coefficients. Detection now runs a raw pivoted QR first and only re-checks on column-equilibrated (unit 2-norm) columns when the raw pass reports a deficiency, adopting the higher equilibrated rank only when the raw drop was scale-induced; the least-squares solve equilibrates columns and unscales the coefficients. This is applied in both the Python and Rust backends, making rank detection and the fit invariant to per-column scaling while leaving everything else unchanged: it is a no-op for full-rank well-conditioned designs (R-parity goldens unaffected) and does **not** change which column is dropped in a genuinely collinear design (the established raw pivot selection is preserved). Also fixes a cryptic `IndexError: arrays used as indices must be of integer type` when a design collapsed to rank 0 (e.g. a constant FE-collinear covariate in `ImputationDiD`/`TwoStageDiD`): `solve_ols` now returns all-NaN coefficients cleanly, and `solve_poisson` raises a clear `ValueError`. (`solve_logit` always prepends an intercept, so rank 0 is unreachable there — its index array is just made `dtype=int` for consistency.) New regression tests in `tests/test_linalg.py` assert scale-invariance of fitted values/t-stats, a finite ATT through the public DiD estimator with a 1e8-scale covariate, rank-0 NaN handling (OLS) / clear `ValueError` (`solve_poisson`), and that the scale repair preserves the raw collinear drop selection (including the mixed scale+collinearity case). Both backends verified equivalent (`tests/test_rust_backend.py`). **Scope:** this covers solves routed through `solve_ols` (DiD, TWFE, MultiPeriodDiD); CallawaySantAnna / TripleDifference / StaggeredTripleDifference perform covariate outcome-regression / doubly-robust nuisance solves locally (`cho_solve` on `X'X`, `scipy.lstsq(cond=1e-7)`) that are not yet equilibrated — making those scale-robust is deferred to the DR/OR rank-guard follow-up (see TODO.md).
 - **Bertanha-Imbens 2014 citation correction (16 sites across 5 files).** A verification spike confirmed the citation across `diff_diff/linalg.py` (×8), `diff_diff/conley.py` (×1), `diff_diff/guides/llms-full.txt` (×2), `docs/methodology/REGISTRY.md` (×4), and `docs/api/spillover.rst` (×1) was incorrect — NBER w20773 *External Validity in Fuzzy Regression Discontinuity Designs* (JBES 2020, 38(3):593-612) by Bertanha & Imbens covers fuzzy RD external validity, NOT weighted spatial-HAC under sampling weights. Replaced across all 16 sites with the open-problem framing: "weighted spatial-HAC under probability sampling is an open methodological question; no canonical extension of Conley (1999) exists for the combination." At the four `REGISTRY.md` sites the replacement is wrapped in the canonical `**Note (open methodological question):**` label per CLAUDE.md "Documenting Deviations (AI Review Compatibility)". REGISTRY ConleySpatialHAC section gains a new `**Note (deferral status, 2026-05-26):**` splitting the boundary into three parts: **Shipped** — SpilloverDiD + Conley + survey via Wave E.1/E.2/E.3 (PR #468/#474/#482), TwoStageDiD + Conley + survey via Wave E.3 parity (PR #485). **Deferred (generic linalg surface, any `weight_type`)** — DiD/MPD/TWFE/LinearRegression generic path + Conley + `survey_design=`; `LinearRegression` / `compute_robust_vcov` Conley + `weights=` rejected for `pweight`, `aweight`, AND `fweight` (weighted Conley is not implemented on the generic linalg surface). **Open methodological question (subset)** — the `pweight` / `survey_design` portion of the deferral additionally lacks a canonical methodological extension of Conley (1999) for weighted spatial-HAC under probability sampling. **No source-code logic changes:** verified via diff-in-diff pytest output before and after the citation strip (175 passed + 14 warnings, bit-identical pass set on `tests/test_conley_vcov.py`). **Historical CHANGELOG entries (pre-[Unreleased]) intentionally retain the Bertanha-Imbens 2014 attribution** as accurate records of what was claimed at the time of each release; the [Unreleased] entry above supersedes those rationales going forward.
 
 ## [3.4.2] - 2026-05-25
 
@@ -79,6 +79,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
 | dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | follow-up | Medium |
 | CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
+| CS/TripleDifference/StaggeredTripleDifference covariate outcome-regression & doubly-robust nuisance solves use estimator-local `cho_solve(X'X)` / `scipy.lstsq(cond=1e-7)` that bypass `solve_ols`, so they are NOT scale-equilibrated — a large-scale covariate can still perturb the nuisance fit (and therefore ATT/SE). Route through the shared scale-robust solver (or equilibrate locally). Bundle with the DR/OR influence-function rank-guard follow-up. | `staggered.py`, `triple_diff.py`, `staggered_triple_diff.py` | covariate-review | Medium |
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |