SpilloverDiD: address review polish — SE clamp, perf hoist, TODO

igerber · claude · igerber · commit 6f3ade490449 · 2026-05-15T18:04:16.000-04:00
- Clamp `max(vcov[i, i], 0.0)` before sqrt for ATT and per-ring SE
  extraction (spillover.py:L1740-L1762). Matches the sibling-estimator
  convention at two_stage.py:1183, estimators.py:606, stacked_did.py:515.
  Prevents numerically tiny negative diagonals from indefinite Conley
  sandwiches or near-singular cases from NaN-ing the full inference row.
- Hoist row_pos out of the per-cohort loop in
  _compute_nearest_treated_distance_staggered (spillover.py:L400-L425).
  row_pos depends only on row_unit and unit_to_pos, both invariant across
  the cohort iteration; one O(n_rows) array build instead of O(n_rows ×
  n_cohorts) on dense staggered fits.
- Add TODO.md row tracking the sparse cKDTree path for the staggered
  helper as Wave B follow-up. Resolves the stale code-comment reference
  in spillover.py:L365-L369.

139 tests pass; no behavior change on existing fixtures (the clamp is
defensive against unrealizable values; the hoist is a refactor; the TODO
is bookkeeping).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/TODO.md b/TODO.md
@@ -125,6 +125,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
 | `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
 | Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium |
+| `SpilloverDiD` sparse cKDTree path for the staggered nearest-treated-distance helper (mirrors the static helper's sparse branch). Currently `_compute_nearest_treated_distance_staggered` always builds dense `(n_units, n_treated_by_onset)` pairwise distance matrices per cohort; on large staggered panels with many cohorts this is avoidable memory/runtime. Add a sparse k-d-tree branch analogous to `_compute_nearest_treated_distance_sparse`, gated on `n > _CONLEY_SPARSE_N_THRESHOLD`. | `spillover.py::_compute_nearest_treated_distance_staggered` | follow-up (Wave B) | Low |
 
 #### Performance
 
diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py
@@ -397,6 +397,10 @@ def _compute_nearest_treated_distance_staggered(
         # in `_validate_spillover_inputs`, but defensively return inf.
         return d_it, row_unit, row_time
 
+    # Row's unit position. Invariant across cohort iterations — compute
+    # once outside the loop.
+    row_pos = np.array([unit_to_pos[uid] for uid in row_unit], dtype=np.intp)
+
     # For each unique onset time, compute (n_units, n_treated_by_then) pairwise
     # distances ONCE, then assign to rows whose t >= that onset (carrying forward
     # the minimum across cohorts).
@@ -416,8 +420,6 @@ def _compute_nearest_treated_distance_staggered(
         affected_rows = row_time >= onset
         if not affected_rows.any():
             continue
-        # Row's unit position -> per-row distance from this cohort.
-        row_pos = np.array([unit_to_pos[uid] for uid in row_unit], dtype=np.intp)
         row_cohort_dist = dists_to_cohort[row_pos]
         # Only update rows where this cohort's distance is smaller than the
         # current d_it (carries the running minimum across cohorts).
@@ -1737,8 +1739,13 @@ def fit(
             )
             df_resid = 0
 
+        # Clamp negative diagonals to 0 before sqrt: indefinite Conley or
+        # near-singular sandwich variances can produce numerically tiny
+        # negative values that would otherwise NaN the entire inference
+        # row. Matches the sibling-estimator convention
+        # (two_stage.py:1183, estimators.py:606, stacked_did.py:515).
         tau_se = (
-            float(np.sqrt(vcov[0, 0]))
+            float(np.sqrt(max(vcov[0, 0], 0.0)))
             if vcov is not None and np.isfinite(vcov[0, 0])
             else float("nan")
         )
@@ -1750,7 +1757,7 @@ def fit(
             idx = 1 + j  # 0 is treatment; rings follow.
             coef_j = float(coef[idx])
             se_j = (
-                float(np.sqrt(vcov[idx, idx]))
+                float(np.sqrt(max(vcov[idx, idx], 0.0)))
                 if vcov is not None and np.isfinite(vcov[idx, idx])
                 else float("nan")
             )