Skip to content

Commit 6f3ade4

Browse files
igerberclaude
andcommitted
SpilloverDiD: address review polish — SE clamp, perf hoist, TODO
- Clamp `max(vcov[i, i], 0.0)` before sqrt for ATT and per-ring SE extraction (spillover.py:L1740-L1762). Matches the sibling-estimator convention at two_stage.py:1183, estimators.py:606, stacked_did.py:515. Prevents numerically tiny negative diagonals from indefinite Conley sandwiches or near-singular cases from NaN-ing the full inference row. - Hoist row_pos out of the per-cohort loop in _compute_nearest_treated_distance_staggered (spillover.py:L400-L425). row_pos depends only on row_unit and unit_to_pos, both invariant across the cohort iteration; one O(n_rows) array build instead of O(n_rows × n_cohorts) on dense staggered fits. - Add TODO.md row tracking the sparse cKDTree path for the staggered helper as Wave B follow-up. Resolves the stale code-comment reference in spillover.py:L365-L369. 139 tests pass; no behavior change on existing fixtures (the clamp is defensive against unrealizable values; the hoist is a refactor; the TODO is bookkeeping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent fc48d69 commit 6f3ade4

2 files changed

Lines changed: 12 additions & 4 deletions

File tree

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ Deferred items from PR reviews that were not addressed before merge.
125125
| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
126126
| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
127127
| Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium |
128+
| `SpilloverDiD` sparse cKDTree path for the staggered nearest-treated-distance helper (mirrors the static helper's sparse branch). Currently `_compute_nearest_treated_distance_staggered` always builds dense `(n_units, n_treated_by_onset)` pairwise distance matrices per cohort; on large staggered panels with many cohorts this is avoidable memory/runtime. Add a sparse k-d-tree branch analogous to `_compute_nearest_treated_distance_sparse`, gated on `n > _CONLEY_SPARSE_N_THRESHOLD`. | `spillover.py::_compute_nearest_treated_distance_staggered` | follow-up (Wave B) | Low |
128129

129130
#### Performance
130131

diff_diff/spillover.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -397,6 +397,10 @@ def _compute_nearest_treated_distance_staggered(
397397
# in `_validate_spillover_inputs`, but defensively return inf.
398398
return d_it, row_unit, row_time
399399

400+
# Row's unit position. Invariant across cohort iterations — compute
401+
# once outside the loop.
402+
row_pos = np.array([unit_to_pos[uid] for uid in row_unit], dtype=np.intp)
403+
400404
# For each unique onset time, compute (n_units, n_treated_by_then) pairwise
401405
# distances ONCE, then assign to rows whose t >= that onset (carrying forward
402406
# the minimum across cohorts).
@@ -416,8 +420,6 @@ def _compute_nearest_treated_distance_staggered(
416420
affected_rows = row_time >= onset
417421
if not affected_rows.any():
418422
continue
419-
# Row's unit position -> per-row distance from this cohort.
420-
row_pos = np.array([unit_to_pos[uid] for uid in row_unit], dtype=np.intp)
421423
row_cohort_dist = dists_to_cohort[row_pos]
422424
# Only update rows where this cohort's distance is smaller than the
423425
# current d_it (carries the running minimum across cohorts).
@@ -1737,8 +1739,13 @@ def fit(
17371739
)
17381740
df_resid = 0
17391741

1742+
# Clamp negative diagonals to 0 before sqrt: indefinite Conley or
1743+
# near-singular sandwich variances can produce numerically tiny
1744+
# negative values that would otherwise NaN the entire inference
1745+
# row. Matches the sibling-estimator convention
1746+
# (two_stage.py:1183, estimators.py:606, stacked_did.py:515).
17401747
tau_se = (
1741-
float(np.sqrt(vcov[0, 0]))
1748+
float(np.sqrt(max(vcov[0, 0], 0.0)))
17421749
if vcov is not None and np.isfinite(vcov[0, 0])
17431750
else float("nan")
17441751
)
@@ -1750,7 +1757,7 @@ def fit(
17501757
idx = 1 + j # 0 is treatment; rings follow.
17511758
coef_j = float(coef[idx])
17521759
se_j = (
1753-
float(np.sqrt(vcov[idx, idx]))
1760+
float(np.sqrt(max(vcov[idx, idx], 0.0)))
17541761
if vcov is not None and np.isfinite(vcov[idx, idx])
17551762
else float("nan")
17561763
)

0 commit comments

Comments
 (0)