Address PR #456 R7 review (1 P2 perf + 1 P2 docs)

igerber · claude · igerber · commit 75ff0e2aef65 · 2026-05-16T15:53:06.000-04:00
P2 perf: extend the trigger_onset_per_row precomputation to the non-
staggered event-study branch. Previously only the staggered fit() call
populated `trigger_onset_per_row_cached`; the non-staggered branch left
it as None, so `_compute_event_time_per_row` fell back to its dense
cohort loop AGAIN, defeating the sparse cutoff in
`_compute_nearest_treated_distance_static` and adding avoidable cost
on large non-staggered panels.

Fix: on the non-staggered branch, derive trigger_onset_per_row directly
from `unit_to_d` (already built from the static distance result) and
the single shared effective onset. In the non-staggered case the
trigger collapses to a constant: any unit within `d_bar` triggers at
`shared_effective_onset`; far-away units have NaN trigger. No second
pairwise pass needed.

P2 docs: remove the unwired `diagnostic_report.event_study_diagnostics`
consumability claim from CHANGELOG, llms-full.txt, results.py docstring,
REGISTRY.md, and api/spillover.rst. SpilloverDiDResults is NOT
registered in DiagnosticReport's `_APPLICABILITY` / `_PT_METHOD` tables,
so `DiagnosticReport(spillover_result)` does not route to event-study
diagnostics. `plot_event_study` integration IS wired and keeps its
claim. Added a TODO row tracking the deferred DiagnosticReport routing.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/TODO.md b/TODO.md
@@ -136,6 +136,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
 | Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium |
 | `SpilloverDiD` sparse cKDTree path for the staggered nearest-treated-distance helper (mirrors the static helper's sparse branch). Currently `_compute_nearest_treated_distance_staggered` always builds dense `(n_units, n_treated_by_onset)` pairwise distance matrices per cohort; on large staggered panels with many cohorts this is avoidable memory/runtime. Add a sparse k-d-tree branch analogous to `_compute_nearest_treated_distance_sparse`, gated on `n > _CONLEY_SPARSE_N_THRESHOLD`. | `spillover.py::_compute_nearest_treated_distance_staggered` | follow-up (Wave B) | Low |
+| `SpilloverDiDResults` in `DiagnosticReport` dispatch tables. Wave C event-study emits a TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias that `plot_event_study` consumes via the new `reference_period` attribute fallback in `_extract_plot_data`, but `SpilloverDiDResults` is NOT registered in `DiagnosticReport`'s `_APPLICABILITY` / `_PT_METHOD` tables — so `DiagnosticReport(spillover_result)` doesn't currently route to event-study diagnostics. Registering requires (a) deciding which diagnostics apply (parallel trends, pre-trends power, heterogeneity, design-effect) AND (b) adding an end-to-end test. | `diff_diff/diagnostic_report.py::_APPLICABILITY`, `_PT_METHOD` | follow-up (Wave C) | Low |
 
 #### Performance
 
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
@@ -502,7 +502,7 @@ sp.fit(
 
 - `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up.
 - `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration)
-- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` / `diagnostic_report.event_study_diagnostics` consumption (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs share the same Wave B Gardner-GMM caveat (biased downward by a few percent; Wave D follow-up).
+- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs share the same Wave B Gardner-GMM caveat (biased downward by a few percent; Wave D follow-up).
 - Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands)
 - Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed
 
diff --git a/diff_diff/results.py b/diff_diff/results.py
@@ -415,7 +415,11 @@ class SpilloverDiDResults(DiDResults):
     # Includes the reference period row with coef=0.0, se=0.0, n_obs=0.
     event_study_effects: Optional[Dict[int, Dict[str, Any]]] = field(default=None)
     # TwoStageDiD-compatible alias for ``att_dynamic`` consumable by
-    # ``plot_event_study`` and ``diagnostic_report.event_study_diagnostics``.
+    # ``plot_event_study`` (wired in Wave C via the ``reference_period``
+    # attribute fallback in ``_extract_plot_data``). ``DiagnosticReport``
+    # routing is NOT yet wired — registering ``SpilloverDiDResults`` in
+    # ``DiagnosticReport``'s applicability/method tables is a planned
+    # follow-up (see TODO.md).
     # Schema mirrors ``two_stage.py:1355-1389``:
     #   {k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}
     # Reference row uses ``conf_int = (0.0, 0.0)`` (TwoStageDiD parity).
diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py
@@ -2272,6 +2272,25 @@ def fit(
                 np.inf,
                 d_it_per_row,
             )
+            # PR #456 R7 perf fix: derive trigger_onset_per_row directly
+            # from the static distance result for the event-study path. In
+            # the non-staggered case there's only one cohort onset, so the
+            # trigger collapses to the shared effective onset for any unit
+            # within d_bar (NaN for far-away units). Avoids hitting
+            # `_compute_event_time_per_row`'s dense-fallback cohort loop.
+            if self.event_study:
+                d_per_unit_inrange = np.array(
+                    [
+                        (
+                            shared_effective_onset
+                            if unit_to_d.get(u, np.inf) <= self._effective_d_bar
+                            else np.nan
+                        )
+                        for u in unit_vals
+                    ],
+                    dtype=np.float64,
+                )
+                trigger_onset_per_row_cached = d_per_unit_inrange
 
         # Step 6: build ring indicators per row (Butts Eq 6 time-varying form).
         ring_masks = _build_ring_indicators(d_it_per_row, list(self.rings))
diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst
@@ -190,8 +190,10 @@ planned as follow-up enhancements:
   event-time) spillover effects ``delta_jk`` as a ``att_dynamic``
   DataFrame plus MultiIndex ``spillover_effects``. The
   ``event_study_effects: Dict[int, Dict]`` alias mirrors
-  ``TwoStageDiD``'s schema for ``plot_event_study`` /
-  ``diagnostic_report.event_study_diagnostics`` consumption. Reference
+  ``TwoStageDiD``'s schema for ``plot_event_study`` consumption (the
+  plotter prefers the new ``reference_period`` attribute over the
+  legacy ``n_obs==0`` heuristic). ``DiagnosticReport`` routing for
+  ``SpilloverDiDResults`` is queued as a follow-up. Reference
   period ``-1 - anticipation`` (TwoStageDiD parity). ``horizon_max``
   bins event-times into endpoint pools (no row drop — divergence
   from TwoStageDiD's filtering semantic, intentional per
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -2996,7 +2996,7 @@ where `D_it` is the treatment indicator (1 if unit `i` is treated by time `t`) a
 
 ### Event-study mode (Wave C, `event_study=True`)
 
-Replaces the aggregate spec with the per-event-time × ring decomposition from Butts Section 5 / Table 2. Direct effects `tau_k` and per-(ring, event-time) spillover effects `delta_jk` are emitted in `att_dynamic` and a MultiIndex `spillover_effects`. A TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias (mirroring `two_stage.py:1355-1389` schema with `conf_int = (low, high)` tuple) is also emitted for consumption by `plot_event_study` and `diagnostic_report.event_study_diagnostics`.
+Replaces the aggregate spec with the per-event-time × ring decomposition from Butts Section 5 / Table 2. Direct effects `tau_k` and per-(ring, event-time) spillover effects `delta_jk` are emitted in `att_dynamic` and a MultiIndex `spillover_effects`. A TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias (mirroring `two_stage.py:1355-1389` schema with `conf_int = (low, high)` tuple) is also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (`DiagnosticReport` routing is NOT yet wired; queued as a follow-up.)
 
 **Note (two-clock K_it):** Butts Section 5 uses one symbol `K_it`, but operationally there are TWO event-time clocks. The **direct-effect clock** is `K_direct_{it} = t - effective_first_treat(i)` for ever-treated unit rows (NaN for never-treated). The **spillover-exposure clock** is `K_spill_{it} = t - min{ effective_first_treat(j) : d(i, j) ≤ d_bar AND effective_first_treat(j) ≤ t }` — time since `i` first became exposed to ANY treated neighbor (running min across activated cohorts). `K_spill` is structurally non-negative (pre-trigger rows are NaN and contribute to stage 1 only); spillover placebos at `k < 0` are therefore NOT meaningful and rectangular emission of those cells uses NaN coef + `n_obs = 0`. The trigger cohort for unit `i` is the earliest activated cohort whose treated members fall within `d_bar` of `i` — NOT necessarily the geographically nearest cohort.