spillover-tva: address CI codex R7 P2 — 2-decimal precision contract

igerber · claude · igerber · commit 99c0428319b7 · 2026-05-25T14:20:18.000-04:00
R7 noted that the notebook, README, and CHANGELOG publish 2-decimal
numbers (-4.29, -7.34, -4.53, -5.38, -2.59) but the drift tests only
enforce round-to-1 precision. Future drift smaller than 0.05 could
leave the prose stale without failing any test.

Mixed fix:

- WELL-CONDITIONED headline pins tightened to round-to-2 (matching
  what the prose actually quotes):
    * test_naive_att_endpoint_matches_quoted: -4.3 → -4.29
    * test_spillover_did_recovers_tau_total: -7.3 → -7.34
    * test_spillover_did_recovers_delta_1: -4.5 → -4.53

- BORDERLINE rings=[0,50] grid point left at round-to-1 (per the R5
  reviewer's BLAS-safety guidance). Notebook §4 prose coarsened to
  match: "-5.38" → "~-5.4" and "-2.59" → "~-2.6", with an explicit
  parenthetical explaining WHY this point uses 1-decimal precision
  while §5 uses 2 (borderline-rank-deficient design under d_bar=50
  can shift across BLAS paths).

Headline values are stable across BLAS paths to better than 0.005
(verified on Apple Silicon; cross-platform variance on well-
conditioned fits is sub-ULP). The drift-test docstrings now
explicitly document the 2-decimal vs 1-decimal contract for future
maintainers.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/tutorials/23_spillover_tva.ipynb b/docs/tutorials/23_spillover_tva.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "94144842",
+   "id": "e918f4ab",
    "metadata": {},
    "source": [
     "# Spillover-aware DiD with `SpilloverDiD` \u2014 a TVA-style worked example\n",
@@ -47,7 +47,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "25550a15",
+   "id": "793f0506",
    "metadata": {},
    "source": [
     "## 2. The synthetic panel\n",
@@ -75,7 +75,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "1d5920a5",
+   "id": "44843f78",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -157,7 +157,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "56a1f7e8",
+   "id": "777da50e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -180,7 +180,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "55bde532",
+   "id": "a06e2dfa",
    "metadata": {},
    "source": [
     "## 3. The naive headline \u2014 multi-period TWFE on the full sample\n",
@@ -194,7 +194,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bfef8286",
+   "id": "bd3092be",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -224,7 +224,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f48bbaf7",
+   "id": "9ec6a20e",
    "metadata": {},
    "source": [
     "The naive estimate is roughly **-4.29**, about 58% of the true\n",
@@ -241,7 +241,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "389eca88",
+   "id": "8ab9fcf7",
    "metadata": {},
    "source": [
     "## 4. Choosing the spillover bandwidth\n",
@@ -267,7 +267,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "771bdfd6",
+   "id": "e768b4ce",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -294,19 +294,22 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ee054b67",
+   "id": "1467301a",
    "metadata": {},
    "source": [
     "At `d_bar = 50` km the ring is too narrow: near-controls in the\n",
     "50-78 km band ARE exposed in the DGP (they're within the true 100 km\n",
     "spillover horizon and carry $\\delta_1 = -4.5$), but the ring spec\n",
     "misclassifies them as far-away clean controls. Both estimates suffer\n",
-    "from the misspecification \u2014 $\\tau$ deflates to -5.38 because the\n",
+    "from the misspecification \u2014 $\\tau$ deflates to ~-5.4 because the\n",
     "\"clean control\" arm now contains genuinely-affected units, and the\n",
-    "spillover coefficient $\\delta_1$ attenuates to -2.59 because the\n",
+    "spillover coefficient $\\delta_1$ attenuates to ~-2.6 because the\n",
     "$S = 1$ ring averages 50-78 km exposure into the cleaner\n",
     "$S = 0$ comparison. This is the registry-documented failure mode for\n",
-    "undershooting `d_bar`.\n",
+    "undershooting `d_bar`. (The exact values are quoted to one decimal\n",
+    "here rather than two because the `d_bar = 50` design is borderline-\n",
+    "rank-deficient \u2014 the two-decimal value can shift slightly across BLAS\n",
+    "paths even at the same locked seed.)\n",
     "\n",
     "At `d_bar = 100` km the ring covers the entire DGP near-control band\n",
     "(0.1\u00b0-0.7\u00b0 latitude \u2248 11-78 km). $\\tau$ recovers to -7.34 and\n",
@@ -327,7 +330,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3234a2c8",
+   "id": "626ba92d",
    "metadata": {},
    "source": [
     "## 5. Fit `SpilloverDiD` and interpret\n",
@@ -339,7 +342,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9ca37cfd",
+   "id": "68e05342",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -366,7 +369,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a0d51daa",
+   "id": "dc82c017",
    "metadata": {},
    "source": [
     "With the spillover term in the regression, `SpilloverDiD` cleanly\n",
@@ -384,7 +387,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b668b174",
+   "id": "b46af080",
    "metadata": {},
    "source": [
     "## 6. Robust inference with Conley spatial-HAC\n",
@@ -408,7 +411,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "116b4b3c",
+   "id": "380eacc7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -442,7 +445,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7169c550",
+   "id": "5e96dc31",
    "metadata": {},
    "source": [
     "Point estimates are identical across all three rows \u2014 the variance\n",
@@ -492,7 +495,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4485f89f",
+   "id": "8bf9c211",
    "metadata": {},
    "source": [
     "## 7. Practitioner takeaways and where to go next\n",
diff --git a/tests/test_t23_spillover_tva_drift.py b/tests/test_t23_spillover_tva_drift.py
@@ -241,21 +241,32 @@ def test_naive_twfe_understates_tau_total(naive_fit):
 
 
 def test_naive_att_endpoint_matches_quoted(naive_fit):
-    """§3 quoted endpoint: round-to-1 pin (looser than round-to-2 for BLAS safety)."""
-    assert round(naive_fit.att, 1) == -4.3
+    """§3 quoted endpoint: 2-decimal pin matching the published `-4.29`
+    in the notebook, README, and CHANGELOG. The well-conditioned naive
+    MultiPeriodDiD fit is stable across BLAS paths to better than 0.005,
+    so 2-decimal pinning is safe (in contrast to the borderline-rank-
+    deficient `rings=[0,50]` sensitivity point, which we keep at
+    round-to-1)."""
+    assert round(naive_fit.att, 2) == -4.29
 
 
 def test_spillover_did_recovers_tau_total(spillover_fit):
-    """§5 quoted: SpilloverDiD tau_total ≈ -7.34 ± 0.12, recovers true -7.4."""
+    """§5 quoted: SpilloverDiD tau_total = -7.34, recovers true -7.4
+    (within 0.5 tolerance bound). Endpoint pinned to 2 decimals
+    matching the published `-7.34` in the notebook, README, and
+    CHANGELOG — well-conditioned fit, BLAS-stable at 2 decimals."""
     assert abs(spillover_fit.att - TAU_TOTAL) < 0.5
-    assert round(spillover_fit.att, 1) == -7.3
+    assert round(spillover_fit.att, 2) == -7.34
 
 
 def test_spillover_did_recovers_delta_1(spillover_fit):
-    """§5 quoted: SpilloverDiD delta_1 ≈ -4.53 ± 0.07, recovers true -4.5."""
+    """§5 quoted: SpilloverDiD delta_1 = -4.53, recovers true -4.5
+    (within 0.5 tolerance bound). Endpoint pinned to 2 decimals
+    matching the published `-4.53` in the notebook, README, and
+    CHANGELOG — well-conditioned fit, BLAS-stable at 2 decimals."""
     delta_1 = float(spillover_fit.spillover_effects.iloc[0]["coef"])
     assert abs(delta_1 - DELTA_1) < 0.5
-    assert round(delta_1, 1) == -4.5
+    assert round(delta_1, 2) == -4.53
 
 
 def test_rings_sensitivity_grid_endpoints(panel):