You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
P1 — HAD design label convention was reversed across T21. Per
REGISTRY:2267 + had.py:7-33, the convention is:
- Design 1' = continuous_at_zero (d_lower = 0, QUG case) — that's T21
- Design 1 = continuous_near_d_lower (d_lower > 0) — that's T20
T21 had Design 1 / Design 1' swapped throughout. Fixed in the build
script (Section 1 paper-step taxonomy, Section 2 panel framing,
Section 3 reading-the-verdict, Section 7 Extensions). Notebook
re-executed and review extract regenerated.
Two residual "QUG selects/picks the identification path" leakages from
the original prose also surfaced (Section 7 + Summary checklist). Both
contradicted the explicit QUG-vs-_detect_design separation locked by
test_had_design_auto_lands_on_continuous_at_zero. Reworded to keep the
two rules independent ("QUG fail-to-reject and `design="auto"`
heuristic both pointed independently"; "QUG is a statistical test on
H0; `design="auto"` calls _detect_design() which uses a min/median
heuristic — both pointed to continuous_at_zero on this panel").
P2 (MT1) — T21 was mapped under had_pretests.py in doc-deps.yaml but
the drift test now also locks HAD(design="auto") / _detect_design()
behavior from had.py via test_had_design_auto_lands_on_continuous_at_zero.
Add T21 entry to the had.py docs block with a note on the
_detect_design() drift coverage so a future had.py design-selection
change does not miss T21 in the manual docs-impact map.
All 16 drift tests still pass on Rust; nbmake clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/_review/t21_notebook_extract.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ This tutorial picks up where T20 left off. We re-run the brand campaign on a pan
28
28
29
29
de Chaisemartin et al. (2026) Section 4.2 lays out a four-step workflow for HAD identification:
30
30
31
-
1.**Step 1 - QUG support-infimum test (paper Theorem 4):** is the support of the dose distribution consistent with `d_lower = 0` (Design 1, `continuous_at_zero`, target = `WAS`)? Or is the support strictly above zero (Design 1', `continuous_near_d_lower`, target = `WAS_d_lower`)? The two designs identify different estimands; getting this right matters.
31
+
1.**Step 1 - QUG support-infimum test (paper Theorem 4):** is the support of the dose distribution consistent with `d_lower = 0` (Design 1', `continuous_at_zero`, target = `WAS`)? Or is the support strictly above zero (Design 1, `continuous_near_d_lower`, target = `WAS_d_lower`)? The two designs identify different estimands; getting this right matters.
32
32
2.**Step 2 - Parallel pre-trends (paper Assumption 7):** does the differenced outcome behave the same way across dose groups in the *pre-treatment* periods? Same identifying logic as classic DiD.
33
33
3.**Step 3 - Linearity / homogeneity (paper Assumption 8):** is `E[dY | D]` linear in `D`, so that the WAS reading reflects the average per-dose marginal effect rather than masking heterogeneity bias?
34
34
4.**Step 4 - Boundary continuity (paper Assumptions 5, 6):** local-linearity of the dose-response near the boundary `d_lower`. **Non-testable**; argued from domain knowledge.
@@ -37,7 +37,7 @@ The library bundles the testable steps into one entry point: `did_had_pretest_wo
37
37
38
38
## 2. The Panel
39
39
40
-
We use a panel close in shape to T20's brand campaign (60 DMAs over 8 weeks, regional add-on spend on top of a national TV blast at week 5, true per-$1K lift = 100 weekly visits). The one difference: regional spend in this tutorial is drawn from `Uniform[$0.01K, $50K]` instead of T20's `Uniform[$5K, $50K]`. The true support of the dose distribution is therefore strictly positive (down to about $10), but very near zero - some markets barely participated in the regional add-on. Two independent things follow from that small `D_(1)`. (a) The QUG test in Step 1 will fail to reject `H0: d_lower = 0`, which means the data are **statistically consistent with** the `continuous_at_zero` (Design 1) identification path even though the true simulation lower bound is positive. (b) Independently, HAD's `design="auto"` detection - which uses a separate min/median heuristic, NOT the QUG p-value (`continuous_at_zero` fires when `d.min() < 0.01 * median(|d|)`) - also lands on `continuous_at_zero` here, because `D_(1) / median(D)` is below 0.01 on this panel. Both checks point to the same identification path on this panel, but they are independent rules; the workflow's `_detect_design` does not consume the pre-test outcomes. The point of this tutorial is not to assert that the data is Design 1 from the DGP up; the point is to read what the workflow concludes from the data and what it leaves open.
40
+
We use a panel close in shape to T20's brand campaign (60 DMAs over 8 weeks, regional add-on spend on top of a national TV blast at week 5, true per-$1K lift = 100 weekly visits). The one difference: regional spend in this tutorial is drawn from `Uniform[$0.01K, $50K]` instead of T20's `Uniform[$5K, $50K]`. The true support of the dose distribution is therefore strictly positive (down to about $10), but very near zero - some markets barely participated in the regional add-on. Two independent things follow from that small `D_(1)`. (a) The QUG test in Step 1 will fail to reject `H0: d_lower = 0`, which means the data are **statistically consistent with** the `continuous_at_zero` (Design 1') identification path even though the true simulation lower bound is positive. (b) Independently, HAD's `design="auto"` detection - which uses a separate min/median heuristic, NOT the QUG p-value (`continuous_at_zero` fires when `d.min() < 0.01 * median(|d|)`) - also lands on `continuous_at_zero` here, because `D_(1) / median(D)` is below 0.01 on this panel. Both checks point to the same identification path on this panel, but they are independent rules; the workflow's `_detect_design` does not consume the pre-test outcomes. The point of this tutorial is not to assert that the data is Design 1' from the DGP up; the point is to read what the workflow concludes from the data and what it leaves open.
**Reading the overall verdict.** Three things to note.
154
154
155
-
-**Step 1 (QUG) fails to reject:** the test statistic `T = D_(1) / (D_(2) - D_(1)) ~ 3.86` lands well below its critical value (`1/alpha - 1 = 19` at alpha = 0.05); the data are statistically consistent with `d_lower = 0`. (Failing to reject is non-rejection, not proof - the true support could still be slightly above zero in finite samples; here it is, by construction of the DGP. QUG's outcome supports interpreting the data as Design 1, but the QUG test is independent of HAD's `design="auto"` selector - which uses the min/median heuristic described in Section 2 to reach the same `continuous_at_zero` decision on this panel.)
155
+
-**Step 1 (QUG) fails to reject:** the test statistic `T = D_(1) / (D_(2) - D_(1)) ~ 3.86` lands well below its critical value (`1/alpha - 1 = 19` at alpha = 0.05); the data are statistically consistent with `d_lower = 0`. (Failing to reject is non-rejection, not proof - the true support could still be slightly above zero in finite samples; here it is, by construction of the DGP. QUG's outcome supports interpreting the data as Design 1', but the QUG test is independent of HAD's `design="auto"` selector - which uses the min/median heuristic described in Section 2 to reach the same `continuous_at_zero` decision on this panel.)
156
156
-**Step 3 (linearity) fails to reject** on both Stute (CvM) and Yatchew-HR. The diagnostics do not flag heterogeneity bias on the dose dimension, so reading the WAS as an average per-dose marginal effect is supported by these tests (subject to finite-sample power).
157
157
-**Step 2 (Assumption 7 pre-trends) is not run on this path.** The verdict says so verbatim: `"Assumption 7 pre-trends test NOT run (paper step 2 deferred to Phase 3 follow-up)"`. With a single pre-period (the avg over weeks 1-4), there is nothing to compare against - we need at least two pre-periods to run a parallel-trends test on the dose dimension. The structural fields back this up: `pretrends_joint` and `homogeneity_joint` on the report are both `None` (the joint-Stute output containers don't get populated on the two-period path).
158
158
@@ -423,7 +423,7 @@ Pre-test results travel awkwardly to non-technical audiences. The template below
423
423
424
424
## 7. Extensions
425
425
426
-
This tutorial covered the composite pre-test workflow on a single panel where QUG led the workflow to select the `continuous_at_zero` (Design 1) identification path. A few directions we did not exercise here:
426
+
This tutorial covered the composite pre-test workflow on a single panel where QUG fail-to-reject and HAD's `design="auto"` heuristic both pointed independently to the `continuous_at_zero` (Design 1') identification path. A few directions we did not exercise here:
427
427
428
428
-**Survey-weighted / population-weighted inference** - HAD's pre-test workflow accepts `survey_design=` (or the deprecated `survey=` / `weights=` aliases) for design-based inference. The QUG step is permanently deferred under survey weighting (extreme-value theory under complex sampling is not a settled toolkit); the linearity family runs with PSU-level Mammen multiplier bootstrap (Stute and joint variants) and weighted OLS + weighted variance components (Yatchew). A follow-up tutorial covers this path end-to-end.
429
429
-**`trends_lin=True` (Pierce-Schott Eq 17 / 18 detrending)** - mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)`. Forwards into both joint pre-trends and joint homogeneity wrappers; consumes the placebo at `base_period - 1` and skips Step 2 if no earlier placebo survives the drop. Useful when you suspect linear time trends correlated with dose but want to keep the joint-Stute machinery.
@@ -444,5 +444,5 @@ See the [`HeterogeneousAdoptionDiD` API reference](../api/had.html) and the [`HA
444
444
- Upgrade to the multi-period (`aggregate="event_study"`) path to add the joint Stute pre-trends and joint homogeneity diagnostics. The verdict then reads "TWFE admissible under Section 4 assumptions" when none of the three testable diagnostics rejects - that is non-rejection evidence under finite-sample power and test specification, not proof.
445
445
- Step 4 (paper Assumptions 5 / 6, boundary continuity) is **non-testable** from data - argue from domain knowledge.
446
446
- The Yatchew-HR test exposes two null modes: `null="linearity"` (paper Theorem 7, default; what the workflow calls under the hood) and `null="mean_independence"` (Phase 4 R-parity with R `YatchewTest::yatchew_test(order=0)`, useful on placebo pre-period data).
447
-
- QUG fail-to-reject means the data are statistically consistent with `d_lower = 0`; it does not prove the true support starts at zero. The workflow uses the QUG outcome to pick the identification path (`continuous_at_zero` vs `continuous_near_d_lower`); finite-sample uncertainty in that decision is a remaining caveat.
447
+
- QUG fail-to-reject means the data are statistically consistent with `d_lower = 0`; it does not prove the true support starts at zero. The QUG test and HAD's `design="auto"` selector are independent rules: QUG is a statistical test on `H0: d_lower = 0`; `design="auto"` calls `_detect_design()` which uses a min/median heuristic on the dose vector. Both pointed to `continuous_at_zero` on this panel; finite-sample uncertainty in either decision is a remaining caveat.
448
448
- Bootstrap p-values are RNG-dependent. The drift test for this notebook lives in `tests/test_t21_had_pretest_workflow_drift.py` and uses tolerance bands per backend (Rust vs pure-Python).
note: "Drift-locks `HAD(design=\"auto\")` resolution to `continuous_at_zero` on T21's panel via `tests/test_t21_had_pretest_workflow_drift.py::test_had_design_auto_lands_on_continuous_at_zero`; changes to `_detect_design()` heuristic should re-validate T21"
0 commit comments