Skip to content

Commit c5a0ee3

Browse files
igerberclaude
andcommitted
Address PR #409 R4 review (1 P1, 1 P2)
P1 — HAD design label convention was reversed across T21. Per REGISTRY:2267 + had.py:7-33, the convention is: - Design 1' = continuous_at_zero (d_lower = 0, QUG case) — that's T21 - Design 1 = continuous_near_d_lower (d_lower > 0) — that's T20 T21 had Design 1 / Design 1' swapped throughout. Fixed in the build script (Section 1 paper-step taxonomy, Section 2 panel framing, Section 3 reading-the-verdict, Section 7 Extensions). Notebook re-executed and review extract regenerated. Two residual "QUG selects/picks the identification path" leakages from the original prose also surfaced (Section 7 + Summary checklist). Both contradicted the explicit QUG-vs-_detect_design separation locked by test_had_design_auto_lands_on_continuous_at_zero. Reworded to keep the two rules independent ("QUG fail-to-reject and `design="auto"` heuristic both pointed independently"; "QUG is a statistical test on H0; `design="auto"` calls _detect_design() which uses a min/median heuristic — both pointed to continuous_at_zero on this panel"). P2 (MT1) — T21 was mapped under had_pretests.py in doc-deps.yaml but the drift test now also locks HAD(design="auto") / _detect_design() behavior from had.py via test_had_design_auto_lands_on_continuous_at_zero. Add T21 entry to the had.py docs block with a note on the _detect_design() drift coverage so a future had.py design-selection change does not miss T21 in the manual docs-impact map. All 16 drift tests still pass on Rust; nbmake clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f269599 commit c5a0ee3

3 files changed

Lines changed: 57 additions & 54 deletions

File tree

docs/_review/t21_notebook_extract.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ This tutorial picks up where T20 left off. We re-run the brand campaign on a pan
2828

2929
de Chaisemartin et al. (2026) Section 4.2 lays out a four-step workflow for HAD identification:
3030

31-
1. **Step 1 - QUG support-infimum test (paper Theorem 4):** is the support of the dose distribution consistent with `d_lower = 0` (Design 1, `continuous_at_zero`, target = `WAS`)? Or is the support strictly above zero (Design 1', `continuous_near_d_lower`, target = `WAS_d_lower`)? The two designs identify different estimands; getting this right matters.
31+
1. **Step 1 - QUG support-infimum test (paper Theorem 4):** is the support of the dose distribution consistent with `d_lower = 0` (Design 1', `continuous_at_zero`, target = `WAS`)? Or is the support strictly above zero (Design 1, `continuous_near_d_lower`, target = `WAS_d_lower`)? The two designs identify different estimands; getting this right matters.
3232
2. **Step 2 - Parallel pre-trends (paper Assumption 7):** does the differenced outcome behave the same way across dose groups in the *pre-treatment* periods? Same identifying logic as classic DiD.
3333
3. **Step 3 - Linearity / homogeneity (paper Assumption 8):** is `E[dY | D]` linear in `D`, so that the WAS reading reflects the average per-dose marginal effect rather than masking heterogeneity bias?
3434
4. **Step 4 - Boundary continuity (paper Assumptions 5, 6):** local-linearity of the dose-response near the boundary `d_lower`. **Non-testable**; argued from domain knowledge.
@@ -37,7 +37,7 @@ The library bundles the testable steps into one entry point: `did_had_pretest_wo
3737

3838
## 2. The Panel
3939

40-
We use a panel close in shape to T20's brand campaign (60 DMAs over 8 weeks, regional add-on spend on top of a national TV blast at week 5, true per-$1K lift = 100 weekly visits). The one difference: regional spend in this tutorial is drawn from `Uniform[$0.01K, $50K]` instead of T20's `Uniform[$5K, $50K]`. The true support of the dose distribution is therefore strictly positive (down to about $10), but very near zero - some markets barely participated in the regional add-on. Two independent things follow from that small `D_(1)`. (a) The QUG test in Step 1 will fail to reject `H0: d_lower = 0`, which means the data are **statistically consistent with** the `continuous_at_zero` (Design 1) identification path even though the true simulation lower bound is positive. (b) Independently, HAD's `design="auto"` detection - which uses a separate min/median heuristic, NOT the QUG p-value (`continuous_at_zero` fires when `d.min() < 0.01 * median(|d|)`) - also lands on `continuous_at_zero` here, because `D_(1) / median(D)` is below 0.01 on this panel. Both checks point to the same identification path on this panel, but they are independent rules; the workflow's `_detect_design` does not consume the pre-test outcomes. The point of this tutorial is not to assert that the data is Design 1 from the DGP up; the point is to read what the workflow concludes from the data and what it leaves open.
40+
We use a panel close in shape to T20's brand campaign (60 DMAs over 8 weeks, regional add-on spend on top of a national TV blast at week 5, true per-$1K lift = 100 weekly visits). The one difference: regional spend in this tutorial is drawn from `Uniform[$0.01K, $50K]` instead of T20's `Uniform[$5K, $50K]`. The true support of the dose distribution is therefore strictly positive (down to about $10), but very near zero - some markets barely participated in the regional add-on. Two independent things follow from that small `D_(1)`. (a) The QUG test in Step 1 will fail to reject `H0: d_lower = 0`, which means the data are **statistically consistent with** the `continuous_at_zero` (Design 1') identification path even though the true simulation lower bound is positive. (b) Independently, HAD's `design="auto"` detection - which uses a separate min/median heuristic, NOT the QUG p-value (`continuous_at_zero` fires when `d.min() < 0.01 * median(|d|)`) - also lands on `continuous_at_zero` here, because `D_(1) / median(D)` is below 0.01 on this panel. Both checks point to the same identification path on this panel, but they are independent rules; the workflow's `_detect_design` does not consume the pre-test outcomes. The point of this tutorial is not to assert that the data is Design 1' from the DGP up; the point is to read what the workflow concludes from the data and what it leaves open.
4141

4242
```python
4343
import numpy as np
@@ -152,7 +152,7 @@ homogeneity_joint populated? False
152152

153153
**Reading the overall verdict.** Three things to note.
154154

155-
- **Step 1 (QUG) fails to reject:** the test statistic `T = D_(1) / (D_(2) - D_(1)) ~ 3.86` lands well below its critical value (`1/alpha - 1 = 19` at alpha = 0.05); the data are statistically consistent with `d_lower = 0`. (Failing to reject is non-rejection, not proof - the true support could still be slightly above zero in finite samples; here it is, by construction of the DGP. QUG's outcome supports interpreting the data as Design 1, but the QUG test is independent of HAD's `design="auto"` selector - which uses the min/median heuristic described in Section 2 to reach the same `continuous_at_zero` decision on this panel.)
155+
- **Step 1 (QUG) fails to reject:** the test statistic `T = D_(1) / (D_(2) - D_(1)) ~ 3.86` lands well below its critical value (`1/alpha - 1 = 19` at alpha = 0.05); the data are statistically consistent with `d_lower = 0`. (Failing to reject is non-rejection, not proof - the true support could still be slightly above zero in finite samples; here it is, by construction of the DGP. QUG's outcome supports interpreting the data as Design 1', but the QUG test is independent of HAD's `design="auto"` selector - which uses the min/median heuristic described in Section 2 to reach the same `continuous_at_zero` decision on this panel.)
156156
- **Step 3 (linearity) fails to reject** on both Stute (CvM) and Yatchew-HR. The diagnostics do not flag heterogeneity bias on the dose dimension, so reading the WAS as an average per-dose marginal effect is supported by these tests (subject to finite-sample power).
157157
- **Step 2 (Assumption 7 pre-trends) is not run on this path.** The verdict says so verbatim: `"Assumption 7 pre-trends test NOT run (paper step 2 deferred to Phase 3 follow-up)"`. With a single pre-period (the avg over weeks 1-4), there is nothing to compare against - we need at least two pre-periods to run a parallel-trends test on the dose dimension. The structural fields back this up: `pretrends_joint` and `homogeneity_joint` on the report are both `None` (the joint-Stute output containers don't get populated on the two-period path).
158158

@@ -423,7 +423,7 @@ Pre-test results travel awkwardly to non-technical audiences. The template below
423423
424424
## 7. Extensions
425425

426-
This tutorial covered the composite pre-test workflow on a single panel where QUG led the workflow to select the `continuous_at_zero` (Design 1) identification path. A few directions we did not exercise here:
426+
This tutorial covered the composite pre-test workflow on a single panel where QUG fail-to-reject and HAD's `design="auto"` heuristic both pointed independently to the `continuous_at_zero` (Design 1') identification path. A few directions we did not exercise here:
427427

428428
- **Survey-weighted / population-weighted inference** - HAD's pre-test workflow accepts `survey_design=` (or the deprecated `survey=` / `weights=` aliases) for design-based inference. The QUG step is permanently deferred under survey weighting (extreme-value theory under complex sampling is not a settled toolkit); the linearity family runs with PSU-level Mammen multiplier bootstrap (Stute and joint variants) and weighted OLS + weighted variance components (Yatchew). A follow-up tutorial covers this path end-to-end.
429429
- **`trends_lin=True` (Pierce-Schott Eq 17 / 18 detrending)** - mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)`. Forwards into both joint pre-trends and joint homogeneity wrappers; consumes the placebo at `base_period - 1` and skips Step 2 if no earlier placebo survives the drop. Useful when you suspect linear time trends correlated with dose but want to keep the joint-Stute machinery.
@@ -444,5 +444,5 @@ See the [`HeterogeneousAdoptionDiD` API reference](../api/had.html) and the [`HA
444444
- Upgrade to the multi-period (`aggregate="event_study"`) path to add the joint Stute pre-trends and joint homogeneity diagnostics. The verdict then reads "TWFE admissible under Section 4 assumptions" when none of the three testable diagnostics rejects - that is non-rejection evidence under finite-sample power and test specification, not proof.
445445
- Step 4 (paper Assumptions 5 / 6, boundary continuity) is **non-testable** from data - argue from domain knowledge.
446446
- The Yatchew-HR test exposes two null modes: `null="linearity"` (paper Theorem 7, default; what the workflow calls under the hood) and `null="mean_independence"` (Phase 4 R-parity with R `YatchewTest::yatchew_test(order=0)`, useful on placebo pre-period data).
447-
- QUG fail-to-reject means the data are statistically consistent with `d_lower = 0`; it does not prove the true support starts at zero. The workflow uses the QUG outcome to pick the identification path (`continuous_at_zero` vs `continuous_near_d_lower`); finite-sample uncertainty in that decision is a remaining caveat.
447+
- QUG fail-to-reject means the data are statistically consistent with `d_lower = 0`; it does not prove the true support starts at zero. The QUG test and HAD's `design="auto"` selector are independent rules: QUG is a statistical test on `H0: d_lower = 0`; `design="auto"` calls `_detect_design()` which uses a min/median heuristic on the dose vector. Both pointed to `continuous_at_zero` on this panel; finite-sample uncertainty in either decision is a remaining caveat.
448448
- Bootstrap p-values are RNG-dependent. The drift test for this notebook lives in `tests/test_t21_had_pretest_workflow_drift.py` and uses tolerance bands per backend (Rust vs pure-Python).

docs/doc-deps.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,9 @@ sources:
388388
- path: diff_diff/guides/llms-full.txt
389389
section: "HeterogeneousAdoptionDiD"
390390
type: user_guide
391+
- path: docs/tutorials/21_had_pretest_workflow.ipynb
392+
type: tutorial
393+
note: "Drift-locks `HAD(design=\"auto\")` resolution to `continuous_at_zero` on T21's panel via `tests/test_t21_had_pretest_workflow_drift.py::test_had_design_auto_lands_on_continuous_at_zero`; changes to `_detect_design()` heuristic should re-validate T21"
391394

392395
diff_diff/had_pretests.py:
393396
drift_risk: medium

0 commit comments

Comments
 (0)