Skip to content

Commit d4b9091

Browse files
igerberclaude
andcommitted
Address PR #402 R4 review (1 P1, 1 P3)
P1 HAD Step-3 overstated pretest coverage on weighted/survey fits: practitioner_next_steps() said did_had_pretest_workflow runs QUG on both the overall and event-study paths without noting that the workflow explicitly skips QUG whenever survey_design= / survey= / weights= is supplied (Phase 4.5 C0 deferral, had_pretests.py:4488-4495 + REGISTRY § "QUG Null Test" Note (Phase 4.5 C0)). On weighted fits the workflow emits a UserWarning and returns a linearity-conditional verdict only. Both _handle_had and _handle_had_event_study Step-3 why-text + code snippets now explicitly state that survey-weighted fits skip QUG and yield a linearity-conditional verdict (the weighted verdict is conditional on QUG holding by assumption). The event-study text also notes that joint Stute pre-trends and joint homogeneity-linearity themselves remain available under survey weighting via the PSU-level Mammen multiplier bootstrap. P3 REGISTRY § HeterogeneousAdoptionDiD requirements checklist was stale: marked "Phase 5: practitioner_next_steps() integration" and "Phase 5 (remaining): llms-full.txt section" as pending. Updated to reflect this PR landing wave 1 of Phase 5; only T21 (HAD pretest workflow tutorial) and T22 (weighted/survey HAD tutorial) remain queued, both tracked in TODO.md. Tests added (1 new, 89 total): - test_had_step_3_flags_qug_under_survey_deferral: asserts both HAD handler variants surface the QUG-under-survey skip and the linearity-conditional-verdict caveat. Without this caveat agents may assume step 1 / Design 1' vs Design 1 was checked on weighted fits when the library deliberately does not check it there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 00ed3ab commit d4b9091

3 files changed

Lines changed: 70 additions & 12 deletions

File tree

diff_diff/practitioner.py

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -857,18 +857,26 @@ def _handle_had(results: Any):
857857
baker_step=3,
858858
label="Run the HAD pretest battery",
859859
why=(
860-
"On a two-period panel did_had_pretest_workflow runs "
861-
"paper Section 4.2 step 1 (QUG support-infimum test - "
860+
"On a two-period unweighted panel did_had_pretest_workflow "
861+
"runs paper Section 4.2 step 1 (QUG support-infimum test - "
862862
"decides Design 1' vs Design 1) and step 3 (Stute / "
863863
"Yatchew-HR Assumption 8 linearity tests). Step 2 "
864864
"(Assumption 7 pre-trends) is NOT covered on the overall "
865865
"path - a single pre-period cannot support the joint "
866866
"Stute variant - and the returned verdict explicitly "
867867
"flags that gap. To close step 2, refit on a multi-period "
868-
"panel with aggregate='event_study'. Assumptions 3 / 5 / 6 "
869-
"(uniform continuity at the boundary, Design 1 sign / "
870-
"WAS_d_lower identification) are NOT testable via "
871-
"pre-trends - the workflow vets only what can be vetted."
868+
"panel with aggregate='event_study'. On survey-weighted "
869+
"fits (survey_design= / survey= / weights=) the workflow "
870+
"skips QUG with a UserWarning (permanent Phase 4.5 C0 "
871+
"deferral - extreme order statistics are not smooth "
872+
"functionals of the empirical CDF) and returns a "
873+
"linearity-conditional verdict only - so step 1 coverage "
874+
"is unweighted-only and the reported verdict on weighted "
875+
"fits is conditional on QUG holding by assumption. "
876+
"Assumptions 3 / 5 / 6 (uniform continuity at the "
877+
"boundary, Design 1 sign / WAS_d_lower identification) "
878+
"are NOT testable via pre-trends - the workflow vets only "
879+
"what can be vetted."
872880
),
873881
code=(
874882
"from diff_diff import did_had_pretest_workflow\n"
@@ -879,7 +887,9 @@ def _handle_had(results: Any):
879887
"print(report.summary())\n"
880888
"# verdict explicitly flags the Assumption 7 gap on the\n"
881889
"# overall path; aggregate='event_study' on a multi-period\n"
882-
"# panel adds joint Stute pre-trends + joint homogeneity-linearity."
890+
"# panel adds joint Stute pre-trends + joint homogeneity-linearity.\n"
891+
"# Passing survey_design= / weights= skips QUG (Phase 4.5 C0)\n"
892+
"# and returns a linearity-conditional verdict only."
883893
),
884894
step_name="parallel_trends",
885895
),
@@ -997,11 +1007,21 @@ def _handle_had_event_study(results: Any):
9971007
baker_step=3,
9981008
label="Run the HAD pretest battery (event-study mode)",
9991009
why=(
1000-
"On multi-period panels, did_had_pretest_workflow with "
1001-
"aggregate='event_study' runs QUG plus joint Stute "
1010+
"On multi-period unweighted panels, did_had_pretest_workflow "
1011+
"with aggregate='event_study' runs QUG plus joint Stute "
10021012
"pre-trends plus joint homogeneity-linearity Stute. The "
10031013
"joint Stute variants close the paper Section 4.2 step-2 "
1004-
"gap that the overall path explicitly flags as deferred."
1014+
"gap that the overall path explicitly flags as deferred. "
1015+
"On survey-weighted fits (survey_design= / survey= / "
1016+
"weights=) the workflow skips QUG with a UserWarning "
1017+
"(permanent Phase 4.5 C0 deferral) and returns a "
1018+
"linearity-conditional verdict only - so step 1 coverage "
1019+
"is unweighted-only on the event-study path too, and the "
1020+
"weighted verdict is conditional on QUG holding by "
1021+
"assumption. The joint Stute pre-trends and joint "
1022+
"homogeneity-linearity tests themselves remain available "
1023+
"under survey weighting via PSU-level Mammen multiplier "
1024+
"bootstrap."
10051025
),
10061026
code=(
10071027
"from diff_diff import did_had_pretest_workflow\n"

docs/methodology/REGISTRY.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2548,9 +2548,10 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in
25482548
- [x] Phase 3: `did_had_pretest_workflow()` composite helper. Two-period `data`-only entry point (Phase 2a overall-path dispatch); reduces panel via `_aggregate_first_difference` and runs all three IMPLEMENTED tests at a shared `alpha`. `seed` forwards to `stute_test` only (QUG and Yatchew are deterministic). Returns `HADPretestReport` with priority-ordered verdict string. Because Phase 3 ships steps 1 + 3 of the paper's four-step workflow but **not** step 2 (Assumption 7 pre-trends test via Equation 18), the fail-to-reject verdict explicitly flags the Assumption 7 gap rather than claiming unconditional TWFE safety: `"QUG and linearity diagnostics fail-to-reject; Assumption 7 pre-trends test NOT run (paper step 2 deferred to Phase 3 follow-up)"`. Verdict priority follows the paper's one-way rule (TWFE admissible only if NO test rejects): **conclusive rejections are the primary verdict and are NEVER hidden by inconclusive status** — any unresolved-step note is appended via `"; additional steps unresolved: ..."` rather than replacing the rejection. The pure `"inconclusive - QUG NaN"` / `"inconclusive - both Stute and Yatchew linearity tests NaN"` forms only fire when NO conclusive test rejects AND a required step is unresolved. The partial-workflow fail-to-reject verdict may carry a `"(Yatchew NaN - skipped)"` (or Stute) suffix when one linearity test is NaN but the other is conclusive (step 3 resolved via the paper's "Stute OR Yatchew" wording). Bundled rejection-reason strings name each failed assumption in the conclusive-rejection case. `all_pass` is `True` iff QUG is conclusive AND at least one of Stute/Yatchew is conclusive AND no conclusive test rejects. **Non-negative-dose contract**: all three raw linearity helpers (`qug_test`, `stute_test`, `yatchew_hr_test`) raise a front-door `ValueError` on any `d < 0`, mirroring the `_validate_had_panel` guard (paper Section 2 HAD support restriction). Multi-period panels pre-slice to `(F-1, F)` before calling; joint-horizon dispatch deferred to Phase 3 follow-up.
25492549
- [ ] Phase 4: Pierce-Schott (2016) replication harness reproduces Figure 2 values.
25502550
- [ ] Phase 4: Full DGP 1/2/3 coverage-rate reproduction from Table 1.
2551-
- [ ] Phase 5: `practitioner_next_steps()` integration for HAD results.
2551+
- [x] Phase 5 (wave 1, PR #402): `practitioner_next_steps()` integration for HAD results - `_handle_had` and `_handle_had_event_study` route both result classes through HAD-specific Baker et al. (2025) step guidance with bidirectional HAD ↔ ContinuousDiD Step-4 routing closure. The `_check_nan_att` helper extends to ndarray `att` (HAD event-study) via `np.all(np.isnan(arr))` semantics; scalar path bit-exact preserved.
2552+
- [x] Phase 5 (wave 1, PR #402): `llms-full.txt` HeterogeneousAdoptionDiD section + result-class blocks + `## HAD Pretests` index + Choosing-an-Estimator row landed; constructor / fit() signatures match the real API (regression-tested via `inspect.signature`); result-class field tables enumerate every public dataclass field (regression-tested via `dataclasses.fields()`); `llms-practitioner.txt` Step 4 decision tree distinguishes ContinuousDiD (per-dose ATT(d), needs never-treated) from HeterogeneousAdoptionDiD (WAS, universal-rollout-compatible).
25522553
- [x] Phase 5 (partial): README catalog one-liner, bundled `llms.txt` `## Estimators` entry, `docs/api/had.rst` (autoclass for the three classes), and `docs/references.rst` citation landed in PR #372 docs refresh.
2553-
- [ ] Phase 5 (remaining): Tutorial notebook + `llms-full.txt` HeterogeneousAdoptionDiD section (preserving the UTF-8 fingerprint).
2554+
- [ ] Phase 5 (remaining): T21 HAD pretest workflow tutorial + T22 weighted/survey HAD tutorial - tracked in `TODO.md`.
25542555
- [ ] Documentation of non-testability of Assumptions 5 and 6.
25552556
- [ ] Warnings for staggered treatment timing (redirect to `ChaisemartinDHaultfoeuille`).
25562557
- [ ] `NotImplementedError` phase pointer when `covariates=` is passed (Theorem 6 future work).

tests/test_practitioner.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -690,6 +690,43 @@ def test_handle_continuous_step_4_snippet_is_valid_python(self, mock_continuous_
690690
if code.strip():
691691
ast.parse(code) # raises SyntaxError on failure
692692

693+
def test_had_step_3_flags_qug_under_survey_deferral(
694+
self, mock_had_results, mock_had_event_study_results
695+
):
696+
# Per diff_diff/had_pretests.py:4488-4495 + REGISTRY § "QUG Null
697+
# Test" Note (Phase 4.5 C0): when survey_design= / survey= /
698+
# weights= is supplied, did_had_pretest_workflow skips the QUG
699+
# step with a UserWarning and returns a linearity-conditional
700+
# verdict only. Both HAD handler variants must surface this
701+
# caveat so agents do not assume step 1 / Design 1' vs Design 1
702+
# was checked on weighted fits when the library deliberately
703+
# cannot check it there.
704+
for fixture in (mock_had_results, mock_had_event_study_results):
705+
output = practitioner_next_steps(fixture, verbose=False)
706+
step_3_steps = [s for s in output["next_steps"] if s["baker_step"] == 3]
707+
assert len(step_3_steps) == 1
708+
text = (step_3_steps[0].get("why", "") + " " + step_3_steps[0].get("code", "")).lower()
709+
# Must mention that survey-weighted fits skip QUG.
710+
assert "skip" in text and "qug" in text, (
711+
"Step-3 text must explicitly say survey-weighted fits "
712+
"skip QUG (Phase 4.5 C0 deferral). Without this caveat "
713+
"agents may assume step 1 / Design 1' vs Design 1 was "
714+
"checked on weighted fits when the library deliberately "
715+
"does not check it there."
716+
)
717+
# Must mention "linearity-conditional" verdict OR equivalent
718+
# framing so agents know the weighted verdict is conditional
719+
# on QUG holding by assumption.
720+
assert (
721+
"linearity-conditional" in text
722+
or "linearity conditional" in text
723+
or "qug holding by assumption" in text
724+
), (
725+
"Step-3 text must describe the weighted verdict as "
726+
"linearity-conditional / conditional on QUG holding by "
727+
"assumption."
728+
)
729+
693730
def test_had_step_3_pretest_assumption_labels_correct(self, mock_had_results):
694731
# Per docs/methodology/REGISTRY.md and diff_diff/had_pretests.py
695732
# docstrings:

0 commit comments

Comments
 (0)