From b04bf2bf73f907657db2125f5e2a7dca15a18c86 Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 1 Jun 2026 19:49:33 -0400 Subject: [PATCH] HAD fit(): extensive-margin warning + covariates= NotImplementedError Two fit-time UX additions to HeterogeneousAdoptionDiD.fit() with NO change to any estimate or standard error (TODO L73 + L74): - Overall path emits a UserWarning when >=10% of units have an exactly-zero post-period dose (library-convention cutoff _HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC). A substantial untreated mass suggests a genuine extensive margin where a standard DiD may be preferable (de Chaisemartin et al. 2026, Section 2 / Assumption 3). The paper retains small untreated shares (Garrett et al. 12/2954 ~ 0.4%), so the cutoff sits ~25x above that. Overall-path-only: the event-study path REQUIRES never-treated units per Appendix B.2. Closes paper-review checklist L191. - fit(covariates=...) raises NotImplementedError via an explicit keyword-only param, pointing to the deferred Appendix B.1 / Theorem 6 covariate-adjusted extension, instead of a bare TypeError from an unknown kwarg. REGISTRY HeterogeneousAdoptionDiD documents both as library Notes + ticks the covariates implementation-checklist item; the new covariates param is propagated to the llms-full.txt guide signature block (pinned by tests/test_guides.py). New behavioral tests in test_had.py + deviation locks in TestHADDeviations. Retires the two TODO rows. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + TODO.md | 2 - diff_diff/guides/llms-full.txt | 1 + diff_diff/had.py | 77 ++++++++++ docs/methodology/REGISTRY.md | 4 +- .../papers/dechaisemartin-2026-review.md | 2 +- tests/test_had.py | 133 ++++++++++++++++++ tests/test_methodology_had.py | 64 +++++++++ 8 files changed, 280 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 09f669db..27c7aeb9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **`SyntheticControl` cross-validation + inverse-variance `V`-selection (ADH 2015 §; Abadie 2021 §3.2(a), Eq. 9).** Two new `v_method` values complete the ADH-2015/Abadie-2021 `V`-selection menu (joining `"nested"` / `"custom"`), each threaded through the in-space / leave-one-out / in-time placebo refits so a diagnostic uses the **same** estimator as the headline fit. **`v_method="cv"`** selects the diagonal predictor-importance `V` by out-of-sample cross-validation: the pre-period is split positionally at `v_cv_t0` (new constructor param; default `len(pre)//2`, Abadie 2021's `t0 = T0/2`) into a training and a validation window, `V` is chosen to minimize the validation-window outcome MSPE of the training-fit weights (`mspe_v` now reports this validation MSPE under cv), and the final reported weights are re-estimated on the validation-window predictors (ADH 2015 step 4). Each predictor spec is **re-aggregated** over each window (its mean/sum/identity recomputed over only the periods that fall in that window — a separate `dataprep` per window, exactly as ADH 2015's CV does, since R `Synth` has no built-in CV function), so the V-search is genuinely out-of-sample for every predictor type and the same `V*` drives both fits with no zeroed coordinate (`v_weights` reproduce `donor_weights` on the validation-window predictors, and `predictor_balance` is reported on that validation-window basis). **Fully-spanning precondition (fail-closed):** re-aggregating a predictor on each window requires it to be observed in **both** windows, so `cv` **requires every predictor to span both the training and validation windows** and raises `ValueError` otherwise — satisfied by ADH 2015's shared covariate / multi-period `special_predictors` (which span the windows) but NOT by the default per-period outcome lags (each is single-period and lives in one window only), so `cv` with the bare default predictors is rejected with guidance to pass spanning predictors. In-time-placebo truncation that breaks the fully-spanning precondition (a kept spec stops spanning both windows at the truncated split) marks that date `infeasible`. A second fail-closed gate covers windows that span but carry **no cross-donor variation** (every re-aggregated predictor constant across the donors, so `X0·W` is constant in `W` → a flat, unidentified weight solve that would otherwise return arbitrary "converged" weights — even when the treated unit differs, since donor distinguishability, not treated-vs-donor variation, identifies `W`): the headline fit raises `ValueError`, in-space placebo refits whose donor pool is indistinguishable in a window are dropped from the reference set, and such in-time-truncated dates are marked `infeasible`. Abadie 2021 footnote 7's CV non-uniqueness is handled by a **deterministic tie-break** (prefer the `V` closest to uniform among ties), making the selected `V*` among equally-good optima independent of the multistart evaluation order. The cv fit is reproducible for a fixed `seed` (like `nested`) but is not seed-independent — the multistart fills any slots beyond the distinct heuristic starts with seed-dependent random Dirichlet draws, so the tie-break removes start-order dependence among ties, not seed dependence. The tie-break is convergence-aware (a non-converged optimizer candidate cannot displace a converged incumbent on an objective tie). If the training-window solve that defines `mspe_v` truncates (e.g. `inner_max_iter` too small), the fit fails closed — `mspe_v=NaN` and the fit is marked non-converged — rather than reporting an invalid Eq. 9 criterion. **`v_method="inverse_variance"`** uses the closed form `v_h = 1/Var(X_h)` (variance over donors+treated on the unstandardized predictors), applied to the **raw** predictors so the effective objective is the unit-variance-rescaled `Σ_h diff_h²/Var_h` (Abadie 2021 §3.2(a)); the `standardize` pre-scaling is intentionally bypassed on this branch (inverse-variance weighting *is* the unit-variance rescaling — applying it on already-standardized rows would double-rescale to `Σ_h diff_h²/Var_h²`), so it is equivalent to uniform `V` on standardized predictors. No search (`mspe_v=None`); a zero-variance row gets 0 weight and an all-zero-variance panel falls back to uniform `V` with a warning. `custom_v` is rejected (fail-closed) for both methods and `v_cv_t0` is rejected unless `v_method="cv"`. On the degenerate **single-donor** path (`J=1` forces `w=[1]`) `V` is unidentified — every `V` yields the same synthetic — so `v_weights` is **uniform** and `mspe_v=None` for ALL `v_method`s (cv / inverse_variance included; their selected / closed-form `V` would be inert), with a `UserWarning`; the donor weights / gap / ATT are unaffected. An explicitly pinned `v_cv_t0` that no longer fits the truncated pre-fake window is nulled to the `//2` default for the placebo refit (a pinned value that still fits the truncated window is kept). **Validation:** R `Synth` has no built-in CV function (ADH 2015's CV is a manual `dataprep`+`synth` re-run), so cv is anchored by deterministic equivalence to the R-anchored `custom_v` path (the step-3 validation MSPE of the training-window fit and the step-4 validation-window weights each match a `custom_v=V*` fit on the correspondingly re-aggregated predictors) plus cv self-consistency (`in_time_placebo` under cv == a fresh cv fit on the backdated panel to 1e-7); inverse-variance is anchored bit-for-bit to a `custom_v=1/Var(X)` fit. Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (new `**Note:**` labels for the per-window re-aggregation convention, the flat-MSPE tie-break, and inverse-variance), `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`. The remaining ADH-2015 items (`W^reg` extrapolation diagnostic, sparse-SC subset search) stay tracked in `TODO.md`. - **Firpo & Possebom (2018) SCM inference paper review on file (PR-A).** Added `docs/methodology/papers/firpo-possebom-2018-review.md`, a faithful, paper-sourced fidelity review of Firpo & Possebom (2018, *Journal of Causal Inference* 6(2), DOI 10.1515/jci-2016-0026) — the Step-1 artifact for the forthcoming SCM **confidence-set / CI-by-test-inversion** track (PR-B) layered on the existing `SyntheticControl` estimator (classic SCM has no analytical SE; `se`/`p_value`/`conf_int` are NaN). Transcribes (paper-sourced only, no code-deviation verdicts) the benchmark RMSPE-ratio permutation test (Eqs. 4–6), the sensitivity-analysis parametric p-value weights with worst/best-case `φ̲`/`φ̄` (Eqs. 7–9), the sharp-null `RMSPE^f` test (Eqs. 10–13), the **confidence sets by test inversion** (Eq. 14) with the operational constant-effect CI (Eqs. 15–16) and linear-effect CS (Eqs. 17–18), the general test-statistic framework + Monte Carlo size/power of five statistics (Eq. 19, Section 5), and the multiple-outcome FWER (Eqs. 23–24) and multiple-treated-unit pooled (Eqs. 25–26) extensions; the requirements checklist flags the PR-B target (sharp-null test + constant/linear CI + benchmark + one-sided) versus the deferred sensitivity-analysis and multi-outcome/treated extensions. Docs-only; no code change. Registered in `docs/references.rst` (Synthetic Control Method section) and `docs/doc-deps.yaml`; REGISTRY `## SyntheticControl` gains a `firpo-possebom-2018-review.md` reviews-on-file pointer. +- **`HeterogeneousAdoptionDiD.fit()` fit-time extensive-margin warning + `covariates=` not-implemented pointer.** Two UX additions to the HAD `fit()` surface, with **no change to any estimate or standard error**. (1) The **overall** path now emits a `UserWarning` when a non-trivial fraction (`>= 10%`, a library-convention cutoff in `_HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC`) of units have an exactly-zero post-period dose — a genuine untreated mass for which a standard DiD using those units as controls may be more appropriate (de Chaisemartin et al. 2026, Section 2 / Assumption 3). The paper retains *small* untreated shares (e.g. 12/2954 in Garrett et al., with close-to-nominal coverage), so the 10% cutoff sits ~25× above that; the warning is **overall-path-only** because the event-study path *requires* never-treated units per Appendix B.2. Previously the recommendation surfaced only via `qug_test()`'s zero-dose warning when the user ran the pre-tests. (2) `HeterogeneousAdoptionDiD.fit(covariates=...)` now raises `NotImplementedError` with a pointer to the deferred Appendix B.1 / Theorem 6 covariate-adjusted extension (via an explicit keyword-only `covariates=` param) instead of a bare `TypeError` from an unknown kwarg; pre-residualize the outcome on the covariates as a workaround. Documented in `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD; new tests in `tests/test_had.py` and `tests/test_methodology_had.py`. ### Fixed - **Covariate names that collide with reserved structural terms now raise `ValueError` instead of silently corrupting the coefficient dict (`DifferenceInDifferences`, `MultiPeriodDiD`, `TwoWayFixedEffects`).** These estimators build their `coefficients` dict by zipping a variable-name list -- structural term names PLUS the user covariate column names appended verbatim -- with the fitted coefficient vector. A covariate whose name equaled a reserved structural name (`const`; the treatment/time column names; the `{treatment}:{time}` interaction; MultiPeriodDiD `period_{p}` dummies and `{treatment}:period_{p}` interactions; `TwoWayFixedEffects` `ATT`; fixed-effect / unit / time dummy names; or an internal `_`-prefixed working column such as `_treat_time` / `_did_treatment` / `_treatment_post`) silently **overwrote** that structural coefficient via Python dict last-write-wins -- e.g. a covariate named `const` dropped the intercept -- with no error or warning. A new shared `validate_covariate_names` helper (`diff_diff/utils.py`) is now called in each of the three `fit()` methods before the design matrix is built; it raises `ValueError` on a collision (the comparison is case-sensitive, so e.g. `Const` is still allowed) **and** on duplicate names within `covariates` (which collapse to a single dict entry the same way). Fixed-effect/unit/time dummy reserved names are taken from the same `pd.get_dummies(..., drop_first=True)` call used to build them, so they match exactly (including for pandas `Categorical` columns with a non-default category order). For `TwoWayFixedEffects` the guard fires on **all** variance paths: the default within-transform path returns only `{"ATT": att}` (no covariate is a dict key there), but a covariate named `_treatment_post` would still clobber the internal interaction column, so guarding both paths is uniform and forward-compatible. **Potentially breaking:** a fit that previously *succeeded* with a colliding (or duplicated) covariate name -- silently returning a corrupted coefficient dict -- now raises; rename the covariate column(s). The staggered / influence-function estimators (CallawaySantAnna, SunAbraham, StaggeredTripleDifference, EfficientDiD, TwoStageDiD, ImputationDiD, WooldridgeDiD, dCDH, StackedDiD) key results by `(g, t)` tuples / relative-time indices, never covariate names, and `TripleDifference` / `SyntheticControl` / `SyntheticDiD` do not expose covariates by name, so none are affected. New tests in `tests/test_utils.py`, `tests/test_estimators.py`, and `tests/test_estimators_vcov_type.py`. diff --git a/TODO.md b/TODO.md index bba75ee9..c2fb6662 100644 --- a/TODO.md +++ b/TODO.md @@ -139,8 +139,6 @@ Deferred items from PR reviews that were not addressed before merge. | `HeterogeneousAdoptionDiD` Phase 3 R-parity: Phase 3 ships coverage-rate validation on synthetic DGPs (not tight point parity against `chaisemartin::stute_test` / `yatchew_test`). Tight numerical parity requires aligning bootstrap seed semantics and `B` across numpy/R and is deferred. | `tests/test_had_pretests.py` | Phase 3 | Low | | `HeterogeneousAdoptionDiD` Phase 3 nprobust bandwidth for Stute: some Stute variants on continuous regressors use nprobust-style optimal bandwidth selection. Phase 3 uses OLS residuals from a 2-parameter linear fit (no bandwidth selection). nprobust integration is a future enhancement; not in paper scope. | `diff_diff/had_pretests.py::stute_test` | Phase 3 | Low | | `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. **Waived in tracker-promotion PR (2026-05-20):** R parity at `atol=1e-8` on the same 3 DGPs (`tests/test_did_had_parity.py`) is a strictly stronger correctness anchor than reproducing Figure 2's pointwise CIs on the LBD-restricted PNTR panel; paper Section 5.2 self-acknowledges NP estimators too noisy to be informative there. Table 1 coverage-rate MC would re-verify the CCF asymptotic coverage already pinned by R parity (Python ≡ R ≡ paper). See REGISTRY HAD Deviations Notes #3 / #4 for full scope-caveat statements. Re-open if user demand emerges for an empirical-application replication harness. | `benchmarks/`, `tests/` | Phase 2a | Low | -| `HeterogeneousAdoptionDiD` `covariates=` kwarg with Theorem 6 multivariate-covariate extension: current behavior is a Python `TypeError` (the `covariates=` kwarg is absent from `HAD.fit()` signature) — fail-closed, but doesn't surface the Theorem 6 future-work pointer to the user. Add an explicit `**kwargs`-trap with `NotImplementedError` and a Theorem 6 / `nprobust` multivariate-NP-regression pointer. ~10 LoC follow-up. | `diff_diff/had.py::HeterogeneousAdoptionDiD.fit` | follow-up | Low | -| `HeterogeneousAdoptionDiD` extensive-margin / positive-mass-of-untreated warning on the main `fit()` path. Paper recommends warning users with positive zero-dose mass that standard DiD may be more appropriate. Currently surfaced via the `qug_test()` zero-dose `UserWarning` (which only fires when the user runs pre-tests). Add a fit-time `UserWarning` when the panel's post-period dose contains a non-trivial fraction at exactly zero, with a "consider running standard DiD" pointer. Paper-review checklist L191 in `dechaisemartin-2026-review.md` left unchecked pending this addition. | `diff_diff/had.py::HeterogeneousAdoptionDiD.fit` | follow-up | Low | | `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low | | `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium | | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low | diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index 6ccf4a04..b4a89596 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -763,6 +763,7 @@ had.fit( *, survey_design: SurveyDesign | None = None, # Canonical survey-design kwarg (weights, strata, PSU, FPC) trends_lin: bool = False, # Eq 17 linear-trend detrending. Requires aggregate="event_study"; needs F>=3 (pre-period depth) for the regression; rejects ALL weighting entry paths (survey_design= / survey= / weights= all raise NotImplementedError under trends_lin). + covariates: Any | None = None, # NOT IMPLEMENTED — non-None raises NotImplementedError (deferred Appendix B.1 / Theorem 6 covariate-adjusted extension; pre-residualize the outcome on covariates as a workaround) ) -> HeterogeneousAdoptionDiDResults | HeterogeneousAdoptionDiDEventStudyResults ``` diff --git a/diff_diff/had.py b/diff_diff/had.py index a3881a8f..47b63c62 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -121,6 +121,17 @@ _MASS_POINT_VCOV_SUPPORTED = ("classical", "hc1") _MASS_POINT_VCOV_UNSUPPORTED = ("hc2", "hc2_bm") +# Extensive-margin / positive-untreated-mass warning (TODO L74). The paper (de +# Chaisemartin et al. 2026, Section 2 / Assumption 3) defines HAD for the case +# where no genuine untreated group exists, and recommends users with a real +# untreated mass consider a standard DiD instead. The paper prescribes "warn" +# but NO numeric cutoff, and explicitly RETAINS small untreated shares (Garrett +# et al.: 12/2954 ~ 0.4%, with nominal coverage), so this fit-time UserWarning +# fires only above a library-convention fraction of EXACTLY-zero post-period +# doses. Overall path ONLY — the event-study path requires never-treated units +# per Appendix B.2, so an untreated mass is expected there, not a misuse signal. +_HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC = 0.10 + # Target-parameter label per design. Design 1' targets the WAS (Assumption 3); # Design 1 targets WAS_{d_lower} (Assumption 5 or 6), which also applies to # the mass-point path (paper Section 3.2.4). @@ -2844,6 +2855,7 @@ def fit( *, survey_design: Any = None, trends_lin: bool = False, + covariates: Any = None, ) -> Union[HeterogeneousAdoptionDiDResults, HeterogeneousAdoptionDiDEventStudyResults]: """Fit the HAD estimator. @@ -2973,6 +2985,14 @@ def fit( ``survey`` / ``weights``); raises ``NotImplementedError`` if combined. Default ``False`` preserves bit-exact backcompat with all pre-PR fits. + covariates : array-like or None, default None, keyword-only + NOT YET IMPLEMENTED. Reserved for the covariate-adjusted HAD + identification of de Chaisemartin et al. (2026), Appendix B.1 / + Theorem 6 (the multivariate-covariate extension). A non-None + value raises ``NotImplementedError`` with a pointer to that + extension; pre-residualize the outcome on the covariates before + calling ``fit()``, or omit ``covariates=`` for the unconditional + WAS estimand. Returns ------- @@ -2984,6 +3004,15 @@ def fit( staggered panels auto-filters to the last cohort plus never-treated): per-event-time WAS estimates with per- horizon arrays. + + Notes + ----- + On the ``aggregate="overall"`` path, ``fit()`` emits a ``UserWarning`` + when a non-trivial fraction (``>= 10%``, a library convention) of + units have exactly-zero post-period dose — a genuine untreated mass + for which a standard DiD may be more appropriate (de Chaisemartin + et al. 2026, Section 2). The event-study path does not warn: it + *requires* never-treated units per Appendix B.2. """ # ---- aggregate / survey_design / survey / weights validation ---- if aggregate not in _VALID_AGGREGATES: @@ -2995,6 +3024,26 @@ def fit( if n_set > 1: raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN) + # ---- covariates= future-work trap (TODO L73) ---- + # Covariate-adjusted HAD identification (de Chaisemartin et al. 2026, + # Appendix B.1 / Theorem 6 — the multivariate-covariate extension) is not + # implemented. An explicit param + NotImplementedError surfaces the roadmap + # (vs the bare TypeError a missing kwarg would raise) while leaving + # unknown-kwarg typos a normal TypeError. Placed after the survey/weights + # mutex and before the event-study dispatch so the single raise covers BOTH + # aggregate="overall" and aggregate="event_study". + if covariates is not None: + raise NotImplementedError( + "HeterogeneousAdoptionDiD.fit(covariates=...) is not yet " + "implemented. Covariate-adjusted HAD identification (de " + "Chaisemartin et al. 2026, Appendix B.1 / Theorem 6 — the " + "multivariate-covariate extension) requires a multivariate " + "nonparametric regression of dY on (D, X) at the dose boundary, " + "which is not derived here. Pre-residualize the outcome on the " + "covariates before calling fit(), or omit covariates= for the " + "unconditional WAS estimand." + ) + # ---- trends_lin scope gates (PR #389 / Phase 4 R-parity). # `trends_lin=True` implements paper Eq 17 linear-trend detrending # (per-group slope from Y[F-1]-Y[F-2], applied to per-event-time @@ -3129,6 +3178,34 @@ def fit( None, ) + # ---- Extensive-margin / positive-untreated-mass warning (TODO L74) ---- + # d_arr is the per-unit post-period dose D_{g,2} (D_{g,1}=0, so dD = D_2); + # exactly-zero entries are genuinely untreated units. The `== 0.0` test + # mirrors the qug_test `d == 0` convention (had_pretests.py). Fraction-only + # (no absolute floor); fires at/above the library-convention cutoff. See the + # _HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC definition for the paper rationale. + # This runs on the overall path only: the event-study dispatch returns + # above, and the event-study path requires never-treated units (App. B.2). + n_zero = int((d_arr == 0.0).sum()) + if n_zero and n_zero / d_arr.shape[0] >= _HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC: + frac_zero = n_zero / d_arr.shape[0] + warnings.warn( + f"{n_zero}/{d_arr.shape[0]} units ({frac_zero:.0%}) have exactly-" + f"zero post-period dose. HeterogeneousAdoptionDiD targets a " + f"Weighted Average Slope under the assumption that all units " + f"receive a positive, heterogeneous dose with no genuine control " + f"group (de Chaisemartin et al. 2026, Section 2 / Assumption 3). A " + f"substantial untreated mass suggests a genuine extensive margin, " + f"where a standard DiD using the untreated units as controls may be " + f"more appropriate. (The paper retains small untreated shares — " + f"e.g. 12/2954 in Garrett et al. — with nominal coverage; this " + f"warning fires only at/above a " + f"{_HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC:.0%} library-convention " + f"cutoff.)", + UserWarning, + stacklevel=2, + ) + # Resolve survey/weights into per-unit weights + optional # ResolvedSurveyDesign (for PSU/strata/FPC composition). # - `weights=` → per-row array, no PSU/strata composition. diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 859541fc..8da65b5a 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2929,6 +2929,8 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in - **Note:** Pierce-Schott (2016) Figure 2 replication harness deferred. The paper's empirical application self-acknowledges (Section 5.2; mirrored in `dechaisemartin-2026-review.md:321`) that "NP estimators are too noisy to be informative" on the LBD-restricted PNTR panel. R parity at `atol=1e-8` on 3 DGPs × 5 method combos via `tests/test_did_had_parity.py` (bit-exact, `rtol=0`) is a stronger correctness anchor than reproducing pointwise CIs on LBD-restricted data. **Scope caveat:** R parity locks point estimate, SE, and CI bounds bit-exactly to R's bounds — it does NOT independently verify the asymptotic-coverage properties of the bias-corrected CI in small samples. Paper Table 1 documents under-coverage at small G (89% at G=100 on DGP 1, 93% at G=500, 95% at G=2500); this is inherited from the CCF asymptotic theory itself, and Python is exact-parity with R at the limit-law machinery. - **Note:** Table 1 coverage-rate reproduction deferred. Paper Section 3.1.5 reports 2,000-iter Monte Carlo coverage rates at `G ∈ {100, 500, 2500}` on DGPs 1/2/3. The existing `tests/test_did_had_parity.py` R parity at `atol=1e-8` on the same 3 DGPs reproduces the exact point estimate and SE algorithm to bit-exact tolerance; coverage-rate MC would re-verify the CCF asymptotic coverage already pinned by R parity (Python ≡ R ≡ paper) at the sample-mean level. **Scope caveat (mirrors above):** R parity does NOT re-prove asymptotic-coverage at small G; paper Table 1's 89% / 93% / 95% under-coverage band is valid for both R and Python. - **Library extension:** Staggered-timing fail-closed. Paper Appendix B.2 prescribes "Warn" when staggered treatment timing is detected; library raises `ValueError` at `diff_diff/had.py:1511` when multiple first-treat cohorts are detected without `first_treat_col`. Library extension toward stricter safety: `UserWarning` would let the silent-misuse bug class through (HAD's Appendix B.2 only identifies the LAST cohort under staggered timing); fail-closed forces the user to either supply `first_treat_col` (which activates auto-filter to last-cohort + never-treated per Appendix B.2) or redirect to `ChaisemartinDHaultfoeuille` (`did_multiplegt_dyn`). Lock in `tests/test_methodology_had.py::TestHADDeviations`. +- **Note:** Extensive-margin / positive-untreated-mass fit-time warning (library convention). The paper (de Chaisemartin et al. 2026, Section 2 / Assumption 3) defines HAD for the case where no genuine untreated group exists and recommends (Section 4 practitioner checklist) that a user with a positive mass of untreated units consider a standard DiD instead — but it prescribes only "warn" with NO numeric cutoff, and explicitly RETAINS small untreated shares (the Garrett et al. bonus-depreciation application keeps 12 untreated counties out of 2,954 ≈ 0.4%, with simulations showing close-to-nominal coverage even at `f_{D_2}(0) = 0`). The library therefore emits a `UserWarning` at `HeterogeneousAdoptionDiD.fit()` time only when the fraction of units with EXACTLY-zero post-period dose is `>= 0.10` (`_HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC` in `diff_diff/had.py`) — a 10% library-convention cutoff chosen to sit ~25× above the paper's kept 0.4% example, so valid small-share fits are not nagged while a substantial untreated mass is flagged. **Overall path only:** the warning is emitted after the `aggregate="event_study"` dispatch returns, because the event-study path REQUIRES never-treated (zero-dose) units per Appendix B.2 (the last-cohort filter retains them), so an untreated mass is expected there, not a misuse signal. Surfaces the recommendation at fit time rather than only via `qug_test()`'s zero-dose `UserWarning` (which fires only when the user runs the pretests). Lock in `tests/test_methodology_had.py::TestHADDeviations::test_extensive_margin_warning_is_10pct_library_convention`. +- **Note:** `covariates=` is reserved but NOT implemented. `HeterogeneousAdoptionDiD.fit(covariates=...)` raises `NotImplementedError` — an explicit keyword-only param, so the message points to the deferred extension instead of letting an unknown kwarg surface as a bare `TypeError`. Covariate-adjusted HAD identification is the paper's Appendix B.1 / Theorem 6 multivariate-covariate extension (a multivariate nonparametric regression of ΔY on (D, X) at the dose boundary), which is not derived in the library. Workaround: pre-residualize the outcome on the covariates before calling `fit()`, or omit `covariates=` for the unconditional WAS estimand. Lock in `tests/test_methodology_had.py::TestHADDeviations::test_covariates_not_implemented_is_documented`. **Requirements checklist (tracks implementation phase completion):** - [x] Phase 1a: Epanechnikov / triangular / uniform kernels with closed-form `κ_k` constants (`diff_diff/local_linear.py`). @@ -2978,7 +2980,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in - [x] Phase 5 (wave 2 second slice): T22 weighted/survey HAD tutorial (`docs/tutorials/22_had_survey_design.ipynb`) - shipped as the follow-up to PR #432. End-to-end walkthrough of `HeterogeneousAdoptionDiD` + `did_had_pretest_workflow` under `SurveyDesign(weights, strata, psu, fpc)` on a BRFSS-shape state-rollout panel (5 strata x 6 PSUs/stratum x 2 states/PSU = 60 states; post-stratification raking weights with CV ~ 0.30; FPC = 30 PSUs/stratum). Companion drift-test file `tests/test_t22_had_survey_design_drift.py` (32 tests pinning panel composition, naive-vs-survey SE inflation direction, design auto-detection, event-study cband-vs-pointwise width ordering, `_QUG_DEFERRED_SUFFIX` substring on `report.verdict` for both overall and event-study paths, the distinct `report.summary()` QUG-skip note on the event-study path, deterministic Yatchew sigma2_*, bootstrap p-value anchored windows of total width 0.30 (± 0.15 around seeded centers) per `feedback_strata_bootstrap_path_divergence`, workflow-surface separation between overall and event-study paths, and the weighted point-estimation contract via the `_fit_continuous` algebraic identity). - [x] Documentation of non-testability of Assumptions 5 and 6. **Closed 2026-05-20:** `HeterogeneousAdoptionDiD` class docstring carries a "Non-testable assumptions (paper Section 3.1.2)" Notes block; `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections carry "Scope (what this test does NOT cover)" clauses explicitly stating they verify ADJACENT identifying conditions (QUG: support-infimum null `d_lower = 0`; Stute / Yatchew: Assumption 8 linearity; `joint_pretrends_test`: Assumption 7 mean-independence) and CANNOT test Assumptions 5 or 6. The composite workflow verdict string does NOT mention Assumptions 5 or 6 — it only flags the Assumption 7 step-2 gap on the two-period `aggregate="overall"` path. The Assumption 5/6 non-testability caveat is surfaced separately by (a) `HAD.fit()`'s fit-time `UserWarning` in `diff_diff/had.py` (search for "---- Assumption 5/6 warning on Design 1 paths ----") which fires whenever the resolved design is Design 1 family (`continuous_near_d_lower` or `mass_point`), and (b) T21 (HAD pretest workflow tutorial) tutorial prose. - [x] Warnings for staggered treatment timing (redirect to `ChaisemartinDHaultfoeuille`). **Closed 2026-05-20:** fail-closed `ValueError` at `diff_diff/had.py:1511` (see Deviations § "Library extension: Staggered-timing fail-closed" for the rationale on raising vs warning). -- [ ] `NotImplementedError` phase pointer when `covariates=` is passed (Theorem 6 future work). **Status 2026-05-20:** current behavior is a Python `TypeError` (the `covariates=` kwarg is not in the `HAD.fit()` signature). Adding an explicit `**kwargs`-trap with `NotImplementedError` and a Theorem 6 pointer is a follow-up PR; tracked in `TODO.md` as Low priority — the existing TypeError is fail-closed. +- [x] `NotImplementedError` phase pointer when `covariates=` is passed (Theorem 6 future work). **Closed 2026-06-01:** `HAD.fit()` now takes an explicit keyword-only `covariates=None` param and raises `NotImplementedError` (with the Appendix B.1 / Theorem 6 multivariate-covariate-extension pointer + a pre-residualization workaround) when it is not None, replacing the prior bare `TypeError` from the absent kwarg. See the `- **Note:**` ("`covariates=` is reserved but NOT implemented") above and `diff_diff/had.py::HeterogeneousAdoptionDiD.fit`; locked by `tests/test_methodology_had.py::TestHADDeviations::test_covariates_not_implemented_is_documented`. --- diff --git a/docs/methodology/papers/dechaisemartin-2026-review.md b/docs/methodology/papers/dechaisemartin-2026-review.md index c943e382..375a3e1a 100644 --- a/docs/methodology/papers/dechaisemartin-2026-review.md +++ b/docs/methodology/papers/dechaisemartin-2026-review.md @@ -188,7 +188,7 @@ Alternative to Stute when `G` is large or heteroskedasticity is suspected. - [x] Yatchew heteroskedasticity-robust linearity test. **Phase 3 implementation (2026-04):** `yatchew_hr_test()` in `diff_diff/had_pretests.py`. Test statistic `T_hr = sqrt(G)·(σ²_lin - σ²_diff)/σ²_W` from paper Equation 29. `σ²_diff` normalizes by `2G` (paper-literal), NOT `2(G-1)` (finite-sample equivalent but tests pin the paper-literal form). Standard-normal critical value, one-sided. - [x] Composite workflow `did_had_pretest_workflow()` (paper Section 4.2-4.3). **Phase 3 implementation (2026-04):** `aggregate="overall"` (default, two-period) runs QUG + Stute + Yatchew on a two-period panel; step 2 is NOT run on this path because a two-period panel has no pre-period placebo horizon. **Phase 3 follow-up (2026-04):** `aggregate="event_study"` (multi-period) runs QUG at F + joint pre-trends Stute + joint homogeneity-linearity Stute; closes the paper step-2 gap. - [x] Warnings for staggered treatment timing (direct users to existing `ChaisemartinDHaultfoeuille` in diff-diff). **Phase 4 closure (2026-05-20):** fail-closed `ValueError` at `diff_diff/had.py:1511` when multiple first-treat cohorts are detected without `first_treat_col`; the error message directs the user to either supply `first_treat_col` (which activates the last-cohort + never-treated auto-filter per Appendix B.2) or to use `ChaisemartinDHaultfoeuille` (`did_multiplegt_dyn`) for full staggered support. The fail-closed choice (over `UserWarning`) is documented in REGISTRY Deviations § "Staggered-timing fail-closed" as a library extension toward stricter safety than the paper's "Warn" prescription. -- [ ] Warnings for extensive-margin effects / positive mass of untreated (not fatal; suggests running existing DiD). **Status 2026-05-20 (partial):** `qug_test()` filters zero-dose observations upfront with a `UserWarning` naming the exclusion count — surfaces the *presence* of extensive-margin / positive-mass-of-untreated units to users running pre-tests. The paper-language "suggests running existing DiD" recommendation is NOT a separate fit-time warning on the main `HeterogeneousAdoptionDiD.fit()` path; this item remains open as a Low-priority follow-up tracked in `TODO.md`. +- [x] Warnings for extensive-margin effects / positive mass of untreated (not fatal; suggests running existing DiD). **Closed 2026-06-01:** `HeterogeneousAdoptionDiD.fit()` now emits a fit-time `UserWarning` on the **overall** path when `>= 10%` of units have an exactly-zero post-period dose — pointing the user to a standard DiD per the Section 4 recommendation. The 10% cutoff is a library convention (the paper prescribes "warn" with NO numeric threshold and explicitly retains small untreated shares, e.g. Garrett et al.'s 12/2954 ≈ 0.4% with close-to-nominal coverage), chosen ~25× above that kept example. Overall-path-only because the event-study path *requires* never-treated units per Appendix B.2 (so an untreated mass is expected there, not a misuse signal). This complements the pre-existing `qug_test()` zero-dose `UserWarning`, which surfaces the *presence* of extensive-margin / positive-mass-of-untreated units only when the user runs the pre-tests. Documented in REGISTRY § HeterogeneousAdoptionDiD ("Note (Extensive-margin / positive-untreated-mass fit-time warning)"); locked by `tests/test_methodology_had.py::TestHADDeviations::test_extensive_margin_warning_is_10pct_library_convention`. - [x] Documentation of non-testability of Assumptions 5 and 6. **Phase 4 closure (2026-05-20):** `HeterogeneousAdoptionDiD.fit()` emits a `UserWarning` at fit time when `resolved_design ∈ {continuous_near_d_lower, mass_point}` (Design 1 family) explicitly flagging that point identification of `WAS_{d_lower}` requires Assumption 6, sign identification requires Assumption 5, and NEITHER is testable via pre-trends (`diff_diff/had.py`, search for "---- Assumption 5/6 warning on Design 1 paths ----"). The `HeterogeneousAdoptionDiD` class docstring + `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections cross-reference this and explicitly state that the available pre-tests verify ADJACENT identifying conditions: QUG tests the Theorem 4 / Design 1' support-infimum null `d_lower = 0` — adjacent evidence on the `d_lower = 0` clause of Assumption 4 only, NOT a test of full Assumption 4's boundary-density / conditional-mean smoothness / variance regularity statement; the raw `stute_test` / `yatchew_hr_test` helpers test Assumption 8 linearity (residuals from `dy ~ 1 + d`); `joint_pretrends_test` tests Assumption 7 mean-independence (intercept-only residuals via `null_form="mean_independence"`). None of these test Assumptions 5 or 6 directly. The composite workflow verdict string does NOT mention Assumptions 5 or 6 — it only flags the Assumption 7 step-2 gap on the two-period `aggregate="overall"` path. The Assumption 5/6 caveat is surfaced separately by the Design 1 fit-time `UserWarning` and by T21 tutorial prose. - [x] Multi-period event-study extension (Appendix B.2). **Phase 2b implementation (2026-04):** `aggregate="event_study"` returns per-event-time WAS estimates using uniform `F-1` anchor. Staggered-timing contract (see L190 closure for full statement): when `first_treat_col` is supplied, the panel auto-filters to last-cohort + never-treated units with a `UserWarning` per Appendix B.2 prescription; when omitted on a multi-cohort panel, the estimator raises `ValueError` (fail-closed, see REGISTRY § "Library extension: Staggered-timing fail-closed"). Pointwise CIs per horizon (no joint cross-horizon covariance; matches paper's Pierce-Schott Figure 2). Pre-period placebos at `e <= -2`; the anchor `e = -1` is skipped since `ΔY = 0` there by construction. - [x] Joint Stute tests (paper Section 4.2 step 2 + Section 4.3 joint extension, pages 23-25 + 32). **Phase 3 follow-up (2026-04):** `stute_joint_pretest()` (residuals-in core) + `joint_pretrends_test()` (mean-independence null) + `joint_homogeneity_test()` (linearity null) in `diff_diff/had_pretests.py`. Sum-of-CvMs aggregation, shared-η Mammen wild bootstrap across horizons (Delgado-Manteiga 2001), per-horizon exact-linear short-circuit. **Eq (18) linear-trend detrending variant SHIPPED (PR #389):** the `trends_lin: bool = False` keyword-only kwarg on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, and `joint_homogeneity_test` applies the per-group linear-trend slope `Y[g, F-1] - Y[g, F-2]` adjustment. R parity validated against `DIDHAD::did_had(..., trends_lin=TRUE)` v2.0.0 (`Credible-Answers/did_had`) — see REGISTRY § "Note (Phase 4 — Eq 17 / Eq 18 linear-trend detrending shipped)". The Pierce-Schott (2016) NUMERICAL REPLICATION against the published p=0.51 anchor on the LBD-restricted panel is waived per REGISTRY Deviations Note #3. diff --git a/tests/test_had.py b/tests/test_had.py index 2e2273c7..e9e187c2 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -5635,3 +5635,136 @@ def test_mass_point_default_vcov_robust_true_survey_allowed(self): r = est.fit(panel, "outcome", "dose", "period", "unit", survey=sd) assert r.vcov_type == "hc1" assert r.variance_formula == "survey_binder_tsl_2sls" + + +# ============================================================================= +# TODO L74: extensive-margin / positive-untreated-mass fit-time warning +# ============================================================================= + +_EXTENSIVE_MARGIN_SUBSTR = "exactly-zero post-period dose" + + +def _panel_with_zero_fraction(G, n_zero, seed=0): + """continuous_at_zero 2-period panel with EXACTLY ``n_zero`` zero post doses. + + The positive interior is drawn from Uniform(0.2, 1.0) so no accidental + zeros sneak in — the exactly-zero fraction is precisely ``n_zero / G``. + """ + rng = np.random.default_rng(seed) + d = rng.uniform(0.2, 1.0, G) + d[:n_zero] = 0.0 + dy = 0.3 * d + 0.1 * rng.standard_normal(G) + return _make_panel(d, dy) + + +class TestExtensiveMarginWarning: + """The overall ``fit()`` path warns above a 10% exactly-zero-dose cutoff. + + Locks the TODO L74 fit-time UserWarning: HAD targets a WAS assuming no + genuine untreated group, so a substantial exactly-zero (untreated) mass + suggests a real extensive margin where standard DiD may be preferable. + """ + + def test_fires_above_threshold(self): + # 40/200 = 20% exactly-zero -> warning fires. + panel = _panel_with_zero_fraction(200, 40, seed=0) + with pytest.warns(UserWarning, match=_EXTENSIVE_MARGIN_SUBSTR): + HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit") + + def test_fires_exactly_at_threshold(self): + # 20/200 = 10% exactly -> the >= cutoff fires at the boundary. + panel = _panel_with_zero_fraction(200, 20, seed=1) + with pytest.warns(UserWarning, match=_EXTENSIVE_MARGIN_SUBSTR): + HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit") + + def test_message_names_count_and_pct(self): + panel = _panel_with_zero_fraction(200, 40, seed=0) + with pytest.warns(UserWarning) as rec: + HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit") + msgs = [str(w.message) for w in rec if _EXTENSIVE_MARGIN_SUBSTR in str(w.message)] + assert len(msgs) == 1 + # Names the count/total and percentage, and points to standard DiD. + assert "40/200" in msgs[0] + assert "20%" in msgs[0] + assert "standard DiD" in msgs[0] + + def test_no_fire_all_positive(self): + # No exactly-zero units -> no extensive-margin warning. + panel = _panel_with_zero_fraction(200, 0, seed=2) + with warnings.catch_warnings(record=True) as rec: + warnings.simplefilter("always") + HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit") + assert not any(_EXTENSIVE_MARGIN_SUBSTR in str(w.message) for w in rec) + + def test_no_fire_just_below_threshold(self): + # 19/200 = 9.5% < 10% -> no warning (boundary no-fire). + panel = _panel_with_zero_fraction(200, 19, seed=3) + with warnings.catch_warnings(record=True) as rec: + warnings.simplefilter("always") + HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit") + assert not any(_EXTENSIVE_MARGIN_SUBSTR in str(w.message) for w in rec) + + def test_event_study_with_never_treated_does_not_warn(self): + # Scope lock: the event-study path REQUIRES never-treated units + # (Appendix B.2), so a 20% never-treated mass must NOT trip the + # overall-path extensive-margin warning. The warning code sits after + # the event-study dispatch returns, so it is structurally unreachable + # here — this test guards against a future re-placement regressing it. + rng = np.random.default_rng(4) + d_at_F = rng.uniform(0.2, 1.0, 200) + d_at_F[:40] = 0.0 # 20% never-treated (dose 0 at every period) + panel = _make_multi_period_panel(d_at_F, n_periods=5, F=3, seed=4) + with warnings.catch_warnings(record=True) as rec: + warnings.simplefilter("always") + _fit_es( + HeterogeneousAdoptionDiD(), + panel, + "outcome", + "dose", + "period", + "unit", + ) + assert not any(_EXTENSIVE_MARGIN_SUBSTR in str(w.message) for w in rec) + + +class TestCovariatesTrap: + """TODO L73: ``fit(covariates=...)`` raises NotImplementedError. + + Covariate-adjusted HAD (de Chaisemartin et al. 2026, Appendix B.1 / + Theorem 6) is not implemented; the explicit param surfaces the roadmap + instead of a bare ``TypeError`` from an unknown kwarg. + """ + + def test_covariates_raises_overall(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + with pytest.raises(NotImplementedError, match="Appendix B.1"): + HeterogeneousAdoptionDiD().fit( + panel, "outcome", "dose", "period", "unit", covariates=["x"] + ) + + def test_covariates_raises_event_study(self): + # Raises before the event-study dispatch, so any panel suffices. + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + with pytest.raises(NotImplementedError, match="multivariate"): + HeterogeneousAdoptionDiD().fit( + panel, + "outcome", + "dose", + "period", + "unit", + aggregate="event_study", + covariates=["x"], + ) + + def test_covariates_none_default_does_not_raise(self): + # The default covariates=None preserves the pre-PR fit path. + d, dy = _dgp_continuous_at_zero(400, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD().fit( + panel, "outcome", "dose", "period", "unit", covariates=None + ) + assert np.isfinite(r.att) diff --git a/tests/test_methodology_had.py b/tests/test_methodology_had.py index 93da68df..baa4ba02 100644 --- a/tests/test_methodology_had.py +++ b/tests/test_methodology_had.py @@ -1273,3 +1273,67 @@ def _make_event_study_panel(rng: np.random.Generator, G: int) -> pd.DataFrame: } ) return pd.DataFrame(rows) + + @staticmethod + def _zero_fraction_panel(n_zero: int, seed: int, G: int = 200) -> pd.DataFrame: + """continuous_at_zero 2-period panel with EXACTLY ``n_zero`` zero doses.""" + rng = np.random.default_rng(seed) + d = rng.uniform(0.2, 1.0, G) + d[:n_zero] = 0.0 + dy = 0.3 * d + 0.1 * rng.standard_normal(G) + units = np.repeat(np.arange(G), 2) + periods = np.tile([1, 2], G) + dose = np.column_stack([np.zeros(G), d]).ravel() + outcome = np.column_stack([np.zeros(G), dy]).ravel() + return pd.DataFrame({"unit": units, "period": periods, "dose": dose, "outcome": outcome}) + + def test_extensive_margin_warning_is_10pct_library_convention(self) -> None: + """Locks TODO L74: the extensive-margin warning is a 10% library convention. + + The paper (de Chaisemartin et al. 2026, Section 2 / Assumption 3) + prescribes warning users with a positive untreated mass but gives NO + numeric cutoff, and explicitly RETAINS small untreated shares (Garrett + et al. 12/2954 ~ 0.4%, nominal coverage). The library picks a 10% + exactly-zero-dose fraction as the fire threshold — documented in + REGISTRY § HeterogeneousAdoptionDiD. This pins both the constant and + the fire/no-fire boundary so the convention cannot drift silently. + """ + from diff_diff.had import _HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC + + assert _HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC == 0.10 + + substr = "exactly-zero post-period dose" + # At/above 10% (20/200) -> fires. + with pytest.warns(UserWarning, match=substr): + HeterogeneousAdoptionDiD().fit( + self._zero_fraction_panel(20, seed=_BASE_SEED_DEVIATIONS + 10), + "outcome", + "dose", + "period", + "unit", + ) + # Just below 10% (19/200 = 9.5%) -> does not fire. + with warnings.catch_warnings(record=True) as rec: + warnings.simplefilter("always") + HeterogeneousAdoptionDiD().fit( + self._zero_fraction_panel(19, seed=_BASE_SEED_DEVIATIONS + 11), + "outcome", + "dose", + "period", + "unit", + ) + assert not any(substr in str(w.message) for w in rec) + + def test_covariates_not_implemented_is_documented(self) -> None: + """Locks TODO L73: fit(covariates=...) raises NotImplementedError. + + Covariate-adjusted HAD identification (de Chaisemartin et al. 2026, + Appendix B.1 / Theorem 6) is deferred; the explicit ``covariates=`` + param raises NotImplementedError with the paper pointer rather than a + bare TypeError. Documented in REGISTRY § HeterogeneousAdoptionDiD. + """ + panel = self._zero_fraction_panel(1, seed=_BASE_SEED_DEVIATIONS + 12) + with pytest.raises(NotImplementedError, match="Theorem 6"): + HeterogeneousAdoptionDiD().fit( + panel, "outcome", "dose", "period", "unit", covariates=["x1"] + )