diff --git a/CHANGELOG.md b/CHANGELOG.md index 27c7aeb9..bb7ffed8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **`SyntheticControl` cross-validation + inverse-variance `V`-selection (ADH 2015 §; Abadie 2021 §3.2(a), Eq. 9).** Two new `v_method` values complete the ADH-2015/Abadie-2021 `V`-selection menu (joining `"nested"` / `"custom"`), each threaded through the in-space / leave-one-out / in-time placebo refits so a diagnostic uses the **same** estimator as the headline fit. **`v_method="cv"`** selects the diagonal predictor-importance `V` by out-of-sample cross-validation: the pre-period is split positionally at `v_cv_t0` (new constructor param; default `len(pre)//2`, Abadie 2021's `t0 = T0/2`) into a training and a validation window, `V` is chosen to minimize the validation-window outcome MSPE of the training-fit weights (`mspe_v` now reports this validation MSPE under cv), and the final reported weights are re-estimated on the validation-window predictors (ADH 2015 step 4). Each predictor spec is **re-aggregated** over each window (its mean/sum/identity recomputed over only the periods that fall in that window — a separate `dataprep` per window, exactly as ADH 2015's CV does, since R `Synth` has no built-in CV function), so the V-search is genuinely out-of-sample for every predictor type and the same `V*` drives both fits with no zeroed coordinate (`v_weights` reproduce `donor_weights` on the validation-window predictors, and `predictor_balance` is reported on that validation-window basis). **Fully-spanning precondition (fail-closed):** re-aggregating a predictor on each window requires it to be observed in **both** windows, so `cv` **requires every predictor to span both the training and validation windows** and raises `ValueError` otherwise — satisfied by ADH 2015's shared covariate / multi-period `special_predictors` (which span the windows) but NOT by the default per-period outcome lags (each is single-period and lives in one window only), so `cv` with the bare default predictors is rejected with guidance to pass spanning predictors. In-time-placebo truncation that breaks the fully-spanning precondition (a kept spec stops spanning both windows at the truncated split) marks that date `infeasible`. A second fail-closed gate covers windows that span but carry **no cross-donor variation** (every re-aggregated predictor constant across the donors, so `X0·W` is constant in `W` → a flat, unidentified weight solve that would otherwise return arbitrary "converged" weights — even when the treated unit differs, since donor distinguishability, not treated-vs-donor variation, identifies `W`): the headline fit raises `ValueError`, in-space placebo refits whose donor pool is indistinguishable in a window are dropped from the reference set, and such in-time-truncated dates are marked `infeasible`. Abadie 2021 footnote 7's CV non-uniqueness is handled by a **deterministic tie-break** (prefer the `V` closest to uniform among ties), making the selected `V*` among equally-good optima independent of the multistart evaluation order. The cv fit is reproducible for a fixed `seed` (like `nested`) but is not seed-independent — the multistart fills any slots beyond the distinct heuristic starts with seed-dependent random Dirichlet draws, so the tie-break removes start-order dependence among ties, not seed dependence. The tie-break is convergence-aware (a non-converged optimizer candidate cannot displace a converged incumbent on an objective tie). If the training-window solve that defines `mspe_v` truncates (e.g. `inner_max_iter` too small), the fit fails closed — `mspe_v=NaN` and the fit is marked non-converged — rather than reporting an invalid Eq. 9 criterion. **`v_method="inverse_variance"`** uses the closed form `v_h = 1/Var(X_h)` (variance over donors+treated on the unstandardized predictors), applied to the **raw** predictors so the effective objective is the unit-variance-rescaled `Σ_h diff_h²/Var_h` (Abadie 2021 §3.2(a)); the `standardize` pre-scaling is intentionally bypassed on this branch (inverse-variance weighting *is* the unit-variance rescaling — applying it on already-standardized rows would double-rescale to `Σ_h diff_h²/Var_h²`), so it is equivalent to uniform `V` on standardized predictors. No search (`mspe_v=None`); a zero-variance row gets 0 weight and an all-zero-variance panel falls back to uniform `V` with a warning. `custom_v` is rejected (fail-closed) for both methods and `v_cv_t0` is rejected unless `v_method="cv"`. On the degenerate **single-donor** path (`J=1` forces `w=[1]`) `V` is unidentified — every `V` yields the same synthetic — so `v_weights` is **uniform** and `mspe_v=None` for ALL `v_method`s (cv / inverse_variance included; their selected / closed-form `V` would be inert), with a `UserWarning`; the donor weights / gap / ATT are unaffected. An explicitly pinned `v_cv_t0` that no longer fits the truncated pre-fake window is nulled to the `//2` default for the placebo refit (a pinned value that still fits the truncated window is kept). **Validation:** R `Synth` has no built-in CV function (ADH 2015's CV is a manual `dataprep`+`synth` re-run), so cv is anchored by deterministic equivalence to the R-anchored `custom_v` path (the step-3 validation MSPE of the training-window fit and the step-4 validation-window weights each match a `custom_v=V*` fit on the correspondingly re-aggregated predictors) plus cv self-consistency (`in_time_placebo` under cv == a fresh cv fit on the backdated panel to 1e-7); inverse-variance is anchored bit-for-bit to a `custom_v=1/Var(X)` fit. Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (new `**Note:**` labels for the per-window re-aggregation convention, the flat-MSPE tie-break, and inverse-variance), `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`. The remaining ADH-2015 items (`W^reg` extrapolation diagnostic, sparse-SC subset search) stay tracked in `TODO.md`. - **Firpo & Possebom (2018) SCM inference paper review on file (PR-A).** Added `docs/methodology/papers/firpo-possebom-2018-review.md`, a faithful, paper-sourced fidelity review of Firpo & Possebom (2018, *Journal of Causal Inference* 6(2), DOI 10.1515/jci-2016-0026) — the Step-1 artifact for the forthcoming SCM **confidence-set / CI-by-test-inversion** track (PR-B) layered on the existing `SyntheticControl` estimator (classic SCM has no analytical SE; `se`/`p_value`/`conf_int` are NaN). Transcribes (paper-sourced only, no code-deviation verdicts) the benchmark RMSPE-ratio permutation test (Eqs. 4–6), the sensitivity-analysis parametric p-value weights with worst/best-case `φ̲`/`φ̄` (Eqs. 7–9), the sharp-null `RMSPE^f` test (Eqs. 10–13), the **confidence sets by test inversion** (Eq. 14) with the operational constant-effect CI (Eqs. 15–16) and linear-effect CS (Eqs. 17–18), the general test-statistic framework + Monte Carlo size/power of five statistics (Eq. 19, Section 5), and the multiple-outcome FWER (Eqs. 23–24) and multiple-treated-unit pooled (Eqs. 25–26) extensions; the requirements checklist flags the PR-B target (sharp-null test + constant/linear CI + benchmark + one-sided) versus the deferred sensitivity-analysis and multi-outcome/treated extensions. Docs-only; no code change. Registered in `docs/references.rst` (Synthetic Control Method section) and `docs/doc-deps.yaml`; REGISTRY `## SyntheticControl` gains a `firpo-possebom-2018-review.md` reviews-on-file pointer. +- **`SyntheticControl` confidence sets by test inversion (Firpo & Possebom 2018 §4, PR-B).** Classic SCM gains the uncertainty quantification it has lacked — a confidence set for the treatment-effect *path* — without changing its always-NaN analytical inference contract. Two opt-in `SyntheticControlResults` methods built ON TOP of the in-space placebo: `test_sharp_null(effect, gamma=0.1)` tests a sharp null `H_0: α_1t = f(t)` (Eq 11; `effect` a scalar constant effect or a length-`n_post` post-period path) by subtracting `f(t)` from every unit's post-period gaps and re-ranking the modified RMSPE ratio `RMSPE^f` (Eqs 12–13 at `φ=0`, `v=(1,…,1)`), and `confidence_set(family="constant"|"linear", gamma=0.1, bounds=None, n_grid=200)` inverts that test into a confidence set — a constant-in-time interval (Eqs 15–16) or a linear-in-time slope set (Eqs 17–18) — keeping every value whose sharp null is not rejected at the paper's **strict** `p^f > γ` boundary (Eq 14). The whole computation is a **pure re-ranking of the gap paths `in_space_placebo()` already computes** (no synthetic-control refits): under a common-effect null the donor synthetics and the pre-period MSPE denominators are unchanged — only the post gaps shift by `f(t)` — so each grid value costs an `O(J)` rank, not a refit. With `bounds=None` the set is recovered **EXACTLY** by piecewise-constant breakpoint inversion: `p^c` is constant between the real roots of the placebo-vs-treated comparison quadratics, so `p` is evaluated once per induced interval AND at each breakpoint (a tie under `≥` can lift `p` above γ there, yielding an isolated accepted point) — NO centering/monotonicity assumption, so accepted tails, disjoint components, and unbounded/empty sets are all handled (a poor-pre-fit treated unit can have its accepted region in the tails). `bounds=(lo,hi)` instead scans a fixed grid (grid-limited); `n_grid` controls only the returned inspection table when `bounds=None`. Results: a pickle-surviving `effect_confidence_set` summary (`{family, parameter, gamma, lower, upper, contiguous, status, …}`, `status ∈ {"ran","empty","unbounded"}`) + a `get_confidence_set_df()` grid table, surfaced under `estimator_native_diagnostics.confidence_set`. **The analytical `conf_int`/`se`/`t_stat`/`p_value` stay NaN** — this is a permutation set at level `1−γ` (γ granular in `1/(J+1)`), possibly a set / unbounded / non-contiguous, so it cannot be coerced into the Wald-interval `conf_int` tuple; it is kept separate exactly as `placebo_p_value` is kept off `p_value`. **Fail-closed:** `γ < 1/(J+1)` (no value rejectable — fn 8) or a treated unit lacking the best pre-fit → `"unbounded"` (`±inf` + warning); no interval or breakpoint accepted → `"empty"` (NaN endpoints); a non-contiguous accepted region (disjoint components / an isolated singleton) → the `[lower, upper]` hull with `contiguous=False` + warning; `< 2` donors / a non-converged treated fit / an unpickled result (no placebo reference set) → `ValueError`. `test_sharp_null(0)` is held bit-for-bit equal to `placebo_p_value` (Eq 5 = Eq 13) by reusing each unit's **per-unit** floored pre-period denominator persisted from the placebo run. **Scope:** the sensitivity-analysis weights (`φ≠0`, Eqs 7–9), the general test-statistic menu (Eq 19), one-sided (§7's signed-`t` statistic), and the multiple-outcome/treated extensions (§6) are deferred (flagged in the paper review checklist). **Validation:** no R anchor (R `Synth` has no test inversion; the authors' Code Ocean capsule was not consulted) — self-consistency to the (Basque-R-anchored) `placebo_p_value`, a numpy oracle on Eqs 12–14 (incl. the strict `p=γ` boundary and the per-unit floor), invariants (the point estimate lies in the constant set for a well-posed fit; a center-rejected/tails-accepted regression; an isolated-breakpoint singleton; monotone-in-γ), and a coverage simulation. Consumes the PR-A `firpo-possebom-2018-review.md`; documented in `docs/methodology/REGISTRY.md` §SyntheticControl (new methodology block + `**Note:**` labels for the boundary convention, the grid choice, the non-analytical `conf_int` contract, and the no-R-anchor validation), `docs/api/synthetic_control.rst`, and the LLM guides. - **`HeterogeneousAdoptionDiD.fit()` fit-time extensive-margin warning + `covariates=` not-implemented pointer.** Two UX additions to the HAD `fit()` surface, with **no change to any estimate or standard error**. (1) The **overall** path now emits a `UserWarning` when a non-trivial fraction (`>= 10%`, a library-convention cutoff in `_HAD_EXTENSIVE_MARGIN_ZERO_DOSE_FRAC`) of units have an exactly-zero post-period dose — a genuine untreated mass for which a standard DiD using those units as controls may be more appropriate (de Chaisemartin et al. 2026, Section 2 / Assumption 3). The paper retains *small* untreated shares (e.g. 12/2954 in Garrett et al., with close-to-nominal coverage), so the 10% cutoff sits ~25× above that; the warning is **overall-path-only** because the event-study path *requires* never-treated units per Appendix B.2. Previously the recommendation surfaced only via `qug_test()`'s zero-dose warning when the user ran the pre-tests. (2) `HeterogeneousAdoptionDiD.fit(covariates=...)` now raises `NotImplementedError` with a pointer to the deferred Appendix B.1 / Theorem 6 covariate-adjusted extension (via an explicit keyword-only `covariates=` param) instead of a bare `TypeError` from an unknown kwarg; pre-residualize the outcome on the covariates as a workaround. Documented in `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD; new tests in `tests/test_had.py` and `tests/test_methodology_had.py`. ### Fixed diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py index fd65ea61..c48cdfe6 100644 --- a/diff_diff/diagnostic_report.py +++ b/diff_diff/diagnostic_report.py @@ -2458,6 +2458,49 @@ def _scm_native(self, r: Any) -> Dict[str, Any]: "placebo (opt-in; refits per backdated date)." ), } + + # Test-inversion confidence set (Firpo & Possebom 2018 §4): opt-in, surfaced once + # the user has run results.confidence_set() (it reuses the in-space placebo + # reference set — no refits). The analytical conf_int stays NaN; this is a SEPARATE + # permutation set at level 1 - gamma, possibly unbounded or non-contiguous. + ecs = getattr(r, "effect_confidence_set", None) + if ecs is not None: + ecs_status = ecs.get("status") + _lo, _hi = ecs.get("lower"), ecs.get("upper") + block = { + "status": ecs_status, + "family": ecs.get("family"), + "parameter": ecs.get("parameter"), + "gamma": _to_python_float(ecs.get("gamma")), + # Emit each endpoint independently: a finite float, else None for a non-finite + # side (NaN for an empty set, +/-inf for an unbounded tail) -- keeps the dict + # JSON-safe while preserving the FINITE side of a one-sided unbounded set. + "lower": float(_lo) if isinstance(_lo, (int, float)) and np.isfinite(_lo) else None, + "upper": float(_hi) if isinstance(_hi, (int, float)) and np.isfinite(_hi) else None, + "contiguous": bool(ecs.get("contiguous")), + "n_placebos": _to_python_scalar(ecs.get("n_placebos")), + } + if ecs_status == "unbounded": + block["reason"] = ( + "confidence_set() ran but the set is unbounded (gamma below the " + "1/(J+1) permutation granularity, or the treated unit lacks the best " + "pre-treatment fit); endpoint(s) are +/-inf." + ) + elif ecs_status == "empty": + block["reason"] = ( + "confidence_set() ran but the set is empty (every effect in the " + "family is rejected at gamma); endpoints are NaN." + ) + out["confidence_set"] = block + else: + out["confidence_set"] = { + "status": "not_run", + "reason": ( + "Call results.confidence_set() for a test-inversion confidence set of " + "the effect path (Firpo-Possebom 2018; opt-in, reuses the in-space " + "placebo reference set)." + ), + } return out # -- Heterogeneity helpers -------------------------------------------- diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index b4a89596..b64b532a 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -617,7 +617,7 @@ scm.fit( ) -> SyntheticControlResults ``` -**Inference:** NONE analytical — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. `att` is the mean post-period gap. Significance via in-space placebo permutation inference: `results.in_space_placebo()` reassigns treatment to each donor, refits against the other J-1 donors (the real treated unit is excluded from every placebo pool), and sets `placebo_p_value = rank/(n_placebos+1)` from the post/pre RMSPE-ratio. The permutation `placebo_p_value` is a SEPARATE field from the (NaN) `p_value`; `is_significant` stays bound to `p_value`. **ADH-2015 §4 robustness (opt-in, analytical inference unchanged):** `results.leave_one_out()` drops each reportably-weighted donor (weight > 1e-6) and re-fits (per-drop ATT/`delta_att` — large `delta_att` ⇒ single-donor dependence); `results.in_time_placebo()` backdates the intervention and checks for a spurious pre-period gap (TRUNCATE windowing — predictor windows in the held-out region are dropped). Predictor periods must lie within the pre window; `post_periods` must be a contiguous suffix cross-checked against `D` (no anticipation). +**Inference:** NONE analytical — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. `att` is the mean post-period gap. Significance via in-space placebo permutation inference: `results.in_space_placebo()` reassigns treatment to each donor, refits against the other J-1 donors (the real treated unit is excluded from every placebo pool), and sets `placebo_p_value = rank/(n_placebos+1)` from the post/pre RMSPE-ratio. The permutation `placebo_p_value` is a SEPARATE field from the (NaN) `p_value`; `is_significant` stays bound to `p_value`. **ADH-2015 §4 robustness (opt-in, analytical inference unchanged):** `results.leave_one_out()` drops each reportably-weighted donor (weight > 1e-6) and re-fits (per-drop ATT/`delta_att` — large `delta_att` ⇒ single-donor dependence); `results.in_time_placebo()` backdates the intervention and checks for a spurious pre-period gap (TRUNCATE windowing — predictor windows in the held-out region are dropped). **Confidence sets by test inversion (Firpo-Possebom 2018 §4, opt-in, also non-analytical):** `results.test_sharp_null(effect, gamma=0.1)` tests `H_0: α_1t = f(t)` by re-ranking the in-space placebo gaps (no refits; `test_sharp_null(0)` == `placebo_p_value`), and `results.confidence_set(family="constant"|"linear", gamma=0.1)` inverts it into a confidence set for the effect path (constant-effect interval / linear-slope set, strict `p>gamma`), surfaced on `effect_confidence_set` / `get_confidence_set_df()` — the analytical `conf_int` still stays NaN. Predictor periods must lie within the pre window; `post_periods` must be a contiguous suffix cross-checked against `D` (no anticipation). **Usage:** @@ -1300,8 +1300,9 @@ Returned by `SyntheticControl.fit()`. | `pre_periods`, `post_periods` | `list` | Calendar-sorted periods | | `v_method`, `standardize` | `str` | Echoed configuration | | `v_cv_t0` | `int \| None` | Resolved cv train/validation split index (None unless v_method="cv") | +| `effect_confidence_set` | `dict \| None` | Test-inversion confidence-set summary (Firpo-Possebom 2018 §4): `{family, parameter, gamma, lower, upper, contiguous, status ("ran"/"empty"/"unbounded"), ...}`; None until `confidence_set()` runs. SEPARATE from the always-NaN analytical `conf_int` (a permutation set at level 1−gamma, possibly a set/unbounded). | -**Methods:** `in_space_placebo()` (opt-in permutation inference; refits one synthetic control per donor), `get_placebo_df()` (per-unit RMSPE-ratio table incl. the treated row), `leave_one_out()` (ADH-2015 §4 donor robustness; drops each reportably-weighted donor (weight > 1e-6) → per-drop ATT/`delta_att` table) + `get_leave_one_out_df()`/`get_leave_one_out_gaps()`, `in_time_placebo()` (ADH-2015 §4 backdating placebo; reassigns the intervention earlier, TRUNCATE windowing, placebo ATT ~0 if no real pre-effect) + `get_in_time_placebo_df()`/`get_in_time_placebo_gaps()`, `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_gap_df()`, `get_weights_df()` +**Methods:** `in_space_placebo()` (opt-in permutation inference; refits one synthetic control per donor), `get_placebo_df()` (per-unit RMSPE-ratio table incl. the treated row), `leave_one_out()` (ADH-2015 §4 donor robustness; drops each reportably-weighted donor (weight > 1e-6) → per-drop ATT/`delta_att` table) + `get_leave_one_out_df()`/`get_leave_one_out_gaps()`, `in_time_placebo()` (ADH-2015 §4 backdating placebo; reassigns the intervention earlier, TRUNCATE windowing, placebo ATT ~0 if no real pre-effect) + `get_in_time_placebo_df()`/`get_in_time_placebo_gaps()`, `test_sharp_null(effect, gamma=0.1)` (Firpo-Possebom 2018 §4: test a sharp null α_1t=f(t) by re-ranking the in-space placebo gaps — `effect` is a scalar or a post-period array; `test_sharp_null(0)` is identically `placebo_p_value`), `confidence_set(family="constant"|"linear", gamma=0.1, bounds=None, n_grid=200)` (invert that test for a confidence set of the effect path — a constant-effect interval / linear-slope set; strict p>gamma membership; exact piecewise-constant breakpoint inversion when bounds=None, else a fixed grid; `conf_int` stays NaN) + `get_confidence_set_df()`, `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_gap_df()`, `get_weights_df()` ### TripleDifferenceResults diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index 250068de..50faa40b 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -60,7 +60,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance - [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=`), both composed with the Wave D Gardner GMM correction; `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores at the meat-helper boundary (Wave E.3, R `svyrecvar(subset())` form) (replicate weights queued as follow-up) - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units -- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html): Abadie, Diamond & Hainmueller (2010) classic synthetic control for ONE treated unit — donor-weight counterfactual, predictor-importance V via nested / cv (out-of-sample, ADH 2015; needs predictors spanning both train/val windows, so default single-period lags are rejected) / inverse-variance (1/Var on raw predictors, bypasses standardize) / custom, gap path + pre-RMSPE; no analytical SE (inference fields NaN), significance via in-space placebo permutation inference (`in_space_placebo()`, post/pre RMSPE-ratio, p = rank/(n_placebos+1)); ADH-2015 §4 robustness: `leave_one_out()` donor-robustness + `in_time_placebo()` backdating placebo +- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html): Abadie, Diamond & Hainmueller (2010) classic synthetic control for ONE treated unit — donor-weight counterfactual, predictor-importance V via nested / cv (out-of-sample, ADH 2015; needs predictors spanning both train/val windows, so default single-period lags are rejected) / inverse-variance (1/Var on raw predictors, bypasses standardize) / custom, gap path + pre-RMSPE; no analytical SE (inference fields NaN), significance via in-space placebo permutation inference (`in_space_placebo()`, post/pre RMSPE-ratio, p = rank/(n_placebos+1)); ADH-2015 §4 robustness: `leave_one_out()` donor-robustness + `in_time_placebo()` backdating placebo; confidence sets by test inversion (Firpo-Possebom 2018 §4): `test_sharp_null()` + `confidence_set(family="constant"|"linear")` re-rank the placebo gaps into a confidence set for the effect path (the analytical `conf_int` stays NaN) - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves - [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html): de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲=0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲>0`, continuous-near-d̲ or mass-point), with multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release (repeated cross-sections rejected by the validator). Alias `HAD`. diff --git a/diff_diff/synthetic_control.py b/diff_diff/synthetic_control.py index 5a04fa75..7b396047 100644 --- a/diff_diff/synthetic_control.py +++ b/diff_diff/synthetic_control.py @@ -1761,6 +1761,22 @@ def _mspe(gap_path: Dict[Any, float], periods: List[Any]) -> float: return float(np.mean(g**2)) +def _floored_pre_mspe(pre_gaps: np.ndarray, scale: float) -> float: + """Pre-period MSPE with the scale-aware floor used as the RMSPE-ratio denominator. + + ``= max(mean(pre_gaps**2), 1e-8 * max(scale, 1)**2)``. Factored out of + ``_rmspe_ratio`` so the sharp-null test inversion (Firpo & Possebom 2018) can reuse + the SAME per-unit floored denominator the in-space placebo used — the floor scale is + PER-UNIT (``scale`` = ``max|Z1|`` of that unit's pre-period outcomes), so this is + what guarantees ``test_sharp_null(0) == placebo_p_value`` bit-for-bit even when the + floor bites (a near-perfect pre-fit). The denominator is GRID-INVARIANT under a sharp + null whose ``f`` is zero in the pre-period (the operational constant/linear families). + """ + pre_mspe = float(np.mean(pre_gaps**2)) if pre_gaps.size else float("nan") + floor = 1e-8 * max(float(scale), 1.0) ** 2 + return max(pre_mspe, floor) + + def _rmspe_ratio(pre_gaps: np.ndarray, post_gaps: np.ndarray, scale: float) -> float: """Post/pre RMSPE ratio — the in-space placebo test statistic (ADH 2010 §2.4). @@ -1777,10 +1793,339 @@ def _rmspe_ratio(pre_gaps: np.ndarray, post_gaps: np.ndarray, scale: float) -> f ``scale`` is the magnitude of the unit's pre-period outcomes. Mirrors the ``_fit_tol`` poor-fit guard in ``fit()``. """ - pre_mspe = float(np.mean(pre_gaps**2)) if pre_gaps.size else float("nan") post_mspe = float(np.mean(post_gaps**2)) if post_gaps.size else float("nan") - floor = 1e-8 * max(float(scale), 1.0) ** 2 - return float(np.sqrt(post_mspe / max(pre_mspe, floor))) + return float(np.sqrt(post_mspe / _floored_pre_mspe(pre_gaps, scale))) + + +# ============================================================================= +# Sharp-null test inversion (Firpo & Possebom 2018, "Synthetic Control Method: +# Inference, Sensitivity Analysis and Confidence Sets," J. Causal Inference 6(2)). +# These helpers RE-RANK the gap paths the in-space placebo already computed (Eqs +# 12-13, φ=0, v=(1,...,1)); no synthetic-control refits. The benchmark f≡0 case is +# identically the existing in-space placebo permutation (Eq 5 = Eq 13 at f≡0). +# ============================================================================= + + +def _constant_f_post(c: float, n_post: int) -> np.ndarray: + """Constant-in-time post-period effect path f(t)=c (Firpo & Possebom Eq 15).""" + return np.full(n_post, float(c), dtype=float) + + +def _linear_f_post(c_tilde: float, n_post: int) -> np.ndarray: + """Linear-in-time post-period effect path f(t)=c̃·(t−T0) (Firpo & Possebom Eq 17). + + ``(t−T0)`` is the 1-based post-period index (1 for the first post period), so the + path is ``c̃·[1, 2, …, n_post]`` aligned to calendar-sorted ``post_periods``. + """ + return float(c_tilde) * np.arange(1, n_post + 1, dtype=float) + + +def _rmspe_f_ratio( + gap_path: Dict[Any, float], + post_periods: List[Any], + f_post: np.ndarray, + pre_denom: float, +) -> float: + """RMSPE^f ratio under a common-effect sharp null (Firpo & Possebom Eq 12). + + Subtracts the post-period effect path ``f_post`` (length ``len(post_periods)``, + aligned to calendar-sorted ``post_periods``) from the unit's post gaps, then divides + by the GRID-INVARIANT floored pre-period denominator ``pre_denom`` (the pre window is + ``f``-free for the operational constant/linear families, Eqs 15/17, so the + denominator is unchanged across the inversion grid). Returns + ``sqrt(mean_post((g_t − f_t)**2) / pre_denom)``. + """ + post = np.array([gap_path[p] for p in post_periods], dtype=float) + resid = post - np.asarray(f_post, dtype=float) + post_mspe = float(np.mean(resid**2)) if resid.size else float("nan") + return float(np.sqrt(post_mspe / pre_denom)) + + +def _sharp_null_pvalue( + treated_gap: Dict[Any, float], + placebo_gaps: Dict[Any, Dict[Any, float]], + post_periods: List[Any], + f_post: np.ndarray, + pre_denoms: Dict[Any, float], + treated_id: Any, +) -> Tuple[float, float, int]: + """Permutation p-value for a common-effect sharp null (Firpo & Possebom Eq 13, φ=0, v=1). + + ``p^f = (1 + #{converged placebos j : RMSPE^f_j ≥ RMSPE^f_1}) / (n_ref + 1)`` — the + treated unit is the "+1" and ``n_ref`` is the number of converged placebos (the SAME + reference set the in-space placebo built). ``placebo_gaps`` maps each converged donor + to its gap path; ``pre_denoms`` maps each unit (treated + those donors) to its floored + pre-denominator. Ties counted via ``>=`` so the p-value is conservative, matching + ``in_space_placebo``. Returns ``(p, rmspe_f_treated, n_ref)``. + """ + f_post = np.asarray(f_post, dtype=float) + r1 = _rmspe_f_ratio(treated_gap, post_periods, f_post, pre_denoms[treated_id]) + n_ref = 0 + n_ge = 0 + for j, g in placebo_gaps.items(): + rj = _rmspe_f_ratio(g, post_periods, f_post, pre_denoms[j]) + n_ref += 1 + if rj >= r1: + n_ge += 1 + p = (1 + n_ge) / (n_ref + 1) + return p, r1, n_ref + + +def _invert_sharp_null( + treated_gap: Dict[Any, float], + placebo_gaps: Dict[Any, Dict[Any, float]], + post_periods: List[Any], + pre_denoms: Dict[Any, float], + treated_id: Any, + family: str, + gamma: float, + *, + bounds: Optional[Tuple[float, float]] = None, + n_grid: int = 200, +) -> Dict[str, Any]: + """Invert the sharp-null RMSPE^f test over a one-parameter effect family. + + Returns the confidence set ``{ param : p^param > gamma }`` (Firpo & Possebom Eqs + 14/16/18; STRICT inequality -- ``p == gamma`` is excluded). ``family`` is ``"constant"`` + (``f(t)=param``, Eq 15) or ``"linear"`` (``f(t)=param*(t-T0)``, Eq 17). + + ``p^param`` is a piecewise-constant step function of the scalar ``param`` (each placebo's + indicator flips only at the real roots of a quadratic in ``param``), so with + ``bounds=None`` the EXACT set is recovered with NO shape assumption (no "centered + interval" / monotonicity): the placebo breakpoints partition the line, ``p`` is evaluated + once per induced interval (and the two unbounded tails), and the union of intervals with + ``p > gamma`` is the set -- correctly handling accepted tails, disjoint components, and the + empty / unbounded cases. With explicit ``bounds`` a fixed ``linspace(*bounds, n_grid)`` + grid is scanned instead (grid-limited membership). Each ``p`` is a pure O(J) re-ranking. + + Result dict: ``{lower, upper, contiguous, status, n_ref, point_estimate, grid}`` where + ``status`` is ``"ran"`` / ``"empty"`` / ``"unbounded"``, ``point_estimate`` is the treated + unit's own RMSPE-minimizing ``param`` (the ATT for constant, the no-intercept OLS slope + for linear -- reported for reference, NOT assumed to maximize ``p``), and ``grid`` is a + list of ``(param, p_value, in_set)`` rows for the returned table. + """ + if family not in ("constant", "linear"): + raise ValueError(f"family must be 'constant' or 'linear', got {family!r}") + n_post = len(post_periods) + # Family shape s_t and per-unit moments of A_u(param) = mean_t((g_ut - param*s_t)**2) + # = S2*param**2 - 2*P_u*param + Q_u (s_t = 1 constant / (t - T0) linear; S2 = mean(s**2)). + s = ( + np.ones(n_post, dtype=float) + if family == "constant" + else np.arange(1, n_post + 1, dtype=float) + ) + S2 = float(np.mean(s**2)) if n_post else 0.0 + + def moments(gap_path: Dict[Any, float], unit: Any) -> Tuple[float, float, float]: + g = np.array([gap_path[p] for p in post_periods], dtype=float) + return float(np.mean(g * s)), float(np.mean(g**2)), pre_denoms[unit] + + P1, Q1, D1 = moments(treated_gap, treated_id) + placebo_mom = [moments(gp, j) for j, gp in placebo_gaps.items()] + n_ref = len(placebo_mom) + # The treated unit's own RMSPE-minimizing param: reported as the point estimate ONLY + # (it is NOT assumed to maximize p -- see the exact-inversion comment below). + center = float(P1 / S2) if S2 > 0 else 0.0 + + def pval(param: float) -> float: + # p^param = (1 + #{placebos j : A_j/D_j >= A_1/D_1}) / (n_ref + 1). Comparing A/D is + # monotone-equivalent to comparing the RMSPE ratios sqrt(A/D), so this is identical to + # _sharp_null_pvalue / test_sharp_null (the squared form just avoids the sqrt). + r1 = ((S2 * param - 2.0 * P1) * param + Q1) / D1 + n_ge = sum( + 1 for (Pj, Qj, Dj) in placebo_mom if ((S2 * param - 2.0 * Pj) * param + Qj) / Dj >= r1 + ) + return (1 + n_ge) / (n_ref + 1) + + def grid_rows(lo: float, hi: float) -> List[Tuple[float, float, bool]]: + if not (np.isfinite(lo) and np.isfinite(hi)) or hi <= lo: + return [] + out: List[Tuple[float, float, bool]] = [] + for x in np.linspace(lo, hi, max(int(n_grid), 2)): + p = pval(float(x)) + out.append((float(x), float(p), bool(p > gamma))) + return out + + # ----- fixed-grid path (user-supplied bounds) ----- + if bounds is not None: + lo_b, hi_b = float(bounds[0]), float(bounds[1]) + if hi_b <= lo_b: + raise ValueError(f"bounds must be (lo, hi) with hi > lo, got {bounds!r}") + rows = grid_rows(lo_b, hi_b) + accepted = [x for x, _, ok in rows if ok] + if not accepted: + return { + "lower": np.nan, + "upper": np.nan, + "contiguous": True, + "status": "empty", + "n_ref": n_ref, + "point_estimate": center, + "grid": rows, + } + lower, upper = min(accepted), max(accepted) + contiguous = all(ok for x, _, ok in rows if lower <= x <= upper) + return { + "lower": lower, + "upper": upper, + "contiguous": contiguous, + "status": "ran", + "n_ref": n_ref, + "point_estimate": center, + "grid": rows, + } + + # ----- exact piecewise-constant inversion (bounds=None; no shape assumption) ----- + # p^param is a step function of param: each placebo's indicator flips only at the real + # roots of A_j(param)*D1 - A_1(param)*Dj = 0, a quadratic in param. So the EXACT set is + # recovered with NO "centered interval" / monotonicity assumption -- collect every placebo + # breakpoint, evaluate p once on each open interval they induce (plus the two unbounded + # tails), and union the intervals with p > gamma. This correctly handles accepted tails, + # disjoint components, and the empty / unbounded cases: a poor-pre-fit treated unit can + # have its accepted region in the TAILS, not around the point estimate. + if gamma < 1.0 / (n_ref + 1): + # p >= 1/(J+1) everywhere (the treated ranks itself), so nothing is ever rejected + # (Firpo & Possebom fn 8 -- the discrete granularity); the set is all of R. + return { + "lower": -np.inf, + "upper": np.inf, + "contiguous": True, + "status": "unbounded", + "n_ref": n_ref, + "point_estimate": center, + "grid": [], + } + + def quad_roots(a: float, b: float, c: float) -> List[float]: + # Real roots of a*x**2 + b*x + c, degenerate-safe (linear / constant fall through). + if abs(a) <= 1e-15: + return [] if abs(b) <= 1e-15 else [-c / b] + disc = b * b - 4.0 * a * c + if disc < 0.0: + return [] + sq = disc**0.5 + return [(-b - sq) / (2.0 * a), (-b + sq) / (2.0 * a)] + + # g_j(param) = A_j(param)*D1 - A_1(param)*Dj = a*param^2 + b*param + c; the indicator + # 1[g_j(param) >= 0] == 1[RMSPE^param_j >= RMSPE^param_1], and breakpoints (where an + # indicator flips) are the real roots of each g_j. + coeffs = [ + (S2 * (D1 - Dj), -2.0 * (Pj * D1 - P1 * Dj), Qj * D1 - Q1 * Dj) + for (Pj, Qj, Dj) in placebo_mom + ] + raw_breaks: List[float] = [] + for a, b, c in coeffs: + raw_breaks.extend(quad_roots(a, b, c)) + breaks: List[float] = [] + for r in sorted(raw_breaks): + # dedup near-equal roots so each interval midpoint stays strictly interior + if not breaks or abs(r - breaks[-1]) > 1e-12 + 1e-9 * abs(r): + breaks.append(r) + + def pval_break(r: float) -> float: + # p AT a breakpoint: strict-`>` membership (Eq 14) combined with the conservative + # tie-counting `>=` (Eqs 5/13) means a placebo that EXACTLY ties the treated at a root + # is counted, so p can spike above gamma there even when both neighboring open + # intervals are rejected (a tangent / co-located root => an isolated accepted point). + # A relative tolerance registers the tie robustly in floating point. + n_ge = 0 + for a, b, c in coeffs: + gj = (a * r + b) * r + c + if gj >= -(1e-9 * (abs(a) * r * r + abs(b) * abs(r) + abs(c)) + 1e-12): + n_ge += 1 + return (1 + n_ge) / (n_ref + 1) + + if not breaks: + # p is constant in param (e.g. every placebo shares the treated's moments + denom). + if pval(center) > gamma: + return { + "lower": -np.inf, + "upper": np.inf, + "contiguous": True, + "status": "unbounded", + "n_ref": n_ref, + "point_estimate": center, + "grid": [], + } + return { + "lower": np.nan, + "upper": np.nan, + "contiguous": True, + "status": "empty", + "n_ref": n_ref, + "point_estimate": center, + "grid": [], + } + + # Atoms along the line: cell_0, b_0, cell_1, b_1, ..., b_{m-1}, cell_m (2m+1 atoms). p is + # constant on each open cell (sampled at an interior point) and is evaluated WITH the tie + # tolerance at each breakpoint. The set = union of accepted atoms; consecutive accepted + # atoms share a boundary => one connected component, a rejected atom splits components + # (so an accepted breakpoint whose neighbor cells are both rejected is a singleton). + m = len(breaks) + cell_pts = ( + [breaks[0] - 1.0] + + [0.5 * (breaks[k] + breaks[k + 1]) for k in range(m - 1)] + + [breaks[-1] + 1.0] + ) + cell_in = [pval(x) > gamma for x in cell_pts] + break_in = [pval_break(r) > gamma for r in breaks] + acc = [cell_in[i // 2] if i % 2 == 0 else break_in[(i - 1) // 2] for i in range(2 * m + 1)] + if not any(acc): + return { + "lower": np.nan, + "upper": np.nan, + "contiguous": True, + "status": "empty", + "n_ref": n_ref, + "point_estimate": center, + "grid": [], + } + + def atom_extent(i: int) -> Tuple[float, float]: + if i % 2 == 0: + k = i // 2 # open cell k = (left boundary, right boundary) + return (-np.inf if k == 0 else breaks[k - 1], np.inf if k == m else breaks[k]) + b = breaks[(i - 1) // 2] # a breakpoint singleton + return (b, b) + + components: List[Tuple[float, float]] = [] + i = 0 + while i < len(acc): + if not acc[i]: + i += 1 + continue + j = i + while j + 1 < len(acc) and acc[j + 1]: + j += 1 + components.append((atom_extent(i)[0], atom_extent(j)[1])) + i = j + 1 + + lower = min(comp[0] for comp in components) + upper = max(comp[1] for comp in components) + contiguous = len(components) == 1 + status = "ran" if (np.isfinite(lower) and np.isfinite(upper)) else "unbounded" + # Display grid: the finite hull, else a padded breakpoint range (for inspection only). + if np.isfinite(lower) and np.isfinite(upper): + if upper <= lower: + # A pure singleton set (lower == upper, a lone accepted breakpoint): a zero-width + # range yields an empty grid_rows(), so emit the single accepted point explicitly + # (with its tie-spike p) — the inspection table must reflect the non-empty set. + grid = [(float(lower), float(pval_break(lower)), True)] + else: + grid = grid_rows(lower, upper) + else: + pad = max(1.0, breaks[-1] - breaks[0]) + grid = grid_rows(breaks[0] - pad, breaks[-1] + pad) + return { + "lower": float(lower), + "upper": float(upper), + "contiguous": bool(contiguous), + "status": status, + "n_ref": n_ref, + "point_estimate": center, + "grid": grid, + } def _placebo_fit_unit( diff --git a/diff_diff/synthetic_control_results.py b/diff_diff/synthetic_control_results.py index 60ef44ad..55866f3d 100644 --- a/diff_diff/synthetic_control_results.py +++ b/diff_diff/synthetic_control_results.py @@ -200,6 +200,16 @@ class SyntheticControlResults: rmspe_ratio: float = np.nan n_placebos: int = 0 n_failed: int = 0 + # Confidence set for the treatment-effect path by test inversion (Firpo & Possebom + # 2018, "Synthetic Control Method: Inference, Sensitivity Analysis and Confidence + # Sets," J. Causal Inference 6(2), §4), populated by ``confidence_set()``. A small + # summary dict ``{family, parameter, gamma, lower, upper, contiguous, boundary, + # point_estimate, n_grid, n_placebos, status}``; None until ``confidence_set()`` runs. + # DELIBERATELY SEPARATE from the always-NaN analytical ``conf_int`` (the Wald interval + # classic SCM does not have): this is a PERMUTATION set at level ``1-gamma`` (with + # ``gamma`` granular in ``1/(J+1)``), and may be a set / unbounded / non-contiguous — + # mirrors how ``placebo_p_value`` is kept distinct from the (NaN) ``p_value``. + effect_confidence_set: Optional[Dict[str, Any]] = None def __post_init__(self) -> None: # Internal state set per instance by ``fit()`` / ``in_space_placebo()``. @@ -223,6 +233,13 @@ def __post_init__(self) -> None: # None (not run), "ran", "treated_fit_nonconverged", "too_few_donors", # "all_placebos_failed". A small string, so it survives pickling. self._placebo_status: Optional[str] = None + # Per-unit floored pre-period denominators (treated + each converged placebo), + # captured by in_space_placebo() so the sharp-null test inversion + # (test_sharp_null / confidence_set, Firpo & Possebom 2018) re-ranks against the + # SAME denominators the placebo run used (the test_sharp_null(0) == placebo_p_value + # anchor). Each value uses that unit's OWN pre-outcome scale; the pre window is + # f-free so the denominator is grid-invariant. Small dict → survives pickling. + self._placebo_pre_denoms: Optional[Dict[Any, float]] = None # --- ADH 2015 §4 robustness diagnostics (opt-in, populated by --- # --- leave_one_out() / in_time_placebo()). Same panel-vs-scalar split as --- @@ -256,6 +273,11 @@ def __post_init__(self) -> None: # periods, all predictors dropped, or a zero-mass surviving custom_v). Surfaced # alongside _in_time_n_failed so a mixed no-success run reports an accurate mix. self._in_time_n_infeasible: int = 0 + # Firpo & Possebom (2018) §4 test-inversion confidence set (opt-in, populated by + # confidence_set()). The grid table {param, p_value, in_set} is small / NOT + # panel-derived, so it survives pickling by default (NOT nulled by __getstate__); + # the public ``effect_confidence_set`` summary dataclass field likewise survives. + self._confidence_set_df: Optional[pd.DataFrame] = None def __getstate__(self) -> Dict[str, Any]: """Exclude panel-derived internal state from pickling. @@ -379,6 +401,45 @@ def summary(self, alpha: Optional[float] = None) -> str: "", ] ) + # Test-inversion confidence set (Firpo & Possebom 2018, §4), if computed. Like the + # placebo p-value this is permutation-based; the analytical conf_int stays n/a. + ecs = self.effect_confidence_set + if ecs is not None: + fam = ecs["family"] + param = ecs["parameter"] + conf_pct = 100.0 * (1.0 - ecs["gamma"]) + lines.append( + f"Confidence set by test inversion (Firpo-Possebom 2018; {fam} effect " + f"f(t), parameter {param}):" + ) + if ecs["status"] == "ran": + note = "" if ecs["contiguous"] else " (non-contiguous; [lower, upper] hull)" + lines.append( + f" {conf_pct:.1f}% set:".ljust(34) + + f"[{ecs['lower']:.4f}, {ecs['upper']:.4f}]{note}" + ) + elif ecs["status"] == "unbounded": + tail = ( + " and NON-CONTIGUOUS (hull shown; see get_confidence_set_df())" + if not ecs["contiguous"] + else "" + ) + lines.append( + " Unbounded (gamma below the 1/(J+1) granularity, or the treated " + f"unit is not the best pre-fit){tail}." + ) + else: # "empty" + lines.append( + " Empty: every effect in this family is rejected at " + f"gamma={ecs['gamma']:.3g}." + ) + lines.extend( + [ + "(Permutation-based; the analytical conf_int above stays n/a.)", + "-" * 75, + "", + ] + ) # Three states: (1) placebo never run -> point to in_space_placebo(); # (2) run with a valid reference set -> show the permutation p-value; # (3) run but infeasible (no placebo entered the rank, e.g. J<2 or all @@ -483,6 +544,17 @@ def to_dict(self) -> Dict[str, Any]: "n_placebos": self.n_placebos, "n_failed": self.n_failed, } + # Test-inversion confidence set (Firpo & Possebom 2018), flattened to scalars so + # to_dataframe() stays a single row of scalars; all None until confidence_set() + # runs. The analytical conf_int_lower/upper above stay NaN (no Wald interval). + ecs = self.effect_confidence_set + result["effect_ci_family"] = ecs["family"] if ecs else None + result["effect_ci_parameter"] = ecs["parameter"] if ecs else None + result["effect_ci_gamma"] = ecs["gamma"] if ecs else None + result["effect_ci_lower"] = ecs["lower"] if ecs else None + result["effect_ci_upper"] = ecs["upper"] if ecs else None + result["effect_ci_contiguous"] = ecs["contiguous"] if ecs else None + result["effect_ci_status"] = ecs["status"] if ecs else None if self.survey_metadata is not None: sm = self.survey_metadata result["weight_type"] = sm.weight_type @@ -646,9 +718,16 @@ def in_space_placebo( "older estimator version. Re-fit to enable in-space placebo " "inference." ) - from diff_diff.synthetic_control import _mspe, _placebo_fit_unit + from diff_diff.synthetic_control import _floored_pre_mspe, _mspe, _placebo_fit_unit snap = self._fit_snapshot + # A rebuilt placebo reference set invalidates any previously computed confidence set + # (test_sharp_null / confidence_set re-rank against THIS reference set), so drop the + # cached confidence-set outputs up front — a stale set must never be reported after an + # explicit in_space_placebo() re-run (e.g. with a different n_starts). The snapshot + # check above has already passed, so the reference IS about to be rebuilt on every exit. + self.effect_confidence_set = None + self._confidence_set_df = None donors = list(snap.donor_ids) n_donors = len(donors) if n_starts is None: @@ -697,6 +776,7 @@ def in_space_placebo( self.n_placebos = 0 self.n_failed = 0 self._placebo_gaps = {} + self._placebo_pre_denoms = {} self._placebo_status = "treated_fit_nonconverged" self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS) return self._placebo_df.copy() @@ -713,6 +793,7 @@ def in_space_placebo( self.n_placebos = 0 self.n_failed = 0 self._placebo_gaps = {} + self._placebo_pre_denoms = {} self._placebo_status = "too_few_donors" self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS) return self._placebo_df.copy() @@ -803,6 +884,20 @@ def in_space_placebo( stacklevel=2, ) + # Persist each unit's floored pre-period denominator (treated + every converged + # placebo) so the sharp-null test inversion (test_sharp_null / confidence_set, + # Firpo & Possebom 2018) re-ranks against the SAME denominators this run used — + # the test_sharp_null(0) == placebo_p_value anchor. The pre window is f-free so the + # denominator is grid-invariant; each unit's floor uses its OWN pre-outcome scale. + outcome_pivot = snap.pivots[snap.outcome] + pre_denoms: Dict[Any, float] = {} + for unit, gp in [(snap.treated_id, self.gap_path), *placebo_gaps.items()]: + pre_gaps_u = np.array([gp[p] for p in snap.pre_periods], dtype=float) + z1_u = outcome_pivot.loc[snap.pre_periods, unit].to_numpy(dtype=float) + scale_u = float(np.max(np.abs(z1_u))) if z1_u.size else 0.0 + pre_denoms[unit] = _floored_pre_mspe(pre_gaps_u, scale_u) + self._placebo_pre_denoms = pre_denoms + self.placebo_p_value = float(p_value) self.n_placebos = int(n_placebos) self.n_failed = int(n_failed) @@ -1401,3 +1496,328 @@ def get_in_time_placebo_gaps(self) -> pd.DataFrame: } ) return pd.DataFrame(rows, columns=["placebo_period", "period", "gap", "phase"]) + + # ===================================================================== + # Confidence sets by test inversion (Firpo & Possebom 2018, §4) + # ===================================================================== + + def _require_placebo_reference(self, n_starts: Optional[int]) -> None: + """Ensure an in-space placebo reference set is available for test inversion. + + Lazily runs :meth:`in_space_placebo` when no reference set has been built yet + (raising the same ValueError as that method if the fit snapshot is missing, e.g. + on an unpickled result). If a reference set already exists, a non-None ``n_starts`` + is **ignored with a UserWarning** — the test inversion reuses the single stored set + (every sharp null re-ranks the SAME gaps), so honouring ``n_starts`` would mean an + expensive O(J) re-fit that the caller did not ask for. Raises ValueError when no + valid reference set could be produced (fewer than 2 donors, a non-converged treated + fit, or all donor refits failed) — there is then no permutation distribution to + invert. + """ + if self._placebo_gaps is None: + # Builds the reference set; raises ValueError if the snapshot is unavailable. + self.in_space_placebo(n_starts=n_starts) + elif n_starts is not None: + warnings.warn( + "n_starts is ignored: the in-space placebo reference set was already " + "computed and is reused (every sharp null / grid value re-ranks the same " + "placebo gaps). Re-run in_space_placebo(n_starts=...) explicitly to rebuild " + "it with a different multistart count.", + UserWarning, + stacklevel=3, + ) + if not self._placebo_gaps or self._placebo_status != "ran": + reasons = { + "treated_fit_nonconverged": ( + "the treated unit's own SCM fit did not converge at fit time, so its " + "RMSPE ratio is not a valid optimum to rank against placebos" + ), + "too_few_donors": ( + "fewer than 2 donors are available (each placebo is fit against the " + "other donors)" + ), + "all_placebos_failed": ( + "every donor refit failed to converge, so no placebo entered the " + "reference set" + ), + } + default_reason = "no valid in-space placebo reference set was produced" + status = self._placebo_status + reason = reasons.get(status, default_reason) if status is not None else default_reason + raise ValueError( + "Confidence set / sharp-null test requires a valid in-space placebo " + f"reference set, but {reason}. (See the in_space_placebo() warning above.)" + ) + + @staticmethod + def _coerce_effect_path(effect: Any, n_post: int) -> np.ndarray: + """Coerce ``effect`` to a length-``n_post`` post-period effect path ``f(t)``. + + A scalar broadcasts to a constant path (Eq 11 with ``f(t) = c``); a 1-D array must + have one finite value per post period, aligned to the calendar-sorted + ``post_periods``. Fails closed on a wrong length or any non-finite value. + """ + arr = np.asarray(effect, dtype=float) + if arr.ndim == 0: + f = np.full(n_post, float(arr), dtype=float) + elif arr.ndim == 1: + if arr.shape[0] != n_post: + raise ValueError( + f"effect path has length {arr.shape[0]} but there are {n_post} " + "post-treatment periods; pass a scalar (a constant-in-time effect) or " + f"a length-{n_post} array aligned to post_periods (calendar order)." + ) + f = arr + else: + raise ValueError( + "effect must be a scalar (constant effect) or a 1-D array (one value per " + f"post period), got a {arr.ndim}-D array." + ) + if not np.all(np.isfinite(f)): + raise ValueError("effect contains non-finite (NaN/inf) values.") + return f + + def test_sharp_null( + self, + effect: Any, + *, + gamma: float = 0.1, + n_starts: Optional[int] = None, + ) -> pd.Series: + """Test a sharp null hypothesis on the treatment-effect path (Firpo & Possebom 2018, §4.1). + + Tests ``H_0^f: α_{1,t} = f(t)`` for every post period (Eq 11) by subtracting the + hypothesized effect path ``f(t)`` from the post-period gaps of EVERY unit and + re-ranking the treated unit's modified RMSPE ratio against the placebo distribution + (Eqs 12–13 at ``φ = 0``, ``v = (1,…,1)`` — the equal-weights benchmark). The + synthetic controls are NOT refit: this reuses the gap paths and per-unit + denominators :meth:`in_space_placebo` already computed (run lazily here if needed). + At ``effect = 0`` the p-value is identically the benchmark ``placebo_p_value`` + (Eq 5 = Eq 13 with ``f ≡ 0``). + + Parameters + ---------- + effect : float or array-like + The hypothesized post-period effect ``f(t)``: a scalar (a constant-in-time + effect, Eq 11), or a length-``n_post_periods`` array aligned to ``post_periods`` + in calendar order (an arbitrary path — e.g. an intervention cost path or a + theory-predicted shape). + gamma : float, default 0.1 + Test level; the null is rejected when ``p^f < gamma``. The permutation p-value + is granular in ``1/(J+1)`` (Firpo & Possebom fn 8), so not every nominal level + is attainable. + n_starts : int, optional + Multistart count for the lazy :meth:`in_space_placebo` run; ignored (with a + warning) if the reference set already exists. + + Returns + ------- + pandas.Series + ``p_value`` (``p^f``), ``reject`` (``p^f < gamma``), ``gamma``, + ``rmspe_f_treated`` (the treated unit's modified RMSPE ratio), ``n_placebos`` + (reference-set size), ``n_failed``. + + Raises + ------ + ValueError + If ``gamma`` is not in ``(0, 1)``, ``effect`` has the wrong shape / non-finite + values, or no valid placebo reference set is available (see + :meth:`in_space_placebo`). + """ + from diff_diff.synthetic_control import _sharp_null_pvalue + + if not (0.0 < float(gamma) < 1.0): + raise ValueError(f"gamma must be in (0, 1), got {gamma!r}") + self._require_placebo_reference(n_starts) + post_periods = list(self.post_periods) + f_post = self._coerce_effect_path(effect, len(post_periods)) + assert self._placebo_gaps is not None and self._placebo_pre_denoms is not None + p, r1, n_ref = _sharp_null_pvalue( + self.gap_path, + self._placebo_gaps, + post_periods, + f_post, + self._placebo_pre_denoms, + self.treated_unit, + ) + return pd.Series( + { + "p_value": float(p), + "reject": bool(p < float(gamma)), + "gamma": float(gamma), + "rmspe_f_treated": float(r1), + "n_placebos": int(n_ref), + "n_failed": int(self.n_failed), + } + ) + + def confidence_set( + self, + *, + family: str = "constant", + gamma: float = 0.1, + bounds: Optional[Tuple[float, float]] = None, + n_grid: int = 200, + n_starts: Optional[int] = None, + ) -> pd.DataFrame: + """Confidence set for the treatment-effect path by test inversion (Firpo & Possebom 2018, §4.2). + + Inverts the sharp-null test (:meth:`test_sharp_null`) over a one-parameter effect + family: the confidence set is every parameter value whose sharp null is **not + rejected**, ``{ param : p^param > gamma }`` (Eq 14, **strict** inequality). Two + families are supported: + + - ``family="constant"`` — ``f(t) = c`` (Eq 15); the set is a confidence **interval** + for a constant-in-time effect (Eq 16). The parameter column is ``c``. + - ``family="linear"`` — ``f(t) = c̃·(t − T0)`` with the 1-based post-period index + ``(t − T0)`` (Eq 17); the set is a confidence **set** over the slope ``c̃`` + (Eq 18). The parameter column is ``c_tilde``. + + The inversion is a pure re-ranking of the stored placebo gaps (no synthetic-control + refits): :meth:`in_space_placebo` is run lazily if needed, then each value only + recomputes ``p^param``. With ``bounds=None`` the set is recovered **exactly**: + ``p^param`` is piecewise-constant (each placebo's indicator flips only at the real + roots of a quadratic in ``param``), so the placebo breakpoints partition the line, + ``p`` is evaluated once per induced interval AND at each breakpoint (where a tie + under ``≥`` can lift ``p`` above ``gamma``), and the union of accepted + intervals/points is the set — with NO centering or monotonicity assumption (accepted + tails and disjoint components are handled). With explicit ``bounds`` a fixed + ``linspace(*bounds, n_grid)`` grid is scanned instead (grid-limited membership). + + **Boundary convention (paper-sourced, Eq 14):** membership is the *strict* inequality + ``p^param > gamma``. The permutation p-value is discrete (a multiple of ``1/(J+1)``), + so ``p = gamma`` is reachable and is **excluded** from the set. + + The result is stored on the object: the summary on + :attr:`effect_confidence_set` (``{family, parameter, gamma, lower, upper, + contiguous, boundary, point_estimate, n_grid, n_placebos, status}``, surviving + pickling) and the full grid on :meth:`get_confidence_set_df`. The analytical + ``conf_int`` / ``se`` stay NaN — this is a separate permutation object. + + Parameters + ---------- + family : {"constant", "linear"}, default "constant" + The one-parameter effect family to invert over. + gamma : float, default 0.1 + Confidence level is ``1 − gamma``; ``p^param > gamma`` defines membership. + bounds : (float, float), optional + Fixed ``(lo, hi)`` grid for the parameter. Default None uses exact breakpoint + inversion (a fixed grid is used only when ``bounds`` is supplied). + n_grid : int, default 200 + Number of grid points evaluated for the returned table (>= 2). + n_starts : int, optional + Multistart count for the lazy :meth:`in_space_placebo` run; ignored (with a + warning) if the reference set already exists. + + Returns + ------- + pandas.DataFrame + Columns ``param`` (``c`` for constant, ``c̃`` for linear), ``p_value`` + (``p^param``), ``in_set`` (``p^param > gamma``). Empty for an ``"empty"`` set; + an ``"unbounded"`` exact set with finite breakpoints still returns an inspection + grid over a padded breakpoint range (see :attr:`effect_confidence_set` + ``status``). + + Raises + ------ + ValueError + If ``family`` is unknown, ``gamma`` not in ``(0, 1)``, ``n_grid < 2``, ``bounds`` + malformed, or no valid placebo reference set is available. + """ + from diff_diff.synthetic_control import _invert_sharp_null + + if family not in ("constant", "linear"): + raise ValueError(f"family must be 'constant' or 'linear', got {family!r}") + if not (0.0 < float(gamma) < 1.0): + raise ValueError(f"gamma must be in (0, 1), got {gamma!r}") + if not isinstance(n_grid, (int, np.integer)) or n_grid < 2: + raise ValueError(f"n_grid must be an integer >= 2, got {n_grid!r}") + if bounds is not None: + # Guard the type/length BEFORE indexing so a malformed scalar raises the + # documented ValueError (not a bare TypeError from len()/subscription). + if ( + not isinstance(bounds, (tuple, list, np.ndarray)) + or len(bounds) != 2 + or not all(isinstance(b, (int, float, np.integer, np.floating)) for b in bounds) + or not all(np.isfinite(float(b)) for b in bounds) + ): + raise ValueError(f"bounds must be a finite (lo, hi) pair, got {bounds!r}") + if float(bounds[1]) <= float(bounds[0]): + raise ValueError(f"bounds must satisfy hi > lo, got {bounds!r}") + self._require_placebo_reference(n_starts) + assert self._placebo_gaps is not None and self._placebo_pre_denoms is not None + res = _invert_sharp_null( + self.gap_path, + self._placebo_gaps, + list(self.post_periods), + self._placebo_pre_denoms, + self.treated_unit, + family, + float(gamma), + bounds=(None if bounds is None else (float(bounds[0]), float(bounds[1]))), + n_grid=int(n_grid), + ) + status = res["status"] + if status == "unbounded": + extra = ( + " The accepted set is ALSO non-contiguous (e.g. two accepted tails with a " + "rejected middle, NOT the whole line), so [lower, upper] is only the hull — " + "inspect get_confidence_set_df() for the structure." + if not res["contiguous"] + else "" + ) + warnings.warn( + "Confidence set is unbounded: either gamma is below the permutation " + "granularity 1/(J+1) (so no effect is ever rejected — Firpo & Possebom " + "fn 8), or the treated unit does not have the best pre-treatment fit (so " + "the RMSPE ratio does not grow without bound on one side). Reported " + "endpoint(s) are +/-inf." + extra, + UserWarning, + stacklevel=2, + ) + elif status == "empty": + warnings.warn( + f"Confidence set is empty: every {family} effect in this family is " + f"rejected at gamma={gamma:.3g} (the largest attainable p-value does not " + "exceed gamma). Endpoints are NaN.", + UserWarning, + stacklevel=2, + ) + elif not res["contiguous"]: + warnings.warn( + "Confidence set is non-contiguous (the discrete permutation p-value dips " + "below gamma at an interior grid point); [lower, upper] is reported as the " + "hull. Inspect get_confidence_set_df() for the full grid.", + UserWarning, + stacklevel=2, + ) + self.effect_confidence_set = { + "family": family, + "parameter": "c" if family == "constant" else "c_tilde", + "gamma": float(gamma), + "lower": float(res["lower"]), + "upper": float(res["upper"]), + "contiguous": bool(res["contiguous"]), + "boundary": "strict", + "point_estimate": float(res["point_estimate"]), + "n_grid": int(n_grid), + "n_placebos": int(res["n_ref"]), + "status": status, + } + self._confidence_set_df = pd.DataFrame(res["grid"], columns=["param", "p_value", "in_set"]) + return self._confidence_set_df.copy() + + def get_confidence_set_df(self) -> pd.DataFrame: + """Get the test-inversion confidence-set grid table (see :meth:`confidence_set`). + + Columns: ``param`` (``c`` constant / ``c̃`` linear), ``p_value`` (``p^param``), + ``in_set`` (``p^param > gamma``). Survives pickling. Raises if + :meth:`confidence_set` has not been run. + + Returns + ------- + pandas.DataFrame + """ + if self._confidence_set_df is None: + raise ValueError("No confidence set yet; call confidence_set() first.") + return self._confidence_set_df.copy() diff --git a/docs/api/synthetic_control.rst b/docs/api/synthetic_control.rst index b50e840a..3ffd197d 100644 --- a/docs/api/synthetic_control.rst +++ b/docs/api/synthetic_control.rst @@ -36,6 +36,21 @@ earlier pre-date and checks for a spurious gap before the true treatment date (t backdating placebo; ``placebo_att`` should be ~0). Both re-run the validated solver and leave the analytical inference fields NaN. +**Confidence sets by test inversion (Firpo & Possebom 2018 §4, opt-in):** +:meth:`~diff_diff.SyntheticControlResults.test_sharp_null` tests a sharp null +``H_0: alpha_1t = f(t)`` (a scalar constant effect, or a post-period effect path) by +re-ranking the stored in-space placebo gaps — no refits, and ``test_sharp_null(0)`` is +identically ``placebo_p_value`` — and +:meth:`~diff_diff.SyntheticControlResults.confidence_set` (``family="constant"`` or +``"linear"``) inverts that test into a confidence set for the effect path: a +constant-effect interval (Eqs. 15–16) or a linear-slope set (Eqs. 17–18), with the +paper's strict ``p > gamma`` membership (Eq. 14), computed by exact piecewise-constant +breakpoint inversion (or a fixed grid when ``bounds=`` is supplied). The set is +summarized on ``effect_confidence_set`` and returned by +:meth:`~diff_diff.SyntheticControlResults.get_confidence_set_df`; the analytical +``conf_int`` stays NaN (this is a separate permutation set at level ``1 - gamma``, +possibly a set / unbounded / non-contiguous). + **Distinct from** :class:`~diff_diff.SyntheticDiD` (Arkhangelsky et al. 2021), which adds time weights and ridge regularization; classic SCM uses **donor weights only** plus the outer ``V`` search. @@ -88,6 +103,9 @@ Results container for synthetic control estimation. ~SyntheticControlResults.in_time_placebo ~SyntheticControlResults.get_in_time_placebo_df ~SyntheticControlResults.get_in_time_placebo_gaps + ~SyntheticControlResults.test_sharp_null + ~SyntheticControlResults.confidence_set + ~SyntheticControlResults.get_confidence_set_df ~SyntheticControlResults.summary ~SyntheticControlResults.print_summary ~SyntheticControlResults.to_dict diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index 5f0857bc..a24cb239 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -635,6 +635,8 @@ sources: - path: docs/methodology/REGISTRY.md section: "SyntheticControl" type: methodology + - path: docs/methodology/papers/firpo-possebom-2018-review.md + type: methodology - path: docs/api/synthetic_control.rst type: api_reference - path: diff_diff/guides/llms-full.txt diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 8da65b5a..4a537bbf 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -1996,6 +1996,10 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d - **Leave-one-out donor robustness** (`leave_one_out()`): drops each **reportably-weighted** donor and re-fits the treated unit against the reduced pool, returning a per-drop ATT / `delta_att` table (a `status="baseline"` row first, then one row per dropped donor sorted by `|delta_att|`). A large `delta_att` flags single-donor dependence (a single *dominant* donor is still dropped — the others absorb its mass — and its large `delta_att` is the intended signal, not a failure). The reporting stack's headline donor-sensitivity number is `max_abs_delta_att` = `max |delta_att|` over the drops (baseline-relative, so a uniform shift of every drop away from the full-fit ATT is not masked the way a raw ATT range would be). `get_leave_one_out_gaps()` returns the per-drop trajectories for the overlay plot. Fails closed on a non-converged treated fit or `< 2` donors. - **In-time (backdating) placebo** (`in_time_placebo()`): reassigns the intervention to an earlier pre-date `t_f`, re-fits using ONLY pre-`t_f` information (TRUNCATE convention — see Note), and reports the placebo "effect" over the held-out window `[t_f, T0)` — ~0 if there is no real pre-period effect (ADH 2015 Fig. 4, German reunification backdated to 1975). Sweeps every feasible interior pre-date by default (≥2 pre-fake + ≥1 post-fake); an explicit post-period / non-pre date raises, a valid-but-dimensionally-infeasible date yields a `status="infeasible"` row (no raise). +**Confidence sets by test inversion (Firpo & Possebom 2018, §4, opt-in):** A confidence set for the treatment-effect path, built ON TOP of the in-space placebo and surfaced under `estimator_native_diagnostics` (the analytical inference contract is unchanged — see the non-analytical Note below). Under a common-effect sharp null `H_0^f: α_1t = f(t)` (Eq 11) the donor synthetic controls and the pre-period MSPE denominators do not change — only the post-period gaps shift by `f(t)` — so the test is a pure **re-ranking** of the gap paths `in_space_placebo()` already computed (no synthetic-control refits): +- **`test_sharp_null(effect, gamma=0.1)`** forms the modified RMSPE ratio `RMSPE^f_j = sqrt(mean_post((α̂_jt − f(t))²) / pre_denom_j)` for every unit (Eq 12) and the permutation p-value `p^f = (1 + #{converged placebos j : RMSPE^f_j ≥ RMSPE^f_1}) / (n_ref + 1)` (Eq 13 at `φ=0`, `v=(1,…,1)` — the equal-weights benchmark). `effect` is a scalar (a constant-in-time effect) or a length-`n_post` array (an arbitrary post-period path `f(t)` — e.g. an intervention cost path or a theory-predicted shape). At `f≡0` this is identically the in-space `placebo_p_value` (Eq 5 = Eq 13), held bit-for-bit by reusing each unit's floored pre-period denominator persisted from the placebo run (the pre window is `f`-free, so the denominator is grid-invariant; the floor scale is **per-unit** `max|Z1_j|`). +- **`confidence_set(family="constant"|"linear", gamma=0.1, bounds=None, n_grid=200)`** inverts that test over a one-parameter family: `"constant"` → `f(t)=c` (Eq 15), a confidence **interval** for a constant effect (Eq 16); `"linear"` → `f(t)=c̃·(t−T0)` with the 1-based post-period index (Eq 17), a confidence **set** over the slope `c̃` (Eq 18). Membership is the paper's **strict** `p^c > γ` (Eq 14 — see the boundary Note). With `bounds=None` the set is recovered **EXACTLY**: `p^c` is a piecewise-constant step function (each placebo's indicator flips only at the real roots of `A_j(c)·D_1 = A_1(c)·D_j`, a quadratic in `c`), so the placebo breakpoints partition the line and `p` is evaluated once per induced open interval AND at each breakpoint — where a tie under `≥` can lift `p` above γ, so an isolated accepted singleton (a tangent / co-located root) is captured. The accepted set is the union of accepted intervals/points, with **no centering or monotonicity assumption** (a poor-pre-fit treated unit can have its accepted region in the tails, not around the point estimate). `bounds=(lo,hi)` instead scans a fixed grid (grid-limited membership). The summary is on `effect_confidence_set` (`status ∈ {"ran","empty","unbounded"}`) and the full grid on `get_confidence_set_df()`. **Fail-closed:** `γ < 1/(J+1)` ⇒ `p^c > γ` for every `c` ⇒ `"unbounded"` (`±inf` endpoints + warning — the discrete-granularity point, fn 8); a treated unit lacking the best pre-fit can give a one-sided unbounded edge; if no interval or breakpoint is accepted the set is `"empty"` (NaN endpoints); a non-contiguous accepted region (disjoint components / an isolated singleton) reports the `[lower, upper]` hull with `contiguous=False` + a warning; and `< 2` donors / a non-converged treated fit / an unpickled result (no placebo reference set) raise `ValueError`. **Scope:** sensitivity weights (`φ≠0`, Eqs 7–9), the general test-statistic menu `θ¹–θ⁵` (Eq 19), one-sided (§7's signed-`t` statistic), and the multiple-outcome / multiple-treated extensions (§6) are **deferred** (flagged in the paper review checklist). + **Notes / deviations:** - **Note:** The standardization divisor `divisor = sqrt(apply(cbind(X0,X1), 1, var))` (per-predictor SD over donors+treated, ddof=1) and the inner/outer optimizer are **not specified in ADH 2010** (which defers these numerics to Abadie & Gardeazabal 2003 App. B / the `Synth` software). The divisor is pinned from the R `Synth::synth` source; `solution.v` lives in this scaled predictor space, so the deterministic R-parity test feeds `custom_v` in the same scaled space. - **Note:** The outer objective minimizes the pre-period outcome MSPE over **all** pre periods, whereas R `Synth` uses a `time.optimize.ssr` window (1960–1969 in the Basque example). The nested `V` therefore differs from R by an efficiency-only choice (the paper notes inferential validity holds for *any* `V`), so end-to-end nested parity is a tolerance band, not equality. @@ -2013,6 +2017,10 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d - **Note (placebo failure handling):** a placebo is **excluded from both the numerator and the denominator** of the rank (never penalized into it) and tallied in `n_failed` when its fit is not a valid optimum — EITHER its **inner Frank-Wolfe weight solve** did not converge (a truncated `W` is unusable) OR its **outer `V` search** did not converge (an under-optimized `V` fits the pre-period worse, shrinking the RMSPE ratio and biasing the p-value anti-conservatively, so it must not silently enter the rank). The reported p-value uses the **effective** count `rank / (n_placebos + 1)`, where `n_placebos` is the number of placebos that entered the reference set. Failed donors still appear in `get_placebo_df()` (`status="failed"`, NaN metrics), so once a reference set is produced the table is the full treated + every-donor unit set (`n_donors + 1` rows). In the fail-closed cases the placebo loop does not run and only the treated row is returned: `J < 2` → `placebo_p_value` is NaN with a warning (no placebo distribution; `J == 2` warns the distribution is coarse), and a treated fit whose own **inner OR outer** search did not converge also fails closed (ranking a truncated / under-optimized treated statistic would not be a valid permutation). **Caveat:** each placebo refit inherits the original fit's `optimizer_options` / `n_starts`, so valid inference requires settings adequate for the outer `V` search to converge to a comparable-quality synthetic (production defaults do; cheap settings under-optimize placebo `V` and those placebos are dropped as failed — raise `n_starts` on `in_space_placebo()` or re-fit with a larger `optimizer_options['maxiter']`). - **Note (RMSPE-ratio floor):** the reported `rmspe_ratio = sqrt(MSPE_post / MSPE_pre)` floors the pre-period MSPE denominator at a scale-aware `1e-8 · max(|pre-outcomes|, 1)²` (before the square root) so a (near-)perfect pre-fit (`pre-MSPE → 0`) yields a large-but-FINITE ratio rather than `inf`/`nan` (which would corrupt the rank). Ties (`ratio_j ≥ treated_ratio`) are counted, making the p-value conservative. Mirrors the `_fit_tol` poor-fit guard. - **Note (placebo p-value is non-analytical):** `placebo_p_value` is deliberately a SEPARATE field from `p_value` (which stays NaN) — it is a permutation p-value with no SE / t-stat, so it does not flow through `safe_inference`. `is_significant` likewise stays bound to the (NaN) `p_value`, NOT `placebo_p_value`; a tool gating on `is_significant` will see `False` even when `placebo_p_value` is small. The reporting stack surfaces the placebo p-value through `estimator_native_diagnostics`, never the analytical headline. +- **Note (confidence set is non-analytical — `conf_int` stays NaN):** the Firpo & Possebom test-inversion `effect_confidence_set` is a permutation set at level `1−γ`, kept DELIBERATELY SEPARATE from the analytical `conf_int` (which stays `(NaN, NaN)` — classic SCM has no Wald interval). It is parametrized by `γ` (not the estimator's `alpha`, and granular in `1/(J+1)`), can be a *set* rather than an interval (the linear family) and can be unbounded or non-contiguous, so it cannot be coerced into the `(lo, hi)` `conf_int` tuple without breaking the `safe_inference` NaN-consistency contract. `se`/`t_stat`/`p_value`/`conf_int`/`is_significant` all stay at their NaN state; the set is surfaced only via `confidence_set()` / `get_confidence_set_df()` / `effect_confidence_set` (and `estimator_native_diagnostics`) — mirroring how `placebo_p_value` is kept off `p_value`. +- **Note (test-inversion boundary convention — strict `p^f > γ`):** Firpo & Possebom's inequalities are non-uniform at `p = γ` — the RMSPE-based tests reject at `p < γ` (Eqs 5/9/13), the general-statistic test rejects at `p ≤ γ` (Eq 19), and the confidence set is the **strict** `p^f > γ` (Eq 14), so Eq 14's set is NOT the exact complement of the Eq 13 rejection region (they differ at `p^f = γ`). Because the permutation p-value is discrete (a multiple of `1/(J+1)`), `p = γ` is reachable, so this implementation pins Eq 14's strict `p^f > γ` for set membership (a `p = γ` value is **excluded**) and documents it (matches the `firpo-possebom-2018-review.md` §4.2 boundary note). +- **Note (test-inversion set construction is an implementation choice):** Firpo & Possebom §4.2 gives the set definitions (Eqs 14/16/18) but does NOT prescribe how to enumerate the set. The default is **exact piecewise-constant breakpoint inversion** — `p^c` is constant between the real roots of the placebo-vs-treated comparison quadratics, so evaluating `p` per induced interval and at each breakpoint recovers the set exactly (no grid resolution / shape assumption); the optional fixed `bounds=` grid is the grid-limited alternative. Either is OUR choice (the paper leaves it unspecified) — a documented deviation. (`n_grid` controls only the returned inspection table, not membership, when `bounds=None`.) +- **Note (test-inversion validation — no R anchor):** R `Synth` has no test-inversion confidence-set function, and the authors' Code Ocean replication capsule was not consulted (paper-sourced only). The confidence sets are validated by (a) **self-consistency** — `test_sharp_null(0)` equals the in-space `placebo_p_value` exactly (transitively R-anchored via the Basque placebo parity), (b) a **numpy oracle** re-implementing Eqs 12–14 on hand-built gap paths (incl. the strict `p = γ` boundary and the per-unit floor), and (c) a **coverage simulation** (a constant-effect DGP; the `1−γ` set covers the truth at ≈ `1−γ`). - **Note (in-time placebo windowing — TRUNCATE):** ADH 2015 §4 says to re-estimate the in-time placebo "with the same predictors lagged accordingly." Because `diff_diff`'s predictor specs reference **absolute** periods, the in-time placebo re-cuts them by TRUNCATION: pre-period-outcome predictors become the pre-`t_f` outcomes, and covariate / special-predictor windows are intersected with the pre-`t_f` window; a window lying ENTIRELY in the held-out region `[t_f, T0)` is **dropped** (surfaced in the `n_dropped_specs` column + an aggregated warning), and `custom_v` is subset in lockstep with the surviving specs. For an outcome-predictor fit (the R-anchorable case) TRUNCATE is identical to ADH's "lag" — both equal a manual `Synth::synth` re-run with `time.optimize.ssr` cut at `t_f`. The held-out window never enters the fit (the placebo's `all_periods` is the pre-fake + post-fake span; the true post-treatment periods are excluded entirely), so there is no "peeking." This concrete convention is NOT spelled out in ADH 2015 (which gives only the qualitative "lag accordingly"). - **Note (in-time placebo requires ≥2 pre-fake periods):** the in-time placebo treats a date with fewer than 2 pre-fake periods as `status="infeasible"` (the default sweep starts at the 3rd pre-period). This is DELIBERATELY stricter than the base estimator's `T0 ≥ 1` allowance (which permits a single-pre-period fit but warns that nested-`V` selection is unreliable): an auto-swept placebo date with a single pre-fake period is a trivially-matchable, non-credible pre-fit, so it is dropped rather than surfaced as a `ran` placebo (mirrors `SyntheticDiD.in_time_placebo`'s `i ≥ 2` rule). A date whose surviving `custom_v` has zero mass after truncation is likewise infeasible (not a convergence failure). - **Note (leave-one-out weight floor):** ADH 2015 §4 leave-one-out omits "each donor that received positive weight." This implementation drops each donor with **reportable** weight — above the `1e-6` interpretability floor (`synthetic_control._MIN_REPORT_WEIGHT`), i.e. exactly the donors in `donor_weights` — rather than every strictly-positive weight. A donor with `0 < w ≤ 1e-6` is numerical dust whose removal moves the ATT by ~its weight (its `delta_att` would be ~0, an uninformative row), and the floor keeps the LOO table aligned with the reported donor support. The drop-set is **frozen at fit time** on the fit snapshot (`weighted_donor_ids`), so `leave_one_out()` is immune to post-fit mutation of the presentation-level `donor_weights` dict. @@ -2031,6 +2039,7 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d - [x] No analytical SE (NaN inference); in-space placebo permutation inference (`in_space_placebo()`, `rank/(n_placebos+1)`) with the real treated unit excluded from every placebo pool, effective-count denominator, and a scale-aware RMSPE-ratio floor. - [x] Leave-one-out donor robustness (`leave_one_out()`, ADH 2015 §4): per-drop ATT / `delta_att` table + overlay gaps; fail-closed. - [x] In-time (backdating) placebo (`in_time_placebo()`, ADH 2015 §4): TRUNCATE windowing (drop held-out-window predictors + lockstep `custom_v` subset), feasible-date sweep, fail-closed. +- [x] Confidence sets by test inversion (`test_sharp_null()` + `confidence_set()`, Firpo & Possebom 2018 §4): sharp-null `RMSPE^f` re-ranking of the in-space placebo gaps (Eqs 12–13) + constant/linear one-parameter sets (Eqs 14/16/18) with the strict `p^f > γ` boundary, EXACT piecewise-constant breakpoint inversion (no shape assumption; isolated/disjoint/unbounded sets handled), and fail-closed unbounded/empty/non-contiguous handling. *Deferred:* sensitivity weights (φ≠0), the general-θ menu (Eq 19), one-sided (§7), multiple-outcome/treated (§6). - [ ] *Deferred (ADH 2015):* regression-weight `W^reg` extrapolation diagnostic, sparse-SC subset search (see `TODO.md`). - [x] Predictor-leakage, absorbing-suffix/no-anticipation, empty-window, duplicate-label, and inner-non-convergence validation gates. diff --git a/docs/methodology/papers/firpo-possebom-2018-review.md b/docs/methodology/papers/firpo-possebom-2018-review.md index f2fae1f0..cd34181c 100644 --- a/docs/methodology/papers/firpo-possebom-2018-review.md +++ b/docs/methodology/papers/firpo-possebom-2018-review.md @@ -5,13 +5,13 @@ **PDF reviewed:** https://doi.org/10.1515/jci-2016-0026 (published *Journal of Causal Inference* version, open access; received 15 Nov 2016, revised 6 Aug 2018, accepted 11 Aug 2018, 26 pp). Per the project's PDFs-never-committed convention the local PDF is kept outside the repository; the published J. Causal Inference version (DOI 10.1515/jci-2016-0026) is the authoritative source. All equation, section, and footnote numbers below are pinned to that version. **Review date:** 2026-06-01 -> Scope note: this paper extends the **permutation / placebo inference** procedure of Abadie, Diamond & Hainmueller (the SCM benchmark) in two ways — (1) a **sensitivity analysis** that parametrically re-weights the placebo p-value away from the equal-weights benchmark, and (2) testing **any sharp null hypothesis** (not only "no effect whatsoever") via a modified RMSPE statistic, which it **inverts to construct confidence sets** for the treatment-effect path. It also generalizes to arbitrary test statistics, multiple outcomes (familywise error control), and multiple treated units (a pooled effect). This review is the **Step-1 fidelity artifact** for a forthcoming SCM **confidence-set / CI-by-test-inversion** implementation (PR-B) layered on the existing `SyntheticControl` estimator; the sensitivity-analysis and multiple-outcome / multiple-treated extensions are documented here but flagged **deferred**. The estimator itself (donor weights `W`, predictor importance `V`) is taken as given from ADH 2010/2015 — already implemented as `SyntheticControl` — and is recapped only as the paper frames it. Nothing here is sourced from outside this paper. +> Scope note: this paper extends the **permutation / placebo inference** procedure of Abadie, Diamond & Hainmueller (the SCM benchmark) in two ways — (1) a **sensitivity analysis** that parametrically re-weights the placebo p-value away from the equal-weights benchmark, and (2) testing **any sharp null hypothesis** (not only "no effect whatsoever") via a modified RMSPE statistic, which it **inverts to construct confidence sets** for the treatment-effect path. It also generalizes to arbitrary test statistics, multiple outcomes (familywise error control), and multiple treated units (a pooled effect). This review is the **Step-1 fidelity artifact** for the SCM **confidence-set / CI-by-test-inversion** implementation (PR-B, **shipped** — `SyntheticControlResults.test_sharp_null()` / `confidence_set()`) layered on the existing `SyntheticControl` estimator; the sensitivity-analysis and multiple-outcome / multiple-treated extensions are documented here but flagged **deferred**. The estimator itself (donor weights `W`, predictor importance `V`) is taken as given from ADH 2010/2015 — already implemented as `SyntheticControl` — and is recapped only as the paper frames it. Nothing here is sourced from outside this paper. --- ## Methodology Registry Entry -*Formatted to match docs/methodology/REGISTRY.md. This documents an **inference procedure on the existing `SyntheticControl` estimator**, not a new estimator — the `## SyntheticControl` heading mirrors `abadie-2021-review.md`. The REGISTRY implementation contract (`docs/methodology/REGISTRY.md` §SyntheticControl) is unchanged by this docs-only PR-A; PR-B will add the confidence-set methodology subsection and flip the relevant checklist items.* +*Formatted to match docs/methodology/REGISTRY.md. This documents an **inference procedure on the existing `SyntheticControl` estimator**, not a new estimator — the `## SyntheticControl` heading mirrors `abadie-2021-review.md`. The REGISTRY implementation contract (`docs/methodology/REGISTRY.md` §SyntheticControl) was unchanged by the docs-only PR-A; PR-B (shipped) added the confidence-set methodology subsection there and flipped the relevant checklist items below.* ## SyntheticControl @@ -132,10 +132,10 @@ A re-analysis of ETA terrorism on Basque Country GDP per capita (Abadie & Gardea - Built on the authors' `Synth` package (R / MATLAB / Stata) for the underlying SCM fit. **Requirements checklist** (features this paper adds beyond ADH 2010/2015; **PR-B** = the planned next implementation target, **deferred** = later): -- [ ] (PR-B) Sharp-null `RMSPE^f` test (Eqs. 12–13) reusing the in-space placebo permutation — subtract the hypothesized `f(t)` from the post-period gaps and re-rank. -- [ ] (PR-B) Confidence **interval** for a constant-in-time effect (Eqs. 15–16) by test inversion over a `c`-grid. -- [ ] (PR-B) Confidence **set** for a linear-in-time effect (Eqs. 17–18) by test inversion over a `c̃`-grid. -- [ ] (PR-B) Benchmark `(φ = 0, v = (1,…,1))` p-value (reuse `in_space_placebo`'s RMSPE-ratio) + a one-sided variant (Section 7). +- [x] (PR-B) Sharp-null `RMSPE^f` test (Eqs. 12–13) reusing the in-space placebo permutation — subtract the hypothesized `f(t)` from the post-period gaps and re-rank. **Shipped:** `SyntheticControlResults.test_sharp_null(effect, gamma=...)`. +- [x] (PR-B) Confidence **interval** for a constant-in-time effect (Eqs. 15–16) by test inversion over a `c`-grid. **Shipped:** `confidence_set(family="constant")`. +- [x] (PR-B) Confidence **set** for a linear-in-time effect (Eqs. 17–18) by test inversion over a `c̃`-grid. **Shipped:** `confidence_set(family="linear")`. +- [x] (PR-B) Benchmark `(φ = 0, v = (1,…,1))` p-value (reuse `in_space_placebo`'s RMSPE-ratio): shipped — `test_sharp_null(0)` is identically `placebo_p_value`. **One-sided variant (Section 7): still `[ ]` deferred** — §7 uses the signed-`t` statistic `θ³` from the deferred general-`θ` menu (Eq. 19), so it ships with that menu, not here. - [ ] (deferred) Sensitivity-analysis parametric weights `π_(j)(φ, v)` (Eqs. 7–9) + worst/best-case `φ̲`/`φ̄` robustness curve (Section 3). - [ ] (deferred) General test-statistic menu `θ¹`–`θ⁵` (Eq. 19, Section 5). - [ ] (deferred) Multiple-outcome FWER control (Eqs. 23–24) and multiple-treated-unit pooled confidence sets (Eqs. 25–26, Section 6). diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py index a0a78fcd..8501c426 100644 --- a/tests/test_diagnostic_report.py +++ b/tests/test_diagnostic_report.py @@ -2111,6 +2111,23 @@ def test_scm_native_surfaces_in_time_placebo_after_optin_run(self, scm_fit): block = native["in_time_placebo"] assert block["status"] == "ran" and block["n_dates"] >= 1 + def test_scm_native_confidence_set_not_run_stub(self, scm_fit): + # Firpo-Possebom test-inversion confidence set is opt-in, like the placebos. + res, _ = scm_fit + native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"] + assert native["confidence_set"]["status"] == "not_run" + + def test_scm_native_surfaces_confidence_set_after_optin_run(self, scm_fit): + res, _ = scm_fit + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res.confidence_set(family="constant", gamma=0.34) # J=4 -> gamma > 1/(J+1) + native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"] + block = native["confidence_set"] + assert block["status"] in ("ran", "empty", "unbounded") + assert block["family"] == "constant" and block["parameter"] == "c" + assert block["gamma"] == pytest.approx(0.34) + def test_scm_does_not_call_honest_did(self, scm_fit): """HonestDiD sensitivity should NOT run on SCM (fit-based / native path).""" res, _ = scm_fit diff --git a/tests/test_methodology_synthetic_control.py b/tests/test_methodology_synthetic_control.py index 02eeb494..00c2a9ff 100644 --- a/tests/test_methodology_synthetic_control.py +++ b/tests/test_methodology_synthetic_control.py @@ -38,6 +38,15 @@ SyntheticControlResults, synthetic_control, ) +from diff_diff.synthetic_control import ( + _constant_f_post, + _floored_pre_mspe, + _invert_sharp_null, + _linear_f_post, + _rmspe_f_ratio, + _rmspe_ratio, + _sharp_null_pvalue, +) from tests.conftest import assert_nan_inference DATA_DIR = Path(__file__).parent / "data" @@ -3013,3 +3022,519 @@ def spy(*a, **k): inner_min_decrease=1e-3, ) assert loo_att == pytest.approx(fresh.att, abs=1e-7) + + +# =========================================================================== +# Confidence sets by test inversion (Firpo & Possebom 2018, Section 4) +# =========================================================================== +# +# Two opt-in SyntheticControlResults methods built ON TOP of the in-space placebo: +# test_sharp_null(effect) tests H_0^f: alpha_1t = f(t) by re-ranking the stored +# placebo gaps (Eqs 12-13, phi=0, v=(1,...,1)); confidence_set(family=...) inverts +# that test over a one-parameter family (Eqs 14/16/18, strict p^f > gamma). No SCM +# refits. The benchmark f==0 case is identically the existing placebo_p_value +# (Eq 5 = Eq 13 at f==0). No R anchor (Synth has no test inversion): validated by +# self-consistency to the placebo p-value, a numpy oracle, and a coverage MC. + + +def _gp(pre_vals, post_vals, pre_periods, post_periods): + """Build a {period: gap} path from pre/post value lists.""" + d = {p: float(v) for p, v in zip(pre_periods, pre_vals)} + d.update({p: float(v) for p, v in zip(post_periods, post_vals)}) + return d + + +# Hand-built scenario for the helper-level oracle tests: the treated unit has the +# best pre-fit and a constant +2 post effect; 4 placebos have worse pre-fit + scattered +# post gaps (so for large |c| the treated becomes the most-deviant unit -> bounded set). +_ORACLE_PRE = [0, 1, 2] +_ORACLE_POST = [3, 4, 5] +_ORACLE_TREATED = _gp([0.08, -0.06, 0.05], [2.0, 2.0, 2.0], _ORACLE_PRE, _ORACLE_POST) +_ORACLE_PLACEBOS = { + "a": _gp([0.3, -0.3, 0.3], [0.6, -0.4, 0.2], _ORACLE_PRE, _ORACLE_POST), + "b": _gp([-0.3, 0.3, 0.2], [-0.5, 0.4, 0.3], _ORACLE_PRE, _ORACLE_POST), + "c": _gp([0.2, 0.3, -0.3], [0.3, 0.5, -0.4], _ORACLE_PRE, _ORACLE_POST), + "d": _gp([-0.2, 0.3, 0.3], [0.4, -0.3, 0.5], _ORACLE_PRE, _ORACLE_POST), +} + + +def _oracle_pre_denoms(scale=1.0): + units = {0: _ORACLE_TREATED, **_ORACLE_PLACEBOS} + return { + u: _floored_pre_mspe(np.array([gp[p] for p in _ORACLE_PRE]), scale) + for u, gp in units.items() + } + + +def test_sharp_null_pvalue_matches_independent_oracle(): + pre_denoms = _oracle_pre_denoms() + for c in (0.0, 1.0, 2.0, 3.5, -1.0): + f = _constant_f_post(c, len(_ORACLE_POST)) + p, r1, n_ref = _sharp_null_pvalue( + _ORACLE_TREATED, _ORACLE_PLACEBOS, _ORACLE_POST, f, pre_denoms, 0 + ) + # Independent re-implementation of Eqs 12-13. + resid_t = np.array([_ORACLE_TREATED[q] for q in _ORACLE_POST]) - f + r1_o = float(np.sqrt(np.mean(resid_t**2) / pre_denoms[0])) + rj = [] + for u, gp in _ORACLE_PLACEBOS.items(): + resid = np.array([gp[q] for q in _ORACLE_POST]) - f + rj.append(float(np.sqrt(np.mean(resid**2) / pre_denoms[u]))) + p_o = (1 + sum(1 for r in rj if r >= r1_o)) / (len(rj) + 1) + assert n_ref == 4 + assert r1 == pytest.approx(r1_o) + assert p == pytest.approx(p_o), (c, p, p_o) + + +def test_sharp_null_zero_equals_rmspe_ratio_helper(): + # Eq 13 at f==0 reduces to the ADH RMSPE-ratio statistic (Eq 4). + scale = 1.0 + pre_denoms = _oracle_pre_denoms(scale) + f0 = _constant_f_post(0.0, len(_ORACLE_POST)) + _, r1, _ = _sharp_null_pvalue( + _ORACLE_TREATED, _ORACLE_PLACEBOS, _ORACLE_POST, f0, pre_denoms, 0 + ) + pre = np.array([_ORACLE_TREATED[p] for p in _ORACLE_PRE]) + post = np.array([_ORACLE_TREATED[p] for p in _ORACLE_POST]) + assert r1 == pytest.approx(_rmspe_ratio(pre, post, scale)) + + +def test_floored_pre_denominator_is_per_unit_not_global(): + # M1: the RMSPE floor scale is PER-UNIT max|Z1|. For a near-perfect pre-fit (the + # floor bites), a wrong GLOBAL scale would change the denominator and break the + # f==0 == placebo_p_value anchor. Assert the per-unit denom == the _rmspe_ratio + # denom and differs from a global-scale denom. + pre = np.array([1e-9, -1e-9, 5e-10]) # near-perfect pre-fit -> the floor dominates + post = np.array([2.0, 2.0, 2.0]) + scale_unit = 10.0 + denom_unit = _floored_pre_mspe(pre, scale_unit) # 1e-8 * 10**2 = 1e-6 + denom_global = _floored_pre_mspe(pre, 1.0) # 1e-8 + assert denom_unit == pytest.approx(1e-6) + assert not np.isclose(denom_unit, denom_global) + gp = _gp(list(pre), list(post), _ORACLE_PRE, _ORACLE_POST) + f0 = _constant_f_post(0.0, 3) + assert _rmspe_f_ratio(gp, _ORACLE_POST, f0, denom_unit) == pytest.approx( + _rmspe_ratio(pre, post, scale_unit) + ) + # The wrong global denom yields a materially different statistic. + assert not np.isclose( + _rmspe_f_ratio(gp, _ORACLE_POST, f0, denom_unit), + _rmspe_f_ratio(gp, _ORACLE_POST, f0, denom_global), + ) + + +def test_invert_constant_set_brackets_true_effect(): + pre_denoms = _oracle_pre_denoms() + res = _invert_sharp_null( + _ORACLE_TREATED, _ORACLE_PLACEBOS, _ORACLE_POST, pre_denoms, 0, "constant", 0.25, n_grid=120 + ) + assert res["status"] == "ran" + assert res["point_estimate"] == pytest.approx(2.0) # center = att = mean post gap + assert res["lower"] <= 2.0 <= res["upper"] + assert res["contiguous"] + + +def test_invert_strict_boundary_excludes_p_equals_gamma(): + # Eq 14 membership is STRICT (p^f > gamma). Use fixed wide bounds so the grid spans + # rejected points too; with 4 placebos gamma=0.4 (=2/5) is attainable, so some grid + # points have p == gamma and MUST be excluded. + pre_denoms = _oracle_pre_denoms() + res = _invert_sharp_null( + _ORACLE_TREATED, + _ORACLE_PLACEBOS, + _ORACLE_POST, + pre_denoms, + 0, + "constant", + 0.4, + bounds=(-6.0, 10.0), + n_grid=401, + ) + grid = res["grid"] + assert all(in_set == (p > 0.4) for _, p, in_set in grid) # strict operator + at_gamma = [in_set for _, p, in_set in grid if np.isclose(p, 0.4)] + assert at_gamma, "expected the grid to include a p == gamma point" + assert not any(at_gamma) # every p == gamma point excluded (strict) + + +def test_invert_unbounded_when_gamma_below_granularity(): + # p^f >= 1/(J+1) always (the treated ranks itself); gamma below that -> nothing is + # rejected -> the set is all of R (Firpo & Possebom fn 8). + pre_denoms = _oracle_pre_denoms() + res = _invert_sharp_null( + _ORACLE_TREATED, _ORACLE_PLACEBOS, _ORACLE_POST, pre_denoms, 0, "constant", 0.1, n_grid=10 + ) + assert res["status"] == "unbounded" + assert res["lower"] == -np.inf and res["upper"] == np.inf + + +def test_invert_empty_set_when_family_cannot_fit(): + # A constant treated effect cannot be matched by the LINEAR family; with a near- + # perfect pre-fit (tiny denom) the treated stays the most-deviant unit at every + # slope, so p == 1/(J+1) everywhere -> rejected at gamma > 1/(J+1) -> empty. + treated = _gp([1e-9, -1e-9, 1e-9], [2.0, 2.0, 2.0], _ORACLE_PRE, _ORACLE_POST) + units = {0: treated, **_ORACLE_PLACEBOS} + pre_denoms = { + u: _floored_pre_mspe(np.array([gp[p] for p in _ORACLE_PRE]), 1.0) for u, gp in units.items() + } + res = _invert_sharp_null( + treated, _ORACLE_PLACEBOS, _ORACLE_POST, pre_denoms, 0, "linear", 0.25, n_grid=60 + ) + assert res["status"] == "empty" + assert np.isnan(res["lower"]) and np.isnan(res["upper"]) + + +def test_invert_accepts_tails_when_center_is_rejected(): + # Regression for the centered-bracket bug (reviewer M1): when the treated unit has a + # WORSE pre-fit than the placebos, its accepted region is in the TAILS, not around the + # point estimate, and the central band is REJECTED. The exact breakpoint inversion must + # return the unbounded, non-contiguous (two-tail) set — NOT "empty" (the old bug). + # Treated post gaps [0, 2] with pre-MSPE 100 (poor fit); 4 placebos post [1, 1] pre-MSPE 1. + pre, post = [0, 1], [2, 3] + treated = {0: 10.0, 1: -10.0, 2: 0.0, 3: 2.0} # pre-MSPE=100 -> D=100; post gaps [0, 2] + placebos = { + f"p{k}": {0: 1.0, 1: -1.0, 2: 1.0, 3: 1.0} for k in range(4) + } # pre-MSPE=1; post [1,1] + pden = { + u: _floored_pre_mspe(np.array([gp[p] for p in pre]), 1.0) + for u, gp in {0: treated, **placebos}.items() + } + assert pden[0] == pytest.approx(100.0) and pden["p0"] == pytest.approx(1.0) + res = _invert_sharp_null(treated, placebos, post, pden, 0, "constant", 0.25, n_grid=50) + assert res["status"] == "unbounded" # was wrongly "empty" under the centered bracket + assert res["lower"] == -np.inf and res["upper"] == np.inf + assert res["contiguous"] is False # central rejected band -> two disjoint accepted tails + # The point estimate (att = 1) is itself rejected; far-out values are accepted. + p_center, _, _ = _sharp_null_pvalue(treated, placebos, post, _constant_f_post(1.0, 2), pden, 0) + p_tail, _, _ = _sharp_null_pvalue(treated, placebos, post, _constant_f_post(50.0, 2), pden, 0) + assert p_center <= 0.25 < p_tail + # Under a user-bounded grid the same scenario is grid-limited "ran" but still flagged + # non-contiguous (accepted at both edges, rejected through the middle). + res_b = _invert_sharp_null( + treated, placebos, post, pden, 0, "constant", 0.25, bounds=(-50.0, 50.0), n_grid=201 + ) + assert res_b["status"] == "ran" and res_b["contiguous"] is False + + +def test_invert_includes_accepted_breakpoint_singleton(): + # Reviewer round-2 (M1/DT1): strict p>gamma membership + tie-counting >= means a placebo + # that EXACTLY ties the treated at a (tangent) breakpoint can push p above gamma THERE + # while both neighbouring open intervals are rejected -> an isolated accepted singleton + # the exact inversion must include (NOT report "empty"). Construct a placebo whose RMSPE + # ratio touches the treated's only at c=0; 3 others stay strictly below. + pre, post = [0, 1], [2, 3] + treated = {0: 1.0, 1: -1.0, 2: 1.0, 3: -1.0} # post [1,-1], pre-MSPE 1 -> D=1 + placebos = {"tie": {0: 2.0, 1: -2.0, 2: 2.0, 3: -2.0}} # post [2,-2], pre-MSPE 4: ties at c=0 + for k in range(3): + placebos[f"lo{k}"] = {0: 2.0, 1: -2.0, 2: 0.5, 3: -0.5} # always strictly below treated + pden = { + u: _floored_pre_mspe(np.array([gp[p] for p in pre]), 1.0) + for u, gp in {0: treated, **placebos}.items() + } + # At c=0 the tie places p at 2/5; just off c=0 only the treated ranks (p=1/5). + p_at0, _, _ = _sharp_null_pvalue(treated, placebos, post, _constant_f_post(0.0, 2), pden, 0) + p_near, _, _ = _sharp_null_pvalue(treated, placebos, post, _constant_f_post(0.5, 2), pden, 0) + assert p_at0 == pytest.approx(0.4) and p_near == pytest.approx(0.2) + res = _invert_sharp_null(treated, placebos, post, pden, 0, "constant", 0.25, n_grid=20) + assert res["status"] == "ran" # NOT "empty" -- the singleton is included + assert res["lower"] == pytest.approx(0.0) and res["upper"] == pytest.approx(0.0) + # the returned inspection grid reflects the non-empty singleton (one accepted row, not []) + assert len(res["grid"]) == 1 + g_param, g_p, g_in = res["grid"][0] + assert g_param == pytest.approx(0.0) and g_p > 0.25 and g_in is True + + +def test_linear_f_post_is_one_based_and_czero_equals_constant_zero(): + assert np.allclose(_linear_f_post(1.0, 4), [1.0, 2.0, 3.0, 4.0]) # (t - T0), 1-based + pre_denoms = _oracle_pre_denoms() + p_lin0, _, _ = _sharp_null_pvalue( + _ORACLE_TREATED, _ORACLE_PLACEBOS, _ORACLE_POST, _linear_f_post(0.0, 3), pre_denoms, 0 + ) + p_con0, _, _ = _sharp_null_pvalue( + _ORACLE_TREATED, _ORACLE_PLACEBOS, _ORACLE_POST, _constant_f_post(0.0, 3), pre_denoms, 0 + ) + assert p_lin0 == pytest.approx(p_con0) + + +def test_invert_monotone_in_gamma(): + # A larger gamma rejects more -> a narrower (or equal) confidence set. + pre_denoms = _oracle_pre_denoms() + + def width(g): + r = _invert_sharp_null( + _ORACLE_TREATED, + _ORACLE_PLACEBOS, + _ORACLE_POST, + pre_denoms, + 0, + "constant", + g, + n_grid=120, + ) + return (r["upper"] - r["lower"]) if r["status"] == "ran" else None + + w_lo, w_hi = width(0.25), width(0.45) + assert w_lo is not None and w_hi is not None + assert w_lo >= w_hi - 1e-9 + + +# --- end-to-end (real fit): custom V skips the outer search for speed/determinism --- + + +def _fit_with_placebos(n_donors=6, T=10, T0=6, effect=3.0, seed=0, run_placebo=True): + df, years, t0 = _make_panel(n_donors=n_donors, T=T, T0=T0, effect=effect, seed=seed) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = SyntheticControl(v_method="custom", custom_v=np.ones(T0), seed=seed).fit( + df, "y", "treated", "unit", "year", post_periods=years[T0:], treated_unit="treated" + ) + if run_placebo: + res.in_space_placebo() + return res, years, t0 + + +def _exact_combo_fit(effect=3.0, T=10, T0=6, n_donors=5): + """Deterministic panel where the treated is an EXACT convex combo of two donors. + + The donors carry distinct sinusoidal idiosyncrasies, so no single donor is a convex + combination of the others (placebos fit poorly -> larger pre-denominators), while the + treated reproduces 0.5*d0 + 0.5*d1 (near-perfect pre-fit -> the smallest denominator). + The treated is therefore uniquely the most-deviant unit in the tails, so the constant + confidence set is BOUNDED ("ran") -- the end-to-end analogue of the helper oracle. + """ + years = list(range(2000, 2000 + T)) + t = np.arange(T, dtype=float) + donors = { + j: 10.0 + 2.0 * j + (0.3 + 0.1 * j) * t + (0.5 + 0.3 * j) * np.sin(t) + for j in range(n_donors) + } + treated = (0.5 * donors[0] + 0.5 * donors[1]).copy() + treated[T0:] += effect + rows = [] + for j in range(n_donors): + for i in range(T): + rows.append({"unit": f"d{j}", "year": years[i], "y": float(donors[j][i]), "treated": 0}) + for i in range(T): + rows.append( + {"unit": "treated", "year": years[i], "y": float(treated[i]), "treated": int(i >= T0)} + ) + df = pd.DataFrame(rows) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = SyntheticControl(v_method="custom", custom_v=np.ones(T0), seed=0).fit( + df, "y", "treated", "unit", "year", post_periods=years[T0:], treated_unit="treated" + ) + res.in_space_placebo() + return res + + +def test_test_sharp_null_zero_equals_placebo_p_value_end_to_end(): + res, _, _ = _fit_with_placebos() + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + s0 = res.test_sharp_null(0.0) + assert s0["p_value"] == pytest.approx(res.placebo_p_value) + assert s0["rmspe_f_treated"] == pytest.approx(res.rmspe_ratio) + assert s0["n_placebos"] == res.n_placebos + + +def test_confidence_set_constant_contains_att_and_excludes_zero(): + res = _exact_combo_fit(effect=3.0) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + grid = res.confidence_set(family="constant", gamma=0.25) + ecs = res.effect_confidence_set + assert ecs["status"] == "ran" + assert ecs["lower"] <= res.att <= ecs["upper"] # point estimate inside the set + assert not (ecs["lower"] <= 0.0 <= ecs["upper"]) # a real +3 effect -> 0 excluded + assert list(grid.columns) == ["param", "p_value", "in_set"] + assert ecs["boundary"] == "strict" and ecs["parameter"] == "c" + + +def test_test_sharp_null_array_path_matches_scalar_and_validates(): + res, _, _ = _fit_with_placebos() + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + s_scalar = res.test_sharp_null(3.0) + s_array = res.test_sharp_null(np.array([3.0, 3.0, 3.0, 3.0])) + assert s_array["p_value"] == pytest.approx(s_scalar["p_value"]) + for bad in ( + np.array([1.0, 2.0]), # wrong length + np.array([[1.0, 2.0, 3.0, 4.0]]), # 2-D + np.array([np.nan, 1.0, 1.0, 1.0]), # non-finite + ): + with pytest.raises(ValueError): + res.test_sharp_null(bad) + + +def test_confidence_set_conf_int_stays_nan(): + res, _, _ = _fit_with_placebos() + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res.confidence_set(family="constant", gamma=0.25) + # The analytical fields stay NaN: this is a SEPARATE permutation object. + assert np.isnan(res.se) and np.isnan(res.p_value) and np.isnan(res.t_stat) + assert np.isnan(res.conf_int[0]) and np.isnan(res.conf_int[1]) + assert res.effect_confidence_set is not None + + +def test_confidence_set_to_dict_flattened_and_summary_renders(): + res = _exact_combo_fit(effect=3.0) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res.confidence_set(family="constant", gamma=0.25) + d = res.to_dict() + assert d["effect_ci_status"] == "ran" + assert d["effect_ci_family"] == "constant" and d["effect_ci_parameter"] == "c" + assert np.isnan(d["conf_int_lower"]) # analytical interval still NaN + assert np.isfinite(d["effect_ci_lower"]) and np.isfinite(d["effect_ci_upper"]) + row = res.to_dataframe() # stays a single row of scalars + assert len(row) == 1 and "effect_ci_lower" in row.columns + assert "Confidence set by test inversion" in res.summary() + + +def test_confidence_set_linear_runs_and_sets_field(): + res, _, _ = _fit_with_placebos() + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + grid = res.confidence_set(family="linear", gamma=0.25) + ecs = res.effect_confidence_set + assert ecs["family"] == "linear" and ecs["parameter"] == "c_tilde" + assert ecs["status"] in ("ran", "empty", "unbounded") + assert list(grid.columns) == ["param", "p_value", "in_set"] + + +def test_confidence_set_lazy_runs_in_space_placebo(): + res, _, _ = _fit_with_placebos(run_placebo=False) + assert res._placebo_gaps is None # placebo NOT run yet + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res.confidence_set(family="constant", gamma=0.25) + assert res._placebo_gaps is not None # lazily built + assert res.effect_confidence_set is not None + + +def test_confidence_set_n_starts_ignored_with_warning_when_reference_exists(): + res, _, _ = _fit_with_placebos(run_placebo=True) # reference already built + with pytest.warns(UserWarning, match="n_starts is ignored"): + res.confidence_set(family="constant", gamma=0.25, n_starts=3) + + +def test_get_confidence_set_df_requires_run(): + res, _, _ = _fit_with_placebos() + with pytest.raises(ValueError, match="No confidence set"): + res.get_confidence_set_df() + + +def test_in_space_placebo_rerun_invalidates_confidence_set(): + # CI-review P1: a confidence set is computed against the CURRENT placebo reference set, + # so an explicit in_space_placebo() rebuild (which _require_placebo_reference even + # suggests, via n_starts) must INVALIDATE the cached set rather than report a stale one. + res = _exact_combo_fit(effect=3.0) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res.confidence_set(family="constant", gamma=0.25) + assert res.effect_confidence_set is not None + native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"] + assert native["confidence_set"]["status"] == "ran" + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res.in_space_placebo(n_starts=2) # rebuild the reference set + assert res.effect_confidence_set is None + with pytest.raises(ValueError, match="No confidence set"): + res.get_confidence_set_df() + native2 = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"] + assert native2["confidence_set"]["status"] == "not_run" + + +def test_confidence_set_too_few_donors_raises(): + # One donor -> in_space_placebo cannot form a reference set -> CI / test raise. + df, years, T0 = _make_panel(n_donors=1, T=10, T0=6) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = SyntheticControl(v_method="custom", custom_v=np.ones(T0), seed=0).fit( + df, "y", "treated", "unit", "year", post_periods=years[T0:], treated_unit="treated" + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + with pytest.raises(ValueError, match="reference set"): + res.confidence_set(family="constant", gamma=0.25) + with pytest.raises(ValueError, match="reference set"): + res.test_sharp_null(0.0) + + +def test_confidence_set_unpickled_raises(): + res, _, _ = _fit_with_placebos() + restored = pickle.loads(pickle.dumps(res)) # snapshot + placebo gaps dropped + with pytest.raises(ValueError): + restored.confidence_set(family="constant", gamma=0.25) + + +@pytest.mark.parametrize( + "kwargs", + [ + {"family": "quadratic"}, + {"gamma": 0.0}, + {"gamma": 1.0}, + {"n_grid": 1}, + {"bounds": (1.0, 1.0)}, + {"bounds": (2.0, 1.0)}, + {"bounds": 5.0}, # scalar -> ValueError, not a bare TypeError from len() + {"bounds": (1.0,)}, # wrong length + {"bounds": (np.inf, 1.0)}, # non-finite + ], +) +def test_confidence_set_input_validation(kwargs): + res, _, _ = _fit_with_placebos() + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + with pytest.raises(ValueError): + res.confidence_set(**kwargs) + + +@pytest.mark.parametrize("bad", [0.0, 1.0, -0.1, 1.5]) +def test_test_sharp_null_gamma_validation(bad): + res, _, _ = _fit_with_placebos() + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + with pytest.raises(ValueError): + res.test_sharp_null(0.0, gamma=bad) + + +@pytest.mark.slow +def test_confidence_set_coverage_simulation(): + # Behavioral coverage check: under a constant-effect DGP the (1 - gamma) confidence + # set should cover the true effect at roughly 1 - gamma. A looser inner tolerance keeps + # the refits converging cleanly; reps with ANY dropped placebo (n_failed > 0) are + # EXCLUDED from the coverage count so dropped placebos cannot bias it (M5), and we + # assert that the large majority of reps are clean (the settings are adequate). + # J = 9 -> attainable p in multiples of 1/10, gamma = 0.1. + gamma = 0.1 + c_true = 2.0 + reps = 100 + clean = 0 + covered = 0 + for s in range(reps): + df, years, T0 = _make_panel(n_donors=9, T=10, T0=6, effect=c_true, seed=1000 + s) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = SyntheticControl( + v_method="custom", custom_v=np.ones(T0), seed=s, inner_min_decrease=1e-3 + ).fit( + df, "y", "treated", "unit", "year", post_periods=years[T0:], treated_unit="treated" + ) + res.in_space_placebo() + if res.n_failed != 0: + continue # exclude biased reps from the coverage count (M5) + clean += 1 + res.confidence_set(family="constant", gamma=gamma) + ecs = res.effect_confidence_set + if ecs["status"] == "unbounded": + covered += 1 # an unbounded set trivially covers the truth + elif ecs["status"] == "ran" and ecs["lower"] <= c_true <= ecs["upper"]: + covered += 1 + assert clean >= 0.8 * reps, f"only {clean}/{reps} reps converged cleanly" + coverage = covered / clean + # Permutation inference is finite-sample valid under exchangeability; allow a wide + # band (the convex-combo treated is not perfectly exchangeable with single donors). + assert coverage >= 0.70, f"coverage {coverage:.3f} too low (target ~{1 - gamma})"