SyntheticControl: confidence sets by test inversion (Firpo-Possebom 2018, PR-B)#527
SyntheticControl: confidence sets by test inversion (Firpo-Possebom 2018, PR-B)#527igerber wants to merge 2 commits into
Conversation
…018, PR-B) Add two opt-in SyntheticControlResults methods that re-rank the in-space placebo gaps into a confidence set for the treatment-effect path (no synthetic-control refits): - test_sharp_null(effect, gamma): test H_0: alpha_1t = f(t) (Eqs 12-13, phi=0, v=1); test_sharp_null(0) is identically placebo_p_value. - confidence_set(family="constant"|"linear", gamma): invert that test (Eqs 14/16/18, strict p^f > gamma). p^param is piecewise-constant, so the set is recovered EXACTLY via the placebo breakpoints (real roots of the pairwise RMSPE-equality quadratics) -- no shape/centering assumption, so accepted tails, disjoint components, empty and unbounded sets are all handled. effect_confidence_set summary + get_confidence_set_df() grid; analytical conf_int/se/t_stat/p_value stay NaN. Persist per-unit floored pre-denominators in in_space_placebo so the f=0 anchor holds bit-for-bit. Wire an opt-in confidence_set block into _scm_native (numeric endpoints only for a bounded set; JSON-safe). Fail-closed for unbounded (gamma<1/(J+1) or treated-not-best-pre-fit) / empty / non-contiguous / <2 donors / non-converged / unpickled. Docs: REGISTRY methodology block + checklist + Notes (boundary/grid/non-analytical/ no-R-anchor); review checklist flip; LLM guides; api rst; CHANGELOG; doc-deps. Tests: numpy oracle (Eqs 12-14 incl strict p=gamma boundary + per-unit floor), the f=0 self-consistency anchor, a center-rejected/tails-accepted regression, invariants, fail-closed paths, and a coverage simulation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
CI codex (gpt-5.5) findings on PR #527: - P1: confidence_set() caches effect_confidence_set / _confidence_set_df against the CURRENT in-space placebo reference set, but a later explicit in_space_placebo() rebuild (which _require_placebo_reference suggests via n_starts) overwrote the reference without invalidating the cache -> a stale set could be reported by summary()/to_dict()/_scm_native. Now clear both at the start of in_space_placebo() (after the snapshot check) so every rebuild drops the stale cache. - P2: add a regression test (confidence_set -> in_space_placebo(n_starts=) -> assert effect_confidence_set is None, get_confidence_set_df() raises, DR status "not_run"). - P3: update the Firpo-Possebom review intro from "forthcoming PR-B" to shipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: ✅ Looks goodExecutive summary
MethodologyFinding 1Severity: P3 informational Finding 2Severity: P3 informational Code QualityNo findings. The previous stale-cache issue is addressed at PerformanceNo findings. The exact breakpoint inversion is documented in MaintainabilityNo findings. Public surfaces are documented in Tech DebtNo untracked blocking tech debt found. Deferred Firpo-Possebom scope is documented in the Registry/paper review, not silently left ambiguous. SecurityNo security findings in the changed code/docs. Documentation/TestsFinding 3Severity: P3 informational |
Summary
docs/methodology/papers/firpo-possebom-2018-review.md, merged in docs: Firpo & Possebom (2018) paper review — SCM CI by test inversion (PR-A) #524). Gives classicSyntheticControlthe uncertainty quantification it lacked — a confidence set for the treatment-effect path by test inversion — without changing its always-NaN analytical inference contract.SyntheticControlResultsmethods, both a pure re-ranking of the in-space placebo gaps (no synthetic-control refits):test_sharp_null(effect, gamma=0.1)— testH_0: α_1t = f(t)(Eqs 12–13, φ=0);test_sharp_null(0)is bit-for-bitplacebo_p_value.confidence_set(family="constant"|"linear", gamma=0.1, bounds=None, n_grid=200)— invert that test (Eqs 14/16/18, strictp^f > γ). Withbounds=Nonethe set is recovered exactly:p^cis piecewise-constant, so the placebo-comparison quadratic breakpoints partition the line;pis evaluated per interval and at each breakpoint (a tie under≥can spikep, so isolated singletons are captured) — no centering/monotonicity assumption, so accepted tails / disjoint components / unbounded / empty are all handled.effect_confidence_setsummary +get_confidence_set_df(); analyticalconf_int/se/t_stat/p_valuestay NaN (a permutation set at level1−γ, γ-granular in1/(J+1), possibly a set/unbounded/non-contiguous — kept separate exactly asplacebo_p_valueis kept offp_value). Surfaced opt-in underestimator_native_diagnostics.confidence_set.γ < 1/(J+1)or treated-not-best-pre-fit →unbounded; nothing accepted →empty; disjoint/singleton →contiguous=False+ warning;<2donors / non-converged treated / unpickled →ValueError.Methodology references
docs/methodology/papers/firpo-possebom-2018-review.md(PR-A, docs: Firpo & Possebom (2018) paper review — SCM CI by test inversion (PR-A) #524); new## SyntheticControlmethodology block + 4**Note:**labels indocs/methodology/REGISTRY.md(boundary convention, exact-breakpoint set construction, non-analyticalconf_int, no-R-anchor validation).p^f > γboundary per Eq 14 (documented; the discrete p-value makesp=γreachable). Deferred (flagged in the review checklist): sensitivity weights (φ≠0), the general-θ menu (Eq 19), one-sided (§7), multiple-outcome/treated (§6).Validation
tests/test_methodology_synthetic_control.py(numpy oracle for Eqs 12–14 incl. the strictp=γboundary + the per-unit floor; thetest_sharp_null(0)==placebo_p_valueself-consistency anchor incl. a near-perfect-pre-fit floor-biting case; center-rejected/tails-accepted and isolated-breakpoint-singleton regressions; invariants; fail-closed; input validation; a@slowcoverage simulation) +tests/test_diagnostic_report.py(DR surfacing). No R anchor — RSynthhas no test inversion; validated by self-consistency (transitively R-anchored via the Basque placebo parity), the numpy oracle, and the coverage MC.Security / privacy
🤖 Generated with Claude Code