Skip to content

Commit 25c364f

Browse files
igerberclaude
andcommitted
utils: enforce all-or-nothing NaN inference contract on degenerate bootstrap
CI review (R6) caught a new P0 in my R5 wild_bootstrap_se() fix: the degenerate-bootstrap branches violated the all-or-nothing NaN contract from feedback_bootstrap_nan_on_invalid_contract: - n_valid == 0 returned p_value = 1.0 with se = NaN (split inference) - valid_coefs.size == 1 returned a finite percentile CI from a single draw alongside se = NaN - t_stat_original was always finite (analytical), surfacing alongside NaN bootstrap se when bootstrap was degenerate Fix: when n_valid < 2 OR valid_coefs.size < 2, NaN-out the entire inference quadruple (se, p_value, ci_lower, ci_upper) AND the surfaced t_stat_original. The analytical t-stat from step 1 is still computed for diagnostic use inside the helper but not propagated to the user-facing result on a degenerate bootstrap — this prevents the estimator wrapper from emitting an analytical t-stat alongside NaN bootstrap fields, which would mix inference families on the same coefficient. New regression tests in tests/test_wild_bootstrap.py:: TestWildBootstrapDegenerateAllNaN: - test_degenerate_n_valid_zero_returns_all_nan: monkeypatches solve_ols so every bootstrap draw has singular vcov; asserts ALL five user-surface fields are NaN. - test_degenerate_single_valid_draw_returns_all_nan: forces exactly one valid draw (n_valid == 1); asserts ALL five fields NaN — no percentile CI from a single-point sample. Both branches were previously not exercised by the analytical-design tests, which is why the R5 fix passed but the R6 reviewer caught the contract violation via code inspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7f5df65 commit 25c364f

2 files changed

Lines changed: 281 additions & 234 deletions

File tree

diff_diff/utils.py

Lines changed: 27 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -624,40 +624,48 @@ def wild_bootstrap_se(
624624
if se_star > 0 and np.isfinite(beta_star[coefficient_index]):
625625
bootstrap_t_stats[b] = (beta_star[coefficient_index] - null_hypothesis) / se_star
626626

627-
# Step 4: Compute bootstrap p-value from VALID (finite) draws only
627+
# Step 4: Compute bootstrap inference from VALID (finite) draws only.
628+
#
629+
# All-or-nothing NaN contract (per feedback_bootstrap_nan_on_invalid_contract):
630+
# when bootstrap output is degenerate (fewer than 2 finite t-stats or
631+
# 2 finite coefs), return NaN across the full inference surface (se,
632+
# p_value, both CI endpoints, AND the surfaced t_stat_original). The
633+
# original analytical t_stat is still computed in step 1 for diagnostic
634+
# use but is NOT propagated to the user-facing result when bootstrap
635+
# is degenerate — surfacing it alongside NaN se/p/CI would mix
636+
# analytical and bootstrap inference families on the same coefficient.
628637
finite_mask = np.isfinite(bootstrap_t_stats)
629638
n_valid = int(finite_mask.sum())
630-
if n_valid == 0:
631-
# All bootstrap draws were singular; fall back to a conservative
632-
# p-value of 1.0 rather than silently returning a misleading value.
633-
p_value = 1.0
634-
else:
635-
p_value = float(np.mean(np.abs(bootstrap_t_stats[finite_mask]) >= np.abs(t_stat_original)))
636-
# Ensure p-value is at least 1/(n_valid+1) to avoid exact zero.
637-
p_value = float(max(p_value, 1 / (n_valid + 1)))
638-
639-
# Step 5: Compute bootstrap SE and confidence interval from valid draws
640-
# only (use nan-safe reductions, mirroring the p-value filtering above).
641639
valid_coefs = bootstrap_coefs[np.isfinite(bootstrap_coefs)]
642-
if valid_coefs.size >= 2:
643-
se_bootstrap = float(np.std(valid_coefs, ddof=1))
644-
else:
645-
se_bootstrap = float("nan")
646640

647-
# Percentile confidence interval from bootstrap distribution
648641
lower_percentile = alpha / 2 * 100
649642
upper_percentile = (1 - alpha / 2) * 100
650-
if valid_coefs.size >= 1:
643+
644+
if n_valid >= 2 and valid_coefs.size >= 2:
645+
p_value = float(np.mean(np.abs(bootstrap_t_stats[finite_mask]) >= np.abs(t_stat_original)))
646+
# Ensure p-value is at least 1/(n_valid+1) to avoid exact zero.
647+
p_value = float(max(p_value, 1 / (n_valid + 1)))
648+
se_bootstrap = float(np.std(valid_coefs, ddof=1))
651649
ci_lower = float(np.percentile(valid_coefs, lower_percentile))
652650
ci_upper = float(np.percentile(valid_coefs, upper_percentile))
651+
surfaced_t_stat = t_stat_original
653652
else:
653+
# Degenerate bootstrap (insufficient valid draws): NaN-out the
654+
# entire inference tuple. Downstream consumers (estimator-level
655+
# `_run_wild_bootstrap_inference`) map these fields directly onto
656+
# the result object; this guarantees the (se, t_stat, p_value, ci)
657+
# quadruple moves together rather than reporting analytical t_stat
658+
# with NaN se.
659+
p_value = float("nan")
660+
se_bootstrap = float("nan")
654661
ci_lower = float("nan")
655662
ci_upper = float("nan")
663+
surfaced_t_stat = float("nan")
656664

657665
return WildBootstrapResults(
658666
se=se_bootstrap,
659667
p_value=p_value,
660-
t_stat_original=t_stat_original,
668+
t_stat_original=surfaced_t_stat,
661669
ci_lower=ci_lower,
662670
ci_upper=ci_upper,
663671
n_clusters=n_clusters,

0 commit comments

Comments
 (0)