Skip to content

Commit 7f5df65

Browse files
igerberclaude
andcommitted
utils: fix wild_bootstrap NaN propagation on rank-deficient designs
CI review (R5) identified a P1 bug in wild_bootstrap_se() that was newly reachable via the TWFE HC2/HC2-BM full-dummy path: Before this fix, wild_bootstrap_se built each draw's pseudo-outcome as `y_star = X @ beta_restricted`. When solve_ols dropped a rank- deficient nuisance column (e.g. a time-invariant covariate collinear with the unit FE on the full-dummy design), beta_restricted contained NaN on the dropped slot, and X @ beta_restricted propagated NaN through every observation. The ATT was analytically identified but the bootstrap crashed because y_star was all-NaN. Pre-PR this was unreachable on TWFE (the within-transform absorbed time-invariant covariates before they entered X), but the new full- dummy HC2/HC2-BM branch keeps unit/time dummies explicit alongside covariates, exposing the bug. Two fixes in wild_bootstrap_se (diff_diff/utils.py): 1. Use solve_ols(return_fitted=True) to get NaN-safe fitted values from the kept columns; build y_star = fitted_restricted + residuals_restricted * obs_weights instead of X @ beta_restricted. fitted_restricted is computed from the kept columns by solve_ols, so dropped nuisance NaN doesn't propagate. 2. Replace bootstrap_t_stats[b] = 0.0 fallback for singular draws with np.nan + a finite_mask filter at the p-value step. Setting t* = 0 biased the p-value downward (|0| < |t_original| counts as non-rejection, but those draws are invalid, not non-rejections). The same nan-safe filter applies to bootstrap_coefs for the SE and percentile CI. New regression test `test_twfe_hc2_wild_bootstrap_survives_rank_deficient_full_dummy` fits TwoWayFixedEffects(vcov_type='hc2', inference='wild_bootstrap', covariates=['x_invariant']) on a panel where x_invariant is time- invariant (collinear with unit FE on the full-dummy design); asserts finite ATT, SE, p-value, and CI. Pre-fix this test crashed with all-NaN y_star. No regression in the existing 53 wild_bootstrap tests across test_wild_bootstrap, test_methodology_did, test_methodology_twfe, test_conley_vcov, test_estimators_vcov_type, test_business_report, test_replicate_weight_expansion, test_survey. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent baceb22 commit 7f5df65

2 files changed

Lines changed: 156 additions & 48 deletions

File tree

diff_diff/utils.py

Lines changed: 103 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -572,16 +572,30 @@ def wild_bootstrap_se(
572572

573573
# Fit restricted model (but we need to drop the column for the restricted coef)
574574
# Actually, for WCR bootstrap we keep all columns but impose the null via residuals
575-
# Re-estimate with the restricted dependent variable
576-
beta_restricted, residuals_restricted, _ = _solve_ols_linalg(X, y_restricted, return_vcov=False)
575+
# Re-estimate with the restricted dependent variable.
576+
#
577+
# Use return_fitted=True so we get NaN-safe fitted values from the kept
578+
# columns when solve_ols drops rank-deficient nuisance columns. Without
579+
# this, building y_star via `X @ beta_restricted` would propagate NaN
580+
# through every observation whenever a nuisance column was dropped
581+
# (e.g. always-treated unit dummy collinear with treated*post on the
582+
# full-dummy TWFE HC2/HC2-BM path), poisoning the entire bootstrap loop
583+
# despite the ATT being analytically identified.
584+
beta_restricted, residuals_restricted, fitted_restricted, _ = _solve_ols_linalg(
585+
X, y_restricted, return_vcov=False, return_fitted=True
586+
)
577587

578588
# Create cluster-to-observation mapping for efficiency
579589
cluster_map = {c: np.where(cluster_ids == c)[0] for c in unique_clusters}
580590
cluster_indices = [cluster_map[c] for c in unique_clusters]
581591

582592
# Step 3: Bootstrap loop
583-
bootstrap_t_stats = np.zeros(n_bootstrap)
584-
bootstrap_coefs = np.zeros(n_bootstrap)
593+
# Use NaN for invalid draws (singular bootstrap SE) and filter at the
594+
# p-value step, rather than coercing to t*=0 which biases the p-value
595+
# toward small values (since |0| < |t_original| counts as "non-rejection"
596+
# only when the original t is large).
597+
bootstrap_t_stats = np.full(n_bootstrap, np.nan)
598+
bootstrap_coefs = np.full(n_bootstrap, np.nan)
585599

586600
for b in range(n_bootstrap):
587601
# Generate cluster-level weights
@@ -592,8 +606,10 @@ def wild_bootstrap_se(
592606
for g, indices in enumerate(cluster_indices):
593607
obs_weights[indices] = cluster_weights[g]
594608

595-
# Construct bootstrap sample: y* = X @ beta_restricted + e_restricted * weights
596-
y_star = np.dot(X, beta_restricted) + residuals_restricted * obs_weights
609+
# Construct bootstrap sample: y* = fitted_restricted + e_restricted * weights
610+
# (fitted_restricted comes from solve_ols's kept-columns reconstruction,
611+
# so it's NaN-safe even when beta_restricted has NaN on dropped columns)
612+
y_star = fitted_restricted + residuals_restricted * obs_weights
597613

598614
# Estimate bootstrap coefficients with cluster-robust SE
599615
beta_star, residuals_star, vcov_star = _solve_ols_linalg(
@@ -603,28 +619,40 @@ def wild_bootstrap_se(
603619
assert vcov_star is not None
604620
se_star = np.sqrt(vcov_star[coefficient_index, coefficient_index])
605621

606-
# Compute bootstrap t-statistic (under null hypothesis)
607-
if se_star > 0:
622+
# Compute bootstrap t-statistic (under null hypothesis); invalid
623+
# draws (singular SE) leave the NaN sentinel for filtering below.
624+
if se_star > 0 and np.isfinite(beta_star[coefficient_index]):
608625
bootstrap_t_stats[b] = (beta_star[coefficient_index] - null_hypothesis) / se_star
609-
else:
610-
bootstrap_t_stats[b] = 0.0
611-
612-
# Step 4: Compute bootstrap p-value
613-
# P-value is proportion of |t*| >= |t_original|
614-
p_value = np.mean(np.abs(bootstrap_t_stats) >= np.abs(t_stat_original))
615626

616-
# Ensure p-value is at least 1/(n_bootstrap+1) to avoid exact zero
617-
p_value = float(max(float(p_value), 1 / (n_bootstrap + 1)))
618-
619-
# Step 5: Compute bootstrap SE and confidence interval
620-
# SE from standard deviation of bootstrap coefficient distribution
621-
se_bootstrap = float(np.std(bootstrap_coefs, ddof=1))
627+
# Step 4: Compute bootstrap p-value from VALID (finite) draws only
628+
finite_mask = np.isfinite(bootstrap_t_stats)
629+
n_valid = int(finite_mask.sum())
630+
if n_valid == 0:
631+
# All bootstrap draws were singular; fall back to a conservative
632+
# p-value of 1.0 rather than silently returning a misleading value.
633+
p_value = 1.0
634+
else:
635+
p_value = float(np.mean(np.abs(bootstrap_t_stats[finite_mask]) >= np.abs(t_stat_original)))
636+
# Ensure p-value is at least 1/(n_valid+1) to avoid exact zero.
637+
p_value = float(max(p_value, 1 / (n_valid + 1)))
638+
639+
# Step 5: Compute bootstrap SE and confidence interval from valid draws
640+
# only (use nan-safe reductions, mirroring the p-value filtering above).
641+
valid_coefs = bootstrap_coefs[np.isfinite(bootstrap_coefs)]
642+
if valid_coefs.size >= 2:
643+
se_bootstrap = float(np.std(valid_coefs, ddof=1))
644+
else:
645+
se_bootstrap = float("nan")
622646

623647
# Percentile confidence interval from bootstrap distribution
624648
lower_percentile = alpha / 2 * 100
625649
upper_percentile = (1 - alpha / 2) * 100
626-
ci_lower = float(np.percentile(bootstrap_coefs, lower_percentile))
627-
ci_upper = float(np.percentile(bootstrap_coefs, upper_percentile))
650+
if valid_coefs.size >= 1:
651+
ci_lower = float(np.percentile(valid_coefs, lower_percentile))
652+
ci_upper = float(np.percentile(valid_coefs, upper_percentile))
653+
else:
654+
ci_lower = float("nan")
655+
ci_upper = float("nan")
628656

629657
return WildBootstrapResults(
630658
se=se_bootstrap,
@@ -823,7 +851,11 @@ def check_parallel_trends_robust(
823851

824852
# Compute outcome changes
825853
treated_changes, control_changes = _compute_outcome_changes(
826-
pre_data, outcome, time, treatment_group, unit,
854+
pre_data,
855+
outcome,
856+
time,
857+
treatment_group,
858+
unit,
827859
caller_label="check_parallel_trends_robust",
828860
)
829861

@@ -1026,7 +1058,11 @@ def equivalence_test_trends(
10261058

10271059
# Compute outcome changes
10281060
treated_changes, control_changes = _compute_outcome_changes(
1029-
pre_data, outcome, time, treatment_group, unit,
1061+
pre_data,
1062+
outcome,
1063+
time,
1064+
treatment_group,
1065+
unit,
10301066
caller_label="equivalence_test_trends",
10311067
)
10321068

@@ -1367,15 +1403,9 @@ def _sc_weight_fw(
13671403
"""
13681404
Y_c = np.ascontiguousarray(Y, dtype=np.float64)
13691405
init_c = (
1370-
np.ascontiguousarray(init_weights, dtype=np.float64)
1371-
if init_weights is not None
1372-
else None
1373-
)
1374-
rw_c = (
1375-
np.ascontiguousarray(reg_weights, dtype=np.float64)
1376-
if reg_weights is not None
1377-
else None
1406+
np.ascontiguousarray(init_weights, dtype=np.float64) if init_weights is not None else None
13781407
)
1408+
rw_c = np.ascontiguousarray(reg_weights, dtype=np.float64) if reg_weights is not None else None
13791409

13801410
if rw_c is not None:
13811411
# Validate reg_weights shape at the dispatcher so Rust and NumPy
@@ -1396,26 +1426,53 @@ def _sc_weight_fw(
13961426
if reg_weights is not None:
13971427
if return_convergence:
13981428
weights, converged = _rust_sc_weight_fw_weighted_with_convergence(
1399-
Y_c, zeta, intercept, init_c, min_decrease, max_iter, rw_c,
1429+
Y_c,
1430+
zeta,
1431+
intercept,
1432+
init_c,
1433+
min_decrease,
1434+
max_iter,
1435+
rw_c,
14001436
)
14011437
return np.asarray(weights), converged
14021438
return np.asarray(
14031439
_rust_sc_weight_fw_weighted(
1404-
Y_c, zeta, intercept, init_c, min_decrease, max_iter, rw_c,
1440+
Y_c,
1441+
zeta,
1442+
intercept,
1443+
init_c,
1444+
min_decrease,
1445+
max_iter,
1446+
rw_c,
14051447
)
14061448
)
14071449
if return_convergence:
14081450
weights, converged = _rust_sc_weight_fw_with_convergence(
1409-
Y_c, zeta, intercept, init_c, min_decrease, max_iter,
1451+
Y_c,
1452+
zeta,
1453+
intercept,
1454+
init_c,
1455+
min_decrease,
1456+
max_iter,
14101457
)
14111458
return np.asarray(weights), converged
14121459
return np.asarray(
14131460
_rust_sc_weight_fw(
1414-
Y_c, zeta, intercept, init_c, min_decrease, max_iter,
1461+
Y_c,
1462+
zeta,
1463+
intercept,
1464+
init_c,
1465+
min_decrease,
1466+
max_iter,
14151467
)
14161468
)
14171469
return _sc_weight_fw_numpy(
1418-
Y, zeta, intercept, init_weights, min_decrease, max_iter,
1470+
Y,
1471+
zeta,
1472+
intercept,
1473+
init_weights,
1474+
min_decrease,
1475+
max_iter,
14191476
return_convergence=return_convergence,
14201477
reg_weights=reg_weights,
14211478
)
@@ -1910,8 +1967,7 @@ def compute_sdid_unit_weights_survey(
19101967

19111968
if rw_control.shape != (n_control,):
19121969
raise ValueError(
1913-
f"rw_control shape {rw_control.shape} does not match expected "
1914-
f"({n_control},)"
1970+
f"rw_control shape {rw_control.shape} does not match expected " f"({n_control},)"
19151971
)
19161972

19171973
if n_control == 0:
@@ -1924,10 +1980,12 @@ def compute_sdid_unit_weights_survey(
19241980
# Build the column-scaled Y matrix: each control column j is multiplied by
19251981
# rw_control[j], so A·ω in the loss equals Σ_j rw_j·ω_j·Y_j,pre.
19261982
rw = np.ascontiguousarray(rw_control, dtype=np.float64)
1927-
Y_scaled = np.column_stack([
1928-
Y_pre_control * rw[np.newaxis, :],
1929-
Y_pre_treated_mean.reshape(-1, 1),
1930-
])
1983+
Y_scaled = np.column_stack(
1984+
[
1985+
Y_pre_control * rw[np.newaxis, :],
1986+
Y_pre_treated_mean.reshape(-1, 1),
1987+
]
1988+
)
19311989

19321990
if return_convergence:
19331991
omega, conv1 = _sc_weight_fw(
@@ -2031,8 +2089,7 @@ def compute_time_weights_survey(
20312089

20322090
if rw_control.shape != (n_control,):
20332091
raise ValueError(
2034-
f"rw_control shape {rw_control.shape} does not match expected "
2035-
f"({n_control},)"
2092+
f"rw_control shape {rw_control.shape} does not match expected " f"({n_control},)"
20362093
)
20372094

20382095
if Y_post_control.shape[0] == 0:
@@ -2058,9 +2115,7 @@ def compute_time_weights_survey(
20582115
# does not re-center on the row-scaled matrix.
20592116
rw_sum = float(np.sum(rw_control))
20602117
if intercept and rw_sum > 0:
2061-
col_weighted_means = (
2062-
(Y_time * rw_control[:, np.newaxis]).sum(axis=0) / rw_sum
2063-
)
2118+
col_weighted_means = (Y_time * rw_control[:, np.newaxis]).sum(axis=0) / rw_sum
20642119
Y_time = Y_time - col_weighted_means[np.newaxis, :]
20652120

20662121
# Row-scale by sqrt(rw): after weighted centering (if any), each

tests/test_estimators_vcov_type.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414

1515
from __future__ import annotations
1616

17+
import warnings
18+
1719
import numpy as np
1820
import pandas as pd
1921
import pytest
@@ -825,6 +827,57 @@ def test_twfe_hc2_explicit_no_auto_cluster_analytical(self):
825827
# No auto-cluster on explicit one-way hc2 + analytical.
826828
assert res.cluster_name is None
827829

830+
def test_twfe_hc2_wild_bootstrap_survives_rank_deficient_full_dummy(self):
831+
"""TWFE(vcov_type='hc2', inference='wild_bootstrap') stays finite when
832+
the full-dummy design has a rank-deficient nuisance column.
833+
834+
Regression for a P1 bug in `wild_bootstrap_se()`: it previously built
835+
`y_star = X @ beta_restricted`, which propagates NaN through every
836+
observation whenever solve_ols dropped a nuisance column (e.g. a
837+
time-invariant covariate collinear with the unit FE). The ATT was
838+
analytically identified, but the bootstrap crashed because every
839+
`y_star` was all-NaN. Reachable on the new TWFE HC2 full-dummy path
840+
(the within-transform path absorbed time-invariant covariates so
841+
the issue was hidden pre-PR).
842+
843+
Fix: `wild_bootstrap_se()` now uses solve_ols's kept-columns
844+
`fitted_restricted` instead of `X @ beta_restricted`, so dropped
845+
nuisance columns no longer poison `y_star`.
846+
"""
847+
data = _make_did_panel(n_units=20).copy()
848+
# x_invariant is time-invariant (only varies across units),
849+
# so it's collinear with the unit fixed effect on the
850+
# full-dummy design and gets dropped by solve_ols.
851+
rng = np.random.default_rng(99)
852+
unit_to_x = {u: rng.normal() for u in data["unit"].unique()}
853+
data["x_invariant"] = data["unit"].map(unit_to_x).astype(float)
854+
with warnings.catch_warnings():
855+
# The expected rank-deficient column drop emits a UserWarning;
856+
# we accept it as part of the documented full-dummy path.
857+
warnings.simplefilter("ignore", UserWarning)
858+
res = TwoWayFixedEffects(
859+
vcov_type="hc2",
860+
inference="wild_bootstrap",
861+
n_bootstrap=50,
862+
seed=1,
863+
).fit(
864+
data,
865+
outcome="y",
866+
treatment="treated",
867+
time="time",
868+
unit="unit",
869+
covariates=["x_invariant"],
870+
)
871+
# ATT remains identified despite the dropped nuisance column.
872+
assert np.isfinite(res.att), "ATT should remain finite despite rank deficiency"
873+
assert np.isfinite(res.se), (
874+
"Bootstrap SE should be finite — if NaN, wild_bootstrap_se's "
875+
"y_star construction is propagating NaN from beta_restricted."
876+
)
877+
assert res.se > 0
878+
assert np.isfinite(res.p_value)
879+
assert np.isfinite(res.conf_int[0]) and np.isfinite(res.conf_int[1])
880+
828881
def test_twfe_hc2_wild_bootstrap_keeps_auto_cluster(self):
829882
"""Wild-bootstrap inference on TWFE(vcov_type='hc2') must keep the
830883
unit auto-cluster (bootstrap resampling uses the cluster structure).

0 commit comments

Comments
 (0)