Skip to content

Commit 026a031

Browse files
authored
Merge pull request #44 from igerber/claude/plan-v1-1-1-release-8iyGQ
2 parents 760c111 + c26e17b commit 026a031

11 files changed

Lines changed: 164 additions & 56 deletions

File tree

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,21 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.1.1] - 2026-01-06
9+
10+
### Fixed
11+
- **SyntheticDiD bootstrap error handling**: Bootstrap now raises clear `ValueError` when all iterations fail, instead of silently returning SE=0.0. Added warnings for edge cases (single successful iteration, high failure rate).
12+
13+
- **Diagnostics module error handling**: Improved error messages in `permutation_test()` and `leave_one_out_test()` with actionable guidance. Added warnings when significant iterations fail. Enhanced `run_all_placebo_tests()` to return structured error info including error type.
14+
15+
### Changed
16+
- **Code deduplication**: Extracted wild bootstrap inference logic to shared `_run_wild_bootstrap_inference()` method in `DifferenceInDifferences` base class, used by both `DifferenceInDifferences` and `TwoWayFixedEffects`.
17+
18+
- **Type hints**: Added missing type hints to nested functions:
19+
- `compute_trend()` in `utils.py`
20+
- `neg_log_likelihood()` and `gradient()` in `staggered.py`
21+
- `format_label()` in `prep.py`
22+
823
## [1.1.0] - 2026-01-05
924

1025
### Added

TODO.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@ Current limitations that may affect users:
1414
|-------|----------|----------|-------|
1515
| MultiPeriodDiD wild bootstrap not supported | `estimators.py:1068-1074` | Low | Edge case |
1616
| `predict()` raises NotImplementedError | `estimators.py:532-554` | Low | Rarely needed |
17-
| SyntheticDiD bootstrap can fail silently | `estimators.py:1580-1654` | Medium | Needs better error handling |
18-
| Diagnostics module error handling | `diagnostics.py:782-885` | Medium | Improve robustness |
1917

2018
---
2119

@@ -27,7 +25,6 @@ Consolidation opportunities for cleaner maintenance:
2725

2826
| Duplicate Code | Locations | Notes |
2927
|---------------|-----------|-------|
30-
| Wild bootstrap inference block | `estimators.py:278-296`, `estimators.py:725-748` | Extract to shared method |
3128
| Within-transformation logic | `estimators.py:217-232`, `estimators.py:787-833`, `bacon.py:567-642` | Extract to utils.py |
3229
| Linear regression helper | `staggered.py:205-240`, `estimators.py:366-408` | Consider consolidation |
3330

@@ -117,8 +114,4 @@ No major performance issues identified. Potential future optimizations:
117114

118115
## Type Hints
119116

120-
Missing type hints in internal functions:
121-
122-
- `utils.py:593` - `compute_trend()` nested function
123-
- `staggered.py:173, 180` - Nested functions in `_logistic_regression()`
124-
- `prep.py:604` - `format_label()` nested function
117+
All previously identified missing type hints have been addressed in v1.1.1.

diff_diff/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@
9090
plot_sensitivity,
9191
)
9292

93-
__version__ = "1.1.0"
93+
__version__ = "1.1.1"
9494
__all__ = [
9595
# Estimators
9696
"DifferenceInDifferences",

diff_diff/diagnostics.py

Lines changed: 59 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -625,11 +625,30 @@ def permutation_test(
625625
# Handle edge cases where fitting fails
626626
permuted_effects[i] = np.nan
627627

628-
# Remove any NaN values
628+
# Remove any NaN values and track failure rate
629629
valid_effects = permuted_effects[~np.isnan(permuted_effects)]
630+
n_failed = n_permutations - len(valid_effects)
630631

631632
if len(valid_effects) == 0:
632-
raise RuntimeError("All permutations failed - check your data")
633+
raise RuntimeError(
634+
f"All {n_permutations} permutations failed. This typically occurs when:\n"
635+
f" - Treatment/control groups are too small for valid permutation\n"
636+
f" - Data contains collinearity or singular matrices after permutation\n"
637+
f" - There are too few observations per time period\n"
638+
f"Consider checking data quality with validate_did_data() from diff_diff.prep."
639+
)
640+
641+
# Warn if significant number of permutations failed
642+
if n_failed > 0:
643+
failure_rate = n_failed / n_permutations
644+
if failure_rate > 0.1:
645+
import warnings
646+
warnings.warn(
647+
f"{n_failed}/{n_permutations} permutations failed ({failure_rate:.1%}). "
648+
f"Results based on {len(valid_effects)} successful permutations.",
649+
UserWarning,
650+
stacklevel=2
651+
)
633652

634653
# Compute p-value: proportion of |permuted| >= |original|
635654
p_value = np.mean(np.abs(valid_effects) >= np.abs(original_att))
@@ -736,11 +755,30 @@ def leave_one_out_test(
736755
# Skip units that cause fitting issues
737756
loo_effects[u] = np.nan
738757

739-
# Remove NaN values for statistics
758+
# Remove NaN values for statistics and track failures
740759
valid_effects = [v for v in loo_effects.values() if not np.isnan(v)]
760+
n_total = len(loo_effects)
761+
n_failed = n_total - len(valid_effects)
741762

742763
if len(valid_effects) == 0:
743-
raise RuntimeError("All leave-one-out estimates failed")
764+
raise RuntimeError(
765+
f"All {n_total} leave-one-out estimates failed. This typically occurs when:\n"
766+
f" - Removing any single treated unit causes model fitting to fail\n"
767+
f" - Very few treated units (need at least 2 for LOO)\n"
768+
f" - Data has collinearity issues that manifest when units are removed\n"
769+
f"Consider checking data quality and ensuring sufficient treated units."
770+
)
771+
772+
# Warn if significant number of LOO iterations failed
773+
if n_failed > 0:
774+
import warnings
775+
failed_units = [u for u, v in loo_effects.items() if np.isnan(v)]
776+
warnings.warn(
777+
f"{n_failed}/{n_total} leave-one-out estimates failed for units: {failed_units}. "
778+
f"Results based on {len(valid_effects)} successful iterations.",
779+
UserWarning,
780+
stacklevel=2
781+
)
744782

745783
# Statistics of LOO distribution
746784
mean_effect = np.mean(valid_effects)
@@ -838,8 +876,13 @@ def run_all_placebo_tests(
838876
)
839877
results[f"fake_timing_{period}"] = test_result
840878
except Exception as e:
841-
# Store error info
842-
results[f"fake_timing_{period}"] = {"error": str(e)}
879+
# Store structured error info for debugging
880+
results[f"fake_timing_{period}"] = {
881+
"error": str(e),
882+
"error_type": type(e).__name__,
883+
"test_type": "fake_timing",
884+
"period": period
885+
}
843886

844887
# Permutation test
845888
try:
@@ -856,7 +899,11 @@ def run_all_placebo_tests(
856899
)
857900
results["permutation"] = perm_result
858901
except Exception as e:
859-
results["permutation"] = {"error": str(e)}
902+
results["permutation"] = {
903+
"error": str(e),
904+
"error_type": type(e).__name__,
905+
"test_type": "permutation"
906+
}
860907

861908
# Leave-one-out test
862909
try:
@@ -871,6 +918,10 @@ def run_all_placebo_tests(
871918
)
872919
results["leave_one_out"] = loo_result
873920
except Exception as e:
874-
results["leave_one_out"] = {"error": str(e)}
921+
results["leave_one_out"] = {
922+
"error": str(e),
923+
"error_type": type(e).__name__,
924+
"test_type": "leave_one_out"
925+
}
875926

876927
return results

diff_diff/estimators.py

Lines changed: 53 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
from diff_diff.results import DiDResults, MultiPeriodDiDResults, PeriodEffect
2121
from diff_diff.utils import (
22+
WildBootstrapResults,
2223
compute_confidence_interval,
2324
compute_p_value,
2425
compute_robust_se,
@@ -279,22 +280,9 @@ def fit(
279280
if self.inference == "wild_bootstrap" and self.cluster is not None:
280281
# Wild cluster bootstrap for few-cluster inference
281282
cluster_ids = data[self.cluster].values
282-
bootstrap_results = wild_bootstrap_se(
283-
X, y, residuals, cluster_ids,
284-
coefficient_index=att_idx,
285-
n_bootstrap=self.n_bootstrap,
286-
weight_type=self.bootstrap_weights,
287-
alpha=self.alpha,
288-
seed=self.seed,
289-
return_distribution=False
283+
se, p_value, conf_int, t_stat, vcov, _ = self._run_wild_bootstrap_inference(
284+
X, y, residuals, cluster_ids, att_idx
290285
)
291-
self._bootstrap_results = bootstrap_results
292-
se = bootstrap_results.se
293-
p_value = bootstrap_results.p_value
294-
conf_int = (bootstrap_results.ci_lower, bootstrap_results.ci_upper)
295-
t_stat = bootstrap_results.t_stat_original
296-
# Also compute vcov for storage (using cluster-robust for consistency)
297-
vcov = compute_robust_se(X, residuals, cluster_ids)
298286
elif self.cluster is not None:
299287
cluster_ids = data[self.cluster].values
300288
vcov = compute_robust_se(X, residuals, cluster_ids)
@@ -408,6 +396,56 @@ def _fit_ols(self, X: np.ndarray, y: np.ndarray) -> Tuple[np.ndarray, np.ndarray
408396

409397
return coefficients, residuals, fitted, r_squared
410398

399+
def _run_wild_bootstrap_inference(
400+
self,
401+
X: np.ndarray,
402+
y: np.ndarray,
403+
residuals: np.ndarray,
404+
cluster_ids: np.ndarray,
405+
coefficient_index: int,
406+
) -> Tuple[float, float, Tuple[float, float], float, np.ndarray, WildBootstrapResults]:
407+
"""
408+
Run wild cluster bootstrap inference.
409+
410+
Parameters
411+
----------
412+
X : np.ndarray
413+
Design matrix.
414+
y : np.ndarray
415+
Outcome vector.
416+
residuals : np.ndarray
417+
OLS residuals.
418+
cluster_ids : np.ndarray
419+
Cluster identifiers for each observation.
420+
coefficient_index : int
421+
Index of the coefficient to compute inference for.
422+
423+
Returns
424+
-------
425+
tuple
426+
(se, p_value, conf_int, t_stat, vcov, bootstrap_results)
427+
"""
428+
bootstrap_results = wild_bootstrap_se(
429+
X, y, residuals, cluster_ids,
430+
coefficient_index=coefficient_index,
431+
n_bootstrap=self.n_bootstrap,
432+
weight_type=self.bootstrap_weights,
433+
alpha=self.alpha,
434+
seed=self.seed,
435+
return_distribution=False
436+
)
437+
self._bootstrap_results = bootstrap_results
438+
439+
se = bootstrap_results.se
440+
p_value = bootstrap_results.p_value
441+
conf_int = (bootstrap_results.ci_lower, bootstrap_results.ci_upper)
442+
t_stat = bootstrap_results.t_stat_original
443+
444+
# Also compute vcov for storage (using cluster-robust for consistency)
445+
vcov = compute_robust_se(X, residuals, cluster_ids)
446+
447+
return se, p_value, conf_int, t_stat, vcov, bootstrap_results
448+
411449
def _parse_formula(
412450
self, formula: str, data: pd.DataFrame
413451
) -> Tuple[str, str, str, Optional[List[str]]]:

diff_diff/prep.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -601,7 +601,7 @@ def summarize_did_data(
601601
if len(time_vals) == 2:
602602
pre_val, post_val = time_vals[0], time_vals[1]
603603

604-
def format_label(x):
604+
def format_label(x: tuple) -> str:
605605
treatment_label = 'Treated' if x[0] == 1 else 'Control'
606606
time_label = 'Post' if x[1] == post_val else 'Pre'
607607
return f"{treatment_label} - {time_label}"

diff_diff/staggered.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,14 +169,14 @@ def _logistic_regression(
169169
# Add intercept
170170
X_with_intercept = np.column_stack([np.ones(n), X])
171171

172-
def neg_log_likelihood(beta):
172+
def neg_log_likelihood(beta: np.ndarray) -> float:
173173
z = X_with_intercept @ beta
174174
# Clip to prevent overflow
175175
z = np.clip(z, -500, 500)
176176
log_lik = np.sum(y * z - np.log(1 + np.exp(z)))
177177
return -log_lik
178178

179-
def gradient(beta):
179+
def gradient(beta: np.ndarray) -> np.ndarray:
180180
z = X_with_intercept @ beta
181181
z = np.clip(z, -500, 500)
182182
probs = 1 / (1 + np.exp(-z))

diff_diff/synthetic_did.py

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -503,10 +503,33 @@ def _bootstrap_se(
503503

504504
bootstrap_estimates = np.array(bootstrap_estimates)
505505

506-
# Warn if too many bootstrap iterations failed
506+
# Check bootstrap success rate and handle failures appropriately
507507
n_successful = len(bootstrap_estimates)
508508
failure_rate = 1 - (n_successful / self.n_bootstrap)
509-
if failure_rate > 0.05:
509+
510+
if n_successful == 0:
511+
raise ValueError(
512+
f"All {self.n_bootstrap} bootstrap iterations failed. "
513+
f"This typically occurs when:\n"
514+
f" - Sample size is too small for reliable resampling\n"
515+
f" - Weight matrices are singular or near-singular\n"
516+
f" - Insufficient pre-treatment periods for weight estimation\n"
517+
f" - Too few control units relative to treated units\n"
518+
f"Consider using n_bootstrap=0 to disable bootstrap inference "
519+
f"and rely on placebo-based standard errors, or increase "
520+
f"the regularization parameters (lambda_reg, zeta)."
521+
)
522+
elif n_successful == 1:
523+
warnings.warn(
524+
f"Only 1/{self.n_bootstrap} bootstrap iteration succeeded. "
525+
f"Standard error cannot be computed reliably (requires at least 2). "
526+
f"Returning SE=0.0. Consider the suggestions above for improving "
527+
f"bootstrap convergence.",
528+
UserWarning,
529+
stacklevel=2,
530+
)
531+
se = 0.0
532+
elif failure_rate > 0.05:
510533
warnings.warn(
511534
f"Only {n_successful}/{self.n_bootstrap} bootstrap iterations succeeded "
512535
f"({failure_rate:.1%} failure rate). Standard errors may be unreliable. "
@@ -515,8 +538,9 @@ def _bootstrap_se(
515538
UserWarning,
516539
stacklevel=2,
517540
)
518-
519-
se = np.std(bootstrap_estimates, ddof=1) if len(bootstrap_estimates) > 1 else 0.0
541+
se = np.std(bootstrap_estimates, ddof=1)
542+
else:
543+
se = np.std(bootstrap_estimates, ddof=1)
520544

521545
return se, bootstrap_estimates
522546

diff_diff/twfe.py

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
compute_confidence_interval,
1818
compute_p_value,
1919
compute_robust_se,
20-
wild_bootstrap_se,
2120
)
2221

2322

@@ -132,21 +131,9 @@ def fit( # type: ignore[override]
132131
cluster_ids = data[cluster_var].values
133132
if self.inference == "wild_bootstrap":
134133
# Wild cluster bootstrap for few-cluster inference
135-
bootstrap_results = wild_bootstrap_se(
136-
X, y, residuals, cluster_ids,
137-
coefficient_index=att_idx,
138-
n_bootstrap=self.n_bootstrap,
139-
weight_type=self.bootstrap_weights,
140-
alpha=self.alpha,
141-
seed=self.seed,
142-
return_distribution=False
134+
se, p_value, conf_int, t_stat, vcov, _ = self._run_wild_bootstrap_inference(
135+
X, y, residuals, cluster_ids, att_idx
143136
)
144-
self._bootstrap_results = bootstrap_results
145-
se = bootstrap_results.se
146-
p_value = bootstrap_results.p_value
147-
conf_int = (bootstrap_results.ci_lower, bootstrap_results.ci_upper)
148-
t_stat = bootstrap_results.t_stat_original
149-
vcov = compute_robust_se(X, residuals, cluster_ids)
150137
else:
151138
# Standard cluster-robust SE
152139
vcov = compute_robust_se(X, residuals, cluster_ids)

diff_diff/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -590,7 +590,7 @@ def check_parallel_trends(
590590
control_data = pre_data[pre_data[treatment_group] == 0]
591591

592592
# Simple linear regression for trends
593-
def compute_trend(group_data):
593+
def compute_trend(group_data: pd.DataFrame) -> Tuple[float, float]:
594594
time_values = group_data[time].values
595595
outcome_values = group_data[outcome].values
596596

0 commit comments

Comments
 (0)