Skip to content

Commit 7ecffd2

Browse files
igerberclaude
andcommitted
Move design-term uniqueness check before the regression (fail-fast)
validate_design_term_names ran just before coef_dict construction — i.e. AFTER the OLS fit — in DifferenceInDifferences and MultiPeriodDiD, so a duplicate assembled term name (e.g. an MPD fixed-effect dummy colliding with a structural period_{p} key) drove a wasted rank-deficient fit and emitted a misleading multicollinearity warning before the intended ValueError (local review P3). Move the check to immediately after var_names is fully assembled and before the regression call in both estimators (TwoWayFixedEffects already checked pre-fit). var_names is not mutated between assembly and coef_dict, so this is behavior- preserving except that the ValueError now fires fast and warning-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ef4ea27 commit 7ecffd2

1 file changed

Lines changed: 14 additions & 6 deletions

File tree

diff_diff/estimators.py

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -510,6 +510,12 @@ def fit(
510510
X = np.column_stack([X, dummies[col].values.astype(float)])
511511
var_names.append(col)
512512

513+
# Reject any duplicate in the FINAL term list (e.g. a fixed-effect dummy
514+
# colliding with a structural term) BEFORE the regression — so the fit is
515+
# not wasted and no misleading multicollinearity warning is emitted ahead
516+
# of the intended ValueError.
517+
validate_design_term_names(var_names, estimator="DifferenceInDifferences")
518+
513519
# Extract ATT index (coefficient on interaction term)
514520
att_idx = 3 # Index of interaction term
515521
att_var_name = f"{treatment}:{time}"
@@ -686,7 +692,6 @@ def _refit_did_absorb(w_r):
686692
n_control = n_control_raw
687693

688694
# Create coefficient dictionary
689-
validate_design_term_names(var_names, estimator="DifferenceInDifferences")
690695
coef_dict = {name: coef for name, coef in zip(var_names, coefficients)}
691696

692697
# Determine inference method and bootstrap info
@@ -1680,6 +1685,12 @@ def fit( # type: ignore[override]
16801685
X = np.column_stack([X, dummies[col].values.astype(float)])
16811686
var_names.append(col)
16821687

1688+
# Reject any duplicate in the FINAL term list (e.g. a fixed-effect dummy
1689+
# colliding with a structural period_{p} key) BEFORE the regression — so
1690+
# the fit is not wasted and no misleading multicollinearity warning is
1691+
# emitted ahead of the intended ValueError.
1692+
validate_design_term_names(var_names, estimator="MultiPeriodDiD")
1693+
16831694
# Fit OLS using unified backend
16841695
# Pass cluster_ids to solve_ols for proper vcov computation
16851696
# This handles rank-deficient matrices by returning NaN for dropped columns
@@ -2038,11 +2049,8 @@ def _refit_mp_absorb(w_r):
20382049
n_treated = n_treated_raw
20392050
n_control = n_control_raw
20402051

2041-
# Backstop: reject any duplicate in the FINAL term list (e.g. a
2042-
# fixed-effect dummy colliding with a structural `period_{p}` key)
2043-
# before it silently overwrites a coefficient in the dict below.
2044-
validate_design_term_names(var_names, estimator="MultiPeriodDiD")
2045-
# Create coefficient dictionary
2052+
# Create coefficient dictionary (var_names uniqueness already enforced
2053+
# before the fit above).
20462054
coef_dict = {name: coef for name, coef in zip(var_names, coefficients)}
20472055

20482056
# Store results

0 commit comments

Comments
 (0)