You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
12
12
13
13
### Added / Changed
14
14
- **EfficientDiD `vcov_type` threading + Results metadata harmonization (Phase 1b interstitial #4, permanently narrow).** `EfficientDiD(vcov_type=...)` now accepts `{"hc1"}` only (default). Analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` are REJECTED at `__init__` / `set_params` with methodology-rooted messages — EfficientDiD uses influence-function-based variance per Chen-Sant'Anna-Xie (2025) achieving the semiparametric efficiency bound; the per-unit EIF aggregation has no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. `cluster=` (Liang-Zeger CR1 on cluster-aggregated EIF) and `survey_design=` (TSL on combined IF) paths are unchanged. **BC break on `EfficientDiDResults`:** the `cluster` field renamed to `cluster_name`; new `n_clusters` + `vcov_type` fields added; `to_dict()` method added (mirrors TripleDifferenceResults). `DiagnosticReport._pt_hausman` updated to read the renamed `cluster_name` field for the Hausman pretest replay (`diff_diff/diagnostic_report.py:2444`). `EfficientDiD.set_params(vcov_type=bad)` raises immediately rather than deferring to `fit()` — intentional eager-validation pattern matching EfficientDiD's existing handling of `pt_assumption`/`control_group` etc, diverging from `ImputationDiD`/`TripleDifference`/`CallawaySantAnna` (which use sklearn mutate-then-validate-at-use). Survey-PSU bootstrap path returns NaN SE when fewer than 2 independent PSUs are available (was ≈0 SE from BLAS roundoff). New summary block: `Variance estimator: <label>` line rendered after the survey block when not under bootstrap; suppressed under bootstrap (replaced with `Inference method: bootstrap` + `Bootstrap replications: <n>`). Default `cluster=None` (no survey) renders "HC1 heteroskedasticity-robust" — methodologically correct because the per-unit EIF SE `sqrt(mean(EIF²)/n)` is HC1-style (no Liang-Zeger G/(G-1) finite-sample correction); diverges from `ImputationDiD` which auto-clusters at unit per BJS Theorem 3.
15
+
- **TwoStageDiD `vcov_type` threading + Results metadata (Phase 1b interstitial #5, final, permanently narrow).** `TwoStageDiD(vcov_type=...)` now accepts `{"hc1"}` only (default), completing the Phase 1b initiative across all 8 standalone estimators. Analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` are REJECTED at `__init__` / `fit()` with methodology-rooted messages: TwoStageDiD's variance is the Gardner (2022) two-stage GMM cluster-sandwich whose meat is the per-cluster GMM-corrected score `S_g = gamma_hat' c_g - X'_{2g} eps_{2g}`, which folds first-stage FE estimation uncertainty into the score — there is no single hat matrix spanning both stages on which HC2 leverage or Bell-McCaffrey Satterthwaite DOF can be defined, and the Gardner correction has not been derived for the leverage-corrected/homoskedastic meat (no reference implementation; mirrors the SpilloverDiD `classical` rejection). `cluster=` and `survey_design=` paths are numerically unchanged (bit-identical for healthy fits). **`TwoStageDiDResults` additions (no rename, no BC break):** new `vcov_type` / `cluster_name` / `n_clusters` fields + `to_dict()` method. `summary()` renders a `Variance estimator: <label>` line after the survey block (suppressed under bootstrap — `Inference method: bootstrap` + `Bootstrap replications: <n>` shown instead — and under any survey design). Default `cluster=None` renders `"CR1 cluster-robust at <unit>, G=<n_units>"` because the Gardner sandwich auto-clusters at the unit column (did2s no-FSA convention — the `CR1` label carries no `(n-1)/(n-p)` factor, matching R `did2s`; same convention as ImputationDiD's Theorem 3 variance). Defensive `n_clusters<2` NaN guard added to the multiplier-bootstrap path (was ≈0 SE from BLAS roundoff) plus a survey-PSU `n_psu<2` parity guard. `cluster=` with a replicate-weight survey design now raises `NotImplementedError` (replicate-refit variance ignores `cluster=`). `vcov_type='conley'` deferred to a TODO follow-up row.
15
16
16
17
### Fixed
17
18
- **Bertanha-Imbens 2014 citation correction (16 sites across 5 files).** A verification spike confirmed the citation across `diff_diff/linalg.py` (×8), `diff_diff/conley.py` (×1), `diff_diff/guides/llms-full.txt` (×2), `docs/methodology/REGISTRY.md` (×4), and `docs/api/spillover.rst` (×1) was incorrect — NBER w20773 *External Validity in Fuzzy Regression Discontinuity Designs* (JBES 2020, 38(3):593-612) by Bertanha & Imbens covers fuzzy RD external validity, NOT weighted spatial-HAC under sampling weights. Replaced across all 16 sites with the open-problem framing: "weighted spatial-HAC under probability sampling is an open methodological question; no canonical extension of Conley (1999) exists for the combination." At the four `REGISTRY.md` sites the replacement is wrapped in the canonical `**Note (open methodological question):**` label per CLAUDE.md "Documenting Deviations (AI Review Compatibility)". REGISTRY ConleySpatialHAC section gains a new `**Note (deferral status, 2026-05-26):**` splitting the boundary into three parts: **Shipped** — SpilloverDiD + Conley + survey via Wave E.1/E.2/E.3 (PR #468/#474/#482), TwoStageDiD + Conley + survey via Wave E.3 parity (PR #485). **Deferred (generic linalg surface, any `weight_type`)** — DiD/MPD/TWFE/LinearRegression generic path + Conley + `survey_design=`; `LinearRegression` / `compute_robust_vcov` Conley + `weights=` rejected for `pweight`, `aweight`, AND `fweight` (weighted Conley is not implemented on the generic linalg surface). **Open methodological question (subset)** — the `pweight` / `survey_design` portion of the deferral additionally lacks a canonical methodological extension of Conley (1999) for weighted spatial-HAC under probability sampling. **No source-code logic changes:** verified via diff-in-diff pytest output before and after the citation strip (175 passed + 14 warnings, bit-identical pass set on `tests/test_conley_vcov.py`). **Historical CHANGELOG entries (pre-[Unreleased]) intentionally retain the Bertanha-Imbens 2014 attribution** as accurate records of what was claimed at the time of each release; the [Unreleased] entry above supersedes those rationales going forward.
Copy file name to clipboardExpand all lines: TODO.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -104,7 +104,7 @@ Deferred items from PR reviews that were not addressed before merge.
104
104
| PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. |`tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R`| PR-C follow-up | Low |
105
105
106
106
107
-
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `TwoStageDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Four interstitial PRs (post-PR-3/8) addressed the IF-based estimators separately, each permanently narrow to `{"hc1"}`**: (a) `CallawaySantAnna` per Callaway & Sant'Anna (2021) Theorem 2 (also fixed CS's bare-`cluster=` silent no-op); (b) `TripleDifference` per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decomposition; (c) `ImputationDiD` per Borusyak-Jaravel-Spiess (2024) Theorem 3 on per-unit IF aggregation (also added defensive `n_clusters<2`/`n_psu<2` NaN guard on the bootstrap path + `cluster=` + replicate-weights `NotImplementedError`); (d) `EfficientDiD` per Chen-Sant'Anna-Xie (2025) EIF aggregation achieving the semiparametric efficiency bound (also renamed `EfficientDiDResults.cluster` → `cluster_name`, added `n_clusters`/`vcov_type` fields + `to_dict()`, added defensive survey-PSU n<2 NaN guard, eager set_params validation diverging from sibling IF-based estimators). Analytical-sandwich families don't compose with IF-based variance for any of the four. This row tracks the remaining 1 (`TwoStageDiD` is sandwich-class with GMM-corrected meat). | `diff_diff/two_stage.py` | Phase 1b | Medium |
107
+
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `TwoStageDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Four interstitial PRs (post-PR-3/8) addressed the IF-based estimators separately, each permanently narrow to `{"hc1"}`**: (a) `CallawaySantAnna` per Callaway & Sant'Anna (2021) Theorem 2 (also fixed CS's bare-`cluster=` silent no-op); (b) `TripleDifference` per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decomposition; (c) `ImputationDiD` per Borusyak-Jaravel-Spiess (2024) Theorem 3 on per-unit IF aggregation (also added defensive `n_clusters<2`/`n_psu<2` NaN guard on the bootstrap path + `cluster=` + replicate-weights `NotImplementedError`); (d) `EfficientDiD` per Chen-Sant'Anna-Xie (2025) EIF aggregation achieving the semiparametric efficiency bound (also renamed `EfficientDiDResults.cluster` → `cluster_name`, added `n_clusters`/`vcov_type` fields + `to_dict()`, added defensive survey-PSU n<2 NaN guard, eager set_params validation diverging from sibling IF-based estimators). Analytical-sandwich families don't compose with IF-based variance for any of the four. PR 8/8 (Phase 1b interstitial #5, final) added `TwoStageDiD` per the Gardner (2022) two-stage GMM cluster-sandwich — sandwich-class, but the GMM-corrected meat `S_g = gamma_hat' c_g - X'_{2g} eps_{2g}` admits the same permanently-narrow `{"hc1"}` contract (added `vcov_type`/`cluster_name`/`n_clusters` + `to_dict()`; bootstrap `n_clusters<2`/`n_psu<2` NaN guard; `cluster=`+replicate `NotImplementedError`). **Initiative COMPLETE — all 8 standalone estimators threaded;** remaining work is the per-estimator `conley` follow-ups below. | `diff_diff/two_stage.py` | Phase 1b | Done |
108
108
| Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. |`diff_diff/sun_abraham.py`| follow-up | Medium |
109
109
| Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. |`diff_diff/stacked_did.py`| follow-up | Medium |
110
110
| Extend `WooldridgeDiD` with `vcov_type="conley"` — thread the six `conley_*` params through `solve_ols` in `_fit_ols`. Phase 1b PR 3/8 deferred this; WooldridgeDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham / StackedDiD conley follow-ups. |`diff_diff/wooldridge.py`| follow-up | Medium |
@@ -113,6 +113,7 @@ Deferred items from PR reviews that were not addressed before merge.
113
113
| Extend `TripleDifference` with `vcov_type="conley"` — would require deriving a spatial-HAC composition for the 3-pairwise-DiD influence-function decomposition (Conley 1999 spatial kernel × `inf = w3·IF_3 + w2·IF_2 - w1·IF_1` aggregation); no reference implementation exists today. Phase 1b interstitial #2 PR rejected this at `__init__` with a deferral pointer here. |`diff_diff/triple_diff.py`| follow-up | Low |
114
114
| Extend `ImputationDiD` with `vcov_type="conley"` — would require deriving a spatial-HAC composition with the Theorem 3 per-unit IF aggregation (Conley 1999 spatial kernel × `sigma_sq = (cluster_psi_sums**2).sum()` reduction); no reference implementation exists today. Phase 1b interstitial #3 PR rejected this at `__init__` with a deferral pointer here. |`diff_diff/imputation.py`| follow-up | Low |
115
115
| Extend `EfficientDiD` with `vcov_type="conley"` — would require deriving a spatial-HAC composition with the per-unit EIF aggregation (Conley 1999 spatial kernel × `_compute_se_from_eif` reduction); no reference implementation exists today. Phase 1b interstitial #4 PR rejected this at `__init__` with a deferral pointer here. |`diff_diff/efficient_did.py`| follow-up | Low |
116
+
| Extend `TwoStageDiD` with `vcov_type="conley"` — thread a spatial-HAC composition into the GMM sandwich meat (`_compute_gmm_variance`); the Conley machinery already exists in the sibling SpilloverDiD `_compute_gmm_corrected_meat` (same module) and could be adapted to TwoStageDiD's per-cluster GMM score `S_g = gamma_hat' c_g - X'_{2g} eps_{2g}`, but two-stage GMM × Conley has no reference implementation. Phase 1b interstitial #5 PR rejected this at `__init__`/`fit()` with a deferral pointer here. |`diff_diff/two_stage.py`| follow-up | Low |
116
117
| Decide whether to formally deprecate `CallawaySantAnna.cluster=X` in favor of `survey_design=SurveyDesign(psu=X)`. Both APIs are first-class today (the bare-cluster path synthesizes a minimal SurveyDesign internally), but having two equivalent paths to express the same intent creates redundant surface. Mirrors a similar question for ImputationDiD / EfficientDiD / TwoStageDiD if those estimators ever face the same review. |`diff_diff/staggered.py`| follow-up | Low |
117
118
| Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. |`diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov`| follow-up | Low |
118
119
<!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
0 commit comments