igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎METHODOLOGY_REVIEW.md‎
Lines changed: 6 additions & 7 deletions b/‎METHODOLOGY_REVIEW.md‎
Lines changed: 6 additions & 7 deletions
diff --git a/‎TODO.md‎
Lines changed: 3 additions & 2 deletions b/‎TODO.md‎
Lines changed: 3 additions & 2 deletions
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+- **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row.
+
 ## [3.4.0] - 2026-05-19
 
 ### Added
 
@@ -80,7 +80,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
 |------|--------|-------------|--------|-------------|
 | BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **Complete** | 2026-05-16 |
 | HonestDiD | `honest_did.py` | `HonestDiD` package | **Complete** | 2026-04-01 |
-| PreTrendsPower | `pretrends.py` | `pretrends` package | **Complete** (R parity pending) | 2026-05-18 |
+| PreTrendsPower | `pretrends.py` | `pretrends` package | **Complete** | 2026-05-19 |
 | PowerAnalysis | `power.py` | `pwr` / `DeclareDesign` | **In Progress** | — |
 | PlaceboTests | `diagnostics.py` | (no canonical reference) | **In Progress** | — |
 
@@ -1047,14 +1047,15 @@ and covariate-adjusted specifications.)
 | Module | `pretrends.py` |
 | Primary Reference | Roth (2022), *Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends*, AER:I 4(3), 305-322 |
 | R Reference | `pretrends` package |
-| Status | **Complete** (R parity pending) |
-| Last Review | 2026-05-18 |
+| Status | **Complete** |
+| Last Review | 2026-05-19 |
 
 **Documentation in place:**
 - REGISTRY.md section: `## PreTrendsPower` — NIS-framed audit per Roth (2022) Section II.A-B with full equation blocks for both NIS and Wald forms; paper-supported alternative + γ-unit MDV + full-Σ_22 routing all locked.
 - Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17 via PR #463).
 - Implementation: `tests/test_pretrends.py` (67 tests — point-estimator, MDV, power curve, sensitivity, plus the PR-A R18 silent-failure regression and the PR-B custom-weight persistence regression) + event-study coverage in `tests/test_pretrends_event_study.py` (27 tests).
-- Dedicated `tests/test_methodology_pretrends.py` (added 2026-05-18 in PR-B Step 7) — Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through (8 classes, 30-40 tests covering NIS box probability, Wald-vs-NIS, Propositions 1-4 simulation parity, linear-units γ-scale, custom-weight persistence, CS/SA full-VCV, helper API).
+- Dedicated `tests/test_methodology_pretrends.py` (added 2026-05-18 in PR-B Step 7; PR-C 2026-05-19 activated `TestPretrendsParityR` with 4 concrete tests) — Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through (8 classes covering NIS box probability, Wald-vs-NIS, Propositions 1-4 simulation parity, linear-units γ-scale, custom-weight persistence, CS/SA full-VCV, helper API, R parity at commit `122731d082`).
+- R parity goldens: `benchmarks/data/r_pretrends_golden.json` generated by `benchmarks/R/generate_pretrends_golden.R` against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0); 4 fixtures (regular K=3, irregular K=3 `[-5,-3,-1]`, anticipation-shifted K=4, K=1 closed form) × NIS power + γ_p MDV at `atol=1e-4`.
 
 **Verified Components:**
 - [x] NIS box probability implemented via `scipy.stats.multivariate_normal.cdf` (Roth Section II.A-B primary form)
@@ -1067,9 +1068,7 @@ and covariate-adjusted specifications.)
 - [x] `PreTrendsPowerResults` persists fitted `violation_weights` + `pretest_form` + `nis_box_probability`; `power_at(M)` works for all four violation types on fresh fits
 - [x] Helper API (`compute_pretrends_power`, `compute_mdv`) accepts `violation_weights` and `pretest_form`; closes the PR-A R18 helper/class API gap
 - [x] Summary, `to_dict`, `to_dataframe` dispatch on `pretest_form` (NIS prints box probability; Wald prints noncentrality)
-
-**Outstanding for promotion to fully Complete:**
-- R parity fixture against the `pretrends` R package at a **pinned revision** (deferred to PR-C). The generator script `benchmarks/R/generate_pretrends_golden.R` is committed in PR-B with a placeholder commit reference; PR-C will install the package, generate the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate `TestPretrendsParityR` (currently skips when goldens missing), and record the audited R-package revision. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` Gaps section remain provisional.
+- [x] R `pretrends` parity at commit `122731d082` (PR-C, 2026-05-19) — 4 fixtures × NIS power + γ_p MDV at `atol=1e-4`; `tests/test_methodology_pretrends.py::TestPretrendsParityR` active
 
 ---
 
 
@@ -94,8 +94,9 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
 | WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
-| PreTrendsPower R parity goldens (PR-C): pin the R `pretrends` package commit/release, run `benchmarks/R/generate_pretrends_golden.R` (committed in PR-B), commit the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate the `TestPretrendsParityR` class in `tests/test_methodology_pretrends.py` (currently skips when goldens missing), and flip the METHODOLOGY_REVIEW.md `PreTrendsPower` row from `**Complete** (R parity pending)` → `**Complete**`. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` remain provisional. | `benchmarks/R/generate_pretrends_golden.R`, `benchmarks/data/r_pretrends_golden.json` (new), `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-C (PreTrendsPower R parity) | Low |
-<!-- The remaining four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV) were all resolved in PR-B 2026-05-18 — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->
+<!-- The PreTrendsPower R parity row (PR-C, 2026-05-19) and the four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV; resolved in PR-B 2026-05-18) are all closed — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->
+| PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
+
 
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
 | Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |