igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 4 additions & 1 deletion b/‎TODO.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎benchmarks/R/generate_did_had_golden.R‎
Lines changed: 274 additions & 0 deletions b/‎benchmarks/R/generate_did_had_golden.R‎
Lines changed: 274 additions & 0 deletions
diff --git a/‎benchmarks/R/requirements.R‎
Lines changed: 3 additions & 0 deletions b/‎benchmarks/R/requirements.R‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎benchmarks/data/did_had_golden.json‎
Lines changed: 1 addition & 0 deletions b/‎benchmarks/data/did_had_golden.json‎
Lines changed: 1 addition & 0 deletions
@@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [Unreleased]
+
+### Added
+- **HAD `trends_lin=True` linear-trend detrending mode** on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, and `joint_homogeneity_test`. Mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)` (paper Eq. 17 / Eq. 18 / page 32 joint-Stute homogeneity-with-trends). Per-group linear-trend slope estimated as `Y[g, F-1] - Y[g, F-2]` and applied as `(t - base) × slope` adjustment to per-event-time outcome evolutions. Requires F ≥ 3 (panel must contain F-2). The "consumed" placebo at our event-time `e=-2` is auto-dropped (R reduces max placebo lag by 1 with the same effect). Mutually exclusive with survey weighting (`survey_design` / `survey` / `weights`): raises `NotImplementedError` per `feedback_per_method_survey_element_contract` (weighted slope estimator not derived from paper; tracked in TODO.md as a follow-up). Bit-exact backcompat for `trends_lin=False` (default). Patch-level (additive keyword-only kwarg).
+- **HAD R-package end-to-end parity test** vs `DIDHAD` v2.0.0 (`Credible-Answers/did_had`). New parity fixture `benchmarks/data/did_had_golden.json` generated by `benchmarks/R/generate_did_had_golden.R` covers 3 paper-derived synthetic DGPs (Uniform, Beta(2,2), Beta(0.5,1)) × 5 method combinations (overall, event-study, placebo, yatchew, trends_lin). Python parity test `tests/test_did_had_parity.py` asserts point estimate / SE / CI bounds at `atol=1e-8` and Yatchew T-stat at `atol=1e-10` after a documented `× G/(G-1)` finite-sample convention shift. Two intentional convention deviations from R, documented in `docs/methodology/REGISTRY.md`: (a) we report the bias-corrected point estimate (modern CCF 2018 convention; R's `Estimate` column reports the conventional estimate with the bias-corrected CI separately — our `att` matches R's CI midpoint); (b) Yatchew uses paper Appendix E's literal (1/G) variance-denominator convention while R uses base-R `var()`'s (1/(N-1)) sample-variance convention (parity is bit-exact after the `× G/(G-1)` shift). Yatchew on placebos with R's mean-independence null (`order=0`) is not yet exposed in our `yatchew_hr_test` (we currently only support the linearity null) and is skipped in the parity test; tracked as TODO follow-up.
+
 ## [3.3.1] - 2026-04-25
 
 ### Changed
 
@@ -101,7 +101,10 @@ Deferred items from PR reviews that were not addressed before merge.
 | `HeterogeneousAdoptionDiD` mass-point: `vcov_type in {"hc2", "hc2_bm"}` raises `NotImplementedError` pending a 2SLS-specific leverage derivation. The OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction uses `x_i' (Z'X)^{-1} (...) (X'Z)^{-1} x_i`. Needs derivation plus an R / Stata (`ivreg2 small robust`) parity anchor. | `diff_diff/had.py::_fit_mass_point_2sls` | Phase 2a | Medium |
 | `HeterogeneousAdoptionDiD` survey-design API consolidation, **next minor bump**: drop the deprecated `survey=` and `weights=` kwargs on all 8 HAD surfaces (`HeterogeneousAdoptionDiD.fit`, `did_had_pretest_workflow`, `qug_test`, `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); only `survey_design=` remains. Also fold the legacy back-end `weights=` paths (e.g. `_aggregate_unit_weights` ad-hoc routing) into the unified `_resolve_survey_for_fit`-driven path. The `_make_trivial_resolved` underscore alias on `survey.py` stays (one-line, harmless). DeprecationWarning ships in this PR; the removal PR is ~50 LoC of cleanup. | `diff_diff/had.py`, `diff_diff/had_pretests.py` | next minor bump | Medium |
 | `HeterogeneousAdoptionDiD` continuous paths: thread `cluster=` through `bias_corrected_local_linear` (Phase 1c's wrapper already supports cluster; Phase 2a ignores it with a `UserWarning` on the continuous path to keep scope tight). | `diff_diff/had.py`, `diff_diff/local_linear.py` | Phase 2a | Low |
-| `HeterogeneousAdoptionDiD` Eq 18 linear-trend detrending (Pierce-Schott style): the joint-Stute infrastructure shipped in the Phase 3 follow-up supports pre-trends (mean-indep) and post-homogeneity (linearity) nulls. The Pierce-Schott application (paper Section 5.2) uses a LINEAR-TREND detrending of pre-period outcomes before the joint CvM — `Y_{g,t} - Y_{g,t_anchor} - (t - t_anchor)*(Y_{g,t_anchor} - Y_{g,t_anchor-1})` — reaching p=0.51 on US-China tariff data. Extends `joint_pretrends_test` with a detrending mode or a separate Eq 18-specific helper. Deferred to Phase 4 replication harness (where the published p=0.51 serves as the parity anchor). | `diff_diff/had_pretests.py::joint_pretrends_test` | Phase 4 | Medium |
+| `HeterogeneousAdoptionDiD` Eq 17 / Eq 18 linear-trend detrending: SHIPPED in PR #389 (Phase 4 R-parity, 2026-04). Exposed as `trends_lin: bool = False` keyword-only kwarg on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, `joint_homogeneity_test`. Mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)`. Pierce-Schott published-number parity (paper p=0.51 / p=0.40) deferred indefinitely (LBD-restricted analysis panel); replaced by end-to-end R-package parity at `tests/test_did_had_parity.py`. | `diff_diff/had_pretests.py::joint_pretrends_test`, `diff_diff/had.py` | Phase 4 (shipped) | Done |
+| `HeterogeneousAdoptionDiD` `trends_lin × survey_design` follow-up: per-group linear-trend slope under survey weighting (weighted slope estimator? per-PSU slope?) is not derived from the paper. PR #389 raises `NotImplementedError` on the combination across all 3 trends_lin surfaces. If user demand emerges, derive the weighted variant and lift the gate. | `diff_diff/had.py::HeterogeneousAdoptionDiD.fit`, `diff_diff/had_pretests.py::joint_pretrends_test`, `diff_diff/had_pretests.py::joint_homogeneity_test` | follow-up | Low |
+| `HeterogeneousAdoptionDiD` `yatchew_hr_test(null="mean_independence")` mode: R `YatchewTest::yatchew_test(order=0)` fits `Y ~ 1` (intercept-only baseline) and tests mean-independence of Y from D; R's `DIDHAD::did_had(yatchew=TRUE)` uses this on placebo rows ("non-parametric pre-trends test"). Our `yatchew_hr_test` always fits `Y ~ D` (linearity null) — no `null=` parameter exposed. Adding the mean-independence mode would (a) give practitioners a more conventional pre-trends test surface, and (b) close the PR #389 R-parity feature gap on the placebo-Yatchew rows (currently skipped in `tests/test_did_had_parity.py::TestYatchewParity` because the two tests are not the same statistic). | `diff_diff/had_pretests.py::yatchew_hr_test` | follow-up | Medium |
+| `HeterogeneousAdoptionDiD` Stute family Stata-bridge parity: PR #389 R-parity covers the full HAD fit + Yatchew surfaces but skips Stute family (`stute_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`) because no R `Stutetest` package exists publicly (chaisemartinPackages publishes only the Stata `stute_test` module; the paper cites a 2024c R Stutetest module that is not on GitHub or CRAN). Stata-bridge parity would add `benchmarks/stata/generate_stute_golden.do` + a Stata installation requirement. Low priority unless user demand emerges. | `benchmarks/stata/`, `tests/test_stute_test_parity.py` | follow-up | Low |
 | `HeterogeneousAdoptionDiD` Phase 3 Stute performance: Appendix D vectorized matrix form replaces the per-iteration OLS refit with a single precomputed `M = I - X(X'X)^{-1}X'` applied to `eps * eta`. Functionally identical, ~2x faster. Shipped literal-refit form in Phase 3 to match paper text and keep reviewer surface small. | `diff_diff/had_pretests.py::stute_test` | Phase 3 | Low |
 | `HeterogeneousAdoptionDiD` Phase 3 R-parity: Phase 3 ships coverage-rate validation on synthetic DGPs (not tight point parity against `chaisemartin::stute_test` / `yatchew_test`). Tight numerical parity requires aligning bootstrap seed semantics and `B` across numpy/R and is deferred. | `tests/test_had_pretests.py` | Phase 3 | Low |
 | `HeterogeneousAdoptionDiD` Phase 3 nprobust bandwidth for Stute: some Stute variants on continuous regressors use nprobust-style optimal bandwidth selection. Phase 3 uses OLS residuals from a 2-parameter linear fit (no bandwidth selection). nprobust integration is a future enhancement; not in paper scope. | `diff_diff/had_pretests.py::stute_test` | Phase 3 | Low |
 
@@ -0,0 +1,274 @@
+# Generate cross-language end-to-end parity fixture for HAD Phase 4
+# (PR #389 R-parity vs `Credible-Answers/did_had`).
+#
+# Purpose: validate Python `HeterogeneousAdoptionDiD.fit()` (overall,
+# event-study, placebo, yatchew, trends_lin) against R `DIDHAD::did_had()`
+# bit-exactly on shared input. The R package is the methodology source
+# of truth (the de Chaisemartin team wrote it); matching it within
+# `atol=1e-8` on point/SE/CI and `atol=1e-10` on closed-form Yatchew
+# T-stats is a strictly stronger correctness signal than reproducing the
+# paper's published Pierce-Schott numbers (which depend on a
+# LBD-restricted analysis panel).
+#
+# Usage:
+#   Rscript benchmarks/R/generate_did_had_golden.R
+#
+# Output:
+#   benchmarks/data/did_had_golden.json
+#
+# Phase 4 of HeterogeneousAdoptionDiD (de Chaisemartin et al. 2025).
+# Python test loader: tests/test_did_had_parity.py.
+#
+# Pin: DIDHAD == 2.0.0 (CRAN current as of 2026-04). YatchewTest >= 1.1.0.
+
+library(jsonlite)
+library(DIDHAD)
+library(YatchewTest)
+
+stopifnot(packageVersion("DIDHAD") >= "2.0.0")
+stopifnot(packageVersion("YatchewTest") >= "1.1.0")
+
+# -------------------------------------------------------------------------
+# Panel builder: 5-period panel with F=4 (treatment onset at t=4).
+# Pre-periods: 1, 2, 3 (D=0). Post-periods: 4, 5 (D=fixed positive dose).
+# Y[g, t] = unit_fe[g] + trend[g] * (t - 1) + (dose[g] + dose[g]^2) * (t >= F) + noise
+# -------------------------------------------------------------------------
+
+build_panel <- function(G, F_treat, T_periods, dose_draws, seed,
+                        unit_trend_sd = 0.05, noise_sd = 0.5) {
+  set.seed(seed)
+  n <- G * T_periods
+  unit_fe <- rnorm(G, mean = 0, sd = 1.0)
+  unit_trend <- rnorm(G, mean = 0.1, sd = unit_trend_sd)
+  noise <- rnorm(n, mean = 0, sd = noise_sd)
+
+  rows <- vector("list", n)
+  k <- 1
+  for (g in seq_len(G)) {
+    for (t in seq_len(T_periods)) {
+      treated <- as.numeric(t >= F_treat)
+      y <- unit_fe[g] + unit_trend[g] * (t - 1) +
+           (dose_draws[g] + dose_draws[g]^2) * treated +
+           noise[k]
+      d_obs <- if (treated == 1) dose_draws[g] else 0.0
+      # Use short column names (g, t, d, y) matching DIDHAD's tutorial
+      # convention. The package has a data-masking issue when column
+      # names alias the formal parameter names (e.g., column "time" with
+      # `time = "time"` resolves to the column values inside dplyr's
+      # `.data[[get("time")]]` lookup), so avoid that overlap upstream.
+      rows[[k]] <- data.frame(
+        g = g,
+        t = t,
+        y = y,
+        d = d_obs,
+        stringsAsFactors = FALSE
+      )
+      k <- k + 1
+    }
+  }
+  do.call(rbind, rows)
+}
+
+# DGP 1: D ~ Uniform(0, 1).
+dgp_uniform <- function(G = 200, F_treat = 4, T_periods = 5, seed = 20260426) {
+  set.seed(seed * 2L + 1L)
+  d <- runif(G, min = 0.0, max = 1.0)
+  list(
+    name = "uniform_G200_F4_T5",
+    panel = build_panel(G, F_treat, T_periods, d, seed = seed),
+    G = G, F_treat = F_treat, T_periods = T_periods,
+    dose_distribution = "Uniform(0, 1)",
+    seed = seed
+  )
+}
+
+# DGP 2: D ~ Beta(2, 2). Symmetric, bell-shaped on [0, 1].
+dgp_beta22 <- function(G = 200, F_treat = 4, T_periods = 5, seed = 20260426) {
+  set.seed(seed * 2L + 2L)
+  d <- rbeta(G, shape1 = 2, shape2 = 2)
+  list(
+    name = "beta22_G200_F4_T5",
+    panel = build_panel(G, F_treat, T_periods, d, seed = seed),
+    G = G, F_treat = F_treat, T_periods = T_periods,
+    dose_distribution = "Beta(2, 2)",
+    seed = seed
+  )
+}
+
+# DGP 3: D ~ Beta(0.5, 1). Heavy left tail (mass near 0); approximates
+# the empirical Pierce-Schott NTR-gap distribution where many industries
+# have small tariff gaps (boundary density vanishes property).
+dgp_boundary <- function(G = 200, F_treat = 4, T_periods = 5, seed = 20260426) {
+  set.seed(seed * 2L + 3L)
+  d <- rbeta(G, shape1 = 0.5, shape2 = 1.0)
+  list(
+    name = "boundary_G200_F4_T5",
+    panel = build_panel(G, F_treat, T_periods, d, seed = seed),
+    G = G, F_treat = F_treat, T_periods = T_periods,
+    dose_distribution = "Beta(0.5, 1)",
+    seed = seed
+  )
+}
+
+# -------------------------------------------------------------------------
+# Run did_had with given options and extract the standardized result
+# matrix. The R package returns a `did_had` S3 object whose `results`
+# slot has `resmat` (effects + placebos) and optionally `yatchew_test`.
+# -------------------------------------------------------------------------
+
+run_did_had <- function(panel, effects = 1, placebo = 0,
+                       trends_lin = FALSE, yatchew = FALSE) {
+  # graph_off=TRUE suppresses the auto-print of the event-study plot.
+  fit <- did_had(
+    df = panel,
+    outcome = "y",
+    group = "g",
+    time = "t",
+    treatment = "d",
+    effects = effects,
+    placebo = placebo,
+    trends_lin = trends_lin,
+    yatchew = yatchew,
+    graph_off = TRUE
+  )
+  res <- fit$results
+  resmat <- res$resmat
+  out <- list(
+    n_effects_actual = res$res.effects,
+    n_placebo_actual = res$res.placebo,
+    rownames = rownames(resmat),
+    estimate = unname(resmat[, "Estimate"]),
+    se = unname(resmat[, "SE"]),
+    ci_lo = unname(resmat[, "LB.CI"]),
+    ci_hi = unname(resmat[, "UB.CI"]),
+    n_per_horizon = unname(as.integer(resmat[, "N"])),
+    bw_per_horizon = unname(resmat[, "BW"]),
+    n_within_bw = unname(as.integer(resmat[, "N.BW"])),
+    qug_t = unname(resmat[, "T"]),
+    qug_p = unname(resmat[, "p.val"]),
+    event_id = unname(as.integer(resmat[, "ID"]))
+  )
+  if (yatchew) {
+    yt <- res$yatchew_test
+    out$yatchew_t <- unname(yt[, "T_hr"])
+    out$yatchew_p <- unname(yt[, "p-value"])
+    out$yatchew_n <- unname(as.integer(yt[, "N"]))
+    # Capture sigma2 components for diagnostic comparison; the column
+    # names contain unicode (sigma², σ²). Use positional indexing.
+    out$yatchew_sigma2_lin <- unname(yt[, 1])
+    out$yatchew_sigma2_diff <- unname(yt[, 2])
+  }
+  out
+}
+
+# -------------------------------------------------------------------------
+# Build the DGP × method-combo fixture grid.
+# -------------------------------------------------------------------------
+
+dgp_builders <- list(
+  uniform = dgp_uniform,
+  beta22 = dgp_beta22,
+  boundary = dgp_boundary
+)
+
+# Per-DGP method matrix. Each combo runs did_had with the named flags
+# and stores the resulting standardized resmat dict alongside the input
+# panel arrays. Python parity test loops over combos and asserts.
+#
+# Why effects=2/placebo=2: F=4 with T=5 leaves 2 post-period horizons
+# (t=4, 5) and 2 pre-period placebos (t=2, 1) without trends_lin. R
+# auto-truncates if requested > feasible. Under trends_lin, the
+# F-2 -> F-1 evolution is consumed by the slope estimator and R reduces
+# max placebo by 1 (so only placebo at t=1 survives).
+combos <- list(
+  list(name = "overall_e1", effects = 1, placebo = 0,
+       trends_lin = FALSE, yatchew = FALSE),
+  list(name = "event_e2_p2", effects = 2, placebo = 2,
+       trends_lin = FALSE, yatchew = FALSE),
+  list(name = "event_e2_p2_yatchew", effects = 2, placebo = 2,
+       trends_lin = FALSE, yatchew = TRUE),
+  list(name = "event_e2_p2_trendslin", effects = 2, placebo = 2,
+       trends_lin = TRUE, yatchew = FALSE),
+  list(name = "event_e2_p2_yatchew_trendslin", effects = 2, placebo = 2,
+       trends_lin = TRUE, yatchew = TRUE)
+)
+
+fixtures <- list()
+for (dgp_name in names(dgp_builders)) {
+  dgp <- dgp_builders[[dgp_name]]()
+  panel <- dgp$panel
+  combo_results <- list()
+  for (combo in combos) {
+    res <- run_did_had(
+      panel = panel,
+      effects = combo$effects,
+      placebo = combo$placebo,
+      trends_lin = combo$trends_lin,
+      yatchew = combo$yatchew
+    )
+    combo_results[[combo$name]] <- list(
+      effects = combo$effects,
+      placebo = combo$placebo,
+      trends_lin = combo$trends_lin,
+      yatchew = combo$yatchew,
+      result = res
+    )
+  }
+  fixtures[[dgp$name]] <- list(
+    name = dgp$name,
+    G = dgp$G,
+    F = dgp$F_treat,
+    T = dgp$T_periods,
+    dose_distribution = dgp$dose_distribution,
+    seed = dgp$seed,
+    panel = list(
+      g = panel$g,
+      t = panel$t,
+      y = panel$y,
+      d = panel$d
+    ),
+    combos = combo_results
+  )
+}
+
+# -------------------------------------------------------------------------
+# Serialize
+# -------------------------------------------------------------------------
+
+out <- list(
+  metadata = list(
+    description = paste(
+      "DIDHAD::did_had end-to-end parity fixture for HAD Phase 4",
+      "(PR #389 R-parity).",
+      sep = " "
+    ),
+    didhad_version = as.character(packageVersion("DIDHAD")),
+    yatchewtest_version = as.character(packageVersion("YatchewTest")),
+    nprobust_version = as.character(packageVersion("nprobust")),
+    r_version = as.character(getRversion()),
+    n_dgps = length(fixtures),
+    n_combos_per_dgp = length(combos),
+    point_atol = 1e-8,
+    se_atol = 1e-8,
+    ci_atol = 1e-8,
+    yatchew_atol = 1e-10,
+    qug_atol = 1e-12,
+    notes = paste(
+      "Three synthetic DGPs (Uniform, Beta(2,2), Beta(0.5,1) approximation",
+      "of the empirical Pierce-Schott NTR-gap distribution). Each DGP runs",
+      "5 method combos covering overall, event-study, placebo, yatchew,",
+      "and trends_lin variants. Tolerances per the Phase 4 plan.",
+      sep = " "
+    )
+  ),
+  fixtures = fixtures
+)
+
+out_dir <- "benchmarks/data"
+if (!dir.exists(out_dir)) dir.create(out_dir, recursive = TRUE)
+out_path <- file.path(out_dir, "did_had_golden.json")
+write_json(out, path = out_path, digits = 17, auto_unbox = TRUE, null = "null")
+message(sprintf(
+  "Wrote %d DGP fixtures (each with %d combos) to %s",
+  length(fixtures), length(combos), out_path
+))
@@ -13,6 +13,9 @@ required_packages <- c(
   "triplediff",    # Ortiz-Villavicencio & Sant'Anna (2025) triple difference
   "survey",        # Lumley (2004) complex survey analysis
   "estimatr",      # Blair et al. (2019) weighted robust / IV SE (HAD mass-point parity)
+  "DIDHAD",        # de Chaisemartin et al. (2025) HAD estimator (HAD Phase 4 R-parity)
+  "YatchewTest",   # Yatchew (1997) linearity test (HAD yatchew R-parity)
+  "nprobust",      # Calonico-Cattaneo-Farrell local-linear (DIDHAD dependency)
 
   # Utilities
   "jsonlite",      # JSON output for Python interop