Skip to content

Commit aa64e26

Browse files
igerberclaude
andcommitted
PowerAnalysis methodology review (PR-A): Bloom 1995 + Burlig 2020 source audits
Step-1 fidelity artifact for the PowerAnalysis methodology validation (the paper-review PR of the 2-PR sequence; source validation + code reconciliation + tests are deferred to PR-B). No estimator logic, formula, or test changes. New source audits under docs/methodology/papers/ (sourced only from the papers): - bloom-1995-review.md: the MDE-multiplier framing — MDE = M*SE with M from the NORMAL distribution (Bloom builds 1.65 + 0.84 = 2.49 from z-quantiles, p.548-549), the cross-sectional impact-estimator SE sigma*sqrt((1-R^2)/(n*T*(1-T))) (Eq. 1/2), the T(1-T) allocation factor (optimal at 50/50), one- and two-sided multipliers; documents that Bloom explicitly excludes clustering/multi-site design effects (Note 1). - burlig-preonas-woerman-2020-review.md: the serial-correlation-robust (SCR) panel-DD variance (Eq. 2, verbatim — three covariance terms psi^B/psi^A/psi^X over m pre- and r post-periods, psi^X entering negatively), the McKenzie special case (Eq. 3), the increase-MDE condition (Eq. 4); Eq. 1 uses the t-distribution; pcpanel is the panel parity reference. The audits surfaced discrepancies between the authoritative PowerAnalysis surfaces and the source material. Per the agreed approach these are DOCUMENTED as under-review now (not yet fixed — reconciliation is deferred to PR-B and tracked in TODO.md): - REGISTRY.md ## PowerAnalysis: umbrella **Note:** enumerating the four discrepancies (t-vs-normal-z multiplier; SE R^2/cluster-m terms; missing T(1-T) allocation factor in the displayed sample-size formula; panel (1+(T-1)rho)/T is an equicorrelated/Moulton design effect, NOT Burlig SCR — an attribution overclaim). - REGISTRY.md R-equivalents table: annotate the PowerAnalysis row as under-review (analytical path is normal-based, so pwr.t.test is not the faithful parity target; panel parity ref is Stata pcpanel) — resolves the cross-reference inconsistency the audits introduced. - power.py: docstring notes on PowerAnalysis and simulate_power flagging the panel attribution and normal-vs-t approximation as under review; the class docstring panel-variance display corrected from a self-canceling factor to the implemented (1/N_treated + 1/N_control) * (1+(T-1)rho)/T (docstring-only; no logic change). - references.rst: clarify the analytical panel path uses an equicorrelated approximation, not Burlig's SCR formula. - TODO.md: tracker row (Methodology/Correctness) for the PR-B reconciliation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent d59039e commit aa64e26

6 files changed

Lines changed: 367 additions & 4 deletions

File tree

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ Deferred items from PR reviews that were not addressed before merge.
7474

7575
| Issue | Location | PR | Priority |
7676
|-------|----------|----|----------|
77+
| PowerAnalysis: REGISTRY `## PowerAnalysis` equation block + analytical panel-path attribution need reconciliation against the source audits — (1) MDE multiplier t vs normal-z (Bloom uses z, Burlig Eq. 1 uses t, code uses z); (2) SE `1/sqrt(1-R^2)` + cluster-size `m` terms vs code's `2*sigma^2*(1/n_T+1/n_C)` (no R^2); (3) sample-size `T(1-T)` allocation factor; (4) panel `(1+(T-1)*rho)/T` is equicorrelated/Moulton, NOT Burlig SCR (Eq. 2) — re-attribute or implement. Documented as under-review Notes in REGISTRY/power.py/references.rst by the paper-review PR. See `docs/methodology/papers/bloom-1995-review.md`, `burlig-preonas-woerman-2020-review.md`. | `power.py`, `docs/methodology/REGISTRY.md`, `docs/references.rst` | follow-up (PR-B) | Medium |
7778
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
7879
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
7980
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |

diff_diff/power.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1224,11 +1224,17 @@ class PowerAnalysis:
12241224
+ 1/n_control_post + 1/n_control_pre)
12251225
12261226
For panel DiD with T periods:
1227-
Var(ATT) = sigma^2 * (1/(N_treated * T) + 1/(N_control * T))
1228-
* (1 + (T-1)*rho) / (1 + (T-1)*rho)
1227+
Var(ATT) = sigma^2 * (1/N_treated + 1/N_control)
1228+
* (1 + (T-1)*rho) / T
12291229
12301230
Where rho is the intra-cluster correlation coefficient.
12311231
1232+
These analytical formulas are under methodology review (2026-05): the panel variance uses an
1233+
equicorrelated/Moulton ``(1 + (T-1)*rho)/T`` design effect, which is **not** Burlig et al.'s
1234+
serial-correlation-robust (SCR) variance, and the critical values use the normal (z)
1235+
approximation (Bloom) rather than Burlig's t-distribution. See ``docs/methodology/REGISTRY.md``
1236+
``## PowerAnalysis`` and the source audits under ``docs/methodology/papers/``.
1237+
12321238
References
12331239
----------
12341240
Bloom, H. S. (1995). "Minimum Detectable Effects."
@@ -1901,6 +1907,9 @@ def simulate_power(
19011907
3. Repeat n_simulations times
19021908
4. Power = fraction of simulations where p-value < alpha
19031909
1910+
The analytical reference formulas this Monte Carlo path complements are under methodology
1911+
review; see ``docs/methodology/REGISTRY.md`` ``## PowerAnalysis``.
1912+
19041913
References
19051914
----------
19061915
Burlig, F., Preonas, L., & Woerman, M. (2020). "Panel Data and Experimental Design."

docs/methodology/REGISTRY.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3204,6 +3204,21 @@ Violation types:
32043204

32053205
*Estimator equation (as implemented):*
32063206

3207+
- **Note:** The formulas in this "Estimator equation" block are under active methodology review
3208+
(2026-05; source audits at `docs/methodology/papers/bloom-1995-review.md` and
3209+
`docs/methodology/papers/burlig-preonas-woerman-2020-review.md`, against Bloom (1995) and Burlig,
3210+
Preonas & Woerman (2020)). The audits identified discrepancies with `diff_diff/power.py` to be
3211+
reconciled in a follow-up PR (tracked in `TODO.md`): (1) the MDE multiplier is written with the
3212+
t-distribution, but the analytical path uses the normal (z) approximation following Bloom — Burlig
3213+
Eq. 1 uses t; (2) the SE expression's `1/sqrt(1-R^2)` and cluster-size `m` terms are not what the
3214+
code implements (its `basic_did` variance is `2*sigma^2*(1/n_T+1/n_C)`, with no R^2 term); (3) the
3215+
sample-size formula below omits the `T(1-T)` allocation factor that the code applies (via
3216+
`treat_frac*(1-treat_frac)`); and (4) the panel `(1+(T-1)*rho)/T` factor is an
3217+
equicorrelated/Moulton design effect, **not** Burlig's serial-correlation-robust (SCR) variance
3218+
(their Eq. 2), so the Burlig attribution for the analytical panel path is an overclaim pending
3219+
re-attribution or implementation. These notes document the known state; this block is not yet a
3220+
corrected contract.
3221+
32073222
Minimum detectable effect (MDE):
32083223
```
32093224
MDE = (t_{α/2} + t_{1-κ}) × SE(τ̂)
@@ -3346,7 +3361,7 @@ should be a deliberate user choice.
33463361
| BaconDecomposition | bacondecomp | `bacon()` |
33473362
| HonestDiD | HonestDiD | `createSensitivityResults()` |
33483363
| PreTrendsPower | pretrends | `pretrends()` |
3349-
| PowerAnalysis | pwr / DeclareDesign | `pwr.t.test()` / simulation |
3364+
| PowerAnalysis | pwr / DeclareDesign / pcpanel | `pwr.t.test()` / simulation — **under review** (see `## PowerAnalysis` Note: the analytical path is normal-based, so `pwr.t.test` is not the faithful parity target; the panel parity reference is Stata `pcpanel`) |
33503365

33513366
---
33523367

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# Paper Review: Minimum Detectable Effects — A Simple Way to Report the Statistical Power of Experimental Designs
2+
3+
**Authors:** Howard S. Bloom
4+
**Citation:** Bloom, H. S. (1995). Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs. *Evaluation Review*, 19(5), 547-556.
5+
**DOI:** https://doi.org/10.1177/0193841X9501900504
6+
**Source reviewed:** *Evaluation Review* 19(5), 547-556 (10 pages). PDF was reviewed externally and is **not** committed to the repository (the `/papers/` working directory is gitignored). Reproduce by downloading the published article via the DOI above, or the open scan used here at `https://bpb-us-e2.wpmucdn.com/sites.uci.edu/dist/1/1159/files/2021/03/Bloom-MDES-Eval-Rev-1995-Bloom.pdf`. Page numbers below refer to the journal pagination (547-556).
7+
**Review date:** 2026-05-31
8+
9+
---
10+
11+
## Methodology Registry Entry
12+
13+
**Status: proposed/confirming source text for the `## PowerAnalysis` REGISTRY entry; this file is a
14+
non-authoritative source audit.** The current `docs/methodology/REGISTRY.md` `## PowerAnalysis` block
15+
remains the sole authoritative methodology contract. This review establishes what Bloom (1995)
16+
*actually* states so that a follow-up audit PR (PR-B) can reconcile the REGISTRY equation block and
17+
`diff_diff/power.py` against it. The registry-candidate text ends just before `## Implementation
18+
Notes`; everything below that boundary is audit notes and is **not** normative.
19+
20+
Scope note: Bloom (1995) is the primary source for **the minimum-detectable-effect (MDE)
21+
multiplier framing** and for **the cross-sectional (two-group) impact-estimator standard error**. It
22+
is *not* a difference-in-differences paper, and it **explicitly excludes** clustering / multi-site
23+
design effects (Note 1, p.555) — those rest on other sources (serial correlation: Burlig, Preonas &
24+
Woerman 2020, see [burlig-preonas-woerman-2020-review.md](burlig-preonas-woerman-2020-review.md);
25+
survey design effect: Kish 1965, already in REGISTRY).
26+
27+
## PowerAnalysis
28+
29+
**Primary source:** [Bloom, H. S. (1995). Minimum Detectable Effects. *Evaluation Review*, 19(5), 547-556.](https://doi.org/10.1177/0193841X9501900504)
30+
31+
**Key implementation requirements:**
32+
33+
*Definitions (p.547):*
34+
- The **minimum detectable effect (MDE)** is "the smallest effect that, if true, has an X% chance of
35+
producing an impact estimate that is statistically significant at the Y level," where X = statistical
36+
power and Y = significance level (p.547).
37+
- The MDE is measured **in the original units of the impact** (e.g. dollars), explicitly *not*
38+
standardized like Cohen's (1977) effect size (p.547-548).
39+
40+
*MDE multiplier — derived from the NORMAL distribution (p.548-549):*
41+
42+
Bloom constructs the multiplier from "two normal (bell-shaped) sampling distributions" (p.548). For a
43+
one-sided test at the .05 level with 80% power he derives:
44+
45+
```
46+
critical value (one-sided .05): 1.65 standard errors above zero = z_{0.95}
47+
power shift (80%): 0.84 standard errors above crit. = z_{0.80}
48+
MDE = (1.65 + 0.84) * SE = 2.49 * SE (p.549)
49+
```
50+
51+
The general rule (p.549-550): "For any significance level, power value, and one- or two-sided
52+
hypothesis test, the minimum detectable effect can be computed as a multiple of the standard error of
53+
the impact estimate." In standard-normal-quantile form (all multipliers below reproduce Bloom's
54+
stated values exactly using `z` quantiles, confirming the normal — not t — basis):
55+
56+
```
57+
MDE = M * SE(impact)
58+
M_one_sided = z_{1-alpha} + z_{power}
59+
M_two_sided = z_{1-alpha/2} + z_{power}
60+
```
61+
62+
*Table 1 multipliers explicitly stated in the text* (p.549-551), one-sided test at the .05 level:
63+
64+
| Power | Multiplier M | Check (z_{0.95} + z_{power}) |
65+
|-------|--------------|------------------------------|
66+
| 90% | 2.93 | 1.645 + 1.282 = 2.927 |
67+
| 80% | 2.49 | 1.645 + 0.842 = 2.487 |
68+
| 70% | 2.17 | 1.645 + 0.524 = 2.169 |
69+
70+
One-sided .10 at 80% power: M = 2.12 (p.552) = z_{0.90} + z_{0.80} = 1.282 + 0.842 = 2.124.
71+
Table 1 has a top panel (one-sided) and a bottom panel (two-sided); the columns are significance
72+
levels and the rows are power (p.550).
73+
74+
*Standard error of the impact estimator (p.551):*
75+
76+
Bloom gives Equation (1) for a **continuous** outcome from a regression-adjusted treatment/control
77+
difference of means (simple random sample), and Equation (2) for a **binary** outcome. The typeset
78+
equations are image-only in the available scan; their algebraic form is fixed unambiguously by Bloom's
79+
explicit textual description (p.551) and Note 8 (p.556). With:
80+
- `sigma` = standard deviation of the continuous outcome
81+
- `Pi` = proportion with outcome value 1 (binary case)
82+
- `T` = fraction of the sample randomly assigned to treatment
83+
- `n` = total study-sample size
84+
- `R^2` = explanatory power of the impact regression
85+
86+
```
87+
Eq (1) continuous: SE_c = sigma * sqrt( (1 - R^2) / ( n * T * (1 - T) ) )
88+
Eq (2) binary: SE_b = sqrt( Pi*(1 - Pi) * (1 - R^2) / ( n * T * (1 - T) ) )
89+
```
90+
91+
Bloom states Equation (1) "increases as heterogeneity sigma increases, decreases as R^2 increases,
92+
decreases as n increases" (p.551) and (Note 8, p.556) the MDE "increases in inverse proportion to the
93+
square root of T(1-T)" — pinning the `(1-R^2)` numerator and the `n*T*(1-T)` denominator. Equation (2)
94+
"differs from Equation (1) only in that the population variance is expressed as Pi(1-Pi) … instead of
95+
sigma^2" (p.551). Writing `n_T = nT`, `n_C = n(1-T)` gives the equivalent two-group form
96+
`Var = sigma^2 (1-R^2) (1/n_T + 1/n_C)`.
97+
98+
*Treatment/control allocation (p.553-554, Table 2):*
99+
- Statistical power is **maximized at a 50/50** treatment/control mix (p.553).
100+
- MDE rises *slowly* away from 50/50: 60/40 → 1.02x, 70/30 → 1.09x the 50/50 MDE (p.554, Note 9);
101+
by Note 8 the MDE scales as `1/sqrt(T(1-T))`, which is symmetric in `T ↔ 1-T` (Note 7).
102+
103+
*One-sided vs two-sided (p.554-555):*
104+
- Bloom argues program evaluation should use a **one-sided** test (decision-oriented), which has a
105+
smaller MDE / higher power than two-sided (p.554-555), but provides multipliers for **both** (Table 1).
106+
107+
*Paper-derived requirements checklist:*
108+
- [ ] MDE computed as `M * SE` with `M` from the **normal** distribution (one- and two-sided).
109+
- [ ] Cross-sectional SE supports the `(1-R^2)` covariate-adjustment factor and the `T(1-T)` allocation factor.
110+
- [ ] Binary-outcome variance available as `Pi(1-Pi)` in place of `sigma^2`.
111+
- [ ] Power maximized at 50/50; MDE robust to moderate allocation imbalance.
112+
- [ ] One-sided and two-sided multipliers both supported.
113+
- [ ] Clustering / multi-site design effects are **out of scope** for Bloom's Eq (1)-(2) (Note 1).
114+
115+
---
116+
117+
## Implementation Notes (audit notes — NOT registry-candidate)
118+
119+
These observations map Bloom (1995) to `diff_diff/power.py` and the current REGISTRY block. They are
120+
flagged here for **PR-B** to reconcile (fix-vs-document); this review does not change code or REGISTRY.
121+
122+
- **D1 (t vs z) resolves in the code's favor.** Bloom's multiplier is built entirely from the normal
123+
distribution (p.548-549). `PowerAnalysis._get_critical_values` (`power.py`) uses
124+
`stats.norm.ppf`**faithful to Bloom**. The REGISTRY block writes the multiplier as
125+
`(t_{alpha/2} + t_{1-kappa})` (the REGISTRY PowerAnalysis block); per the primary source this should be `z`
126+
(normal), not `t`. PR-B candidate: correct the REGISTRY notation to `z`, or document the
127+
normal-approximation explicitly.
128+
- **R-parity reference implication.** Because the analytical path uses the normal approximation (per
129+
Bloom), `pwr::pwr.t.test()` (noncentral-t) is **not** the right parity reference for it; a
130+
normal-based reference (`pwr::pwr.norm.test` / `pwr.2p2n.test`, or a hand-derived closed form) is.
131+
This bears on the deferred PR-B R-parity fixture choice.
132+
- **Bloom's SE is cross-sectional, not DiD.** Eq (1) `Var = sigma^2(1-R^2)(1/n_T+1/n_C)` is a
133+
single-measurement two-group estimator. The code's `basic_did` branch
134+
(`_compute_variance`, `power.py`) uses `sigma^2(1/n_T+1/n_T+1/n_C+1/n_C) = 2 sigma^2(1/n_T+1/n_C)`
135+
— the **DiD analog** (two independent time points, factor of 2), with **no `R^2` term**. So Bloom
136+
underpins the MDE *multiplier* and the cross-sectional SE; the DiD variance itself is a separate
137+
(DiD-specific) quantity. PR-B candidate: reconcile the REGISTRY SE formula (which currently shows
138+
`sigma*sqrt(1/n_T+1/n_C)*sqrt(1+rho(m-1))/sqrt(1-R^2)`, the REGISTRY PowerAnalysis block) against the code — note the
139+
`R^2` term appears **inverted** there relative to Bloom (Bloom multiplies by `sqrt(1-R^2)`; the
140+
REGISTRY divides by it). The code's `basic_did` variance uses the group-count form
141+
`2*sigma^2*(1/n_T+1/n_C)` — allocation still enters, implicitly via the `n_T`/`n_C` counts here and
142+
explicitly as `f(1-f)` in `_compute_required_n` (`power.py`) — rather than Bloom's
143+
total-`n` x `T(1-T)` parameterization, and it omits the `R^2` factor entirely.
144+
- **Allocation factor is faithful.** Bloom's `T(1-T)` (Note 8) appears in the code's required-N
145+
formula as `f(1-f)` (`_compute_required_n`, `power.py`); the REGISTRY sample-size formula
146+
`n = 2(...)^2 sigma^2 / MDE^2` (the REGISTRY PowerAnalysis block) **omits** it. PR-B candidate: add the allocation factor to
147+
the REGISTRY formula.
148+
- **Binary outcomes.** Bloom's Eq (2) (`Pi(1-Pi)`) is supported in the library only by the user
149+
passing `sigma = sqrt(Pi(1-Pi))`; there is no dedicated binary path. Reasonable simplification;
150+
worth a one-line note for PR-B.
151+
- **Out-of-scope for Bloom.** `deff` (Kish survey design effect) and the panel `rho` serial-correlation
152+
factor are **not** Bloom — Note 1 (p.555) explicitly says Eq (1)-(2) "do not account for the design
153+
effect … in multisite experiments" and points to other sources. See the Burlig review for the panel
154+
variance; Kish (1965) is already cited at the REGISTRY Kish DEFF note for `deff`.
155+
156+
## Gaps and Uncertainties
157+
158+
1. **Typeset Eq (1)/(2) and Table 1 are image-only in the available scan.** The algebraic forms above
159+
are reconstructed from Bloom's explicit prose (p.551) + Notes 8/9 (p.556) and verified to reproduce
160+
every numeric multiplier Bloom states (2.93/2.49/2.17/2.12). If PR-B needs the exact typeset
161+
coefficients of the full Table 1 grid (all power × significance × one/two-sided cells), obtain a
162+
text-layer copy of the article; the values are nonetheless fully determined by `M = z_{1-alpha(/2)} + z_{power}`.
163+
2. **Numeric SE verification not possible from this scan.** Bloom's worked Examples 1-3 (p.552-553)
164+
report SEs ($560, 2.8 scale points, 0.040) but the example *parameters* are image-only, so the SE
165+
formula could not be re-derived numerically from the paper. The form is standard and textually
166+
pinned; this is a transcription limitation, not a methodological doubt.
167+
3. **DiD applicability.** Bloom does not treat panel/DiD designs; the library's DiD variance and its
168+
serial-correlation handling must be validated against the Burlig review, not this one.

0 commit comments

Comments
 (0)