|
| 1 | +# Paper Review: Minimum Detectable Effects — A Simple Way to Report the Statistical Power of Experimental Designs |
| 2 | + |
| 3 | +**Authors:** Howard S. Bloom |
| 4 | +**Citation:** Bloom, H. S. (1995). Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs. *Evaluation Review*, 19(5), 547-556. |
| 5 | +**DOI:** https://doi.org/10.1177/0193841X9501900504 |
| 6 | +**Source reviewed:** *Evaluation Review* 19(5), 547-556 (10 pages). PDF was reviewed externally and is **not** committed to the repository (the `/papers/` working directory is gitignored). Reproduce by downloading the published article via the DOI above, or the open scan used here at `https://bpb-us-e2.wpmucdn.com/sites.uci.edu/dist/1/1159/files/2021/03/Bloom-MDES-Eval-Rev-1995-Bloom.pdf`. Page numbers below refer to the journal pagination (547-556). |
| 7 | +**Review date:** 2026-05-31 |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Methodology Registry Entry |
| 12 | + |
| 13 | +**Status: proposed/confirming source text for the `## PowerAnalysis` REGISTRY entry; this file is a |
| 14 | +non-authoritative source audit.** The current `docs/methodology/REGISTRY.md` `## PowerAnalysis` block |
| 15 | +remains the sole authoritative methodology contract. This review establishes what Bloom (1995) |
| 16 | +*actually* states so that a follow-up audit PR (PR-B) can reconcile the REGISTRY equation block and |
| 17 | +`diff_diff/power.py` against it. The registry-candidate text ends just before `## Implementation |
| 18 | +Notes`; everything below that boundary is audit notes and is **not** normative. |
| 19 | + |
| 20 | +Scope note: Bloom (1995) is the primary source for **the minimum-detectable-effect (MDE) |
| 21 | +multiplier framing** and for **the cross-sectional (two-group) impact-estimator standard error**. It |
| 22 | +is *not* a difference-in-differences paper, and it **explicitly excludes** clustering / multi-site |
| 23 | +design effects (Note 1, p.555) — those rest on other sources (serial correlation: Burlig, Preonas & |
| 24 | +Woerman 2020, see [burlig-preonas-woerman-2020-review.md](burlig-preonas-woerman-2020-review.md); |
| 25 | +survey design effect: Kish 1965, already in REGISTRY). |
| 26 | + |
| 27 | +## PowerAnalysis |
| 28 | + |
| 29 | +**Primary source:** [Bloom, H. S. (1995). Minimum Detectable Effects. *Evaluation Review*, 19(5), 547-556.](https://doi.org/10.1177/0193841X9501900504) |
| 30 | + |
| 31 | +**Key implementation requirements:** |
| 32 | + |
| 33 | +*Definitions (p.547):* |
| 34 | +- The **minimum detectable effect (MDE)** is "the smallest effect that, if true, has an X% chance of |
| 35 | + producing an impact estimate that is statistically significant at the Y level," where X = statistical |
| 36 | + power and Y = significance level (p.547). |
| 37 | +- The MDE is measured **in the original units of the impact** (e.g. dollars), explicitly *not* |
| 38 | + standardized like Cohen's (1977) effect size (p.547-548). |
| 39 | + |
| 40 | +*MDE multiplier — derived from the NORMAL distribution (p.548-549):* |
| 41 | + |
| 42 | +Bloom constructs the multiplier from "two normal (bell-shaped) sampling distributions" (p.548). For a |
| 43 | +one-sided test at the .05 level with 80% power he derives: |
| 44 | + |
| 45 | +``` |
| 46 | +critical value (one-sided .05): 1.65 standard errors above zero = z_{0.95} |
| 47 | +power shift (80%): 0.84 standard errors above crit. = z_{0.80} |
| 48 | +MDE = (1.65 + 0.84) * SE = 2.49 * SE (p.549) |
| 49 | +``` |
| 50 | + |
| 51 | +The general rule (p.549-550): "For any significance level, power value, and one- or two-sided |
| 52 | +hypothesis test, the minimum detectable effect can be computed as a multiple of the standard error of |
| 53 | +the impact estimate." In standard-normal-quantile form (all multipliers below reproduce Bloom's |
| 54 | +stated values exactly using `z` quantiles, confirming the normal — not t — basis): |
| 55 | + |
| 56 | +``` |
| 57 | +MDE = M * SE(impact) |
| 58 | +M_one_sided = z_{1-alpha} + z_{power} |
| 59 | +M_two_sided = z_{1-alpha/2} + z_{power} |
| 60 | +``` |
| 61 | + |
| 62 | +*Table 1 multipliers explicitly stated in the text* (p.549-551), one-sided test at the .05 level: |
| 63 | + |
| 64 | +| Power | Multiplier M | Check (z_{0.95} + z_{power}) | |
| 65 | +|-------|--------------|------------------------------| |
| 66 | +| 90% | 2.93 | 1.645 + 1.282 = 2.927 | |
| 67 | +| 80% | 2.49 | 1.645 + 0.842 = 2.487 | |
| 68 | +| 70% | 2.17 | 1.645 + 0.524 = 2.169 | |
| 69 | + |
| 70 | +One-sided .10 at 80% power: M = 2.12 (p.552) = z_{0.90} + z_{0.80} = 1.282 + 0.842 = 2.124. |
| 71 | +Table 1 has a top panel (one-sided) and a bottom panel (two-sided); the columns are significance |
| 72 | +levels and the rows are power (p.550). |
| 73 | + |
| 74 | +*Standard error of the impact estimator (p.551):* |
| 75 | + |
| 76 | +Bloom gives Equation (1) for a **continuous** outcome from a regression-adjusted treatment/control |
| 77 | +difference of means (simple random sample), and Equation (2) for a **binary** outcome. The typeset |
| 78 | +equations are image-only in the available scan; their algebraic form is fixed unambiguously by Bloom's |
| 79 | +explicit textual description (p.551) and Note 8 (p.556). With: |
| 80 | +- `sigma` = standard deviation of the continuous outcome |
| 81 | +- `Pi` = proportion with outcome value 1 (binary case) |
| 82 | +- `T` = fraction of the sample randomly assigned to treatment |
| 83 | +- `n` = total study-sample size |
| 84 | +- `R^2` = explanatory power of the impact regression |
| 85 | + |
| 86 | +``` |
| 87 | +Eq (1) continuous: SE_c = sigma * sqrt( (1 - R^2) / ( n * T * (1 - T) ) ) |
| 88 | +Eq (2) binary: SE_b = sqrt( Pi*(1 - Pi) * (1 - R^2) / ( n * T * (1 - T) ) ) |
| 89 | +``` |
| 90 | + |
| 91 | +Bloom states Equation (1) "increases as heterogeneity sigma increases, decreases as R^2 increases, |
| 92 | +decreases as n increases" (p.551) and (Note 8, p.556) the MDE "increases in inverse proportion to the |
| 93 | +square root of T(1-T)" — pinning the `(1-R^2)` numerator and the `n*T*(1-T)` denominator. Equation (2) |
| 94 | +"differs from Equation (1) only in that the population variance is expressed as Pi(1-Pi) … instead of |
| 95 | +sigma^2" (p.551). Writing `n_T = nT`, `n_C = n(1-T)` gives the equivalent two-group form |
| 96 | +`Var = sigma^2 (1-R^2) (1/n_T + 1/n_C)`. |
| 97 | + |
| 98 | +*Treatment/control allocation (p.553-554, Table 2):* |
| 99 | +- Statistical power is **maximized at a 50/50** treatment/control mix (p.553). |
| 100 | +- MDE rises *slowly* away from 50/50: 60/40 → 1.02x, 70/30 → 1.09x the 50/50 MDE (p.554, Note 9); |
| 101 | + by Note 8 the MDE scales as `1/sqrt(T(1-T))`, which is symmetric in `T ↔ 1-T` (Note 7). |
| 102 | + |
| 103 | +*One-sided vs two-sided (p.554-555):* |
| 104 | +- Bloom argues program evaluation should use a **one-sided** test (decision-oriented), which has a |
| 105 | + smaller MDE / higher power than two-sided (p.554-555), but provides multipliers for **both** (Table 1). |
| 106 | + |
| 107 | +*Paper-derived requirements checklist:* |
| 108 | +- [ ] MDE computed as `M * SE` with `M` from the **normal** distribution (one- and two-sided). |
| 109 | +- [ ] Cross-sectional SE supports the `(1-R^2)` covariate-adjustment factor and the `T(1-T)` allocation factor. |
| 110 | +- [ ] Binary-outcome variance available as `Pi(1-Pi)` in place of `sigma^2`. |
| 111 | +- [ ] Power maximized at 50/50; MDE robust to moderate allocation imbalance. |
| 112 | +- [ ] One-sided and two-sided multipliers both supported. |
| 113 | +- [ ] Clustering / multi-site design effects are **out of scope** for Bloom's Eq (1)-(2) (Note 1). |
| 114 | + |
| 115 | +--- |
| 116 | + |
| 117 | +## Implementation Notes (audit notes — NOT registry-candidate) |
| 118 | + |
| 119 | +These observations map Bloom (1995) to `diff_diff/power.py` and the current REGISTRY block. They are |
| 120 | +flagged here for **PR-B** to reconcile (fix-vs-document); this review does not change code or REGISTRY. |
| 121 | + |
| 122 | +- **D1 (t vs z) resolves in the code's favor.** Bloom's multiplier is built entirely from the normal |
| 123 | + distribution (p.548-549). `PowerAnalysis._get_critical_values` (`power.py`) uses |
| 124 | + `stats.norm.ppf` — **faithful to Bloom**. The REGISTRY block writes the multiplier as |
| 125 | + `(t_{alpha/2} + t_{1-kappa})` (the REGISTRY PowerAnalysis block); per the primary source this should be `z` |
| 126 | + (normal), not `t`. PR-B candidate: correct the REGISTRY notation to `z`, or document the |
| 127 | + normal-approximation explicitly. |
| 128 | +- **R-parity reference implication.** Because the analytical path uses the normal approximation (per |
| 129 | + Bloom), `pwr::pwr.t.test()` (noncentral-t) is **not** the right parity reference for it; a |
| 130 | + normal-based reference (`pwr::pwr.norm.test` / `pwr.2p2n.test`, or a hand-derived closed form) is. |
| 131 | + This bears on the deferred PR-B R-parity fixture choice. |
| 132 | +- **Bloom's SE is cross-sectional, not DiD.** Eq (1) `Var = sigma^2(1-R^2)(1/n_T+1/n_C)` is a |
| 133 | + single-measurement two-group estimator. The code's `basic_did` branch |
| 134 | + (`_compute_variance`, `power.py`) uses `sigma^2(1/n_T+1/n_T+1/n_C+1/n_C) = 2 sigma^2(1/n_T+1/n_C)` |
| 135 | + — the **DiD analog** (two independent time points, factor of 2), with **no `R^2` term**. So Bloom |
| 136 | + underpins the MDE *multiplier* and the cross-sectional SE; the DiD variance itself is a separate |
| 137 | + (DiD-specific) quantity. PR-B candidate: reconcile the REGISTRY SE formula (which currently shows |
| 138 | + `sigma*sqrt(1/n_T+1/n_C)*sqrt(1+rho(m-1))/sqrt(1-R^2)`, the REGISTRY PowerAnalysis block) against the code — note the |
| 139 | + `R^2` term appears **inverted** there relative to Bloom (Bloom multiplies by `sqrt(1-R^2)`; the |
| 140 | + REGISTRY divides by it). The code's `basic_did` variance uses the group-count form |
| 141 | + `2*sigma^2*(1/n_T+1/n_C)` — allocation still enters, implicitly via the `n_T`/`n_C` counts here and |
| 142 | + explicitly as `f(1-f)` in `_compute_required_n` (`power.py`) — rather than Bloom's |
| 143 | + total-`n` x `T(1-T)` parameterization, and it omits the `R^2` factor entirely. |
| 144 | +- **Allocation factor is faithful.** Bloom's `T(1-T)` (Note 8) appears in the code's required-N |
| 145 | + formula as `f(1-f)` (`_compute_required_n`, `power.py`); the REGISTRY sample-size formula |
| 146 | + `n = 2(...)^2 sigma^2 / MDE^2` (the REGISTRY PowerAnalysis block) **omits** it. PR-B candidate: add the allocation factor to |
| 147 | + the REGISTRY formula. |
| 148 | +- **Binary outcomes.** Bloom's Eq (2) (`Pi(1-Pi)`) is supported in the library only by the user |
| 149 | + passing `sigma = sqrt(Pi(1-Pi))`; there is no dedicated binary path. Reasonable simplification; |
| 150 | + worth a one-line note for PR-B. |
| 151 | +- **Out-of-scope for Bloom.** `deff` (Kish survey design effect) and the panel `rho` serial-correlation |
| 152 | + factor are **not** Bloom — Note 1 (p.555) explicitly says Eq (1)-(2) "do not account for the design |
| 153 | + effect … in multisite experiments" and points to other sources. See the Burlig review for the panel |
| 154 | + variance; Kish (1965) is already cited at the REGISTRY Kish DEFF note for `deff`. |
| 155 | + |
| 156 | +## Gaps and Uncertainties |
| 157 | + |
| 158 | +1. **Typeset Eq (1)/(2) and Table 1 are image-only in the available scan.** The algebraic forms above |
| 159 | + are reconstructed from Bloom's explicit prose (p.551) + Notes 8/9 (p.556) and verified to reproduce |
| 160 | + every numeric multiplier Bloom states (2.93/2.49/2.17/2.12). If PR-B needs the exact typeset |
| 161 | + coefficients of the full Table 1 grid (all power × significance × one/two-sided cells), obtain a |
| 162 | + text-layer copy of the article; the values are nonetheless fully determined by `M = z_{1-alpha(/2)} + z_{power}`. |
| 163 | +2. **Numeric SE verification not possible from this scan.** Bloom's worked Examples 1-3 (p.552-553) |
| 164 | + report SEs ($560, 2.8 scale points, 0.040) but the example *parameters* are image-only, so the SE |
| 165 | + formula could not be re-derived numerically from the paper. The form is standard and textually |
| 166 | + pinned; this is a transcription limitation, not a methodological doubt. |
| 167 | +3. **DiD applicability.** Bloom does not treat panel/DiD designs; the library's DiD variance and its |
| 168 | + serial-correlation handling must be validated against the Burlig review, not this one. |
0 commit comments