-
Notifications
You must be signed in to change notification settings - Fork 10
feat: add hierarchical FDR correction for dose-response data #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b06003f to
1ede068
Compare
Design note: Why min p-value instead of Simes?The initial implementation 1ede068 used Simes' method for Stage 1 p-value aggregation, following Yekutieli (2008) hierarchical FDR. However, testing on real dose-response data (LINCS, 4 plates, 58 compounds × 6 doses) revealed a problem: Simes penalizes compounds for having inactive low doses - which is the expected biological behavior in dose-response data. For example, compound
The compound has a strong signal at high dose but Simes dilutes it with the inactive low doses. Min p-value is more appropriate: a compound passes Stage 1 if ANY dose is active. This matches the biological question: "Does this compound have a phenotype at any tested dose?" Results on LINCS data:
Min-p provides 88% power gain over flat BH while correctly handling the dose-response structure. When would Simes be appropriate?Simes would be better when you expect most or all group members to be active (e.g., testing replicates of the same condition). For dose-response, where only high doses are expected to be active, min-p is the right choice. If there's future demand, we could add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic seems okay to me, but (and I consulted with @alxndrkalinin) we probabl don't want to add a bunch of arguments to a single function: Let's do composition.
At a general level, this means:
- Split p value calculation from statistical correction
- Isolate hierarchical statistical correction into its own function (ideally in a new file).
I'm a bit torn as to how much to modify the original mean_average_precision funciton, but I think it's worth isolating the two main steps to avoid code duplication, before and after the p-value is calculated this section.
Then we would have composition:
- One small function (e.g.,
get_map_pvaluethat covers the p-value calculation (the first section of mean_average_precision) - One function with hierarchical FDR correction
- refactor the function mean_average_precision into
get_map_pvalueandmultipletests, to retain backwards compatibility. - Potentially another function that wraps
get_map_pvalueand eithermultipletestsorhierarchical_fdr, if you want the convenience of map+hierarchical fdr in one.
This minimises repetition (as it is only the call to multipletests), while maximising modularity in case we have to add a different statistical correction in the future. It should be relatively simple, the code would remain modular enough, and we wouldn't start accumulating a bunch of flags and arguments on the main functions.
- Tests run on my side so far.
Implements two-stage hierarchical FDR (Yekutieli 2008) to reduce over-correction when testing related hypotheses (e.g., multiple doses of the same compound). - Add `hierarchical_by` parameter to `mean_average_precision()` - Stage 1: Aggregate p-values by group using Simes' method, apply BH - Stage 2: For significant groups, apply BH within each group - Add `simes_pvalue()` function for p-value combination - Fix `silent_thread_map` bug (handle `leave` kwarg) - Add comprehensive tests for hierarchical FDR Closes #115 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Simes method penalizes compounds for having inactive low doses, which is the expected biological behavior in dose-response data. Min p-value is more appropriate: a compound passes Stage 1 if ANY dose shows activity. - Replace simes_pvalue() aggregation with simple min() - Remove unused simes_pvalue function and tests - Update docstrings to reflect the change
The silent_thread_map leave kwarg issue will be properly fixed in the fix/silent-thread-map-leave-kwarg branch. This PR should be rebased after that fix is merged. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
baff4b1 to
eea7f76
Compare
Split mean_average_precision into modular components per PR review: - get_map_pvalue(): compute mAP scores and p-values - apply_fdr_correction(): standard BH correction - apply_hierarchical_fdr(): two-stage hierarchical correction This enables composition without accumulating flags on the main function. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tion Consistent naming pattern with apply_fdr_correction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Thanks for the excellent suggestions! I've made all these changes (using Claude) but have not reviewed it carefully myself. I'll tag you again when I'm ready. |
|
Alright ready for you @afermg |
Summary
Implements two-stage hierarchical FDR to reduce over-correction when testing related hypotheses (e.g., multiple doses of the same compound).
hierarchical_byparameter tomean_average_precision()Usage
When
hierarchical_byis specified, the result includes additional columns:stage1_p_value: Group-level p-value (minimum p-value in group)stage1_corrected_p_value: BH-corrected Stage 1 p-valuestage1_significant: Whether the group passed Stage 1Why min p-value instead of Simes?
For dose-response data, low doses are expected to be inactive. Simes' method penalizes compounds for having inactive low doses, which is biologically normal. Min p-value is more appropriate: a compound passes Stage 1 if ANY dose shows activity.
Example on LINCS data (4 plates, 58 compounds × 6 doses):
Why hierarchical FDR matters
With 1000 compounds × 5 doses = 5000 tests, standard BH treats each as independent. But doses of the same compound test the same underlying hypothesis. Hierarchical FDR:
Test plan
Context for Broadies: See https://github.com/broadinstitute/cpg0037-oasis-broad-U2OS-data/issues/9#issuecomment-3624961526 for a real-world example of its utility
Closes #115
🤖 Generated with Claude Code