Fix control matching and pooling fallback reporting#4
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Added standard spike-in scale-factor TSV outputs (no --dump_scale_factors gating) and optional debug scope reference output. Commit: 5151d32. |
|
Commit c2db7f8: expand peak QC to all callers. PEAK_QC now runs on all peak/consensus streams, adds caller-aware peak QC tables in 03_peak_calling/07_qc_tables (peak_counts, peak_frip_scores, peak_reproducibility, consensus_peak_counts), and makes peak heatmap inputs per caller with caller-tagged filenames. Also updates MultiQC section labels/docs and enables SEACR summit extraction when SEACR is not primary. |
|
Commit 019dd4a: align spec-facing peak outputs to biological sample IDs. Peak-calling artifacts (bedgraph/bigwig/peaks), heatmaps, QC tables, and control fallback/normalisation reports now use {group}_{condition}_rep{replicate} (no _T#). Updated heatmap ID handling, peak caller prefixes, and tests to match new naming. |
|
Commit 069fc0e: add --consensus_grouping all support (maps to all-samples consensus). Updated flowswitch, schema, docs, and tests; consensus outputs documented under 03_peak_calling/05_consensus_peaks/all/all. |
|
Commit 061930c: add CI tests for control validation, scale-factor outputs, multi-caller QC, and sample-ID naming. New missing-control samplesheet + failfast/override tests; epic2 missing-control skip test checks fallback TSV; spike-in test checks normalisation_factors/all.tsv; new multi-caller QC test checks caller_id values and no _T in sample IDs; CI tag matrix updated. |
|
Commit 981eedb: allow --consensus_grouping all validation; use fallback pooled controls (status= fallback_other_condition, action=used) and record missing-control skips with full columns; --allow_cross_condition_controls now warns (including zero-control groups) instead of failing; samplesheet.valid.csv id now uses group_condition_rep (no _T); updated/added tests incl. no-controls samplesheet, epic2 fallback+skip, and output expectations without _T. Tests not run per instructions. |
|
Commit 27005d3: surface samplesheet warnings via samplesheet.warnings.txt and log them in INPUT_CHECK; adjust FRiP join to preserve multi-caller peaks; move ch_scope_reference init to workflow scope; keep caller_ids handling in cutandrun workflow. Tests: full pytest-workflow via Slurm job 12911989 PASS (323 passed, 102 warnings) in /scratch/users/dhusmann/nextflow-work/runs/pytest_full/20260105_061940. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27005d391d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed #4 (comment) in commit e8e2b6f: pair_targets_controls now preserves targets when controls are missing (no silent drops). SEACR/MACS2 now log a warning and run without control for those samples when --use_control is true but control group is absent. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8e2b6f893
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed #4 (comment) in commit 4583100: updated missing-control warnings to explicitly flag possible typos and document behavior (epic2/span skipped; SEACR/MACS2 run without control). |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4583100108
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed #4 (comment) in commit 79880d9: group peak-summit beds by sample_id before joining with bigwigs, then expand to preserve all caller-specific peak sets (no caller dropped during heatmap prep). |
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Pushed commit 50d0df8 fixing SPAN + peak QC issues in dev0. Summary of changes:
These resolve: SPAN NoClassDefFoundError, SPAN .peak missing, bedtools intersect chrY mismatch, bedtools -sorted order error, and intersect input/output name collision. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50d0df8cce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| MAPPED_READS=\$(grep -m 1 'mapped (' ${flagstat} | awk '{print \$1}') | ||
| FRIP_SCORE=\$(awk -v a="\$READS_IN_PEAKS" -v b="\$MAPPED_READS" 'BEGIN {OFS="\\t"} {if (b > 0) {print a / b} else {print 0}}') |
There was a problem hiding this comment.
Fix FRiP score computation to actually output a value
The new FRIP_SCORE=$(awk …) call runs awk with no input stream, so the default rule never executes and FRIP_SCORE ends up empty for every sample. That means both *_frip_score.txt and the MultiQC table rows are blank even when mapped reads are present. This happens unconditionally because the awk command isn’t fed any input; use a BEGIN block that prints, or pipe a dummy line into awk.
Useful? React with 👍 / 👎.
|
Pushed commit 5cc7de0 with peak QC repro fixes + config parse fix. Changes:
Resolves:
|
|
Pushed commit 3f05d4a with fixes for late-stage failures. Changes:
Resolves:
|
|
@codex review |
|
Addressed Codex review note on FRiP: updated PEAK_FRIP awk to print in BEGIN block (no input stream needed), so FRIP_SCORE is no longer empty. Commit: 4cbb959. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f05d4ac99
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| PEAK_QC( | ||
| ch_peaks_primary, | ||
| ch_peaks_with_ids_primary, | ||
| ch_consensus_peaks_primary, | ||
| ch_consensus_peaks_unfilt_primary, | ||
| ch_peaks_all, | ||
| ch_peaks_with_ids_all, |
There was a problem hiding this comment.
Ensure per-caller peak QC uses unique sample IDs
Switching PEAK_QC to consume ch_peaks_all means each biological sample now contributes multiple peak sets (one per caller), but the QC modules still derive output prefixes and MultiQC sample names from meta.id alone (see modules/local/peak_frip.nf and modules/local/peak_counts.nf, which default to ${meta.id} for output filenames). Because meta.id remains the sample ID (caller is only in meta.caller), per-caller QC files share the same basename and MultiQC will collapse or overwrite them, leaving only one caller’s FRiP/peak-count metrics per sample. Consider augmenting meta.id with the caller (e.g., ${sample_id}_${caller}) before invoking PEAK_QC, or update the QC modules to incorporate meta.caller in their prefixes.
Useful? React with 👍 / 👎.
|
Pushed commit adef1f0: fixes callers initialization and PREPARE_PEAKCALLING inputs. Changes:
Outcome: DEEPTOOLS_BAMCOVERAGE now emits for all samples; SEACR should schedule once peak prep completes. |
|
Pushed commit f345f8e: avoid MultiQC filename collisions for peak reproducibility. Change:
|
|
Success! @codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f345f8e225
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def peaks_list = row[2] | ||
| def bigwig = row[1] | ||
| peaks_list.collect { peak_meta, bed -> | ||
| def sample_id = peak_meta.sample_id ?: peak_meta.id | ||
| def heatmap_meta = peak_meta + [id: "${sample_id}_${peak_meta.caller}"] |
There was a problem hiding this comment.
Fix tuple destructuring in peaks_list.collect
In this block peaks_list is a Groovy List of [meta, bed] tuples from groupTuple, but List.collect passes each element as a single argument. With peaks_list.collect { peak_meta, bed -> ... }, peak_meta becomes the whole list and bed is null (or an index), so peak_meta.sample_id will throw a MissingPropertyException at runtime. This will break the heatmap path whenever run_deeptools_heatmaps is enabled. Explicitly destructure the tuple (e.g. def (peak_meta, bed) = it) before accessing sample_id.
Useful? React with 👍 / 👎.
Summary\n- Enforce condition-matched controls in samplesheet validation (fail-fast by default; optional cross-condition override).\n- Prevent silent drops when pooled controls are missing; skip control-required callers with warnings and recorded reasons.\n- Always emit control pooling report with status/action fields; update docs accordingly.\n\n## Testing\n- Not run (spec-only changes).