Problem
Work directory and published outputs accumulate large uncompressed TSV files. Probability matrices from `scvi_predict` (one per query×ref combination) and F1/confusion TSVs from `classify_all` and `predict_seurat` are the main contributors.
Proposed Fix
Compress TSV outputs at write time in the relevant scripts, and update module `output:` patterns and `publishDir` patterns to match `*.tsv.gz`. Most downstream readers (pandas, R) handle gzipped TSVs transparently.
Affected modules / scripts
- `scvi_predict` / `bin/predict_scvi.py` — probability TSVs (`.rf.prob.df.tsv`, `.knn.prob.df.tsv`)
- `classify_all` / `bin/classify_all.py` — F1 summary + confusion matrix TSVs
- `predict_seurat` / `bin/predict_seurat.R` — prediction score TSVs
Expected outcome
Significant reduction in disk usage in both the Nextflow work directory and the published output directory, with no change in downstream behavior.
Problem
Work directory and published outputs accumulate large uncompressed TSV files. Probability matrices from `scvi_predict` (one per query×ref combination) and F1/confusion TSVs from `classify_all` and `predict_seurat` are the main contributors.
Proposed Fix
Compress TSV outputs at write time in the relevant scripts, and update module `output:` patterns and `publishDir` patterns to match `*.tsv.gz`. Most downstream readers (pandas, R) handle gzipped TSVs transparently.
Affected modules / scripts
Expected outcome
Significant reduction in disk usage in both the Nextflow work directory and the published output directory, with no change in downstream behavior.