-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hi,
I'm currently working on training the random forest model in PTATO and have been using the provided 'training_vcfs' to run the training program. I have encountered some issues and would appreciate your guidance.
I'm unsure how to properly modify the configuration file and execute the training program, as I couldn't find comprehensive instructions for this part. Below are the command and configuration file I used:
Command:
nextflow run /home/ug2263/software/PTATO/ptato-train.nf \
-c /home/ug2263/software/PTATO/configs/run_model_demo_test.config \
--out_dir /home/ug2263/data/LY_OME/01_processing/PTATO_model/model_demo_test \
-process.memory '800 GB' \
-process.cpus 64 \
-process.maxForks 10 \
-process.queueSize 100 \
-resume
Configuration File (run_model_demo_test.config):
includeConfig "${projectDir}/configs/process.config"
includeConfig "${projectDir}/configs/nextflow.config"
includeConfig "${projectDir}/configs/resources.config"
params {
run {
snvs =true
QC = false
svs = false
indels = false
cnvs = false
}
// TRAINING
train {
version = '2.0.0'
}
pta_vcfs_dir = '/home/ug2263/data/LY_OME/Training/training_vcfs/TP'
nopta_vcfs_dir = '/home/ug2263/data/LY_OME/Training/training_vcfs/FP'
// END TRAINING
// TESTING
input_vcfs_dir = '/home/ug2263/data/LY_OME/Training/training_vcfs/TP'
bams_dir = ''
// END TESTING
out_dir = ''
bulk_names = [
['IBFM26', 'IBFM26_shared_filtered'],
['PMCCB15', 'PMCCB15_shared_filtered'],
['PMCAHH1-FANCCKO', 'PMCAHH1-FANCCKO_shared_filtered'],
['IBFM35', 'IBFM35_shared_filtered'],
['PB10268', 'PB10268_shared_filtered']
]
snvs {
rf_rds = ""
}
indels {
rf_rds = ''
excludeindellist = "${projectDir}/resources/hg38/indels/excludeindellist/PTA_Indel_ExcludeIndellist_normNoGTrenamed.vcf.gz"
}
optional {
germline_vcfs_dir = ''
callableloci_dir = ''
autosomal_callable_dir = ''
walker_vcfs_dir = ''
short_variants {
somatic_vcfs_dir = ''
phased_vcfs_dir = ''
ab_tables_dir = ''
context_beds_dir = ''
features_beds_dir = ''
}
snvs {
rf_tables_dir = ''
ptato_vcfs_dir = ''
}
indels {
rf_tables_dir = ''
ptato_vcfs_dir = ''
}
qc {
wgs_metrics_dir = ''
alignment_summary_metrics_dir = ''
}
svs {
gridss_driver_vcfs_dir = ''
gridss_unfiltered_vcfs_dir = ''
gripss_somatic_filtered_vcfs_dir = ''
gripss_filtered_files_dir = ''
integrated_sv_files_dir = ''
}
cnvs {
cobalt_ratio_tsv_dir = ''
cobalt_filtered_readcounts_dir = ''
baf_filtered_files_dir = ''
}
}
}
However, the execution failed. I encountered the following errors:
Error 1:
" N E X T F L O W ~ version 25.04.6
Launching `/home/ug2263/software/PTATO/ptato-train.nf` [silly_church] DSL2 - revision: d5f55b4f34
WARN: Include with `params()` is deprecated -- pass params as a workflow or process input instead
Cannot find a component with name 'extractInputVcfGzFromDir' in module: /home/ug2263/software/PTATO/NextflowModules/Utils/getFilesFromDir.nf
Did you mean any of these?
extractInputVcfFromDir
-- Check script '/home/ug2263/software/PTATO/ptato-train.nf' at line: 8 or see '.nextflow.log' file for more details"
After revising the function ‘extractInputVcfFromDir’ in ptato.nf to address the above error, I encountered another issue:
Error 2
No .vcf(.gz) files found in [/home/ug2263/data/LY_OME/Training/training_vcfs/FP/*/*.{vcf,vcf.gz}].
Upon further investigation, I believe this error originates from 'short_variants.nf' rather than 'getFilesFromDir.nf'. I'm quite confused about this part and would appreciate any solutions or suggestions you might have.
Thank you in advance for your help!
Best regards,
Gexin