Goal:
Implementing Eddies' Bakeoff project with unified DBs for consensus species detection
The consensus will be determined using a custom script that combines their species table calls. Here's the idea (from Todd) :
- 2 out of 3 tools detecting presence of species
- Same criterea for parent species if looking at strain calls
- Ignore combining the abundance information for now and only focus on presence/absence.
- Should we include the relative abundance from all 3 tools in our final table or will that be more confusing?
What do we do with multiple samples? Should they be combined into a single output?
Plan
Prashant laying the ground-work with:
unified_db_base_dir = "/home/Users/pacbio_bakeoff/data/ref_db/refseq03032025" // path to Eddy's unified databases (Ensemble analysis: species detection)
Continue from here on a new branch with something like ensemble_species_detection_i107
Goal:
Implementing Eddies' Bakeoff project with unified DBs for consensus species detection
The consensus will be determined using a custom script that combines their species table calls. Here's the idea (from Todd) :
What do we do with multiple samples? Should they be combined into a single output?
Plan
Prashant laying the ground-work with:
Picking these 3 tools: Download the nf-core modules using
nf-core modules install <module>on branch:add_taxprofilerssylph_i73at commit: f6c45efDirect the pipeline to Eddy's bakeoff DB path:
test-modules/sylph-test.nf(same as subissue here: test sylph with the unified DB #108)Continue from here on a new branch with something like
ensemble_species_detection_i107nextflow.config. Can start with this unfinished script:test-modules/ensemble-test.nfanalysis_type = species_detectionmode = comprehensiveor something to turn onensemble modeand activate the 3 tools inspecies_detectionbranch. Or use afastmode to run sylph only here?