Skip to content

Distill the 3 species detection tables into a single output file with a python/pola.rs script #115

@ppreshant

Description

@ppreshant

Goals

Plan

  1. read in each table and merge into 1 using a common key (like a taxonomy_id)
    • Need to find or make the key in all 3 tables first? //
    • See how Eddy standardized her outputs for comparisons!?
  2. Keep only the desired/necesessary columns and arrange them by: (confidence of identification / or abundance?)
  3. Break into a table of high confidence : with 2/3 tools detected and a low confidence table and output these

Understand formats

Understand the output formats

results.rep: plain report of the run, used to further generate tree-like reports
results.tre: tree-like report with cumulative abundances by taxonomic ranks (can be re-generated with ganon report)
  • kraken2 : mock9.kraken2.report.txt : similar to ganon2's .tre with hierarchy of outputs?
  • Understanding taxpasta output. _how do I match the taxonomy_id to the species/taxa name?
    • If you want to learn how to use taxpasta to add taxonomic names (rather than IDs) to your profiles, see here. // Need to supply ncbi/other taxanomy files (.dmp)

Information to retain

  • Species name
  • confidence metric:
    • adjusted_ANI (sylph)
  • abundance estimate

References

  • Best tool to use: polars/python which is fast and rust based. This is a quick way to learn this library + Copilot/Seqera will help generate a base script and module
  • (NO, skip for now) Is it relevant to use taxpasta here to standardize or merge the 3 tool outputs?
    • Does it support all 3 of our tools? -- not supporting sylph yet :😞; Brought up polars library in their repo here
    • _Consider if their minimalistic 2 col output format (taxonomy_id and count) is good enough for us?
  • Example output formats for the 3 tools for the test datasets (mock9 and mock20) in these work dirs; store somewhere or link here for reference?
[80/06b239] ORCHESTRATE_SOMATEM:SOMATEM:SPECIES_DETECTION:SYLPH_PROFILE (mock9)   [100%] 2 of 2 ✔
[38/2e20cc] ORCHESTRATE_SOMATEM:SOMATEM:SPECIES_DETECTION:GANON_CLASSIFY (mock9)  [100%] 2 of 2 ✔
[0e/f5823f] ORCHESTRATE_SOMATEM:SOMATEM:SPECIES_DETECTION:KRAKEN2_KRAKEN2 (mock9) [100%] 2 of 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions