Skip to content

[BUG] independent i-evalue-sel on large dataset #82

@iwangtoknow

Description

@iwangtoknow

Describe the bug
I noticed specific hit in the hmmer out file is neither present in the best_solution nor in the rejected list.

To Reproduce
Steps to reproduce the behavior:

  1. macsyfinder --db-type gembase -w 0 --sequence-db proteins.fasta --models CONJScan/Plasmids all -o CONJScan/result
  2. $ls -lh 1.8G Dec 1 15:20 proteins.fasta
  3. proteins are named after >plasmidID_proteinID
    Expected behavior

All the proteins fullfill the GA score in each profile will be reported either in best_solution or rejected list.

Please complete the following information):

OS:

  • Linux
  • Windows
  • Mac

MacSyFinder Version:

2.1.4, I am testing 2.1.6 but I think should be the same

Additional context
This param i-evalue-sel default=0.001 and the coverage-profile are independent, given one large gembase file, E-value in the hmmer result increases, the proteins barely have a score above the GA cutoff will not be picked up by MSF. They should be disabled when there is a GA score available in the hmm profile and not GA 0 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions