The memory usage is becoming an ongoing issue especially with the newer databases and the self classification step during DB construction. Once recent strategy is the implementation by sharded hashes in Kun-Peng:
https://github.com/eric9n/Kun-peng
https://www.biorxiv.org/content/10.1101/2024.12.19.629356v1
Some initial tests looked good and the unique syncmer assignment also improved the MEDI classifications a bit. This issue tracks the addition of sharded hashing into the workflow.
Open Steps
The memory usage is becoming an ongoing issue especially with the newer databases and the self classification step during DB construction. Once recent strategy is the implementation by sharded hashes in Kun-Peng:
https://github.com/eric9n/Kun-peng
Some initial tests looked good and the unique syncmer assignment also improved the MEDI classifications a bit. This issue tracks the addition of sharded hashing into the workflow.
Open Steps