[feature] Evaluate sharded hashing through kun-peng

The memory usage is becoming an ongoing issue especially with the newer databases and the self classification step during DB construction. Once recent strategy is the implementation by sharded hashes in Kun-Peng:

https://github.com/eric9n/Kun-peng<br>
> https://www.biorxiv.org/content/10.1101/2024.12.19.629356v1

Some initial tests looked good and the unique syncmer assignment also improved the MEDI classifications a bit. This issue tracks the addition of sharded hashing into the workflow.

## Open Steps

- [ ] change download scripts to also download the decoys previously contained in the Kraken2 standard DB (bacteria, archaea, virus, plasmid, vectors)
- [ ] figure out how to layout the database to make it work with Kun-Peng
- [ ] benchmark the DB constructions (slower with Kun-Peng)
- [ ] decide on the level of fragmentation (shard size)
- [ ] benchmark the classification speed with the sharded hash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Evaluate sharded hashing through kun-peng #38

Open Steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[feature] Evaluate sharded hashing through kun-peng #38

Description

Open Steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions