modularize code klustering and check split_tetra

https://github.com/poyuliu/KTU2/blob/a3a52350d811d32aaf204a7d32e99f47c52c14af/R/KTU2.R#L112-L124
I would try to modularize the code as much as possible. For instance, in this case, I would force `klustering` to only accept `tetra.tables` or `fasta files`. This is a way to make the code cleaner by avoiding redundancies and it is easier to debug later.
My suggestion would be to let it feed only from tetra.tables.

On the same line of thinking I noticed the function `split_tetra` computes tetra.table and after splits the sequences based on these tables. To modularize the code I would take out from the function the lines computing tetra tables:
https://github.com/poyuliu/KTU2/blob/a3a52350d811d32aaf204a7d32e99f47c52c14af/R/KTU2.R#L516
Other enhancements regarding `split_tetra`:
1. returns an error when the `length(tetra.table) < 2`. This can happen for small datasets when there is no split. Please, try to reproduce the error.
2. it is time-consuming given the simplicity of the calculations. I don't know what step is bottlenecking the process. Other k-mer-based clustering algorithms might be faster. For instance: https://doi.org/10.1093/nar/gky315.

	if(class(repseq)=="list" & all(names(repseq)==c("tetra.table","output.seq"))){
	message("loading k-mer frequency table...")
	tetra.table <- repseq[[1]]
	asv.id <- colnames(tetra.table)
	species <- repseq[[2]]
	} else{
	message("k-mer frequency calling...")
	tetra.table <- tetra.freq(repseq, pscore = as.logical(pscore), file = as.logical(seqfromfile), cores=cores)
	asv.id <- colnames(tetra.table)
	if(class(repseq)=="DNAStringSet"){
	species <- as.character(repseq)
	} else species <- as.character(Biostrings::readDNAStringSet(filepath = repseq,use.names = T))
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modularize code klustering and check split_tetra #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

modularize code klustering and check split_tetra #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions