-
Notifications
You must be signed in to change notification settings - Fork 2
Software Usage
We recommande using KaMRaT within apptainer (previous singularity) container:
apptainer exec -B /bind_src:/bind_des kamrat <CMD> [options] /path/from/{bind_des}/to/input/kmer/table
# <CMD> can be one of index, filter, mask, merge, score, query
# replace "apptainer" to "singularity" when KaMRaT is built by singularityThe -B option is to bind disk partitions to apptainer image, please check apptainer helper for details:
apptainer exec -hIf built from source, KaMRaT can be run by:
/path/to/KaMRaT/kamrat/bin/in/app/directory <CMD> [options] /path/to/input/kmer/table
# <CMD> can be one of index, filter, mask, merge, score, queryIn the following sections, we present under the situation of using KaMRaT in apptainer.
For two alternative situations:
- to run KaMRaT within
singularitycontainer, please simply replace the keywordapptainerbysingularity; - to run KaMRaT after building from source, please replace the leading
apptainer exec -B /bind_src:/bind_desby the path to KaMRaT binary file (in theapp/folder).
KaMRaT's top-level helper is accessible by typing one of these commands:
apptainer exec kamrat
apptainer exec kamrat -h
apptainer exec kamrat -helpHelpers of each KaMRaT modules are accessible via one of these commands:
apptainer exec kamrat <CMD>
apptainer exec kamrat <CMD> -h
apptainer exec kamrat <CMD> -help
# <CMD> can be one of index, filter, mask, merge, score, query[USAGE] kamrat index -intab STR -outdir STR [-klen INT -unstrand -nfbase INT]
[OPTION] -h, -help Print the helper
-intab STR Input table for index, mandatory
-outdir STR Output index directory, mandatory
-klen k-mer length, mandatory if features are k-mer
if present, indexation will be switched to k-mer mode
-unstrand Unstranded mode, indexation with canonical k-mers
if present, indexation will be switched to k-mer mode
-nfbase INT Base for calculating normalization factor, not compatible with -nffile STR
if not provided, input counts will not be normalized
-nffile STR File for loading normalization factor, not compatible with -nfbase INT
a tab-separated row of normalization factors, same order as table header
[USAGE] kamrat filter -idxdir STR -design STR [-upmin INT1:INT2 -downmax INT1:INT2 -reverse -outfmt STR -outpath STR -counts STR]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-design STR Path to filter design file, a table of two columns, mandatory
the first column indicate sample names
the second column should be either UP or DOWN (capital letters)
samples with UP will be considered as up-regulated samples
samples with DOWN will be considered as down-regulated samples
samples not given will be neutral (not considered for filter)
samples can also be all UP or all DOWN
-upmin INT1:INT2 Up feature lower bound, [1:1, meaning no filter]
output features counting >= INT1 in >= INT2 UP-samples
-downmax INT1:INT2 Down feature upper bound [inf:1, meaning no filter]
output features counting <= INT1 in >= INT2 DOWN-samples
-reverse Reverse filter, to remove eligible features [false]
-outfmt STR Output format, STR can be tab, fa, or bin [tab]
tab will output the final count table, set by default
fa will output a fasta file containing sequences without counts
bin will output a binary file, to be taken by the -with option of other modules
-outpath STR Path to results after filter
if not provided, output to screen
-counts STR STR can be int or float [int]
int will round the count values to nearest integers
float will output the values in decimals
[USAGE] kamrat mask -idxdir STR [-seq2sel STR -seq2sup STR] [-outfmt STR -outpath STR -counts STR]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-seq2sel STR Sequence fasta file to select, mandatory if -seq2sup not provided.
-seq2sup STR Sequence fastq file to suppress, mandatory if -seq2sel not provided.
-outfmt STR Output format, STR can be tab or bin [tab]
tab will output the final count table
bin will output a binary file, to be taken by the -with option of other modules
-outpath STR Path to extension results
if not provided, output to screen
-counts STR STR can be int or float [int], only works if -outfmt tab
int will round the count values to nearest integers
float will output the values in decimals
[USAGE] kamrat merge -idxdir STR -overlap MAX-MIN [-with STR1[:STR2] -interv STR[:FLOAT] -min-nbkmer INT -outfmt STR -outpath STR -counts STR1[:STR2]]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-overlap MAX-MIN Overlap range for extension, by default: from (k-1) to ⌊k/2⌋
MIN and MAX are integers, MIN <= MAX < k-mer length
-with STR1[:STR2] File indicating k-mers to be extended (STR1) and rep-mode (STR2)
if not provided, all indexed k-mers are used for extension
in the file STR1, a supplementary column of rep-value can be provided
STR2 can be one of {min, minabs, max, maxabs} [min]
-interv STR[:FLOAT] Intervention method for extension [pearson:0.20]
can be one of {none, pearson, spearman, mac}
the threshold may follow a ':' symbol
-min-nbkmer INT Minimal length of extended contigs [0]
-outfmt STR Output format, STR can be tab, fa, or bin [tab]
tab will output the final count table
fa will output a fasta file containing sequences without counts
bin will output a binary file, to be taken by the -with option of other modules
-outpath STR Path to extension results
if not provided, output to screen
-counts STR1[:STR2] How to compute contig counts from k-mer counts, only works if -outfmt tab
STR1 can be rep, mean, or median [rep]
rep uses the representative k-mer count for the contig (see rep-mode in -with
mean computes mean counts among all composite k-mers for each sample
median computes median counts among all composite k-mers for each sample
STR2 can be int or float [int]
int will round the count values to nearest integers
float will output the values in decimals
Three intervention methods are available for choice:
-
pearson: Pearson distance, i.e., 0.5 * [1 - pearson.correlation(x, y)] -
spearman: Spearman distance, i.e., 0.5 * [1 - spearman.correlation(x, y)] -
mac: mean absolute contrast, as described in [Nguyen, H. T., et al., 2021]
The threshold controlling these distances can be given between [0, 1], where 0 indicates the most strict case and 1 indicates the most permissive case (equivalent to none).
KaMRaT score: score features by classification performance, statistical significance, correlation, or variability
[USAGE] kamrat score -idxdir STR -scoreby STR -design STR [-with STR1[:STR2] -seltop NUM -outfmt STR -outpath STR -counts STR]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-scoreby STR Scoring method, mandatory, can be one of:
classification (binary sample labels given by design file)
ttest.padj adjusted p-value of t-test between conditions
ttest.pi π-value of t-test between conditions
snr signal-to-noise ratio between conditions
lr:nfold accuracy by logistic regression classifier
classification (binary or multiple sample labels given by design file)
dids DIDS score
bayes:nfold accuracy by naive Bayes classifier
correlation evaluation (continuous sample labels given by design file)
pearson Pearson correlation with the continunous sample condition
spearman Spearman correlation with the continuous sample condition
unsupervised evaluation (no design file required)
sd standard deviation
rsd1 standard deviation adjusted by mean
rsd2 standard deviation adjusted by min
rsd3 standard deviation adjusted by median
entropy entropy of sample counts + 1
-design STR Path to file indicating sample-condition design, mandatory unless using sd, rsd1, rsd2, rsd3, entropy
without header line, each row can be either:
sample name, sample condition
sample name, sample condition, sample batch (only for lrc, nbc, and svm)
-with STR1[:STR2] File indicating features to score (STR1) and counting mode (STR2)
if not provided, all indexed features are used for scoring
STR2 can be one of [rep, mean, median]
-seltop NUM Select top scored features
if NUM > 1, number of top features to select (should be integer)
if 0 < NUM <= 1, ratio of top features to select
if absent or NUM <= 0, output all features
-outfmt STR Output format, STR can be tab, fa, or bin [tab]
tab will output the final count table
fa will output a fasta file containing sequences without counts
bin will output a binary file, to be taken by the -with option of other modules
-outpath STR Path to scoring result
if not provided, output to screen
-counts STR STR can be int or float [int], only works if -outfmt tab
int will round the count values to nearest integers
float will output the values in decimals
[NOTE] For scoring methods lrc, nbc, and svm, a univariate CV fold number (nfold) can be provided
if nfold = 0, leave-one-out cross-validation
if nfold = 1, without cross-validation, training and testing on the whole datset
if nfold > 1, n-fold cross-validation
For t-test scoring methods, a transformation log2(x + 1) is applied to sample counts
For SVM scoring, sample counts standardization is applied feature by feature
For detailed description of some scoring methods, please refer to the supplementary document of our article for information.
KaMRaT score has an alias as KaMRaT rank, which share the same usage as described above. Please prioritize the "score" name instead of "rank". The alias is only to ensure compatiblility to previous projects, and may be deprecated in future release.
[USAGE] kamrat query -idxdir STR -fasta STR -toquery STR [-withabsent -outpath STR -counts]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-fasta STR Sequence fasta file, mandatory
-toquery STR Query method, mandatory, can be one of:
mean mean count among all composite k-mers for each sample
median median count among all composite k-mers for each sample
-withabsent Output also absent queries (count vector all 0) [default: false]
-outpath STR Path to extension results
if not provided, output to screen
-counts STR STR can be int or float [int], only works if -outfmt tab
int will round the count values to nearest integers
float will output the values in decimals