Skip to content

Software Usage

Haoliang Xue edited this page Mar 15, 2025 · 2 revisions

1. General Usage

1.1 Usage within container

We recommande using KaMRaT within apptainer (previous singularity) container:

apptainer exec -B /bind_src:/bind_des kamrat <CMD> [options] /path/from/{bind_des}/to/input/kmer/table 
    # <CMD> can be one of index, filter, mask, merge, score, query
    # replace "apptainer" to "singularity" when KaMRaT is built by singularity

The -B option is to bind disk partitions to apptainer image, please check apptainer helper for details:

apptainer exec -h

1.2 Usage after building from source

If built from source, KaMRaT can be run by:

/path/to/KaMRaT/kamrat/bin/in/app/directory <CMD> [options] /path/to/input/kmer/table 
    # <CMD> can be one of index, filter, mask, merge, score, query

2. Usage by Operations

In the following sections, we present under the situation of using KaMRaT in apptainer.

For two alternative situations:

  • to run KaMRaT within singularity container, please simply replace the keyword apptainer by singularity;
  • to run KaMRaT after building from source, please replace the leading apptainer exec -B /bind_src:/bind_des by the path to KaMRaT binary file (in the app/ folder).

KaMRaT helper

KaMRaT's top-level helper is accessible by typing one of these commands:

apptainer exec kamrat
apptainer exec kamrat -h
apptainer exec kamrat -help

Helpers of each KaMRaT modules are accessible via one of these commands:

apptainer exec kamrat <CMD>
apptainer exec kamrat <CMD> -h
apptainer exec kamrat <CMD> -help
    # <CMD> can be one of index, filter, mask, merge, score, query

KaMRaT index: index feature count table on disk

[USAGE]    kamrat index -intab STR -outdir STR [-klen INT -unstrand -nfbase INT]

[OPTION]   -h, -help      Print the helper
           -intab STR     Input table for index, mandatory
           -outdir STR    Output index directory, mandatory
           -klen          k-mer length, mandatory if features are k-mer
                              if present, indexation will be switched to k-mer mode
           -unstrand      Unstranded mode, indexation with canonical k-mers
                              if present, indexation will be switched to k-mer mode
           -nfbase INT    Base for calculating normalization factor, not compatible with -nffile STR
                              if not provided, input counts will not be normalized
           -nffile STR    File for loading normalization factor, not compatible with -nfbase INT
                              a tab-separated row of normalization factors, same order as table header

KaMRaT filter: filter feature by expression level

[USAGE]    kamrat filter -idxdir STR -design STR [-upmin INT1:INT2 -downmax INT1:INT2 -reverse -outfmt STR -outpath STR -counts STR]

[OPTION]    -h,-help              Print the helper
            -idxdir STR           Indexing folder by KaMRaT index, mandatory
            -design STR           Path to filter design file, a table of two columns, mandatory
                                      the first column indicate sample names
                                      the second column should be either UP or DOWN (capital letters)
                                          samples with UP will be considered as up-regulated samples
                                          samples with DOWN will be considered as down-regulated samples
                                          samples not given will be neutral (not considered for filter)
                                          samples can also be all UP or all DOWN
            -upmin INT1:INT2      Up feature lower bound, [1:1, meaning no filter]
                                      output features counting >= INT1 in >= INT2 UP-samples
            -downmax INT1:INT2    Down feature upper bound [inf:1, meaning no filter]
                                      output features counting <= INT1 in >= INT2 DOWN-samples
            -reverse              Reverse filter, to remove eligible features [false]
			-outfmt STR           Output format, STR can be tab, fa, or bin [tab]
									  tab will output the final count table, set by default
									  fa will output a fasta file containing sequences without counts
									  bin will output a binary file, to be taken by the -with option of other modules
            -outpath STR          Path to results after filter
                                      if not provided, output to screen
            -counts STR           STR can be int or float [int]
						              int will round the count values to nearest integers
						              float will output the values in decimals

KaMRaT mask: mask k-mers from matrix

[USAGE]    kamrat mask -idxdir STR [-seq2sel STR -seq2sup STR] [-outfmt STR -outpath STR -counts STR]

[OPTION]    -h,-help         Print the helper
            -idxdir STR      Indexing folder by KaMRaT index, mandatory
            -seq2sel STR     Sequence fasta file to select, mandatory if -seq2sup not provided.
            -seq2sup STR     Sequence fastq file to suppress, mandatory if -seq2sel not provided.
            -outfmt STR      Output format, STR can be tab or bin [tab]
			         tab will output the final count table
				 bin will output a binary file, to be taken by the -with option of other modules
            -outpath STR     Path to extension results
                                 if not provided, output to screen
            -counts STR      STR can be int or float [int], only works if -outfmt tab
								 int will round the count values to nearest integers
								 float will output the values in decimals

KaMRaT merge: extend k-mers into contigs

[USAGE]    kamrat merge -idxdir STR -overlap MAX-MIN [-with STR1[:STR2] -interv STR[:FLOAT] -min-nbkmer INT -outfmt STR -outpath STR -counts STR1[:STR2]]

[OPTION]    -h,-help               Print the helper
            -idxdir STR            Indexing folder by KaMRaT index, mandatory
            -overlap MAX-MIN       Overlap range for extension, by default: from (k-1) to ⌊k/2⌋
                                       MIN and MAX are integers, MIN <= MAX < k-mer length
            -with STR1[:STR2]      File indicating k-mers to be extended (STR1) and rep-mode (STR2)
                                       if not provided, all indexed k-mers are used for extension
                                       in the file STR1, a supplementary column of rep-value can be provided
                                       STR2 can be one of {min, minabs, max, maxabs} [min]
            -interv STR[:FLOAT]    Intervention method for extension [pearson:0.20]
                                       can be one of {none, pearson, spearman, mac}
                                       the threshold may follow a ':' symbol
            -min-nbkmer INT        Minimal length of extended contigs [0]
            -outfmt STR            Output format, STR can be tab, fa, or bin [tab]
									   tab will output the final count table
									   fa will output a fasta file containing sequences without counts
									   bin will output a binary file, to be taken by the -with option of other modules
            -outpath STR           Path to extension results
                                       if not provided, output to screen
            -counts STR1[:STR2]    How to compute contig counts from k-mer counts, only works if -outfmt tab
							           STR1 can be rep, mean, or median [rep]
								           rep uses the representative k-mer count for the contig (see rep-mode in -with
								           mean computes mean counts among all composite k-mers for each sample
								           median computes median counts among all composite k-mers for each sample
									   STR2 can be int or float [int]
										   int will round the count values to nearest integers
										   float will output the values in decimals

Three intervention methods are available for choice:

  • pearson: Pearson distance, i.e., 0.5 * [1 - pearson.correlation(x, y)]
  • spearman: Spearman distance, i.e., 0.5 * [1 - spearman.correlation(x, y)]
  • mac: mean absolute contrast, as described in [Nguyen, H. T., et al., 2021]

The threshold controlling these distances can be given between [0, 1], where 0 indicates the most strict case and 1 indicates the most permissive case (equivalent to none).

KaMRaT score: score features by classification performance, statistical significance, correlation, or variability

[USAGE]    kamrat score -idxdir STR -scoreby STR -design STR [-with STR1[:STR2] -seltop NUM -outfmt STR -outpath STR -counts STR]

[OPTION]    -h,-help             Print the helper
            -idxdir STR          Indexing folder by KaMRaT index, mandatory
            -scoreby STR         Scoring method, mandatory, can be one of: 
                                     classification (binary sample labels given by design file)
                                         ttest.padj      adjusted p-value of t-test between conditions
                                         ttest.pi        π-value of t-test between conditions
                                         snr             signal-to-noise ratio between conditions
                                         lr:nfold        accuracy by logistic regression classifier
                                     classification (binary or multiple sample labels given by design file)
                                         dids            DIDS score
                                         bayes:nfold     accuracy by naive Bayes classifier
                                     correlation evaluation (continuous sample labels given by design file)
                                         pearson         Pearson correlation with the continunous sample condition
                                         spearman        Spearman correlation with the continuous sample condition
                                     unsupervised evaluation (no design file required)
                                         sd              standard deviation
                                         rsd1            standard deviation adjusted by mean
                                         rsd2            standard deviation adjusted by min
                                         rsd3            standard deviation adjusted by median
                                         entropy         entropy of sample counts + 1
            -design STR          Path to file indicating sample-condition design, mandatory unless using sd, rsd1, rsd2, rsd3, entropy
                                     without header line, each row can be either: 
                                         sample name, sample condition
                                         sample name, sample condition, sample batch (only for lrc, nbc, and svm)
            -with STR1[:STR2]    File indicating features to score (STR1) and counting mode (STR2)
                                     if not provided, all indexed features are used for scoring
                                     STR2 can be one of [rep, mean, median]
            -seltop NUM          Select top scored features
                                     if NUM > 1, number of top features to select (should be integer)
                                     if 0 < NUM <= 1, ratio of top features to select
                                     if absent or NUM <= 0, output all features
            -outfmt STR          Output format, STR can be tab, fa, or bin [tab]
									 tab will output the final count table
									 fa will output a fasta file containing sequences without counts
									 bin will output a binary file, to be taken by the -with option of other modules
            -outpath STR         Path to scoring result
                                     if not provided, output to screen
            -counts STR          STR can be int or float [int], only works if -outfmt tab
					                 int will round the count values to nearest integers
					                 float will output the values in decimals

[NOTE]      For scoring methods lrc, nbc, and svm, a univariate CV fold number (nfold) can be provided
                if nfold = 0, leave-one-out cross-validation
                if nfold = 1, without cross-validation, training and testing on the whole datset
                if nfold > 1, n-fold cross-validation
            For t-test scoring methods, a transformation log2(x + 1) is applied to sample counts
            For SVM scoring, sample counts standardization is applied feature by feature

For detailed description of some scoring methods, please refer to the supplementary document of our article for information.

KaMRaT score has an alias as KaMRaT rank, which share the same usage as described above. Please prioritize the "score" name instead of "rank". The alias is only to ensure compatiblility to previous projects, and may be deprecated in future release.

KaMRaT query: query sequences

[USAGE]    kamrat query -idxdir STR -fasta STR -toquery STR [-withabsent -outpath STR -counts]

[OPTION]    -h,-help         Print the helper
            -idxdir STR      Indexing folder by KaMRaT index, mandatory
            -fasta STR       Sequence fasta file, mandatory
            -toquery STR     Query method, mandatory, can be one of:
                                 mean        mean count among all composite k-mers for each sample
                                 median      median count among all composite k-mers for each sample
            -withabsent      Output also absent queries (count vector all 0) [default: false]
            -outpath STR     Path to extension results
                                 if not provided, output to screen
            -counts STR      STR can be int or float [int], only works if -outfmt tab
					             int will round the count values to nearest integers
					             float will output the values in decimals