KamScanRust

KamScanRust Performs fast parallel statistical tests over long k-mer, contig or other features count matrices, streaming differential analysis between two conditions (Welch T-test, Wilcoxon rank-sum, or ANCOVA) — multi-threaded. Adapted from https://github.com/Transipedia/KamScan.

🛠️ Installation

Prerequisites

Rust 1.70 or later (Install Rust)
Git (optional)

Steps

# Clone the repository
git clone https://github.com/Transipedia/KamScanRust.git
cd KamScanRust

# Build and install
cargo build --release

Usage

Basic Command

kamscan [OPTIONS] --design <DESIGN>

Options

-t, --tsv Input counts file, space- or tab-separated. Use '-' or omit to read from stdin
-d, --design Design file: sample_name condition [covariate]
-o, --output Output file [default: stdout]
-p, --pseudo Pseudocount added before log2 transformation [default: 1.0]
-m, --min-count <MIN_COUNT> Skip features where ALL relevant samples have count <= this value [default: 0.0]
-s, --max-pvalue <MAX_PVALUE> Only report features with p-value (or adj p-value when --bh) <= this threshold [default: 0.05]
--test Statistical test to use [default: ttest] Possible values: - ttest: Welch's two-sample t-test on log2-transformed counts (default) - wilcoxon: Wilcoxon rank-sum test (Mann-Whitney U) on log2-transformed counts - ancova: ANCOVA: remove linear covariate effect before testing group difference. Requires a numeric third column in the design file
--no-header <HEADER_FILE> Counts file has no header row; instead, read column names (one per line) from HEADER_FILE. The first name is the feature-ID column; remaining names are sample names matched against the design file in order
-j, --threads Number of worker threads [default: logical CPU count]
--chunk-factor <CHUNK_FACTOR> Chunk size multiplier: each chunk holds chunk_factor × 64 KiB of raw input text. Increase for files with very long lines. [default: 16 → 1 MiB chunks]
--norm-counts <NORM_COUNTS> Optional file of pre-computed column sums for CPM normalization. Format: sample_name TAB count_sum (one line per sample, any order)
--norm-scale <NORM_SCALE> Normalization factor for the CPM normalization [default: 1000000]
--bh Apply Benjamini-Hochberg FDR correction (two-pass run). Incompatible with reading the counts file from stdin
--output-mode <OUTPUT_MODE> What to output for significant features. 'stats': means, logFC, test statistic, p-value (default). 'raw': the original input line verbatim (useful for large count matrices)
```
   Possible values:
   - stats: Output statistics columns: means, logFC, test stat, p-value (default)
   - raw:   Output the original input line verbatim for each significant feature
```
-h, --help Print help (see a summary with '-h')

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
target		target
test		test
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
testkamscan.slurm		testkamscan.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KamScanRust

🛠️ Installation

Prerequisites

Steps

Usage

Basic Command

Options

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KamScanRust

🛠️ Installation

Prerequisites

Steps

Usage

Basic Command

Options

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages