Simple BSA (Bulk Segregant Analysis) Simple BSA is a forward-genetics pipeline for mapping EMS-induced point mutations in plant populations. Rather than analysing genes in isolation, Simple BSA leverages allele-frequency shifts across the whole genome to pinpoint the genomic interval — and ultimately the causal gene — responsible for a mutant phenotype.
- Alignment: Paired-end or single-end FASTQ reads from the mutant and wild-type bulks are aligned to the reference genome using BWA-MEM.
- Variant Calling: Coordinate-sorted, deduplicated BAM files are processed through GATK HaplotypeCaller to call SNPs genome-wide.
- Annotation: Variants are functionally annotated with SnpEff to identify missense, nonsense, and splice-site changes in coding regions.
- Mapping: Allele-ratio values (mutant alt / total reads) are plotted as a loess-smoothed Manhattan plot. A sharp peak indicates the chromosomal region linked to the mutant phenotype.
- Candidate Output: Coding SNPs within the peak region are ranked and reported as candidate causal mutations.
- FASTQ files from a bulked-segregant experiment — one mutant bulk and one wild-type bulk, renamed according to the naming convention below.
- Species selection — choose from the supported species list; the reference genome and known-SNP VCF are downloaded automatically on first use.
- Review the HTML report for an interactive summary of allele-ratio plots, candidate SNPs, and run statistics.
- Open
<LINE>.Rplot.loess.1.pdfand<LINE>.Rplot.loess.3.pdffor publication-ready allele-ratio plots. - The
METHODS_FOR_PUBLICATION.txtfile contains a complete, ready-to-paste methods paragraph with all tool versions and citations.
wget https://github.com/euchrogene/EG_tools/raw/refs/heads/main/EG_tools
sudo chmod 777 EG_tools
sudo mv EG_tools /usr/bin
sudo EG_tools install -r https://github.com/euchrogene/SIMPLE_Bulked_Segregant_Analysis.git -d SIMPLE_Bulked_Segregant_Analysis -e Simple_BSA_v.1.0 -m "EMS point-mutation mapping pipeline using the SIMPLE bulk segregant analysis tool."
EG_tools
Simple_BSA_v.1.0
sudo EG_tools uninstall -t Simple_BSA_v.1.0 -i managene7/simple_bsa:v.1.0
============================================================================
EuchroGene Simple BSA Pipeline v.1.0 - Forward Genetics Mutation Mapping
============================================================================
DESCRIPTION:
Automated bulk segregant analysis pipeline for mapping EMS point mutations
in plant genomes. Generates candidate gene lists, allele-ratio plots, and
a publication-ready HTML report from raw FASTQ input.
USAGE:
Simple_BSA_v.1.0 -fastq_dir <DIR> [OPTIONS]
FASTQ NAMING CONVENTION:
Mutant bulk : <LINE_NAME>.mut.R1.fastq[.gz] (+ R2 for paired-end)
Wild-type : <LINE_NAME>.wt.R1.fastq[.gz] (+ R2 for paired-end)
Bare form : mut.R1.fastq / wt.R1.fastq (LINE_NAME defaults to EMS)
Example — screen named "gl1":
gl1.mut.R1.fastq.gz gl1.mut.R2.fastq.gz
gl1.wt.R1.fastq.gz gl1.wt.R2.fastq.gz
REQUIRED:
-fastq_dir <DIR> Directory containing renamed FASTQ files
OPTIONS:
-species <NAME> Reference species key (default: Arabidopsis_thaliana)
-line_name <NAME> Output filename prefix, letters/underscores only
(default: EMS)
-mutation_type <TYPE> recessive | dominant (default: recessive)
-threads <N> CPU threads for BWA and SAMtools (default: 32)
-min_coverage <N> Minimum read depth to report a SNP (default: 5)
-max_memory <SIZE> JVM heap for GATK/SnpEff, e.g. 8g, 16g, 32g
(default: 32g — does not apply to BWA)
-reuse <BOOL> Resume an interrupted run: true | false (default: false)
-cache <DIR> Reference genome cache directory
(default: /opt/simple_cache)
-out <DIR> Override the output folder name
SUPPORTED SPECIES:
Arabidopsis_thaliana TAIR10 / TAIR11 (135 Mb)
Oryza_sativa_Japonica MSU7 (374 Mb)
Oryza_sativa_Indica BGI (426 Mb)
Zea_mays B73 v4 (2.3 Gb)
Solanum_lycopersicum ITAG3.2 (828 Mb)
Medicago_truncatula Mt4.0 (390 Mb)
Lotus_japonicus Lj3.0 (472 Mb)
Sorghum_bicolor v3.1 (738 Mb)
Brachypodium_distachyon v3.1 (272 Mb)
Glycine_max Wm82.a4 (978 Mb)
Phaseolus_vulgaris v2.1 (521 Mb)
EXAMPLES:
# Arabidopsis recessive EMS screen, paired-end reads
Simple_BSA_v.1.0 -fastq_dir ./fastq \
-species Arabidopsis_thaliana -line_name gl1 -mutation_type recessive
# Rice dominant mutation, 8 threads, 16 GB RAM
Simple_BSA_v.1.0 -fastq_dir ./fastq \
-species Oryza_sativa_Japonica -line_name rl1 \
-mutation_type dominant -threads 8 -max_memory 16g
# Maize screen on a high-memory server (64 threads, 128 GB)
Simple_BSA_v.1.0 -fastq_dir ./fastq \
-species Zea_mays -line_name bz1 \
-threads 64 -max_memory 128g
# Resume an interrupted run
Simple_BSA_v.1.0 -fastq_dir ./fastq \
-species Arabidopsis_thaliana -line_name gl1 -reuse true
OUTPUT FILES:
<LINE_NAME>_Simple_BSA_v.1.0_Results/
├── <LINE>.allSNPs.txt All EMS SNPs with allele counts
├── <LINE>.candidates.txt Filtered coding candidate mutations
├── <LINE>.Rplot.loess.1.pdf Allele-ratio Manhattan plot (threshold > 0.1)
├── <LINE>.Rplot.loess.3.pdf Allele-ratio Manhattan plot (threshold > 0.3)
├── <LINE>_SNP_SUMMARY.csv Machine-readable SNP table
├── <LINE>_ANALYSIS_REPORT.html Interactive HTML analysis report
├── METHODS_FOR_PUBLICATION.txt Ready-to-paste methods section
└── EMS_RUN_SPEC.json Full run specification for reproducibility
SUPPORT:
Bugs / Questions: bioinformatics@euchrogene.com
============================================================================
If you use this pipeline in published research, please cite:
Candela H, Martínez-Laborda A, Micol JL, Ventura L (2017) A Simple Pipeline for Mapping Point Mutations. Plant Physiology 174(3):1307–1313. doi:10.1104/pp.17.00415
The METHODS_FOR_PUBLICATION.txt file generated at the end of each run contains a complete methods paragraph formatted for journal submission, including all tool versions and references.