Skip to content

euchrogene/SIMPLE_Bulked_Segregant_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

This is for EuchroGene Members.

Simple BSA (Bulk Segregant Analysis) Simple BSA is a forward-genetics pipeline for mapping EMS-induced point mutations in plant populations. Rather than analysing genes in isolation, Simple BSA leverages allele-frequency shifts across the whole genome to pinpoint the genomic interval — and ultimately the causal gene — responsible for a mutant phenotype.

How It Works:

  1. Alignment: Paired-end or single-end FASTQ reads from the mutant and wild-type bulks are aligned to the reference genome using BWA-MEM.
  2. Variant Calling: Coordinate-sorted, deduplicated BAM files are processed through GATK HaplotypeCaller to call SNPs genome-wide.
  3. Annotation: Variants are functionally annotated with SnpEff to identify missense, nonsense, and splice-site changes in coding regions.
  4. Mapping: Allele-ratio values (mutant alt / total reads) are plotted as a loess-smoothed Manhattan plot. A sharp peak indicates the chromosomal region linked to the mutant phenotype.
  5. Candidate Output: Coding SNPs within the peak region are ranked and reported as candidate causal mutations.

Required Inputs:

  1. FASTQ files from a bulked-segregant experiment — one mutant bulk and one wild-type bulk, renamed according to the naming convention below.
  2. Species selection — choose from the supported species list; the reference genome and known-SNP VCF are downloaded automatically on first use.

Post-Analysis:

  • Review the HTML report for an interactive summary of allele-ratio plots, candidate SNPs, and run statistics.
  • Open <LINE>.Rplot.loess.1.pdf and <LINE>.Rplot.loess.3.pdf for publication-ready allele-ratio plots.
  • The METHODS_FOR_PUBLICATION.txt file contains a complete, ready-to-paste methods paragraph with all tool versions and citations.

Installation

0. Install EG_tools   (skip if already installed)

wget https://github.com/euchrogene/EG_tools/raw/refs/heads/main/EG_tools
sudo chmod 777 EG_tools
sudo mv EG_tools /usr/bin

1. Install Simple BSA

sudo EG_tools install -r https://github.com/euchrogene/SIMPLE_Bulked_Segregant_Analysis.git -d SIMPLE_Bulked_Segregant_Analysis -e Simple_BSA_v.1.0 -m "EMS point-mutation mapping pipeline using the SIMPLE bulk segregant analysis tool."

2. Display installed software

EG_tools

3. Show help contents

Simple_BSA_v.1.0

4. Uninstall

sudo EG_tools uninstall -t Simple_BSA_v.1.0 -i managene7/simple_bsa:v.1.0

Help Contents:

============================================================================
EuchroGene Simple BSA Pipeline v.1.0 - Forward Genetics Mutation Mapping
============================================================================

DESCRIPTION:
  Automated bulk segregant analysis pipeline for mapping EMS point mutations
  in plant genomes. Generates candidate gene lists, allele-ratio plots, and
  a publication-ready HTML report from raw FASTQ input.

USAGE:
  Simple_BSA_v.1.0 -fastq_dir <DIR> [OPTIONS]

FASTQ NAMING CONVENTION:
  Mutant bulk :  <LINE_NAME>.mut.R1.fastq[.gz]   (+ R2 for paired-end)
  Wild-type   :  <LINE_NAME>.wt.R1.fastq[.gz]    (+ R2 for paired-end)
  Bare form   :  mut.R1.fastq / wt.R1.fastq      (LINE_NAME defaults to EMS)

  Example — screen named "gl1":
    gl1.mut.R1.fastq.gz   gl1.mut.R2.fastq.gz
    gl1.wt.R1.fastq.gz    gl1.wt.R2.fastq.gz

REQUIRED:
  -fastq_dir <DIR>      Directory containing renamed FASTQ files

OPTIONS:
  -species   <NAME>     Reference species key (default: Arabidopsis_thaliana)
  -line_name <NAME>     Output filename prefix, letters/underscores only
                        (default: EMS)
  -mutation_type <TYPE> recessive | dominant (default: recessive)
  -threads   <N>        CPU threads for BWA and SAMtools (default: 32)
  -min_coverage <N>     Minimum read depth to report a SNP (default: 5)
  -max_memory <SIZE>    JVM heap for GATK/SnpEff, e.g. 8g, 16g, 32g
                        (default: 32g — does not apply to BWA)
  -reuse     <BOOL>     Resume an interrupted run: true | false (default: false)
  -cache     <DIR>      Reference genome cache directory
                        (default: /opt/simple_cache)
  -out       <DIR>      Override the output folder name

SUPPORTED SPECIES:
  Arabidopsis_thaliana       TAIR10 / TAIR11  (135 Mb)
  Oryza_sativa_Japonica      MSU7             (374 Mb)
  Oryza_sativa_Indica        BGI              (426 Mb)
  Zea_mays                   B73 v4           (2.3 Gb)
  Solanum_lycopersicum       ITAG3.2          (828 Mb)
  Medicago_truncatula        Mt4.0            (390 Mb)
  Lotus_japonicus            Lj3.0            (472 Mb)
  Sorghum_bicolor            v3.1             (738 Mb)
  Brachypodium_distachyon    v3.1             (272 Mb)
  Glycine_max                Wm82.a4          (978 Mb)
  Phaseolus_vulgaris         v2.1             (521 Mb)

EXAMPLES:

  # Arabidopsis recessive EMS screen, paired-end reads
  Simple_BSA_v.1.0 -fastq_dir ./fastq \
    -species Arabidopsis_thaliana -line_name gl1 -mutation_type recessive

  # Rice dominant mutation, 8 threads, 16 GB RAM
  Simple_BSA_v.1.0 -fastq_dir ./fastq \
    -species Oryza_sativa_Japonica -line_name rl1 \
    -mutation_type dominant -threads 8 -max_memory 16g

  # Maize screen on a high-memory server (64 threads, 128 GB)
  Simple_BSA_v.1.0 -fastq_dir ./fastq \
    -species Zea_mays -line_name bz1 \
    -threads 64 -max_memory 128g

  # Resume an interrupted run
  Simple_BSA_v.1.0 -fastq_dir ./fastq \
    -species Arabidopsis_thaliana -line_name gl1 -reuse true

OUTPUT FILES:

  <LINE_NAME>_Simple_BSA_v.1.0_Results/
  ├── <LINE>.allSNPs.txt              All EMS SNPs with allele counts
  ├── <LINE>.candidates.txt           Filtered coding candidate mutations
  ├── <LINE>.Rplot.loess.1.pdf        Allele-ratio Manhattan plot (threshold > 0.1)
  ├── <LINE>.Rplot.loess.3.pdf        Allele-ratio Manhattan plot (threshold > 0.3)
  ├── <LINE>_SNP_SUMMARY.csv          Machine-readable SNP table
  ├── <LINE>_ANALYSIS_REPORT.html     Interactive HTML analysis report
  ├── METHODS_FOR_PUBLICATION.txt     Ready-to-paste methods section
  └── EMS_RUN_SPEC.json               Full run specification for reproducibility

SUPPORT:
  Bugs / Questions: bioinformatics@euchrogene.com

============================================================================

Citation

If you use this pipeline in published research, please cite:

Candela H, Martínez-Laborda A, Micol JL, Ventura L (2017) A Simple Pipeline for Mapping Point Mutations. Plant Physiology 174(3):1307–1313. doi:10.1104/pp.17.00415

The METHODS_FOR_PUBLICATION.txt file generated at the end of each run contains a complete methods paragraph formatted for journal submission, including all tool versions and references.

About

This is Bulked Segregant Analysis tool for plant species

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors