GitHub - WilliamLautertD/Lab-code

QC pipelines

deeptools tools for exploring deep sequencing data

Analysis of correlation of bamfiles

Mapping pipelines

Automated FASTQ to CNV workflow

An end-to-end Nextflow workflow now lives in main.nf. It automates:

Raw-read FastQC
fastp trimming
Trimmed-read FastQC
BWA-MEM mapping with read groups
samtools duplicate marking and filtered BAM generation with samtools flagstat
MultiQC summary
CNV calling with CNVkit, GATK CNV, or both

Files

config/samples.tsv - one row per sample, with FASTQ paths and CNV reference group.
nextflow.config - reference genome, targets, output directory, executors, resources, and CNV settings.
envs/qc_mapping_cnv.yaml - Conda environment for Nextflow and the required tools.
main.nf - automated pipeline definition.

Quick start

Edit config/samples.tsv and nextflow.config, then run:

conda env create -f envs/qc_mapping_cnv.yaml
conda activate qc_mapping_cnv
nextflow run main.nf -profile conda -resume

For an HPC run using the built-in SLURM profile:

nextflow run main.nf -profile slurm -resume

CNV modes

Set params.cnv_method in nextflow.config:

cnvkit - build CNVkit references from samples marked normal, control, or reference, then call .called.cns files.
gatk - preprocess target intervals, collect read counts, build a panel of normals, denoise, segment, and call .called.seg files.
both - run both CNVkit and GATK CNV outputs from the same marked BAMs.

Samples with cnv_role set to normal, control, or reference are used for CNV references. Other roles, such as case or treated, are CNV-called against their cnv_reference_group.

Generic ChIP-seq Nextflow workflow

For ChIP-seq, a separate generic Nextflow workflow is available in chipseq_main.nf with configuration in chipseq_nextflow.config.

Why this is generic

Input FASTQ files are fully controlled by config/chipseq_samples.tsv (no fixed filename pattern is required).
Sample names can be any value in the sample column.
The control_sample column pairs each ChIP sample with the exact Input sample ID for downstream bamCompare output.
Paths and run parameters are set in chipseq_nextflow.config, including reference genome, mapping filters, bigWig options, and bamCompare settings (ratio mode with duplicate ignoring, no scale-factor normalization).

Files

chipseq_main.nf - ChIP-seq QC, trimming, mapping/filtering, optional bigWig generation, optional ChIP/Input bamCompare, and MultiQC.
chipseq_nextflow.config - ChIP-seq pipeline parameters and runtime profiles.
config/chipseq_samples.tsv - sample sheet template; edit file paths, sample IDs, and control_sample pairing.

Quick start

nextflow run chipseq_main.nf -c chipseq_nextflow.config -profile conda -resume

For SLURM:

nextflow run chipseq_main.nf -c chipseq_nextflow.config -profile slurm -resume

Copy Number Variation (CNV) Analysis Pipelines

GATK & CNVkit Workflows for Targeted and Whole-Exome Sequencing

Overview

This repository provides reproducible, HPC-ready workflows for copy number variation (CNV) analysis using two independent pipelines:

GATK CNV Workflow — Best-practice CNV calling using the Broad Institute's Genome Analysis Toolkit (GATK).
CNVkit Workflow — Coverage-based CNV detection using CNVkit for targeted and hybrid capture sequencing.

Each workflow includes:

Ready-to-run SLURM batch scripts for HPC clusters
Step-by-step setup and execution guides
Notes on parameters, expected outputs, and biological interpretation

Reproducibility

All scripts are fully modular and can be customized per project. Each step includes:

Input and output definitions
Environment setup instructions
Optional parameters for advanced tuning To rerun or adapt:
Update paths in the scripts (BAM, REF, TARGETS, etc.)
Submit each job to the HPC queue using sbatch
Review logs and resulting CNV tables/plots

Duplicate and fusion genes

Manual inspection of fusioned genes
Using the "supplementary", "mates on different chromosomes", and mates on same chromosomes but in distant than expected" reads.
Compare it with Normal.

Repository Structure

Folder Organization

CNVkit: Contains scripts and tools for copy number variation analysis using CNVkit.
GATK_CNV: Includes files related to the Genome Analysis Toolkit for copy number variations.
Mapping: Houses the mapping files and scripts used for aligning sequencing data.
QC: Contains quality control metrics and reports for the datasets.
ChIP-Seq_Chromatin_analysis: Includes analysis scripts and data related to ChIP-Seq experiments.
Duplication_fusion_genes: Contains files related to the analysis of gene duplications and fusions.
deeptools: Houses scripts and tools used for deep data analysis.
bcftools: Contains tools for variant calling and manipulating VCF files.
fastq: Houses FASTQ files of raw sequencing data.

Technology Stack

Shell: 84.9%
Jupyter Notebook: 15.1%

Citation

relevant tools:

GATK CNV – Benjamin et al., Nature Genetics (2013)
CNVkit – Talevich et al., PLOS Computational Biology (2016)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QC pipelines

Mapping pipelines

Automated FASTQ to CNV workflow

Files

Quick start

CNV modes

Generic ChIP-seq Nextflow workflow

Why this is generic

Files

Quick start

Copy Number Variation (CNV) Analysis Pipelines

Overview

Reproducibility

Duplicate and fusion genes

Repository Structure

Folder Organization

Technology Stack

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.nextflow		.nextflow
CNVkit		CNVkit
ChIP-Seq_Chromatin_analysis		ChIP-Seq_Chromatin_analysis
Duplication_fusion_genes		Duplication_fusion_genes
GATK_CNV		GATK_CNV
Mapping		Mapping
QC		QC
bcftools		bcftools
config		config
deeptools		deeptools
envs		envs
fastq		fastq
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.nextflow.log		.nextflow.log
README.md		README.md
chipseq_main.nf		chipseq_main.nf
chipseq_nextflow.config		chipseq_nextflow.config
main.nf		main.nf
nextflow.config		nextflow.config

Folders and files

Latest commit

History

Repository files navigation

QC pipelines

Mapping pipelines

Automated FASTQ to CNV workflow

Files

Quick start

CNV modes

Generic ChIP-seq Nextflow workflow

Why this is generic

Files

Quick start

Copy Number Variation (CNV) Analysis Pipelines

Overview

Reproducibility

Duplicate and fusion genes

Repository Structure

Folder Organization

Technology Stack

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages