GitHub - apinto17/Breast-Cancer-Variant-Discovery: A variant calling pipeline used to find indels and SNP's in the JIMT1 breast cancer cell line

Intro

This is a pipeline for a full variant calling work-flow starting with fastq files to be downloaded from here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1172971

These fastq files come from raw sequencing reads from a breast cancer cell line called JIMT1. The purpose of this pipeline is to identify variants between a reference and JIMT1, and find genes that are the most heavily mutated. I make use of a gene annotation file (gtf) called gencode.gtf.gz which can be found in this folder.

To Run

IMPORTANT: pipeline.sh will probably take around 24-48 hours to run, and is not necessary to see the results of the analysis. It is more meant to document how I was able to get the output.vcf.gz and all the QC and alignment steps up to it.

In order to run the variants.py file, which contains the post-variant calling analysis, do the following:

Install dependencies from requirements.txt with pip install -r requirements.txt
Make sure you have python 3.12 or greater (older version might work too, but this is best for reproducibility)
Run python variants.py

This will run all of the analysis that is seen in Report.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
QC		QC
README.md		README.md
Report.pdf		Report.pdf
Screenshot 2025-06-09 at 7.49.23 PM.png		Screenshot 2025-06-09 at 7.49.23 PM.png
Screenshot 2025-06-09 at 7.52.39 PM.png		Screenshot 2025-06-09 at 7.52.39 PM.png
gencode.gtf.gz		gencode.gtf.gz
genomic_variant_regions.png		genomic_variant_regions.png
genomic_variant_regions_gt_1_allele.png		genomic_variant_regions_gt_1_allele.png
output.vcf		output.vcf
output.vcf.gz		output.vcf.gz
output.vcf.gz.tbi		output.vcf.gz.tbi
pipeline.sh		pipeline.sh
requirements.txt		requirements.txt
variants.py		variants.py
variants_per_gene.png		variants_per_gene.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

To Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intro

To Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages