This repository contains scripts and resources used to identify extrachromosomal DNA (ecDNA) in the MMTV-Cre E2F5-flox mouse model of breast cancer. The analysis leverages the nf-core/circdna pipeline and downstream tools including AmpliconArchitect, CycleViz, and AmpliconReconstructor.
circdna_e2f5/
├── fetchngs_results/ # Metadata and samplesheet and raw data
│ ├── fastq/
│ ├── metadata/
│ └── samplesheet/
├── mosek_license_dir/ # MOSEK license required for AmpliconArchitect
│ └── mosek.lic
├── results/ # Output from pipeline (e.g., sv_view images)
│ └── ampliconsuite/
├── data_repo/ # Reference data required by AmpliconArchitect (for mm10)
├── run_circdna.sh # Script to run nf-core/circdna
├── run_fetchngs.sh # Script to fetch raw fastq files via nf-core/fetchngs
├── icer.config # Custom config for running on MSU HPCC
├── ids.csv # List of run accessions (e.g., SRX IDs)
├── LICENSE
├── README.md- Nextflow >= 22.10.1
- nf-core/fetchngs and nf-core/circdna
- Singularity (or Docker)
- MOSEK license for AmpliconArchitect
- Reference genome repository for AmpliconArchitect (see below)
- Access to HPC with SLURM (configured in
icer.config)
Clone this repo:
git clone https://github.com/johnvusich/circdna_e2f5.git
cd circdna_e2f5AmpliconArchitect requires a MOSEK license. Academic users can request a free license as follows:
- Visit the MOSEK license request page.
- Fill out the form using your academic email address.
- Once approved, download the license file (typically named
mosek.lic). - Place the file in the following path in your local setup:
circdna_e2f5/mosek_license_dir/mosek.licEnsure the path is correctly passed to the --mosek_license_dir parameter in the pipeline script.
AmpliconArchitect requires a structured reference data repository to function. In this analysis, the repository is set up at:
$SCRATCH/circdna_e2f5/data_repo
To set this up exactly as used in the pipeline, run the following script:
bash setup_data_repo.shThis script will:
- Create the expected
data_repodirectory - Download the
mm10.tar.gzreference bundle - Unpack it with proper permissions
- Set the
AA_DATA_REPOenvironment variable (used by the pipeline)
circdna_e2f5/data_repo
├── coverage.stats
├── mm10
│ ├── annotations
│ │ ├── gencode.vM10.basic.annotation_genes.gff
│ │ └── mm10GenomicSuperDup.tab
│ ├── cancer
│ │ ├── oncogene_list.txt
│ │ └── oncogenes
│ │ ├── AC_oncogene_set_mm10.gff
│ │ └── mm10_consensus_oncogenes_list_from_hg19.gff
│ ├── dummy_ploidy.vcf
│ ├── file_list.txt
│ ├── file_sources.txt
│ ├── last_updated.txt
│ ├── mm10-blacklist.v2.bed
│ ├── mm10_centromere.bed
│ ├── mm10_cnvkit_filtered_ref.cnn
│ ├── mm10_conserved_gain5.bed
│ ├── mm10_conserved_gain5_onco_subtract.bed
│ ├── mm10.fa
│ ├── mm10.fa.fai
│ ├── mm10.Hardison.Excludable.full.bed
│ ├── mm10_k35.mappability.bedgraph
│ ├── mm10_merged_centromeres_conserved_sorted.bed
│ ├── mm10_noAlt.fa.fai
│ └── onco_bed.bed
├── mm10.tar.gzMake sure data_repo is accessible by the pipeline or container environment and define its path when running AmpliconArchitect manually, or ensure the environment is configured to detect it.
Use run_fetchngs.sh to download data from SRA using the IDs listed in ids.csv.
sbatch run_fetchngs.shEdit samplesheet.csv and multiqc_config.yml as needed in fetchngs_results/samplesheet/.
Use run_circdna.sh to launch the circular DNA analysis pipeline with AmpliconArchitect.
sbatch run_circdna.shThis script uses icer.config for cluster-specific settings on the MSU HPCC.
- CycleViz and AmpliconReconstructor can be run using output files from AmpliconArchitect.
- Visual outputs (e.g.,
.pngfiles) are stored inresults/ampliconsuite/ampliconarchitect/sv_view.
The following are example structural variant views of predicted circular DNA generated by AmpliconArchitect:
These figures can be found in: results/ampliconsuite/ampliconarchitect/sv_view/
If you use this code, please cite:
MIT License – see the LICENSE file for details.