This repository contains a reproducible Nextflow (DSL2) pipeline that uses a neural-network model to predict CTCF occupancy (Convolutional Neural Network) from SMF/motif data and then prepares the inputs for loop-extrusion simulations to validate those predictions. It keeps key region fields, writes an occupancy track, and produces barrier lists plus a paramdict for simulations. The whole pipeline runs in a Singularity/Apptainer container for consistent, portable results.
- Nextflow ≥ 24.10
- Singularity/Apptainer (tested with Singularity 3.8.x)
Required files:
-
Reference genome FASTA (e.g., mm10.fa) and index in a readable location
-
CTCF peaks BED/CSV/TSV (first 3 columns = chrom,start,end)
-
Model weights file (see workflow/files/model_weights)
-
CTCF PFM file (e.g., workflow/files/MA0139.1.pfm or .smooth.pfm)
- Clone the repo
git clone https://github.com/Fudenberg-Research-Group/OccupancyInputCTCF.git cd OccupancyInputCTCF
-
Build the Singularity/Apptainer image (use --remote if you don't have local root privileges)
singularity build --remote occufold.sif singularity.def -
Run the pipeline (predicted occupancy mode) note: Replace /path/to/CTCF_peaks.bed with your own peaks file. If you don't have peaks, you can DROP the --peaks line.
nextflow run main.nf -profile singularity \ --region "chr1:10_000_000-11_500_590" \ --peaks /path/to/CTCF_peaks.bed \ --outdir results -resume
Outputs
- Step 1:
results/step1/REGION.csv— region table (e.g.,chrom,start,end,mid,strand,...). - Step 2:
results/step2/REGION.occupancy.csv— Step-1 columns plus model outputs (e.g.,Accessible,Bound,Nucleosome.occupied). - Step 3:
results/step3/REGION.occupancy.refined_occupancy.csv,barriers.csv,ctcf_lists.csv,paramdict.json. - Step 4:
results/step4/REGION.1d_sims - Step 5:
results/step5/Chip.png, Hi-C.png
