UBS-seq (Uracil Bisulfite Sequencing) analysis pipeline for samples YSL-5 and YSL-6, mapping to the human reference genome.
This project converts an existing Jupyter notebook-based UBS-seq pipeline into a standalone, reproducible bash script. The pipeline processes paired-end sequencing data through trimming, alignment, duplicate handling, and C-to-T conversion table generation.
- trim -- Adapter trimming and quality filtering (cutadapt)
- map -- Bisulfite-aware alignment to human genome (hisat-3n)
- mark_duplicates -- PCR duplicate marking (GATK)
- dedup -- Duplicate removal (samtools)
- conv_unconv3n -- C-to-T conversion/unconversion tables (hisat-3n-table)
| File | Description |
|---|---|
pipeline.sh |
Main pipeline script |
test_results.sh |
Verification script to compare outputs against original notebook results |
PIPELINE.md |
Original task specification and requirements |
PLAN.md |
Detailed pipeline documentation, prerequisites, and usage |
./pipeline.sh # Run full pipeline
./pipeline.sh --start-from map # Resume from a specific step
./pipeline.sh --sample YSL-5 # Run a single sampleSee PLAN.md for full usage details, prerequisites, and reference file setup.
Final output for downstream analysis is in conv_unconv3n/:
{sample}.tsv.gz-- Raw hisat-3n-table output{sample}.tmp.tsv.gz-- Processed table with sample name, depth, and conversion ratios