Skip to content

prairie-guy/UBS-seq_260308_chenyou

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UBS-seq_260308_chenyou

UBS-seq (Uracil Bisulfite Sequencing) analysis pipeline for samples YSL-5 and YSL-6, mapping to the human reference genome.

Overview

This project converts an existing Jupyter notebook-based UBS-seq pipeline into a standalone, reproducible bash script. The pipeline processes paired-end sequencing data through trimming, alignment, duplicate handling, and C-to-T conversion table generation.

Pipeline Steps

  1. trim -- Adapter trimming and quality filtering (cutadapt)
  2. map -- Bisulfite-aware alignment to human genome (hisat-3n)
  3. mark_duplicates -- PCR duplicate marking (GATK)
  4. dedup -- Duplicate removal (samtools)
  5. conv_unconv3n -- C-to-T conversion/unconversion tables (hisat-3n-table)

Files

File Description
pipeline.sh Main pipeline script
test_results.sh Verification script to compare outputs against original notebook results
PIPELINE.md Original task specification and requirements
PLAN.md Detailed pipeline documentation, prerequisites, and usage

Quick Start

./pipeline.sh                        # Run full pipeline
./pipeline.sh --start-from map       # Resume from a specific step
./pipeline.sh --sample YSL-5         # Run a single sample

See PLAN.md for full usage details, prerequisites, and reference file setup.

Output

Final output for downstream analysis is in conv_unconv3n/:

  • {sample}.tsv.gz -- Raw hisat-3n-table output
  • {sample}.tmp.tsv.gz -- Processed table with sample name, depth, and conversion ratios

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages