Skip to content

ejrsimr/transcriptome2star

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

transcriptome2star

Convert a FASTA file of a transcriptome into a FASTA and GTF suitable for a STAR database. Particularly useful for single-cell analysis when no genome assembly is available.

A very simple, very short script that takes a FASTA file as input and outputs a new FASTA file with a ".ref" suffix added to seach sequence name and GTF with a single exon transcript covering the entire length of each sequence, retaining the original transcript name. This output is suitable for creating a STAR database as required by many single cell and RNAseq analysis pipelines.

Most single cell pipelines prefer non-redundant sequences and so I usually select the isoform with the longest CDS for each gene before creating a reference.

Originally written for Benham-Pyle 2021, PMID: 34475533.

Usage: python transcriptome2star.py <fasta_in> <gtf_out> <fasta_out>

Example STAR indexing command:
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir star_index --genomeFastaFiles fasta.ref.fa --sjdbGTFfile fasta.ref.gtf --sjdbOverhang 99 --genomeSAindexNbases 11

FAQS

Question: Why does this 100 line script need it's own repository?

Answer: I have been asked for this script or output from it dozens of times over the last decade and shared it ad hoc. When it was suggested that I make a repository, I wondered why I hadn't made it years ago. While this solution is simple, it is often non-obvious.

About

Convert a FASTA file of a transcriptome into a FASTA and GTF suitable for a STAR database. Particularly useful for single-cell analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages