transcriptome2star

Convert a FASTA file of a transcriptome into a FASTA and GTF suitable for a STAR database. Particularly useful for single-cell analysis when no genome assembly is available.

A very simple, very short script that takes a FASTA file as input and outputs a new FASTA file with a ".ref" suffix added to seach sequence name and GTF with a single exon transcript covering the entire length of each sequence, retaining the original transcript name. This output is suitable for creating a STAR database as required by many single cell and RNAseq analysis pipelines.

Most single cell pipelines prefer non-redundant sequences and so I usually select the isoform with the longest CDS for each gene before creating a reference.

Originally written for Benham-Pyle 2021, PMID: 34475533.

Usage: python transcriptome2star.py <fasta_in> <gtf_out> <fasta_out>

Example STAR indexing command:
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir star_index --genomeFastaFiles fasta.ref.fa --sjdbGTFfile fasta.ref.gtf --sjdbOverhang 99 --genomeSAindexNbases 11

FAQS

Question: Why does this 100 line script need it's own repository?

Answer: I have been asked for this script or output from it dozens of times over the last decade and shared it ad hoc. When it was suggested that I make a repository, I wondered why I hadn't made it years ago. While this solution is simple, it is often non-obvious.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
transcriptome2star.py		transcriptome2star.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

transcriptome2star

FAQS

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

transcriptome2star

FAQS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages