Skip to content

labioinfoufsc/FastaUtils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 FastaUtils β€” FASTA Processing Utilities

FastaUtils is a lightweight collection of command-line utilities for manipulating protein FASTA files using Docker.
It includes tools for cleaning FASTA headers, extracting descriptions, filtering by IDs, and validating amino acids.

All tools are available inside the Docker image:

bioinfoufsc/fastautils

🐳 Running the Tools

To display the general help (which shows all available commands):

docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils

πŸ“˜ Available Tools and Examples

Below are examples of how to use each tool assuming your input files are located in ${PWD}/example.


🧹 1. clear_fasta.py

Description:
Removes descriptions from FASTA headers, keeping only IDs and sequences.

Usage:

docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils clear_fasta.py -i /example/input.fasta -o /example/input_clean.fasta

Example Input: /example/input.fasta
Example Output: /example/input_clean.fasta


🧾 2. extract_description.py

Description:
Extracts IDs and descriptions from a protein FASTA file and optionally creates a clean FASTA containing only IDs and sequences.

Usage:

docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils extract_description.py -i /example/proteins.fasta -o /example/proteins.tsv -cf /example/proteins_clean.fasta

Example Input: /example/proteins.fasta
Example Output:

  • /example/proteins.tsv β†’ Table with ID and description
  • /example/proteins_clean.fasta β†’ Clean FASTA (optional)

πŸ” 3. filter_fasta_by_ids.py

Description:
Filters a FASTA file to keep only sequences whose IDs match those provided in a file or directly as parameters.

Option 1 – Using a file with IDs:

docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils filter_fasta_by_ids.py -i /example/proteins.fasta -if /example/ids.txt -o /example/filtered_by_file.fasta

Option 2 – Providing IDs directly:

docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils filter_fasta_by_ids.py -i /example/proteins.fasta -il "P12345,Q67890;A1B2C3" -o /example/filtered_by_list.fasta

Example Input:

  • /example/proteins.fasta
  • /example/ids.txt (optional)

Example Output: /example/filtered_by_file.fasta or /example/filtered_by_list.fasta


βœ… 4. validate_fasta.py

Description:
Validates a FASTA protein file by checking for valid amino acids (ACDEFGHIKLMNPQRSTVWY only).

Usage:

docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils validate_fasta.py -i /example/input.fasta --valid /example/proteins_valid.fasta --invalid /example/proteins_invalid.fasta

Example Input: /example/input.fasta
Example Outputs:

  • /example/proteins_valid.fasta β†’ Proteins with only valid amino acids
  • /example/proteins_invalid.fasta β†’ Proteins with invalid amino acids

πŸ“‚ Example Directory Structure

example/
β”œβ”€β”€ input.fasta
β”œβ”€β”€ proteins.fasta
β”œβ”€β”€ ids.txt

You can mount this folder using:

-v ${PWD}/example:/example

πŸ’¬ Support and Citation

If you have any questions or encounter issues, please contact:
πŸ“§ renato.simoes@ifsc.edu.br

FastaUtils is part of the FastProtein project
πŸ‘‰ https://github.com/labioinfoufsc/FastProtein

If this utility has been useful in your research or contributed in any way to your work, please cite the repository and the following publication:

Moreira RS, Benetti Filho V, Maia GA, Soratto TAT, Kawagoe EK, Russi BC, Miletti LC, Wagner G.
FastProteinβ€”an automated software for in silico proteomic analysis.
PeerJ 12:e18309 (2024).
https://doi.org/10.7717/peerj.18309

πŸ‘‰ Repository: https://github.com/bioinfoufsc/FastaUtils


Developed by:
LaboratΓ³rio de BioinformΓ‘tica - Universidade Federal de Santa Catarina (UFSC)

About

FASTA files utilitaries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors