FastaUtils is a lightweight collection of command-line utilities for manipulating protein FASTA files using Docker.
It includes tools for cleaning FASTA headers, extracting descriptions, filtering by IDs, and validating amino acids.
All tools are available inside the Docker image:
bioinfoufsc/fastautils
To display the general help (which shows all available commands):
docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautilsBelow are examples of how to use each tool assuming your input files are located in ${PWD}/example.
Description:
Removes descriptions from FASTA headers, keeping only IDs and sequences.
Usage:
docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils clear_fasta.py -i /example/input.fasta -o /example/input_clean.fastaExample Input: /example/input.fasta
Example Output: /example/input_clean.fasta
Description:
Extracts IDs and descriptions from a protein FASTA file and optionally creates a clean FASTA containing only IDs and sequences.
Usage:
docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils extract_description.py -i /example/proteins.fasta -o /example/proteins.tsv -cf /example/proteins_clean.fastaExample Input: /example/proteins.fasta
Example Output:
/example/proteins.tsvβ Table with ID and description/example/proteins_clean.fastaβ Clean FASTA (optional)
Description:
Filters a FASTA file to keep only sequences whose IDs match those provided in a file or directly as parameters.
Option 1 β Using a file with IDs:
docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils filter_fasta_by_ids.py -i /example/proteins.fasta -if /example/ids.txt -o /example/filtered_by_file.fastaOption 2 β Providing IDs directly:
docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils filter_fasta_by_ids.py -i /example/proteins.fasta -il "P12345,Q67890;A1B2C3" -o /example/filtered_by_list.fastaExample Input:
/example/proteins.fasta/example/ids.txt(optional)
Example Output: /example/filtered_by_file.fasta or /example/filtered_by_list.fasta
Description:
Validates a FASTA protein file by checking for valid amino acids (ACDEFGHIKLMNPQRSTVWY only).
Usage:
docker run --rm -v ${PWD}/example:/example bioinfoufsc/fastautils validate_fasta.py -i /example/input.fasta --valid /example/proteins_valid.fasta --invalid /example/proteins_invalid.fastaExample Input: /example/input.fasta
Example Outputs:
/example/proteins_valid.fastaβ Proteins with only valid amino acids/example/proteins_invalid.fastaβ Proteins with invalid amino acids
example/
βββ input.fasta
βββ proteins.fasta
βββ ids.txt
You can mount this folder using:
-v ${PWD}/example:/exampleIf you have any questions or encounter issues, please contact:
π§ renato.simoes@ifsc.edu.br
FastaUtils is part of the FastProtein project
π https://github.com/labioinfoufsc/FastProtein
If this utility has been useful in your research or contributed in any way to your work, please cite the repository and the following publication:
Moreira RS, Benetti Filho V, Maia GA, Soratto TAT, Kawagoe EK, Russi BC, Miletti LC, Wagner G.
FastProteinβan automated software for in silico proteomic analysis.
PeerJ 12:e18309 (2024).
https://doi.org/10.7717/peerj.18309
π Repository: https://github.com/bioinfoufsc/FastaUtils
Developed by:
LaboratΓ³rio de BioinformΓ‘tica - Universidade Federal de Santa Catarina (UFSC)