This repository contains homework scripts developed in the Python course at the Bioinformatics Institute during the 2023-2024 academic year. Code with examples in the showcases.py notebook is marked with *.
This script provides various functionalities for working with biological data.
RNASequence/DNASequence/AminoAcidSequenceclasses *
Assists in working with DNA, RNA, and amino acid sequencing data.
filter_fastqfunction
Filters FASTQ files based on GC content, sequence length, and quality threshold.
run_genscanfunction *
Uses the Genscan prediction tool for DNA sequences, and extracts predicted peptide sequences, intron, and exon information.
telegram_loggerdecorator
Sends messages and log files of run scripts to a Telegram chat for notification purposes. Implementation of this function was based on Telegram bot API.
convert_multiline_fasta_to_onelinefunction
Converts any number of DNA/RNA/protein sequences in FASTA file from multi-line FASTA files into one-line FASTA format.
OpenFastacontext manager *
Opens FASTA files, like the open built-in function. Returns separate FASTA records including ID, description, and sequence.
RandomForestClassifierCustomclass *
Allows to apply parallelization for custom random forest class for faster usage
Contains tests to verify the functionality of the code in bioseq.py and bio_files_processor.py
Demonstrates examples of using functions and classes from other files in the repository.