Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
f981ca9
Create file for dna_rna_constants
zmitserbio Oct 11, 2023
f468451
Create file for dna_rna_tools
zmitserbio Oct 11, 2023
8dabc17
Create file for protein_tools
zmitserbio Oct 11, 2023
12dfcc1
Create file for protein_constants
zmitserbio Oct 11, 2023
6fc8f0a
Create file for fastq_filtration_tool
zmitserbio Oct 11, 2023
4ace5e0
Create file for fastq_constants
zmitserbio Oct 11, 2023
6e7d0b3
Add description of dna_rna_tools toolkit in README.md
zmitserbio Oct 11, 2023
d6d024f
Add description of protein_tools toolkit in README.md
zmitserbio Oct 11, 2023
c24d68d
Add description of fastq_filtration toolkit in README.md
zmitserbio Oct 11, 2023
474cdfd
Add revised code to dna_rna_tools.py
zmitserbio Oct 11, 2023
3ee2705
Add constants to dna_rna_constants.py
zmitserbio Oct 11, 2023
4fecad5
Remove blank line at end of file in dna_rna_tools.py
zmitserbio Oct 11, 2023
cefbd5f
Add constant to fastq_constants.py
zmitserbio Oct 11, 2023
3632c7f
Add subfunctions for run_fastq_filtration to fastq_filtration_tools.py
zmitserbio Oct 11, 2023
b98fe39
Add constants to protein_constants.py
zmitserbio Oct 11, 2023
407c777
Add revised code to protein_tools.py
zmitserbio Oct 11, 2023
027829b
Create main function and subfunctions in beginner_bioinf_tools.py
zmitserbio Oct 11, 2023
ebff383
Add check_fastq_file to beginner_bioinf_tools.py
zmitserbio Oct 15, 2023
c38c33e
Add convert_fastq_to_dict to beginner_bioinf_tools.py
zmitserbio Oct 15, 2023
b6d592d
Add function for directory creation to beginner_bioinf_tools.py
zmitserbio Oct 15, 2023
6496d34
Add save_fastq_dict_to_file to beginner_bioinf_tools.py
zmitserbio Oct 15, 2023
6c4cb94
Correct run_beginner_bioinf_tools in beginner_bioinf_tools.py
zmitserbio Oct 15, 2023
9bb9e0f
Update fastq toolkit description in README.md
zmitserbio Oct 15, 2023
513f26c
Create bio_files_processor.py
zmitserbio Oct 15, 2023
da63a81
Add convert_multiple_fasta_to_oneline to bio_files_processor.py
zmitserbio Oct 15, 2023
cebc7f2
Add hw14 code to beginner_bioinf_tools.py
zmitserbio Feb 25, 2024
db1a0e2
Delete modules directory
zmitserbio Feb 25, 2024
c2c915d
Delete README.md
zmitserbio Feb 25, 2024
dc12272
Upload requirements.txt
zmitserbio Feb 25, 2024
743a44b
Upload beginner_bioinf_tools.py script
zmitserbio May 1, 2024
fb57738
Upload bio_files_processor.py script
zmitserbio May 1, 2024
0c0bc63
Upload showcases notebook
zmitserbio May 1, 2024
590d9e6
Upload data files
zmitserbio May 1, 2024
4e93466
Move example_fastq.fastq to data/example_fastq.fastq
zmitserbio May 1, 2024
2fc8723
Move example_fasta.fasta to data/example_fasta.fasta
zmitserbio May 1, 2024
f286608
Create very short README.md
zmitserbio May 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# BI_toolkit
Homework for Python course in the Bioinformatic Institute
# Homework repository for 2023-2024 Informatics Bioinstitute professional retraining program

Here, several homeworks conserning processing bioinformatical data are gathered. Please, see Showcases.ipynb for some examples.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну тут все-таки немножно лучше добавить о том что реализовано:) Обработка каких данных, что за обработка вообще. То есть после README должно быть какое-то представление о том полезен человеку этот репозиторий или нет

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну тут все-таки стоит побольше рассказать про то что в репозитории, с какими данными работаешь, какой есть функционал. После просмотра README у человека должно быть понимание интересно ему дальше разбираться с репозиторием или нет:)

224 changes: 224 additions & 0 deletions Showcases.ipynb
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Поскольку в твоем репозитории ноутбук заменяет README, то лучше было бы добавить некоторые пояснения, что вот тут вот мы тестируем базовые операции для днк, рнк и белков, вот тут проверяем, что срабатывает ошибка и тд. Хорошо бы, чтобы человеку, который не очень понимает о чем код при быстром просмотре стало все понятно.

Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 11,
"id": "4b3c7110",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from bio_files_processor import OpenFasta\n",
"from beginner_bioinf_tools import DNASequence, RNASequence, AminoAcidSequence, filter_fastq"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "cce834fa",
"metadata": {},
"outputs": [],
"source": [
"path_to_fasta = os.path.join('data', 'fasta_example.fasta')\n",
"path_to_fastq = path_to_fasta = os.path.join('data', 'example_fastq.fastq')"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "b17d1431",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'UACG'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"str(DNASequence('ATGC').transcribe())"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "f0550631",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'GuAACcaU'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"str(RNASequence('AugGUUaC').reverse_complement())"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "cf32f6b5",
"metadata": {},
"outputs": [
{
"ename": "InvalidInputError",
"evalue": "Cannot complement: incorrect input sequence. Only nucleotides (in both cases) are supported!",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mInvalidInputError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[15], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[38;5;28mstr\u001b[39m(DNASequence(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mATGY\u001b[39m\u001b[38;5;124m'\u001b[39m)\u001b[38;5;241m.\u001b[39mcomplement())\n",
"File \u001b[1;32m~\\1ib\\python\\hw18\\beginner_bioinf_tools.py:148\u001b[0m, in \u001b[0;36mDNASequence.__init__\u001b[1;34m(self, sequence)\u001b[0m\n\u001b[0;32m 146\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msequence \u001b[38;5;241m=\u001b[39m sequence\n\u001b[0;32m 147\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mis_correct_alphabet():\n\u001b[1;32m--> 148\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m InvalidInputError(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mCannot complement: incorrect input sequence. Only nucleotides (in both cases) are supported!\u001b[39m\u001b[38;5;124m'\u001b[39m)\n",
"\u001b[1;31mInvalidInputError\u001b[0m: Cannot complement: incorrect input sequence. Only nucleotides (in both cases) are supported!"
]
}
],
"source": [
"str(DNASequence('ATGY').complement())"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "e6ec1bdd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<beginner_bioinf_tools.DNASequence at 0x1b7f1878290>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"AminoAcidSequence('KMGf').convert_to_gene() #returns DNA!"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "06999e39",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'AAAATGGGGttc'"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"str(AminoAcidSequence('KMGf').convert_to_gene())"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "562bb696",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'LYS---MET---GLY---phe'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"AminoAcidSequence('KMGf').recode_3letter_to_1letter('---')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "4b6286fb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'H': 30.77,\n",
" 'S': 23.08,\n",
" 'F': 15.38,\n",
" 'K': 7.69,\n",
" 'M': 7.69,\n",
" 'G': 7.69,\n",
" 'L': 7.69}"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"AminoAcidSequence('HKSHMGFFHSHSL').info_amino_acid_percentage()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "697f02c5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[SeqRecord(seq=Seq('TATAGCTACTACACCTTCATGTGATATAACTTCAAGCAATTTTTCATTTAACAT...CTC'), id='SRX079804:1:SRR292678:1:1101:391832:391832', name='SRX079804:1:SRR292678:1:1101:391832:391832', description='SRX079804:1:SRR292678:1:1101:391832:391832 2:N:0:1 BH:ok', dbxrefs=[])]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"filter_fastq(path_to_fastq, gc_bounds=30, length_bounds=(60,70), quality_threshold=35)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading