You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
abstract: "PROBEst is a tool designed for generating nucleotide probes with specified properties, leveraging advanced algorithms and AI-driven techniques to ensure high-quality results."
27
+
reference:
28
+
type: article
29
+
title: "Efficient and Verified Extraction of the Research Data Using LLM"
<spanstyle="color: red">**Warning**:</span> tool is under active development
294
+
295
+
**PROBEst** is a tool designed for generating nucleotide probes with specified properties, leveraging advanced algorithms and AI-driven techniques to ensure high-quality results. The tool is particularly useful for researchers and bioinformaticians who require probes with tailored universality and specificity for applications such as PCR, hybridization, and sequencing. By integrating a wrapped evolutionary algorithm, PROBEst optimizes probe generation through iterative refinement, ensuring that the final probes meet stringent biological and computational criteria.
296
+
297
+
At the core of PROBEst is an AI-enhanced workflow that combines Primer3 for initial oligonucleotide generation, BLASTn for specificity and universality checks, and a mutation module for probe optimization. The tool allows users to input target sequences, select reference files for universality and specificity validation, and customize layouts for probe design. The evolutionary algorithm iteratively refines the probes by introducing mutations and evaluating their performance, ensuring that the final output is both specific to the target and universally applicable across related sequences.
`pipeline.py` relies on pre-prepared BLASTn databases. To create the required `true_base`, `false_base`, and `contig_table`, you can use the following script:
323
+
324
+
```bash
325
+
bash scripts/generator/prep_db.sh \
326
+
-n {DATABASE_NAME} \
327
+
-c {CONTIG_NAME} \
328
+
-t {TMP_DIR} \
329
+
[FASTA]
330
+
```
331
+
332
+
#### Arguments:
333
+
-`-n DATABASE_NAME`: Name of the output BLAST database (required).
334
+
-`-c CONTIG_TABLE`: Output file to store contig names and their corresponding sequence headers (required).
335
+
-`-t TMP_DIR`: Temporary directory for intermediate files (optional, defaults to `./.tmp`).
336
+
-`FASTA`: List of input FASTA files (gzipped or uncompressed).
337
+
338
+
### Generation
339
+
340
+
PROBEst can be run using the following command:
341
+
342
+
```bash
343
+
python pipeline.py \
344
+
-i {INPUT} \
345
+
-tb {TRUE_BASE} \
346
+
-fb [FALSE_BASE ...] \
347
+
-c {CONTIG_TABLE} \
348
+
-o {OUTPUT}
349
+
```
350
+
351
+
**Blastn databases** and **contig table** are results of the ```prep_db.sh```
352
+
353
+
#### Key arguments:
354
+
-`-i INPUT`: Input FASTA file for probe generation.
355
+
-`-tb TRUE_BASE`: Input BLASTn database path for primer adjusting.
356
+
-`-fb FALSE_BASE`: Input BLASTn database path for non-specific testing.
357
+
-`-c CONTIG_TABLE`: .tsv table with BLAST database information.
358
+
-`-o OUTPUT`: Output path for results.
359
+
-`-t THREADS`: Number of threads to use.
360
+
-`-a ALGORITHM`: Algorithm for probe generation (`FISH` or `primer`).
361
+
362
+
For a full list of arguments, run:
363
+
364
+
```bash
365
+
python pipeline.py --help
366
+
```
367
+
368
+
For parameter selection, grid search is implemented. You can specify parameters in json (see for example `data/test/general/param_grid_light.json`) and run
369
+
370
+
```bash
371
+
python test_parameters.py \
372
+
-p {JSON}
373
+
```
374
+
375
+
376
+
# Algorithm
377
+
378
+
## Algorithm Steps
379
+
380
+
0.**Prepare BLASTn databases**
381
+
382
+
1.**Select File for Probe Generation** (`INPUT`)
383
+
384
+
2.**Select Files for Universality Check** (`TRUE_BASE`)
385
+
386
+
3.**Select Files for Specificity Check** (`FALSE_BASE`)
387
+
388
+
4.**Select Layouts and Run Wrapped Evolutionary Algorithm** (`pipeline.py`)
author = {Alexandr Serdiukov and Vitaliy Dragvelis and Daniil Smutin and Amir Taldaev and Sergey Muravyov},
588
+
title = {Efficient and Verified Extraction of the Research Data Using LLM},
589
+
journal = {Preprints}
590
+
}
591
+
```
592
+
593
+
**Plain text:**
594
+
Serdiukov, A., Dragvelis, V., Smutin, D., Taldaev, A., & Muravyov, S. (2025). Efficient and Verified Extraction of the Research Data Using LLM. *Preprints*. https://doi.org/10.20944/preprints202511.2140.v1
595
+
596
+
597
+
# License
598
+
599
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
600
+
601
+
# Contribution
602
+
603
+
We welcome contributions from the community! To contribute:
604
+
605
+
606
+
Please read the [Contribution Guidelines](CONTRIBUTING.md) for more details.
607
+
608
+
# Wiki
609
+
610
+
Tool have its own <ahref = "https://github.com/CTLab-ITMO/PROBEst/wiki">Wiki</a> pages with detailed information on usage cases, data description and another neccessary information
611
+
612
+
288
613
# License
289
614
290
615
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
0 commit comments