An application that uses LLMs to summarize published literature on supplied genes into specified fields, in order to automate the curation of gene annotations.
-
Input: Gene identifier (locus or name).
-
Paper Retrieval: Query PubMed Central to obtain relevant research articles.
-
Section Extraction: Extract the abstract, results, and discussion sections from each paper.
-
Initial LLM Summarization: Submit each section to multiple large language models (
mistral-nemo:12b,llama3:8b,gemma3:12b) to generate independent summaries. -
Consensus Generation: Reconcile the outputs from the three LLMs using a tiebreaker model (
phi4:14b) to produce a unified section summary. -
Gene-Level Aggregation: Combine the consensus summaries across all papers using
gemma3:12bto generate a final, aggregated gene annotation. -
Output: Structured gene annotation for the input gene.