Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 2.22 KB

File metadata and controls

32 lines (20 loc) · 2.22 KB

DugBot

DugBot is an advanced search assistant that helps you find and understand biomedical data by answering your questions in natural language. It uses a powerful technique called Retrieval Augmented Generation (RAG) to ensure the answers are not only conversational but also grounded in factual, verifiable information from reliable knowledge sources.

Data behind Dug-bot

DugBot is able to answer questions around BDC studies and variables used in those studies. Currently, there are 200+ studies in the databases that back DugBot.

How Dug-bot Works

DugBot uses two retrival mechanisms to find your answer.

  1. Vector-Based Retrieval: This method is best for general questions about studies and their findings. DugBot has processed abstracts of several biomedical studies and generated potential questions a researcher might ask. When you ask a question, it finds the most similar pre-generated questions and uses the linked studies to form a comprehensive answer.
  2. Knowledge Graph Retrieval: This approach is ideal when you want to understand the relationships between biomedical concepts, like diseases, genes, chemical substance , phenotypic features etc... to study variables. DugBot extracts the key concepts from your query (e.g., "Proliferation") and uses a vast knowledge graph to find connected variables and the studies they appear in. This provides a detailed, context-rich response.

Example Queries

  1. What studies are available for COPD?
  2. What are the main findings of SHARe study?
  3. Find variables related to heart and heart-attack?

A Reliable and Improving System

DugBot is designed for accuracy and is continuously improving.

  • Accuracy: By grounding responses in a curated knowledge base, DugBot tries to avoid the "hallucinations" or factually incorrect statements sometimes generated by large language models.
  • Transparency and Evaluation: Every search process is traced, which helps developers quickly find and fix any issues. The system is also constantly evaluated using user feedback and automated assessment tools to ensure the quality of its responses.

Other information

For general information that is outside the scope of BDC studies and variable, including directions to access data please consult BDCBOT.