GitHub - LLM-VLM-GSL/Discuss-RAG: Code for paper: Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA

Discuss-RAG

Code for paper: Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA

Introduction

Medical question answering (QA) is a reasoning-intensive task that remains challenging for large language models (LLMs) due to hallucinations and outdated domain knowledge. Retrieval-Augmented Generation (RAG) provides a promising post-training solution by leveraging external knowledge. However, existing medical RAG systems suffer from two key limitations: (1) a lack of modeling for human-like reasoning behaviors during information retrieval, and (2) reliance on suboptimal medical corpora, which often results in the retrieval of irrelevant or noisy snippets. To overcome these challenges, we propose Discuss-RAG, a plug-and-play module designed to enhance the medical QA RAG system through collaborative agent-based reasoning. Our method introduces a summarizer agent that orchestrates a team of medical experts to emulate multi-turn brainstorming, thereby improving the relevance of retrieved content. Additionally, a decision-making agent evaluates the retrieved snippets before their final integration. Experimental results on four benchmark medical QA datasets show that Discuss-RAG consistently outperforms MedRAG, especially significantly improving answer accuracy by up to 16.67% on BioASQ and 12.20% on PubMedQA.

Requirement

Clone the repo

git -r clone https://github.com/LLM-VLM-GSL/Discuss-RAG.git

Create a Python Environment and install the required libraries by running
```
conda env create -f environment.yml
conda activate DISCUSS-RAG
```
Download the medical QA benchmarks: MMLU-Med, MedQA-US, BioASQ and PubMedQA

Usage

Using our module is straightforward. The script main_discuss_rag.py contains all the necessary functions for performing medical QA as well as calculating accuracy. For example, to evaluate on MMLU-Med, simply run the following commands to start inference and compute the accuracy:

cd Discuss-RAG
python main_discuss_rag.py

Notification: In our paper, we evaluate all benchmarks using only Textbooks as the corpus. For retrieval, we employ MedCPT, and for the LLM, we conduct experiments exclusively with GPT-3.5 (i.e., gpt-3.5-turbo-0125).

Additionally, all necessary prompts are included in src/template_discuss.py. For further implementation details, please refer to our manuscript.

Acknowledgement

Our work is built upon and inspired by MedRAG and MDAgents. We sincerely thank the authors of these projects for their valuable contributions to the research community.

Cite Us

If you find this repository useful in your research, please cite our works:

@article{dong2025talk,
  title={Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA},
  author={Dong, Xuanzhao and Zhu, Wenhui and Wang, Hao and Chen, Xiwen and Qiu, Peijie and Yin, Rui and Su, Yi and Wang, Yalin},
  journal={arXiv preprint arXiv:2504.21252},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
src		src
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main_discuss_rag.py		main_discuss_rag.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discuss-RAG

Introduction

Requirement

Usage

Acknowledgement

Cite Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Discuss-RAG

Introduction

Requirement

Usage

Acknowledgement

Cite Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages