🧠🌳Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering

🔍 What is RT-RAG?

RT-RAG systematically decomposes complex multi-hop questions into explicit binary reasoning trees. It leverages structured entity analysis and consensus-based tree selection to ensure e decomposition, clearly separating core queries, known entities, and unknown targets.

Once the tree is built, a bottom-up traversal strategy is used to iteratively rewrite and refine sub-questions. This process efficiently collects high-quality evidence while mitigating error propagation through recursive reasoning.

⚙️ 1. Environment Setup

✅ Install Dependencies

pip install -r requirements.txt

⚡️ (Optional) Serve Qwen2.5-14B-Instruct via vLLM

To serve Qwen2.5-14B-Instruct locally using vLLM with OpenAI-compatible API:

First, install vLLM:

pip install vllm

Then, start the server:

vllm serve Qwen/Qwen2.5-14B-Instruct \
  --dtype auto \
  --api-key your-api-key

Replace your-api-key with a secure token. This key must match what you configure in config.py.

📝 Tip: For more details, see vLLM OpenAI-Compatible Server Docs

📦 2. Model Downloads

You can download models manually or use Hugging Face CLI:

🔍 Reranker Model

BAAI/bge-reranker-base

huggingface-cli download BAAI/bge-reranker-base

🧠 Language Model (Qwen2.5-14B-Instruct)

Qwen/Qwen2.5-14B-Instruct

huggingface-cli download Qwen/Qwen2.5-14B-Instruct

Make sure to login if authentication is required:

huggingface-cli login

🛠️ 3. Data Preparation

The preprocessed corpus is already in the raw folder.
Evaluation and retrieval data are from LongBench.

✏️ 4. Configure `main/build_dense_index/config.py`

Update your configuration for embedding/index building:

Parameter	Description
`raw_path`	Path to folder containing preprocessed JSON
`save_path`	Where to store FAISS index & metadata
`dataset_name`	Filename without `.json`
`chunk_size`	Max words per chunk (e.g., 200)
`min_sentence`	Min sentences per chunk (e.g., 2)
`overlap`	Overlapping sentences between chunks (e.g., 2)
`base_url`	API endpoint (e.g., `http://localhost:8000/v1`)
`api_key`	Your API key used with the embedding service

🧱 5. Build the Dense Index

Once main/build_dense_index/config.py is ready, build your FAISS index with:

python build_dense_index/dense_build_index.py

🧪 6. Run on the Full Dataset

After the dense index is successfully built:

Configure runtime parameters in:
```
main/config.py
```
Make sure the dataset path, retrieval settings, API credentials, and output paths are correct and aligned with the built index.
Run the full dataset through the system:
```
python main/load_data.py
```

This step runs the entire dataset through the RT-RAG pipeline: it performs retrieval, reranking, tree generation, and LLM querying.

📊 7. Evaluate the Results

Once inference on the full dataset is complete, you can evaluate the generated answers using:

python main/evaulate.py /path/to/result.txt

Replace /path/to/result.txt with the actual path to the output file generated by main/load_data.py.

This script will compute metrics on the dataset.

📈 RT-RAG Performance

The table below summarizes RT-RAG's performance across three benchmark datasets using two different backbone models:

Model	Dataset	F1	EM
GPT-4o-mini	MuSiQue	54.42	41.50
	2WikiMQA	75.08	63.00
	HotpotQA	65.26	52.50
	Average	64.92	52.33
Qwen2.5-14B	MuSiQue	50.04	39.00
	2WikiMQA	73.69	64.00
	HotpotQA	66.24	51.00
	Average	63.32	51.33

RT-RAG consistently outperforms all baselines across diverse multi-hop QA datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
main		main
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠🌳Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering

🔍 What is RT-RAG?

⚙️ 1. Environment Setup

✅ Install Dependencies

⚡️ (Optional) Serve Qwen2.5-14B-Instruct via vLLM

📦 2. Model Downloads

🔍 Reranker Model

🧠 Language Model (Qwen2.5-14B-Instruct)

🛠️ 3. Data Preparation

✏️ 4. Configure `main/build_dense_index/config.py`

🧱 5. Build the Dense Index

🧪 6. Run on the Full Dataset

📊 7. Evaluate the Results

📈 RT-RAG Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠🌳Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering

🔍 What is RT-RAG?

⚙️ 1. Environment Setup

✅ Install Dependencies

⚡️ (Optional) Serve Qwen2.5-14B-Instruct via vLLM

📦 2. Model Downloads

🔍 Reranker Model

🧠 Language Model (Qwen2.5-14B-Instruct)

🛠️ 3. Data Preparation

✏️ 4. Configure main/build_dense_index/config.py

🧱 5. Build the Dense Index

🧪 6. Run on the Full Dataset

📊 7. Evaluate the Results

📈 RT-RAG Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

✏️ 4. Configure `main/build_dense_index/config.py`

Packages