Getting Started

Setting up Environment

Use python=3.10
pip install packages from requirements.txt

Running the framework

Pre-Run Instructions

Set / export the parent folder of src to PYTHONPATH.
Update the src_dir and data_dir of src.utils.filepaths accordingly (should point to your src folder and your data folder accordingly.
Set / export the OPENAI_API_KEY variable

Run on ConFinQA Dataset

Run src.runme
```
python path/to/src/runme.py
```
To score the dataset (after running the runme.py file to generate the output files required for scoring), run src.utils.scoring
```
python path/to/src/utils/scoring.py
```

Run chatbot

Run src.app.cli
```
python path/to/src/app/cli.py record_file_name
```
where record_file_name is the filename of the record (e.g. Single_UNP/2014/page_35.pdf-3)

How it works

Agentic Pipeline

Agents

1️⃣ Clarify

Detects ambiguity and reformulates vague user questions into precise, self-contained forms using conversation history.

2️⃣ Route

Implements a “short-circuit” mechanism to directly answer questions when sufficient context is already available, reducing latency.

3️⃣ Decompose & Retrieve

Breaks complex queries into subproblems and runs parallel retrieval agents to gather supporting facts.

4️⃣ Aggregate

Generates a deterministic Python function to compute the final answer, ensuring reproducibility and auditability.

5️⃣ Planner

A planner agent monitors outputs and injects immediate feedback for retries if needed, ending the pipeline when a direct or aggregated answer is produced.

Results

Metrics Used

The first metric used is the 'correctness' of answer - be it via direct matching or tolerance-based evaluation.
The second metric used modulates the 'Quality of Reasoning' with 'Efficiency of Reasoning'. 'Quality of Reasoning' is defined to be the number of agents used in the pipeline that had successfully completed their tasks (therefore it is a binary number of either 0 or 1). 'Efficiency of Reasoning' is defined to be the number of attempts each agent takes to complete its task. Therefore, the afore-mentioned can be formalized as such: $$\text{SQ\_METRIC} = \frac{\sum_{i=1}^{N} q_i}{\sum_{i=1}^{N} a_i}$$

where:

$q_i \in {0, 1}$ is the quality indicator of agent $i$ (1 if successful, 0 otherwise).
$a_i \in \mathbb{N}^+$ is the number of attempts made by agent $i$.
$N$ is the total number of agents in the pipeline.

Results

The pipeline was run on the top 8 records of the ConvFinQA dataset to get the agent outputs, before being scored by the scorer. The scores are as follows:

Agent/Metric	Score
ClarifierAgent_score	0.862069
DirectQAAgent_score	0.890805
DecomposerAgent_score	0.947368
FreeAgent_score	0.645833
AggregatorAgent_score	0.312500
`SQ_METRIC`	0.583908
Correctness	0.689655

Some notes

This system was developed on mac using pycharm, compatibility with Windows systems cannot be guaranteed
If I had missed out any packages in requirements.txt, please let me know :)
The system might get stuck in "reflection" loops, especially at the aggregation stage. If that happens, just re-run the system. An agent_retry_limit has already been set (defaults to 5).
Each 'agent' file can be run independently, I developed it as such to test out the pipeline incrementally
Currently, 'gpt-5-mini' is being used as the LLM for all agents. It works fairly well

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
diagrams		diagrams
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

Setting up Environment

Running the framework

Pre-Run Instructions

Run on ConFinQA Dataset

Run chatbot

How it works

Agentic Pipeline

Agents

1️⃣ Clarify

2️⃣ Route

3️⃣ Decompose & Retrieve

4️⃣ Aggregate

5️⃣ Planner

Results

Metrics Used

Results

Some notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Setting up Environment

Running the framework

Pre-Run Instructions

Run on ConFinQA Dataset

Run chatbot

How it works

Agentic Pipeline

Agents

1️⃣ Clarify

2️⃣ Route

3️⃣ Decompose & Retrieve

4️⃣ Aggregate

5️⃣ Planner

Results

Metrics Used

Results

Some notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages