Document Intelligence RAG Platform

A full-stack Retrieval-Augmented Generation (RAG) application for uploading PDFs, indexing document content, chatting with source-grounded answers, and evaluating retrieval quality across chunking strategies.

The project is built to demonstrate more than a basic PDF chatbot. It shows how document parsing, chunking, embedding, retrieval, reranking, citation, and evaluation decisions affect answer quality.

Highlights

Upload one or more PDF documents simultaneously.
Parse structured PDF content with Docling.
Generate embeddings with BAAI/bge-m3.
Store and retrieve chunks with Qdrant vector database.
Chat with uploaded documents through a React interface.
Return answers with document name, page number, section title, chunk text, similarity score, and reranking metadata.
Evaluate retrieval quality using hard-coded benchmark questions and expected answers.
Compare chunking strategies and top-k settings using measurable scores.
Use a document-aware retrieval policy for default strategy selection.
Optionally rerank retrieved chunks before the LLM call.
Run the full stack with Docker Compose.

Tech Stack

Layer	Technology
Frontend	React, TypeScript, Vite
Backend	FastAPI, Python
PDF parsing	Docling
Embeddings	BGE-M3
Vector database	Qdrant
Retrieval	Dense vector search with metadata filtering
Reranking	SentenceTransformers CrossEncoder
LLM provider	OpenAI-compatible API, Groq-compatible configuration, extractive fallback
Evaluation	Custom retrieval and RAG evaluation scripts
Runtime	Docker Compose

System Architecture

flowchart LR
    U[User] --> FE[React Frontend]
    FE --> API[FastAPI Backend]

    API --> UP[PDF Upload]
    UP --> PARSE[Docling Parser]
    PARSE --> PROFILE[Document Profiler]
    PARSE --> CHUNK[Chunking Strategies]

    CHUNK --> EMB[BGE-M3 Embeddings]
    EMB --> QDRANT[Qdrant Vector DB]

    FE --> CHAT[Chat Request]
    CHAT --> API
    API --> RETRIEVE[Vector Retrieval]
    RETRIEVE --> QDRANT
    RETRIEVE --> RERANK[Optional CrossEncoder Reranking]
    RERANK --> LLM[LLM / Extractive Provider]
    LLM --> ANSWER[Answer + Sources]
    ANSWER --> FE

    API --> EVAL[Evaluation Workflow]
    EVAL --> SCORES[Strategy Metrics]
    SCORES --> FE

Core Workflow

Upload PDF
→ Parse document
→ Profile document structure
→ Generate multiple chunking variants
→ Embed chunks
→ Store chunks in Qdrant
→ Select retrieval policy
→ Retrieve relevant chunks for each question
→ Optionally rerank retrieved chunks
→ Generate source-grounded answer
→ Display answer, sources, and evaluation metrics

Running the Project

Option 1: One-command local start

Use this when running the application locally on Windows.

.\start-local.ps1

The script starts the Docker Compose stack, waits for services, and opens the frontend automatically.

Stop everything with:

.\stop-local.ps1

Option 2: Direct Docker Compose

docker compose up --build

Open:

Service	URL
Frontend	http://localhost:5173
Backend API docs	http://localhost:8000/docs
Qdrant dashboard	http://localhost:6333/dashboard

Stop the stack:

docker compose down

Run in background:

docker compose up -d --build

View logs:

docker compose logs -f

Environment Configuration

Create the backend environment file:

copy backend\.env.example backend\.env

Example backend configuration:

APP_NAME=Document Intelligence RAG Platform

QDRANT_URL=http://qdrant:6333
QDRANT_COLLECTION_NAME=document_chunks

EMBEDDING_MODEL_NAME=BAAI/bge-m3
EMBEDDING_BATCH_SIZE=8
NORMALIZE_EMBEDDINGS=true

OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
OPENAI_COMPATIBLE_API_KEY=your_api_key_here
OPENAI_COMPATIBLE_MODEL=llama-3.1-8b-instant

ENABLE_RERANKING=true
RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
RERANKER_CANDIDATE_LIMIT=12
RERANKER_FINAL_TOP_K=3

API Overview

Endpoint	Purpose
`GET /health`	Backend health check
`POST /api/documents/upload`	Upload, parse, chunk, embed, and index PDFs
`GET /api/documents`	List indexed documents
`POST /api/search`	Run retrieval against indexed chunks
`POST /api/chat`	Generate an answer from retrieved context
`POST /api/evaluation/run`	Run retrieval evaluation across strategies

Example chat request:

{
  "question": "What does Mean Opinion Score measure in TTS evaluation?",
  "document_id": "<document_id>",
  "strategy": "auto",
  "provider": "compatible",
  "limit": 3,
  "rerank": true,
  "rerank_candidate_limit": 12
}

Retrieval Design

The platform does not use a single fixed retrieval setup for every PDF. During ingestion, each document is parsed, profiled, and indexed using multiple chunking strategies.

Strategy	Purpose	Best suited for
`section_aware`	Preserves heading-based sections	Structured reports, papers, documentation
`recursive`	Splits text by size and overlap	General prose and weakly structured PDFs
`table_preserving`	Keeps table-like content together	Metric-heavy or tabular documents
`parent_child`	Retrieves focused child chunks with broader parent context	Dense technical documents and long explanations

The default retrieval policy is selected from document structure signals such as page count, heading density, table density, and chunk statistics. Full evaluation is available as an explicit diagnostic workflow rather than blocking upload or chat.

Reranking

The application includes optional second-stage reranking.

Qdrant retrieves candidate chunks
→ CrossEncoder scores question-chunk pairs
→ chunks are sorted by rerank score
→ final top-k chunks are sent to the LLM

This improves precision when dense vector search returns semantically related chunks that are not the best direct evidence for the question. The UI shows both retrieval similarity and reranking metadata:

vector similarity score
rerank score
original retrieval rank
final source order

Rerank scores are raw model relevance scores. They are not normalized percentages; higher score means stronger relevance.

Evaluation

The project includes a hard-coded evaluation set with expected answers and source expectations. Evaluation is used to compare retrieval configurations and justify strategy selection.

Component	Configuration
Evaluation mode	Full RAG evaluation
Evaluated strategies	`section_aware`, `recursive`, `parent_child`, `table_preserving`
Top-k values	`3`, `5`
Benchmark questions	10
Total evaluation calls	80 `/api/chat` calls
Metrics	Overall, retrieval, groundedness, citation accuracy, latency
Best current configuration	`section_aware`, `top_k=3`

RAG Evaluation Results

Strategy	Top-k	Overall	Retrieval	Groundedness	Citation	Latency ms
`section_aware`	3	0.7265	0.80	0.85	0.80	706.47
`section_aware`	5	0.7145	0.80	0.90	0.80	1351.36
`parent_child`	5	0.7080	0.80	0.95	0.80	8534.32
`parent_child`	3	0.6544	0.60	1.00	0.60	6139.30
`recursive`	3	0.6192	0.80	0.6215	0.80	9478.93
`table_preserving`	3	0.6007	0.80	0.5733	0.80	9471.08
`table_preserving`	5	0.5998	0.80	0.6422	0.80	14004.92
`recursive`	5	0.5833	0.80	0.5667	0.80	13795.27

Selected Retrieval Configuration

Decision	Selected value	Reason
Default strategy	`section_aware`	Best overall benchmark result
Default top-k	`3`	Best quality-latency trade-off
Table-heavy fallback	`table_preserving`	Preserves tabular evidence
Weak-structure fallback	`recursive`	Robust when headings are limited
Long-document fallback	`parent_child`	Provides broader context when needed
Full evaluation	On demand	Avoids slowing normal upload and chat

The result shows that the highest groundedness score is not always the best production choice. parent_child improves context coverage, but its latency is much higher. section_aware + top_k=3 gives the best balance between accuracy, citations, and responsiveness.

Why the Chunking Strategy Matters

Chunking directly controls what evidence the retriever can find. Poor chunking can split relevant context, lose section boundaries, or separate table rows from their headers. This project evaluates chunking strategies instead of assuming one strategy is universally correct.

The selected default, section_aware, works well for structured PDFs because it preserves topical boundaries and improves source-level citations. The evaluation results show that increasing top_k from 3 to 5 does not automatically improve the final system; it can increase latency and introduce less focused context.

Source Attribution

Each answer includes retrieved evidence with:

document name
page number
section title
chunk index
retrieval strategy
similarity score
rerank score when reranking is enabled
original retrieval rank
source text preview

This makes each answer inspectable and helps users verify whether the response is grounded in the uploaded document.

Repository Structure

document-intelligence-rag-platform/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   ├── core/
│   │   ├── schemas/
│   │   ├── services/
│   │   └── main.py
│   ├── evaluation/
│   ├── data/
│   ├── Dockerfile
│   ├── requirements.txt
│   └── .env.example
├── frontend/
│   ├── src/
│   ├── Dockerfile
│   ├── package.json
│   └── .env.example
├── docker-compose.yml
├── start-local.ps1
├── stop-local.ps1
└── README.md

Validation Commands

Backend syntax check:

cd backend
python -m py_compile app\main.py
python -m py_compile app\api\documents.py
python -m py_compile app\api\chat.py
python -m py_compile app\api\evaluation.py

Frontend build:

cd frontend
npm run build

Docker validation:

docker compose up --build
docker compose ps
docker compose logs -f

Evaluation run:

cd backend
python -m evaluation.run_eval

Limitations

Evaluation uses a fixed benchmark set and custom scoring.
Chat history is kept in the frontend session and is not persisted.
Follow-up question rewriting is not yet implemented.
Authentication and user-level document isolation are not included.
Very large PDFs may require background processing in a production deployment.
Streaming responses are not implemented yet.

Future Improvements

Add streaming answer generation.
Add hybrid dense and keyword retrieval.
Add LLM-as-judge evaluation for groundedness and answer correctness.
Add automatic evaluation-question generation per uploaded document.
Add persistent chat sessions.
Add user authentication and document ownership.
Add deployment configuration for a public URL.
Add CI checks for backend, frontend, and Docker builds.

Demo Flow

Start the platform with ./start-local.ps1 or docker compose up --build.
Upload one or more PDFs.
Show the detected document profile and selected retrieval policy.
Ask a document question.
Expand the source cards and show page/section citations.
Enable reranking and compare source ordering.
Run retrieval evaluation.
Explain why section_aware + top_k=3 is the default configuration.
Open FastAPI docs and Qdrant dashboard to show the backend infrastructure.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
backend		backend
docs/screenshots		docs/screenshots
frontend		frontend
frontend_old		frontend_old
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
start-local.ps1		start-local.ps1
stop-local.ps1		stop-local.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Intelligence RAG Platform

Highlights

Tech Stack

System Architecture

Core Workflow

Running the Project

Option 1: One-command local start

Option 2: Direct Docker Compose

Environment Configuration

API Overview

Retrieval Design

Reranking

Evaluation

RAG Evaluation Results

Selected Retrieval Configuration

Why the Chunking Strategy Matters

Source Attribution

Repository Structure

Validation Commands

Limitations

Future Improvements

Demo Flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document Intelligence RAG Platform

Highlights

Tech Stack

System Architecture

Core Workflow

Running the Project

Option 1: One-command local start

Option 2: Direct Docker Compose

Environment Configuration

API Overview

Retrieval Design

Reranking

Evaluation

RAG Evaluation Results

Selected Retrieval Configuration

Why the Chunking Strategy Matters

Source Attribution

Repository Structure

Validation Commands

Limitations

Future Improvements

Demo Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages