a web app that lets you upload documents and ask questions about them. it uses retrieval-augmented generation (RAG) to find relevant passages and generate answers with source citations.
- upload PDF, DOCX, TXT or MD files through the browser
- documents are chunked, embedded, and stored in a vector database
- questions are answered using hybrid search (dense + BM25) and a cross-encoder re-ranker
- answers include citations pointing back to the source document and page
- documents can be deleted from the store at any time
| layer | tool |
|---|---|
| LLM | OpenAI GPT (via openai SDK) |
| embeddings | sentence-transformers — all-MiniLM-L6-v2 |
| vector database | ChromaDB (persistent, local) |
| sparse search | BM25 via rank-bm25 |
| re-ranking | cross-encoder ms-marco-MiniLM-L-6-v2 |
| PDF parsing | pypdf + pymupdf + tesseract (OCR fallback for scanned PDFs) |
| backend | FastAPI |
| frontend | plain HTML / CSS / JS (no framework) |
rag-qa/
├── app/
│ ├── main.py # FastAPI app factory
│ ├── models.py # pydantic request/response schemas
│ ├── dependencies.py # shared pipeline singleton
│ └── routes/
│ ├── documents.py # upload, list, delete endpoints
│ ├── qa.py # ask endpoint
│ └── health.py # health check endpoint
├── static/
│ └── index.html # single-page web UI
├── config.py # all tuneable settings
├── ingest.py # document parsing and chunking
├── vectorstore.py # ChromaDB wrapper
├── retrieval.py # hybrid search + re-ranking
├── rag.py # main RAG pipeline
├── server.py # entry point
└── requirements.txt
1. install dependencies
pip install -r requirements.txttesseract is also required for OCR on scanned PDFs. install it with:
# macOS
brew install tesseract
# ubuntu / debian
sudo apt install tesseract-ocr2. add your OpenAI API key
cp .env.example .env
# open .env and set OPENAI_API_KEY=sk-...3. start the server
uvicorn server:app --host 0.0.0.0 --port 80004. open the app
go to http://localhost:8000 in your browser.
upload a document from the sidebar, then type a question.
all settings are in config.py. the most useful ones:
| setting | default | description |
|---|---|---|
openai_model |
gpt-4o-mini |
which GPT model to use |
chunk_size |
512 |
tokens per chunk |
chunk_overlap |
64 |
token overlap between chunks |
top_k_rerank |
5 |
number of chunks passed to the LLM |
hybrid_alpha |
0.5 |
blend between dense (1.0) and sparse (0.0) search |
- user authentication so multiple users can have separate document stores
- streaming responses so the answer appears word by word instead of all at once
- support for URLs and web pages as input sources
- multi-language support for non-english documents
- a document preview panel that highlights the cited passages
- conversation history so follow-up questions have context
- evaluation metrics to measure retrieval and answer quality
- docker setup for easier deployment