A retrieval-augmented generation (RAG) pipeline for querying research papers. Ask questions and get answers with citations pointing to the exact source passage. It runs fully locally using Ollama and Llama 3.2.
Instead of asking an LLM a question and hoping it knows the answer from training data, this system:
- Splits your PDFs into overlapping text chunks
- Converts each chunk into a vector embedding that captures its meaning
- Stores those embeddings locally in ChromaDB
- At query time, it embeds your question and finds the most semantically similar chunks
- Passes those chunks to Llama 3.2 as context
- Returns an answer based only on the retrieved text, and returns source citations
This means the model answers based only on the uploaded documents, and not general training knowledge.
| Component | Tool |
|---|---|
| LLM | Llama 3.2 3B via Ollama |
| Embeddings | BAAI/bge-small-en-v1.5 via HuggingFace |
| Vector store | ChromaDB (persistent, on-disk) |
| Orchestration | LlamaIndex |
| Interface | Streamlit |
Requirements: Python 3.9+, Ollama installed (ollama.com)
# Clone the repo
git clone https://github.com/aeesh/paper-qa-system.git
cd paper-qa-system
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows
# Install dependencies
pip install llama-index llama-index-vector-stores-chroma llama-index-embeddings-huggingface llama-index-llms-ollama chromadb streamlit pymupdf
# Pull the local model
ollama pull llama3.2Step 1 — Add the PDFs
Drop the PDF papers into the papers/ folder.
Step 2 — Ingest (run once)
# In a separate terminal, start Ollama
ollama serve
# Back in your main terminal
python ingest.pyThis reads your PDFs, chunks them, generates embeddings, and stores everything in chroma_db/.
You only need to run this again if you add new papers.
Step 3 — Launch the interface
streamlit run app.pyA browser window opens. Type your question, click "Get Answer", and see the answer with source citations.
Or query from terminal
python query.pyThe system was evaluated on 34 domain-specific questions across 5 research papers in materials science and AI — specifically GraphMetaMat, DiffuMeta, two high-entropy wolframite oxide papers, and a Quantum ESPRESSO tutorial.
| Setting | Automated Accuracy |
|---|---|
| top_k = 3 (retrieve 3 chunks) | 44.1% (15/34) |
| top_k = 5 (retrieve 5 chunks) | 50.0% (17/34) |
The automated scorer checks for key numbers, acronyms, and phrases from the expected answer. Manual review of the full results puts real accuracy slightly higher, since the scorer misses paraphrased correct answers.
Main failure modes:
- Retrieval misses: For questions about specific methods or numerical details, the relevant passage sometimes isn't in the top-k chunks retrieved. Increasing k helps but doesn't fully solve it.
- Cross-paper ambiguity: Two papers cover similar wolframite materials with different measurements. The model sometimes retrieves from the wrong one.
- Model size: Llama 3.2 3B will sometimes hallucinate rather than say "I don't know." A larger model would improve factual accuracy on these technical questions significantly.
Full evaluation results are in eval_results.json. Questions and expected answers are in eval_dataset.json.
paper-qa-system/
├── papers/ # PDF files go here
├── ingest.py # Read PDFs, chunk, embed, store in ChromaDB
├── query.py # Single question from terminal
├── evaluate.py # Run full eval dataset, compute accuracy
├── app.py # Streamlit web interface
├── eval_dataset.json # Evaluation questions with expected answers
├── eval_results.json # Evaluation results
├── util.py # helper functions
└── .gitignore
- Semantic chunking instead of fixed-size splitting — keeping sentences and paragraphs intact would improve retrieval precision
- Reranking — after retrieving top-k chunks, a cross-encoder reranker could reorder them by relevance before passing to the LLM
- Larger model — swapping Llama 3.2 3B for a 13B or 70B model (given sufficient hardware) would substantially improve factual accuracy on technical questions
- Hybrid search — combining dense vector search with keyword (BM25) search would help for questions about specific numbers or named methods