Upload your documents. Ask questions. Get answers grounded in your own content.
A full RAG pipeline built from scratch. No managed vector database, no third-party retrieval service. Local embeddings, FAISS, and a clean chat interface on top.
flowchart TD
A[User uploads .txt / .md files] --> B[Document Loader]
B --> C[Chunker\n500 char chunks, 50 char overlap]
C --> D[Embedder\nall-MiniLM-L6-v2 local model]
D --> E[FAISS Index\nsaved per session UUID]
F[User asks a question] --> G[Query Embedder]
G --> H[FAISS Similarity Search\nTop-K retrieval]
E --> H
H --> I[Prompt Builder]
I --> J[LLM Generation]
J --> K[Answer returned to chat UI]
style A fill:#1e293b,color:#f8fafc,stroke:#334155
style F fill:#1e293b,color:#f8fafc,stroke:#334155
style E fill:#0f172a,color:#f8fafc,stroke:#6366f1
style K fill:#0f172a,color:#f8fafc,stroke:#22c55e
Each upload creates an isolated session with its own FAISS index. Sessions are UUID-based with path traversal protection built in.
| Layer | Technology |
|---|---|
| Backend | Python, FastAPI |
| Vector Search | FAISS (flat L2 index) |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2, runs locally) |
| Frontend | React, Vite, Tailwind CSS |
| Validation | Pydantic |
| Session Management | UUID-based, localStorage |
grounded/
├── backend/
│ ├── app/
│ │ ├── ingestion/ # loader, chunker, embedder, indexer
│ │ ├── retrieval/ # FAISS retriever
│ │ ├── generation/ # prompt builder, LLM call
│ │ ├── config.py # all settings in one place
│ │ └── main.py # FastAPI routes
│ └── data/
│ ├── raw/ # source documents
│ ├── processed/ # global vector store
│ └── uploads/ # per-session indexes
└── frontend/
└── src/
├── components/ # ChatWindow, UploadPage, InputBox
└── services/ # API calls
Backend
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reloadFrontend
cd frontend
npm install
npm run devBackend runs on http://localhost:8000. Frontend on http://localhost:5173.
POST /upload
POST /upload
Content-Type: multipart/form-data
session_id: string (UUID)
files: .txt or .md files (max 10MB each, max 20 files per upload)POST /chat
POST /chat
Content-Type: application/json
{
"question": "your question here",
"session_id": "your-session-uuid"
}GET /health
GET /healthAll settings live in backend/app/config.py:
| Setting | Default | What it controls |
|---|---|---|
CHUNK_SIZE |
500 | Characters per chunk |
CHUNK_OVERLAP |
50 | Overlap between chunks |
EMBEDDING_MODEL_NAME |
all-MiniLM-L6-v2 | Local embedding model |
TOP_K |
5 | Chunks retrieved per query |
MAX_FILE_SIZE_BYTES |
10MB | Per file upload limit |
MAX_FILES_PER_UPLOAD |
20 | Files per session |
Chunking strategy matters more than most people expect. Fixed character chunking with overlap is simple and works well for most plain text. The real tradeoff is chunk size — too small and you lose context, too large and retrieval gets noisy.
Local embeddings with MiniLM keep latency low and cost at zero. For this scale it works well. At production scale with millions of chunks you would move to approximate nearest neighbour search and a proper vector database like Pinecone or Weaviate.
FAISS flat L2 search is exact and accurate. It is the right choice here. At scale you would switch to IVF or HNSW indexes depending on your latency and accuracy tradeoffs.
Built by Abdullah Khalid