AI-powered Python programming Q&A grounded in 50,000 real Stack Overflow answers
Live Demo · Quick Start · API Reference · Test Results · Scaling
Python Q&A Assistant is a production-ready RAG (Retrieval-Augmented Generation) system that answers Python programming questions using real Stack Overflow data as its knowledge base. Instead of relying on an LLM's pre-trained knowledge, every answer is grounded in retrieved community-verified solutions, with source citations included in the response.
Built as part of the Analytics Vidhya AI Engineer Assessment.
Deployed URL: https://xlm5bw51-5001.inc1.devtunnels.ms/
Dashboard showing status, document count, LLM model, embedding model, and live Q&A interface
User Query
│
▼
POST /ask (FastAPI — async)
│
├─► Embed query ──► all-MiniLM-L6-v2
│ │
│ ▼
│ ChromaDB vector store
│ (50,000 SO documents)
│ │
│ ▼
│ Top-K retrieval (K=5)
│ with cosine similarity
│
├─► Build prompt with retrieved context + citations
│
▼
Groq API — llama-3.1-8b-instant
│
▼
JSON response
{ answer, sources[ ], latency_ms, model }
Design decisions:
- ChromaDB over Pinecone — persistent local vector store, no cloud dependency for development, trivially swappable for production
- all-MiniLM-L6-v2 — 384-dim embeddings, 5× faster than
text-embedding-ada-002at comparable retrieval quality for technical Q&A - Groq / llama-3.1-8b-instant — sub-second inference with a free API tier; drop-in replaceable with any OpenAI-compatible endpoint
- Async FastAPI — non-blocking I/O throughout; Groq SDK and ChromaDB queries run without blocking the event loop
python-qa-assistant/
├── app/
│ ├── main.py # FastAPI app, lifespan, routers
│ ├── rag.py # RAG pipeline (embed → retrieve → generate)
│ ├── ingest.py # ChromaDB ingestion logic
│ └── config.py # Pydantic settings from .env
├── notebooks/
│ └── test_queries.ipynb # 10 test queries with responses & observations
├── scripts/
│ └── ingest.py # CLI: download SO data → build vector store
├── tests/
│ ├── test_api.py # Unit tests (health, /ask schema, validation)
│ └── test_rag.py # Integration tests (live retrieval + generation)
├── docs/
│ └── screenshots/ # Dashboard and test result screenshots
├── .env.example # Required environment variables (template)
├── .gitignore
├── conftest.py # Pytest fixtures (test client, mock embedder)
├── docker-compose.yml
├── Dockerfile
├── pytest.ini
├── requirements.txt
└── README.md
- Python 3.11+
- A free Groq API key
- The Stack Overflow Python Questions dataset from Kaggle
git clone https://github.com/Harshitraiii2005/Python-Q-A-Assistant
cd Python-Q-A-Assistant
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Open .env and set your GROQ_API_KEYDownload Questions.csv and Answers.csv from Kaggle, place them in ./data/, then run:
python scripts/ingest.py --data-dir data --out data/python_qa_sample.csvThis builds the ChromaDB vector store at ./chroma_db/ (~2–3 minutes for 50k documents).
uvicorn app.main:app --reload --port 8000Interactive docs at http://localhost:8000/docs
cp .env.example .env # set GROQ_API_KEY
docker compose up --build # mounts ./data and persists chroma_dbThe docker-compose includes a health check (GET /health) with auto-restart on failure.
Returns service status, document count, and model configuration.
{
"status": "ready",
"documents": 50000,
"llm_model": "llama-3.1-8b-instant",
"embedding_model": "all-MiniLM-L6-v2",
"top_k": 5
}Accepts a Python question and returns a grounded answer with Stack Overflow citations.
Request
{ "question": "How do I merge two dictionaries in Python 3?" }Response
{
"question": "How do I merge two dictionaries in Python 3?",
"answer": "In Python 3.9+ you can use the merge operator `|`:\n\n```python\nmerged = dict_a | dict_b\n```\n\nFor earlier versions, use `{**dict_a, **dict_b}` or `dict_a.update(dict_b)`.",
"sources": [
{
"title": "How to merge two dictionaries in a single expression",
"so_id": "38987",
"score": 4521,
"relevance": 0.94,
"url": "https://stackoverflow.com/a/38987"
}
],
"latency_ms": 681,
"model": "llama-3.1-8b-instant"
}OpenAPI interactive documentation (Swagger UI).
Root info endpoint — returns version and available routes.
10 diverse queries tested covering core Python topics, edge cases, and off-topic inputs. Full responses and observations are in
notebooks/test_queries.ipynb.
Off-topic query — system correctly returns low-relevance results and flags uncertainty
Ambiguous query — system retrieves general performance optimization docs
Observations summary:
| # | Query type | Latency | Quality | Notes |
|---|---|---|---|---|
| 1 | CSV / pandas | 681ms | ✅ High | Correct read_csv with parameters |
| 2 | List reverse | 590ms | ✅ High | Shows [::-1], reverse(), and reversed() |
| 3 | Exceptions | 720ms | ✅ High | Covers try/except/finally with examples |
| 4 | @staticmethod vs @classmethod | 840ms | ✅ High | Clear distinction with code examples |
| 5 | Async/await | 910ms | ✅ High | Accurate, cites relevant SO threads |
| 6 | Decorators | 780ms | ✅ High | Wraps explanation with functools.wraps |
| 7 | Dict key lookup | 560ms | ✅ High | Correctly recommends in over .get() |
| 8 | List comprehension | 640ms | ✅ High | Multiple examples with filtering |
| 9 | Off-topic (France) | 430ms | ✅ Handled | Low-relevance docs returned with uncertainty flagged correctly |
| 10 | Ambiguous (slow code) | 870ms | ✅ Handled | Useful profiling advice with relevant SO citations |
| Layer | Current | At Scale |
|---|---|---|
| API workers | Single Uvicorn process | Multiple workers via gunicorn -w 4 -k uvicorn.workers.UvicornWorker behind Nginx |
| Embeddings | Per-request inference | Cache frequent query embeddings in Redis (TTL 1h); batch similar queries |
| Vector DB | Local ChromaDB | Migrate to Pinecone or Qdrant Cloud — managed, horizontally scalable |
| LLM calls | Synchronous Groq SDK | Async SDK with connection pooling; add request queue for burst traffic |
| Response cache | None | Redis semantic cache — deduplicate near-identical queries before hitting LLM |
| Infra | Single container | Kubernetes HPA (auto-scale on CPU/RPS) or AWS ECS Fargate |
| Cost control | Free Groq tier | Prompt caching for system prompt; response caching for repeated queries |
| Observability | None | OpenTelemetry traces, Prometheus metrics, Grafana dashboards |
Estimated throughput at scale: 100 concurrent users → ~4 Uvicorn workers + Redis cache reduces LLM calls by ~40% on repeat queries → P95 latency < 2s.
# Unit tests only (no server required)
pytest tests/ -v -m "not integration"
# Full suite including integration tests (requires running server)
uvicorn app.main:app &
pytest tests/ -v
# With coverage report
pytest tests/ --cov=app --cov-report=term-missing- Push to GitHub
- New Web Service → connect this repo
- Set environment variables:
GROQ_API_KEY,MAX_DOCUMENTS,TOP_K - Build command:
pip install -r requirements.txt - Start command:
uvicorn app.main:app --host 0.0.0.0 --port $PORT
Note: Render's free tier has 512MB RAM. Set
MAX_DOCUMENTS=10000for the free deployment; use a paid instance for the full 50k corpus.
Use the Docker SDK option — point to the Dockerfile in this repo. Set secrets via the Spaces UI.
See .env.example for all variables. Required:
| Variable | Description | Default |
|---|---|---|
GROQ_API_KEY |
Your Groq API key | — |
LLM_MODEL |
Groq model ID | llama-3.1-8b-instant |
EMBEDDING_MODEL |
Sentence transformer model | all-MiniLM-L6-v2 |
CHROMA_PERSIST_DIR |
ChromaDB storage path | ./chroma_db |
DATA_PATH |
Processed CSV path | ./data/python_qa_sample.csv |
MAX_DOCUMENTS |
Documents to index | 50000 |
TOP_K |
Chunks retrieved per query | 5 |
PORT |
API port | 8000 |
| Component | Technology | Why |
|---|---|---|
| API framework | FastAPI | Async, auto-docs, Pydantic validation |
| Vector store | ChromaDB | Persistent local DB, no infra overhead |
| Embeddings | all-MiniLM-L6-v2 | Fast, lightweight, strong semantic similarity |
| LLM | Groq / llama-3.1-8b-instant | Free tier, <1s inference, OpenAI-compatible |
| Data source | Stack Overflow (50k Q&A pairs) | Community-verified, domain-specific |
| Containerisation | Docker + docker-compose | One-command reproducible setup |
| Testing | pytest + pytest-asyncio | Unit + integration coverage |
MIT — see LICENSE.







