Python Q&A Assistant

AI-powered Python programming Q&A grounded in 50,000 real Stack Overflow answers

Live Demo · Quick Start · API Reference · Test Results · Scaling

Overview

Python Q&A Assistant is a production-ready RAG (Retrieval-Augmented Generation) system that answers Python programming questions using real Stack Overflow data as its knowledge base. Instead of relying on an LLM's pre-trained knowledge, every answer is grounded in retrieved community-verified solutions, with source citations included in the response.

Built as part of the Analytics Vidhya AI Engineer Assessment.

Live Demo

Deployed URL: https://xlm5bw51-5001.inc1.devtunnels.ms/

Dashboard showing status, document count, LLM model, embedding model, and live Q&A interface

Architecture

User Query
    │
    ▼
POST /ask  (FastAPI — async)
    │
    ├─► Embed query ──► all-MiniLM-L6-v2
    │                        │
    │                        ▼
    │              ChromaDB vector store
    │              (50,000 SO documents)
    │                        │
    │                        ▼
    │              Top-K retrieval (K=5)
    │              with cosine similarity
    │
    ├─► Build prompt with retrieved context + citations
    │
    ▼
Groq API — llama-3.1-8b-instant
    │
    ▼
JSON response
{ answer, sources[ ], latency_ms, model }

Design decisions:

ChromaDB over Pinecone — persistent local vector store, no cloud dependency for development, trivially swappable for production
all-MiniLM-L6-v2 — 384-dim embeddings, 5× faster than text-embedding-ada-002 at comparable retrieval quality for technical Q&A
Groq / llama-3.1-8b-instant — sub-second inference with a free API tier; drop-in replaceable with any OpenAI-compatible endpoint
Async FastAPI — non-blocking I/O throughout; Groq SDK and ChromaDB queries run without blocking the event loop

Project Structure

python-qa-assistant/
├── app/
│   ├── main.py          # FastAPI app, lifespan, routers
│   ├── rag.py           # RAG pipeline (embed → retrieve → generate)
│   ├── ingest.py        # ChromaDB ingestion logic
│   └── config.py        # Pydantic settings from .env
├── notebooks/
│   └── test_queries.ipynb   # 10 test queries with responses & observations
├── scripts/
│   └── ingest.py        # CLI: download SO data → build vector store
├── tests/
│   ├── test_api.py      # Unit tests (health, /ask schema, validation)
│   └── test_rag.py      # Integration tests (live retrieval + generation)
├── docs/
│   └── screenshots/     # Dashboard and test result screenshots
├── .env.example         # Required environment variables (template)
├── .gitignore
├── conftest.py          # Pytest fixtures (test client, mock embedder)
├── docker-compose.yml
├── Dockerfile
├── pytest.ini
├── requirements.txt
└── README.md

Quick Start

Prerequisites

Python 3.11+
A free Groq API key
The Stack Overflow Python Questions dataset from Kaggle

1. Clone and install

git clone https://github.com/Harshitraiii2005/Python-Q-A-Assistant
cd Python-Q-A-Assistant
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

2. Configure environment

cp .env.example .env
# Open .env and set your GROQ_API_KEY

3. Ingest the dataset

Download Questions.csv and Answers.csv from Kaggle, place them in ./data/, then run:

python scripts/ingest.py --data-dir data --out data/python_qa_sample.csv

This builds the ChromaDB vector store at ./chroma_db/ (~2–3 minutes for 50k documents).

4. Run the API

uvicorn app.main:app --reload --port 8000

Interactive docs at http://localhost:8000/docs

5. Docker (recommended)

cp .env.example .env          # set GROQ_API_KEY
docker compose up --build     # mounts ./data and persists chroma_db

The docker-compose includes a health check (GET /health) with auto-restart on failure.

API Reference

`GET /health`

Returns service status, document count, and model configuration.

{
  "status": "ready",
  "documents": 50000,
  "llm_model": "llama-3.1-8b-instant",
  "embedding_model": "all-MiniLM-L6-v2",
  "top_k": 5
}

`POST /ask`

Accepts a Python question and returns a grounded answer with Stack Overflow citations.

Request

{ "question": "How do I merge two dictionaries in Python 3?" }

Response

{
  "question": "How do I merge two dictionaries in Python 3?",
  "answer": "In Python 3.9+ you can use the merge operator `|`:\n\n```python\nmerged = dict_a | dict_b\n```\n\nFor earlier versions, use `{**dict_a, **dict_b}` or `dict_a.update(dict_b)`.",
  "sources": [
    {
      "title": "How to merge two dictionaries in a single expression",
      "so_id": "38987",
      "score": 4521,
      "relevance": 0.94,
      "url": "https://stackoverflow.com/a/38987"
    }
  ],
  "latency_ms": 681,
  "model": "llama-3.1-8b-instant"
}

`GET /docs`

OpenAPI interactive documentation (Swagger UI).

`GET /`

Root info endpoint — returns version and available routes.

Test Results

10 diverse queries tested covering core Python topics, edge cases, and off-topic inputs. Full responses and observations are in notebooks/test_queries.ipynb.

Query 1 — Basic data structures

Query 2 — List operations

Query 3 — Error handling

Query 4 — OOP

Query 5 — Async/await

Query 6 — Decorators

Query 7 — Performance

Query 8 — Advanced pattern

Query 9 — Edge case: off-topic

Off-topic query — system correctly returns low-relevance results and flags uncertainty

Query 10 — Edge case: ambiguous

Ambiguous query — system retrieves general performance optimization docs

Observations summary:

#	Query type	Latency	Quality	Notes
1	CSV / pandas	681ms	✅ High	Correct `read_csv` with parameters
2	List reverse	590ms	✅ High	Shows `[::-1]`, `reverse()`, and `reversed()`
3	Exceptions	720ms	✅ High	Covers `try/except/finally` with examples
4	@staticmethod vs @classmethod	840ms	✅ High	Clear distinction with code examples
5	Async/await	910ms	✅ High	Accurate, cites relevant SO threads
6	Decorators	780ms	✅ High	Wraps explanation with functools.wraps
7	Dict key lookup	560ms	✅ High	Correctly recommends `in` over `.get()`
8	List comprehension	640ms	✅ High	Multiple examples with filtering
9	Off-topic (France)	430ms	✅ Handled	Low-relevance docs returned with uncertainty flagged correctly
10	Ambiguous (slow code)	870ms	✅ Handled	Useful profiling advice with relevant SO citations

Scaling to 100+ Concurrent Users

Layer	Current	At Scale
API workers	Single Uvicorn process	Multiple workers via `gunicorn -w 4 -k uvicorn.workers.UvicornWorker` behind Nginx
Embeddings	Per-request inference	Cache frequent query embeddings in Redis (TTL 1h); batch similar queries
Vector DB	Local ChromaDB	Migrate to Pinecone or Qdrant Cloud — managed, horizontally scalable
LLM calls	Synchronous Groq SDK	Async SDK with connection pooling; add request queue for burst traffic
Response cache	None	Redis semantic cache — deduplicate near-identical queries before hitting LLM
Infra	Single container	Kubernetes HPA (auto-scale on CPU/RPS) or AWS ECS Fargate
Cost control	Free Groq tier	Prompt caching for system prompt; response caching for repeated queries
Observability	None	OpenTelemetry traces, Prometheus metrics, Grafana dashboards

Estimated throughput at scale: 100 concurrent users → ~4 Uvicorn workers + Redis cache reduces LLM calls by ~40% on repeat queries → P95 latency < 2s.

Running Tests

# Unit tests only (no server required)
pytest tests/ -v -m "not integration"

# Full suite including integration tests (requires running server)
uvicorn app.main:app &
pytest tests/ -v

# With coverage report
pytest tests/ --cov=app --cov-report=term-missing

Deployment

Render (recommended free tier)

Push to GitHub
New Web Service → connect this repo
Set environment variables: GROQ_API_KEY, MAX_DOCUMENTS, TOP_K
Build command: pip install -r requirements.txt
Start command: uvicorn app.main:app --host 0.0.0.0 --port $PORT

Note: Render's free tier has 512MB RAM. Set MAX_DOCUMENTS=10000 for the free deployment; use a paid instance for the full 50k corpus.

Hugging Face Spaces

Use the Docker SDK option — point to the Dockerfile in this repo. Set secrets via the Spaces UI.

Environment Variables

See .env.example for all variables. Required:

Variable	Description	Default
`GROQ_API_KEY`	Your Groq API key	—
`LLM_MODEL`	Groq model ID	`llama-3.1-8b-instant`
`EMBEDDING_MODEL`	Sentence transformer model	`all-MiniLM-L6-v2`
`CHROMA_PERSIST_DIR`	ChromaDB storage path	`./chroma_db`
`DATA_PATH`	Processed CSV path	`./data/python_qa_sample.csv`
`MAX_DOCUMENTS`	Documents to index	`50000`
`TOP_K`	Chunks retrieved per query	`5`
`PORT`	API port	`8000`

Tech Stack

Component	Technology	Why
API framework	FastAPI	Async, auto-docs, Pydantic validation
Vector store	ChromaDB	Persistent local DB, no infra overhead
Embeddings	all-MiniLM-L6-v2	Fast, lightweight, strong semantic similarity
LLM	Groq / llama-3.1-8b-instant	Free tier, <1s inference, OpenAI-compatible
Data source	Stack Overflow (50k Q&A pairs)	Community-verified, domain-specific
Containerisation	Docker + docker-compose	One-command reproducible setup
Testing	pytest + pytest-asyncio	Unit + integration coverage

License

MIT — see LICENSE.

Built by Harshit Rai · Analytics Vidhya AI Engineer Assessment · June 2026

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
app		app
notebooks		notebooks
scripts		scripts
tests		tests
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Screenshot from 2026-06-14 16-13-28.png		Screenshot from 2026-06-14 16-13-28.png
Screenshot from 2026-06-14 16-13-40.png		Screenshot from 2026-06-14 16-13-40.png
conftest.py		conftest.py
docker-compose.yml		docker-compose.yml
eval.py		eval.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Python Q&A Assistant

Overview

Live Demo

Architecture

Project Structure

Quick Start

Prerequisites

1. Clone and install

2. Configure environment

3. Ingest the dataset

4. Run the API

5. Docker (recommended)

API Reference

GET /health

POST /ask

GET /docs

GET /

Test Results

Query 1 — Basic data structures

Query 2 — List operations

Query 3 — Error handling

Query 4 — OOP

Query 5 — Async/await

Query 6 — Decorators

Query 7 — Performance

Query 8 — Advanced pattern

Query 9 — Edge case: off-topic

Query 10 — Edge case: ambiguous

Scaling to 100+ Concurrent Users

Running Tests

Deployment

Render (recommended free tier)

Hugging Face Spaces

Environment Variables

Tech Stack

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`POST /ask`

`GET /docs`

`GET /`

Packages