This project is a question-answering demo over 100K Amazon product reviews, using a simple retrieval‑augmented generation setup (vector search, BM25, and reranking) with a Gradio UI.
- Builds a FAISS index over 100K Amazon reviews
- Runs hybrid retrieval (vectors + BM25) with a cross‑encoder reranker
- Uses an OpenAI chat model to answer questions based only on retrieved reviews
- Exposes a small Gradio app to ask questions about the reviews
Install dependencies (Python 3.9+ recommended):
pip install langchain-core langchain-community langchain-openai \
sentence-transformers faiss-cpu rank-bm25 gradio datasets tqdm python-dotenvPrepare data and build the index:
python -m src.ingest
python -m src.build_indexStart the Gradio app:
python -m src.app_gradioThen open the URL printed in the terminal and ask questions like “What do customers say about battery life?”.