A production-ready Retrieval-Augmented Generation (RAG) system for analyzing customer feedback with hybrid search (BM25 + FAISS) and sentiment analysis.
This project implements a RAG system for analyzing 50,000+ customer reviews in Turkish. The system demonstrates the performance benefits of query-specific analysis using RAG compared to full dataset analysis.
- Hybrid RAG Pipeline: Combines BM25 (sparse) and FAISS (dense) retrieval with RRF fusion
- Query-Specific Analysis: Analyze relevant reviews for a specific topic in ~0.8 seconds
- Full Dataset Analysis: Process all 50,000 reviews in ~25 minutes
- Sentiment Analysis: Turkish sentiment classification using BERT-based models
- Visualization: Generate business insights with charts and heatmaps
- Hidden Risk Detection: Identify customers who give high scores but express negative sentiment
- Clean and prepare customer review data
- Build BM25 (sparse) and FAISS (dense) indexes
- Hybrid RAG pipeline with RRF fusion
- RAG (Query-Specific): Analyze relevant reviews for a query (~1 seconds)
- Non-RAG (Full Analysis): Process all 50,000 reviews (~5 minutes)
- Result: RAG is ~1500x faster for query-specific analysis
- Turkish sentiment analysis using BERT models
- Sentiment scores and classification
- Correlation between customer scores and sentiment
- Sentiment distribution charts
- Timeline trends
- Most complained topics
- Score-based sentiment distribution
- Topic × Score correlation heatmap
- Hidden risks and strengths detection
rag3/
├── data/ # Data files
├── index/ # Generated indexes (BM25 + FAISS)
├── outputs/ # Analysis outputs and charts
├── eval/ # Evaluation results
├── src/ # Source code modules
│ ├── analyze_full_dataset.py
│ ├── visualization.py
│ └── utils.py
├── ingest_clean.py # Data ingestion
├── build_index.py # Index building
├── query_rag.py # RAG query
├── query_baseline.py # Baseline query
├── evaluate.py # RAG vs Baseline evaluation
├── benchmark.py # RAG vs Non-RAG benchmark
├── requirements.txt # Dependencies
└── README.md # This file
- Python 3.8+
- 8GB+ RAM (16GB recommended)
- Optional: GPU for faster inference
-
Clone the repository
git clone https://github.com/onurrtosunn/credit_risk_and_rag_case.git cd rag_case_study -
Install dependencies
pip install -r requirements.txt
-
Prepare data
python ingest_clean.py --input musteriyorumlari.xlsx --out-dir data
-
Build indexes
python build_index.py --input-parquet data/clean.parquet --out-dir index
-
Run a query
python query_rag.py --query "kredi" --index-dir index --limit 1000
Clean and prepare the raw Excel data:
python ingest_clean.py \
--input musteriyorumlari.xlsx \
--out-dir data \
--out-name cleanOutput: data/clean.parquet and data/clean.csv
Build BM25 and FAISS indexes:
python build_index.py \
--input-parquet data/clean.parquet \
--out-dir index \
--model-name intfloat/multilingual-e5-small \
--batch-size 128Optional: Pre-compute sentiment for faster queries:
# Step 1: Full dataset analysis
python src/analyze_full_dataset.py \
--input data/clean.parquet \
--output outputs/full_analysis_results.parquet
# Step 2: Build index (sentiment scores will be included)
python build_index.py --input-parquet data/clean.parquet --out-dir indexAnalyze reviews for a specific topic:
python query_rag.py \
--query "kredi" \
--index-dir index \
--limit 1000 \
--min_sim 0.7 \
--out_csv outputs/rag_kredi.csvQuery using only BM25:
python query_baseline.py \
--query "kredi" \
--index-dir index \
--limit 1000 \
--out_csv outputs/baseline_kredi.csvAnalyze all 50,000 reviews:
python src/analyze_full_dataset.py \
--input data/clean.parquet \
--output outputs/full_analysis_results.parquet \
--batch_size 256Compare RAG and Baseline approaches:
python evaluate.py \
--queries "kredi,araç,takım" \
--index-dir index \
--out_dir eval \
--limit 1000 \
--min_sim 0.7Output:
- Comparison metrics
- Query-specific visualizations in
eval/charts/ - Summary metrics in
eval/summary_metrics.csv
Measure performance difference:
python benchmark.py \
--queries "kredi" "araç" "takım" \
--index-dir index \
--input-data data/clean.parquet \
--output-dir outputs/benchmarkOutput:
- RAG query times (~0.8 seconds)
- Non-RAG full analysis time (~25 minutes)
- Speedup analysis
Generate visualizations for full dataset:
python src/visualization.py \
--input outputs/full_analysis_results.parquet \
--output-dir outputs/charts \Output:
sentiment_distribution.png: Overall sentiment distributiontimeline_trends.png: Timeline trendstopic_frequency.png: Most complained topicssentiment_by_score.png: Sentiment by scorecorrelation_heatmap.png: Topic × Score correlationhidden_risks_strengths.png: Hidden risks and strengths
The system follows a modular architecture with clear separation between data preparation, indexing, query processing, and analysis.
Raw Data (Excel) → Data Cleaning → Cleaned Data (Parquet)
↓
Index Building
├── BM25 (Sparse)
└── FAISS (Dense)
↓
Query Processing
├── RAG Pipeline (Query-Specific) →
└── Non-RAG Pipeline (Full Dataset)
↓
Evaluation & Visualization
- Reads Excel/CSV data
- Cleans and normalizes text
- Converts to Parquet format
- Output:
data/clean.parquet
- BM25 (Sparse): Turkish tokenization, lemmatization, BM25Okapi index
- FAISS (Dense): Embedding model (
intfloat/multilingual-e5-small), HNSW index (Inner Product) - Metadata: id, score, title, feedback, timestamp, pre-computed sentiment (optional)
- Output:
index/directory
- Query encoding with embedding model
- BM25 retrieval (lexical matching)
- FAISS retrieval (semantic matching)
- RRF fusion (hybrid)
- Optional reranking with cross-encoder
- Sentiment analysis on top-N results
- Aggregation and metrics
- Full dataset scan (all 50,000 reviews)
- Sentiment analysis on all reviews
- Aggregation and metrics
- RAG vs Baseline (BM25-only) comparison
- Relevance metrics
- Sentiment correlation
- Business insights
- Automatic visualization generation
- Measures RAG query times
- Measures Non-RAG full analysis time
- Calculates speedup and savings
- Sentiment distribution charts
- Timeline trends
- Topic frequency
- Score-based sentiment
- Topic × Score correlation heatmap
- Hidden risks and strengths detection
- Embedding Model:
intfloat/multilingual-e5-small(multilingual) - Sparse Retrieval: BM25 (rank-bm25)
- Dense Retrieval: FAISS IVFPQ
- Fusion: RRF (Reciprocal Rank Fusion, k=60)
- Reranker (optional):
BAAI/bge-reranker-v2-m3 - Sentiment Analysis:
savasy/bert-base-turkish-sentiment-cased - Lemmatization: zeyrek (Turkish)
- Visualization: matplotlib, seaborn
- Data Processing: pandas, numpy, pyarrow
- Hybrid RAG (BM25 + FAISS): Combines lexical (BM25) and semantic (FAISS) matching for better coverage
- RRF Fusion: Simple and effective fusion method that balances both retrieval results
- Pre-computed Sentiment: Sentiment scores can be pre-computed and stored in index for faster queries
- HNSW Index: Fast approximate nearest neighbor search for large datasets
| Scenario | Time | Description |
|---|---|---|
| RAG (Query-Specific) | ~0.8 seconds | Analyze relevant reviews for a query |
| Non-RAG (Full Analysis) | ~25 minutes | Process all 50,000 reviews |
- Sentiment Analysis: High correlation between customer scores and sentiment analysis
- Hidden Risks: Customers who give high scores (4-5) but express negative sentiment
- Hidden Strengths: Customers who give low scores (1-2) but mention positive aspects
- Topic Insights: Correlation heatmap shows which topics produce strong negative/positive sentiment for each score
Some examples of the visualizations and outputs generated by the system:
Example terminal output showing RAG query results:
Sentiment distribution for the "kredi" (credit) query:
The system generates various business insights including:
- Topic Frequency: Most discussed topics in customer reviews
- Sentiment by Score: Correlation between customer scores and sentiment analysis
- Correlation Heatmap: Topic × Score correlation analysis
- Hidden Risks & Strengths: Identification of customers with score-sentiment mismatches
All visualizations are automatically generated in the eval/charts/ directory when running the evaluation script.
build_index.py:
--model-name: Embedding model (default:intfloat/multilingual-e5-small)--batch-size: Embedding batch size (default: 128)--no-e5-prefix: Disable E5 prefix
query_rag.py:
--k_lex: BM25 candidate count (default: 0 = all)--k_vec: FAISS candidate count (default: 0 = all)--limit: Final top-N after fusion (default: 0 = all)--min_sim: Minimum semantic similarity threshold (default: 0.5)--use_reranker: Use cross-encoder reranker--max_sentiment: Max results for sentiment analysis (default: 500)
-
Sentiment Pre-computation: If full analysis is run before index building, sentiment scores are automatically included in the index for faster queries.
-
GPU Support: If GPU is available, sentiment analysis and embedding operations run on GPU.
-
Quantization: Dynamic quantization is used for CPU (2-4x speedup).
-
Large Artifacts: Index files and analysis results are large. They can be regenerated using the provided scripts. See
.gitignorefor files that should not be committed.
# 1. Data preparation
python ingest_clean.py --input musteriyorumlari.xlsx --out-dir data
# 2. Full dataset analysis (optional, for pre-computed sentiment)
python src/analyze_full_dataset.py --input data/clean.parquet --output outputs/full_analysis_results.parquet
# 3. Index building
python build_index.py --input-parquet data/clean.parquet --out-dir index
# 4. Evaluation
python evaluate.py --queries "zaman,araç,takım" --index-dir index --out_dir eval
