AI-powered insurance fraud detection using ML, computer vision, graph analytics, and LLMs.
ClaimLens analyzes insurance claims to detect fraud using multiple AI techniques:
- ML Engine: CatBoost classifier with 145 features
- Computer Vision: Document forgery detection (PAN, Aadhaar, licenses, etc.)
- Graph Analytics: Fraud network detection using Neo4j
- LLM: Natural language explanations powered by Groq
The system can process claims in English and Hinglish (Hindi-English mix).
Requirements:
- Python 3.10+
- Docker (optional, for Neo4j)
- Groq API key (free at console.groq.com)
Setup:
git clone https://github.com/pranaya-mathur/ClaimLens_App.git
cd ClaimLens_App
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Add your GROQ_API_KEY to .envRun:
# Start backend
uvicorn api.main:app --reload
# Start frontend (new terminal)
streamlit run frontend/streamlit_app.py
# Optional: Neo4j for graph features
docker-compose up neo4j -dVisit http://localhost:8501 for the UI or http://localhost:8000/docs for API docs.
The system processes claims through four engines:
- ML Engine extracts 145 features (text embeddings, behavioral patterns, document flags) and runs CatBoost classification
- CV Engine verifies document authenticity using ResNet50 + Error Level Analysis
- Graph Engine checks for fraud networks (shared documents, suspicious connections)
- LLM Engine aggregates results and generates explanations
All results are combined using semantic aggregation to produce a final verdict.
Score a claim:
POST /api/ml/score
{
"claim_id": "CLM001",
"narrative": "Car accident on highway",
"claim_amount": 50000,
"product": "motor",
...
}Verify a document:
POST /api/documents/verify-pan
(upload PAN card image)Complete analysis:
POST /api/unified/analyze-complete
(runs all engines)See /docs folder for detailed API documentation.
- Multi-modal fraud detection
- Document forgery detection with OCR
- Fraud network visualization
- AI-generated explanations (technical & customer-friendly)
- Real-time processing (<2s per claim with caching)
- Hinglish support
ClaimLens_App/
├── api/ # FastAPI backend
│ ├── routes/ # API endpoints
│ └── schemas/ # Pydantic models
├── src/ # Core engines
│ ├── ml_engine/ # CatBoost fraud scoring
│ ├── cv_engine/ # Document verification
│ ├── llm_engine/ # LLM explanations
│ └── fraud_engine/ # Graph analytics
├── frontend/ # Streamlit UI
├── models/ # Trained models
├── docs/ # Documentation
└── tests/ # Test suite
Key environment variables (see .env.example):
GROQ_API_KEY=your_key_here
NEO4J_URI=bolt://localhost:7687
NEO4J_PASSWORD=your_password
ENABLE_LLM_EXPLANATIONS=true
ML_THRESHOLD=0.5Based on testing with synthetic claims:
- ML inference: ~80ms
- Document analysis: 1-2s
- LLM explanations: 2-3s
- End-to-end (cached): <500ms
Rate limited to 100 requests/min by default.
- Neo4j sometimes needs a restart on first launch
- Large PDFs (>10MB) may timeout
- Hinglish embeddings need mixed English-Hindi text to work well
- Generic document detector accuracy varies by document quality
See Issues for more.
- Add batch processing endpoint
- Support PDF documents
- Add more languages (Hindi, Tamil, Telugu)
- Improve graph visualizations
- Add model retraining pipeline
See full roadmap in CHANGELOG.md.
PRs welcome! Please:
- Fork the repo
- Create a feature branch
- Add tests for new features
- Submit a PR
See CONTRIBUTING.md for guidelines.
MIT - see LICENSE
Pranay Mathur
- GitHub: @pranaya-mathur
- Email: pranaya.mathur@yahoo.com
Built with help from Groq, LangChain, Neo4j, and CatBoost.