ClaimLens AI

AI-powered insurance fraud detection using ML, computer vision, graph analytics, and LLMs.

What is this?

ClaimLens analyzes insurance claims to detect fraud using multiple AI techniques:

ML Engine: CatBoost classifier with 145 features
Computer Vision: Document forgery detection (PAN, Aadhaar, licenses, etc.)
Graph Analytics: Fraud network detection using Neo4j
LLM: Natural language explanations powered by Groq

The system can process claims in English and Hinglish (Hindi-English mix).

Quick Start

Requirements:

Python 3.10+
Docker (optional, for Neo4j)
Groq API key (free at console.groq.com)

Setup:

git clone https://github.com/pranaya-mathur/ClaimLens_App.git
cd ClaimLens_App

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt

cp .env.example .env
# Add your GROQ_API_KEY to .env

Run:

# Start backend
uvicorn api.main:app --reload

# Start frontend (new terminal)
streamlit run frontend/streamlit_app.py

# Optional: Neo4j for graph features
docker-compose up neo4j -d

Visit http://localhost:8501 for the UI or http://localhost:8000/docs for API docs.

How it Works

The system processes claims through four engines:

ML Engine extracts 145 features (text embeddings, behavioral patterns, document flags) and runs CatBoost classification
CV Engine verifies document authenticity using ResNet50 + Error Level Analysis
Graph Engine checks for fraud networks (shared documents, suspicious connections)
LLM Engine aggregates results and generates explanations

All results are combined using semantic aggregation to produce a final verdict.

API Examples

Score a claim:

POST /api/ml/score
{
  "claim_id": "CLM001",
  "narrative": "Car accident on highway",
  "claim_amount": 50000,
  "product": "motor",
  ...
}

Verify a document:

POST /api/documents/verify-pan
(upload PAN card image)

Complete analysis:

POST /api/unified/analyze-complete
(runs all engines)

See /docs folder for detailed API documentation.

Features

Multi-modal fraud detection
Document forgery detection with OCR
Fraud network visualization
AI-generated explanations (technical & customer-friendly)
Real-time processing (<2s per claim with caching)
Hinglish support

Project Structure

ClaimLens_App/
├── api/              # FastAPI backend
│   ├── routes/       # API endpoints
│   └── schemas/      # Pydantic models
├── src/              # Core engines
│   ├── ml_engine/    # CatBoost fraud scoring
│   ├── cv_engine/    # Document verification
│   ├── llm_engine/   # LLM explanations
│   └── fraud_engine/ # Graph analytics
├── frontend/         # Streamlit UI
├── models/           # Trained models
├── docs/             # Documentation
└── tests/            # Test suite

Configuration

Key environment variables (see .env.example):

GROQ_API_KEY=your_key_here
NEO4J_URI=bolt://localhost:7687
NEO4J_PASSWORD=your_password

ENABLE_LLM_EXPLANATIONS=true
ML_THRESHOLD=0.5

Performance

Based on testing with synthetic claims:

ML inference: ~80ms
Document analysis: 1-2s
LLM explanations: 2-3s
End-to-end (cached): <500ms

Rate limited to 100 requests/min by default.

Known Issues

Neo4j sometimes needs a restart on first launch
Large PDFs (>10MB) may timeout
Hinglish embeddings need mixed English-Hindi text to work well
Generic document detector accuracy varies by document quality

See Issues for more.

TODO

Add batch processing endpoint
Support PDF documents
Add more languages (Hindi, Tamil, Telugu)
Improve graph visualizations
Add model retraining pipeline

See full roadmap in CHANGELOG.md.

Contributing

PRs welcome! Please:

Fork the repo
Create a feature branch
Add tests for new features
Submit a PR

See CONTRIBUTING.md for guidelines.

License

MIT - see LICENSE

Contact

Pranay Mathur

GitHub: @pranaya-mathur
Email: pranaya.mathur@yahoo.com

Built with help from Groq, LangChain, Neo4j, and CatBoost.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
api		api
config		config
docs		docs
examples		examples
frontend		frontend
models		models
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
FULL_RESTART.bat		FULL_RESTART.bat
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md
START_STREAMLIT.bat		START_STREAMLIT.bat
diagnose_features.py		diagnose_features.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
restart_server.bat		restart_server.bat
restart_server.sh		restart_server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClaimLens AI

What is this?

Quick Start

How it Works

API Examples

Features

Project Structure

Configuration

Performance

Known Issues

TODO

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClaimLens AI

What is this?

Quick Start

How it Works

API Examples

Features

Project Structure

Configuration

Performance

Known Issues

TODO

Contributing

License

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages