Skip to content

anish2105/Insurance_premium

Repository files navigation

Insurance Premium Prediction

This project predicts insurance premiums based on a user's profile using a machine learning pipeline. The project includes data ingestion, preprocessing, model training, and prediction serving through a Flask API. The application also logs predictions and experiment details using MLflow.


Project Structure

├── application.py              # Flask server for prediction
├── requirements.txt            # Python dependencies
├── Dockerfile                  # For containerizing the app
├── model                       # Stores base dataset and experimental/trained models
├── src/                        # Source code
│   ├── components/             # Contains data ingestion, transformation and training modules
│   │   ├── data_ingestion.py   # Loads data and applies initial feature engineering
│   │   ├── data_transformation.py # Handles preprocessing and feature transformation
│   │   └── model_training.py   # Trains and evaluates regression models
│   └── pipeline/               # Pipeline logic for inference
│   |   └── predict_pipeline.py # Handles model loading and prediction
│   └── exception.py            # Custom exception handling module
│   └── logger.py               # Handles logging across the application
│   └── utils.py                # Common utility functions
└── artifacts/                  # Stores trained models and preprocessor objects

How to Run

1. Install Requirements

pip install -r requirements.txt

2. Run the Flask Server

python application.py

Server will start at http://127.0.0.1:5000/


Docker Support

To build and run the application inside a Docker container:

1. Build Docker Image

docker build -t insurance_premium .

2. Run Docker Container

docker run -p 8000:5000 insurance_premium

Then access the server at http://localhost:8000/


Pipeline Overview

pipeline/predict_pipeline.py

  • Loads trained model.pkl and proprocessor.pkl from the artifacts/ folder.
  • Transforms incoming data and performs predictions.
  • Logs inputs, model parameters, and output using MLflow.

components/data_ingestion.py

  • Reads and processes the raw dataset.
  • Handles missing dates, applies log transform to income, removes duplicates, and drops sparse columns.
  • Splits data into train/test sets and stores them in artifacts/.

components/data_transformation.py

  • Applies imputation to numeric and categorical columns.
  • Label encodes categorical features.
  • Uses ColumnTransformer and Pipeline for structured preprocessing.
  • Saves the preprocessor object as proprocessor.pkl.

components/model_training.py

  • Trains multiple regression models (Random Forest, XGBoost, LightGBM, etc.).
  • Evaluates models using R2 score.
  • Saves the best model as model.pkl.
  • Logs training details using MLflow.

MLflow Integration

All inference and training metadata are tracked using MLflow experiments: This helps monitor performance across different runs. Screenshot 2025-04-15 225719


Screenshots

Screenshot 2025-04-16 234917


About

This project predicts insurance premiums based on a user's profile using a machine learning pipeline. The project includes data ingestion, preprocessing, model training, and prediction serving through a Flask API. The application also logs predictions and experiment details using MLflow.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages