Skip to content

AishwaryaGade02/User-Expanse-Forecasting-using-XGBoost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“ˆ User Expense Forecasting with PySpark and XGBoost

This project demonstrates an end-to-end pipeline for forecasting user expenses using time series modeling. It integrates PySpark for scalable preprocessing and feature engineering, and leverages XGBoost for modeling transaction-level financial data.


🧠 Problem Statement

Predict future daily user expense patterns using transaction data, card and user metadata, and derived temporal and lag-based features.


🧰 Tech Stack

  • Language: Python
  • Distributed Processing: Apache Spark (PySpark)
  • Data Source: Kaggle Fraud Dataset via kagglehub
  • Modeling: XGBoost Regressor
  • Visualization: matplotlib
  • Libraries: pandas, scikit-learn, numpy, seaborn

πŸ”„ Workflow

  1. Data Acquisition

    • Download Kaggle dataset using kagglehub
    • Load and merge transactions, user, and card data
  2. Data Preprocessing (PySpark)

    • Handle missing values
    • Remove fraudulent transactions
    • Clean currency fields ($)
    • Derive date-based and cyclic features (sin/cos)
    • Compute lag-based predictors (lag_1, lag_7, etc.)
  3. Feature Engineering

    • Group by date to get daily total spending
    • Add year, day, and weekday indicators
    • Normalize features using StandardScaler
  4. Modeling

    • Use TimeSeriesSplit with a gap for realistic validation
    • Train XGBoost regressor across hyperparameter grid
    • Select best model using average RMSE
  5. Evaluation

    • Plot predicted vs actual values for both train and test sets
    • Report RMSE for model performance

πŸ“Š Visual Outputs

  • Time Series Plot of daily spending
  • Actual vs Predicted Plots for train and test sets
  • Lag and Seasonality Features visualized

βœ… Model Highlights

  • Robust time-aware validation using TimeSeriesSplit
  • Realistic financial behavior modeled using lagged features and seasonality
  • Final model trained with optimal max_depth, learning_rate, and subsample

About

Project that looks at past spending patterns to predict future expenses, helping users plan their budgets more effectively and make smarter financial decisions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages