Skip to content

ash-datapro/churn-pred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Post-Sales Customer Churn (R + Shiny + Plumber)

App demo

A production-style, end-to-end churn analytics project:

  • Frontend: Dark-themed Shiny dashboard with filters, KPIs, insights, and a live prediction form.
  • Backend: plumber REST API serving a tidymodels gradient boosting model.
  • Data: PostgreSQL with reproducible load/feature engineering scripts.
  • Reports: Confusion matrix, ROC curve, and a markdown report for the trained model.

Table of Contents


Architecture

PostgreSQL  ←  data/load-data.R
      ↑
      │                   ┌──────────────────────────────┐
      │                   │        Shiny Frontend        │
      │                   │  KPIs, filters, insights,    │
      │                   │  predictions (dark theme)    │
      │                   └──────────────┬───────────────┘
      │                                  │ HTTP (JSON)
      │                           /predict (plumber)
      │                                  │
      └──────────  backend/api/train-model.R  ──►  model.rds
  • Modeling: tidymodels workflow with xgboost tuning, F1-optimized threshold, saved as model.rds.
  • API: plumber exposes /health and /predict, loading model.rds.
  • UI: Shiny dashboard (dark, compact), filterable, with insights plots and a guided prediction form.

Features

  • Interactive dashboard: KPIs, churn mix, tenure/charges visuals, contract vs churn, “quick insights” chips, and a downloadable filtered CSV.
  • Live predictions: Enter customer attributes → get class & probability from the API.
  • Reproducible training: Train script generates reports (reports/) and the serialized model.
  • Database-backed: Load Telco churn CSV into Postgres and query from the app.

Screenshots

  • Dashboard (dark mode), filters & KPIs Demo

Getting Started

Prerequisites

  • R ≥ 4.3 with packages listed in frontend/R/requirements.txt and backend/api/requirements.txt (if you maintain one there).
  • PostgreSQL ≥ 13
  • Optional: Docker / docker-compose (a docker-compose.yml is included but local run works fine).

R style in this repo uses = for assignment.

Repository Layout

backend/
  api/
    main.R            # plumber API (serves /health, /predict)
    train-model.R     # tidymodels + xgboost training
    model.rds         # trained model artifact
  reports/            # confusion matrix, ROC, report.md

data/
  load-data.R         # create schema & load Telco CSV into Postgres
  Telco-Customer-Churn.csv

frontend/
  app.R               # Shiny app entry
  R/
    about.R           # About tab (cards)
    insights.R        # Correlation / feature-importance
    metrics.R         # KPI value boxes
    plots.R           # Plot helpers + dark theming
    prediction.R      # Prediction form + API call
    retrain.R         # (optional) hooks to retrain
    sidebar.R         # Filters module (selectize + slider)
    utils.R           # helpers

postgres/
  create_schema.sql   # optional schema helper

media/
  demo.gif

Environment

Create a local .Renviron (or export in shell) with your DB settings:

export DB_USER="user"
export DB_PASSWORD="password"
export DB_HOST="localhost"   # or 'postgres' if running under docker-compose
export DB_PORT="5432"
export DB_NAME="churn_db"

In RStudio, you can also use Tools → Global Options → Environment or put these in ~/.Renviron.

1) Load Data

# from repo root or the data/ folder
source("data/load-data.R")
# This reads Telco-Customer-Churn.csv and populates the 'customers' table.

2) Train Model

# from backend/api/
setwd("backend/api")
source("train-model.R")
# Outputs:
# - backend/api/model.rds
# - backend/reports/{confusion_matrix.png, roc_curve.png, classification_report.txt, model_report.md}

3) Run the API

# from backend/api/
setwd("backend/api")
library(plumber)
pr = plumb("main.R")
pr$run(host = "0.0.0.0", port = 8000)
# Swagger UI: http://127.0.0.1:8000/__docs__/

Sanity test:

curl -X POST "http://127.0.0.1:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "tenure": 15,
    "monthly_charges": 70,
    "contract": "Month-to-month",
    "gender": "Female",
    "senior_citizen": 0,
    "partner": "No",
    "dependents": "No",
    "internet_service": "Fiber optic",
    "paperless_billing": "Yes",
    "payment_method": "Electronic check",
    "services_list": ["Multiple Lines","Streaming TV"]
  }'

4) Run the Shiny App

# from frontend/
setwd("frontend")
shiny::runApp("app.R", launch.browser = TRUE)

API

Base: http://127.0.0.1:8000

GET /health

  • 200{ "status": "ok" }

POST /predict

Body (JSON):

{
  "tenure": 12,
  "monthly_charges": 70,
  "contract": "Month-to-month",
  "gender": "Female",
  "senior_citizen": 0,
  "partner": "Yes",
  "dependents": "No",
  "internet_service": "DSL",
  "paperless_billing": "Yes",
  "payment_method": "Electronic check",
  "services_list": ["Streaming TV", "Online Security"]
}

Response (JSON):

{
  "prediction": "No",
  "probability": 0.1673
}

The API auto-derives engineered fields (e.g., total/avg charges, tenure buckets, service flags) to match the training pipeline.


Usage Notes & Tips

  • Filters: Multi-select dropdowns accept empty selection = “All”. Tenure range slider updates the dataset reactively.
  • Dark theme: Custom CSS fixes for selectize ensure selected chips remain visible.
  • Model threshold: Training step chooses a probability threshold that maximizes F1 on the test split; the API uses 0.50 by default (you can expose the tuned threshold if desired).

Troubleshooting

  • “Could not resolve host: backend” in Shiny prediction Use the full URL http://127.0.0.1:8000/predict in prediction_server(..., api_url = ...) when running locally. Use http://backend:8000/predict only inside docker-compose networks.

  • Swagger “Invalid JSON body” Click Try it out → paste a valid JSON object in the request body (don’t post empty).

  • “Could not derive probability from model prediction” Ensure the trained workflow is saved as model.rds and predictions support type="prob". This repo uses tidymodels (.pred_1) and includes a robust get_positive_proba() in main.R.

  • Connection refused on port 5432 Make sure Postgres is running and credentials/host (localhost vs postgres) match how you launched the DB.


Roadmap

  • Add segmented lift/ICE plots for top features.
  • Expose tuned threshold from training into API response.
  • Optional authentication for the API.
  • Dockerized one-click stack (compose: db + API + Shiny).

Contributing

PRs and issues are welcome. Keep styles consistent with the repo (e.g., R assignments with =). If you add a feature in the UI, please include a short GIF and update this README.


License

See LICENSE in the repository root.


Built With

  • R, Shiny, bslib (darkly)
  • plumber
  • tidymodels (recipes, workflows, tune, yardstick), xgboost
  • dplyr, ggplot2, plotly, DT
  • PostgreSQL