Skip to content

SharjeelJalil/customer-pricing-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Pricing Intelligence System

A two-stage system that infers customer personas from trip geography using OpenStreetMap data, then predicts willingness-to-pay using RFM lifecycle scores, bidding behavior, and persona segments. Outputs per-customer pricing multipliers (1.00x to 1.15x) for differentiated fare pricing.

Industry: Ride-Hailing and Logistics (Pakistan)
Role: Analytics lead for customer intelligence | Designed both the persona inference and pricing propensity pipelines
Tools: Python, OSMnx, GeoPandas, Scikit-learn, Statsmodels
Scale: Scored across the active rider base in Karachi


System Architecture

Problem

The platform used flat pricing: every customer in the same zone paid the same fare for the same trip. But customers have fundamentally different willingness-to-pay. A working professional commuting daily from DHA doesn't care about a 10% fare increase. A student going to university twice a week will switch to public transport.

Flat pricing leaves money on the table with high-value customers and simultaneously risks losing price-sensitive ones. The platform needed a way to identify who would tolerate premium pricing, so differentiated multipliers could be applied without driving churn in sensitive segments.

The challenge was that the platform had no direct income data, no survey data on price sensitivity, and no customer demographic profiles. Everything had to be inferred from trip behavior.

Approach

Stage 1: Persona Inference from Trip Geography

The first insight was that where someone travels reveals who they are. A customer whose trips consistently end near universities is probably a student. One whose trips end near office buildings is probably a working professional.

The system downloads all building footprints from OpenStreetMap for Karachi, filters to commercially relevant categories, and for each customer's trip history finds the nearest POI to their pickup/dropoff coordinates.

Building type to persona mapping:

OSM Building Type Inferred Persona
school, college, university Student
commercial, industrial, office Working Professional
hospital Health Professional
government Government Employee
mall Self Employed / Shop Owner
stadium, sports_centre Athlete

Each customer's primary persona is determined by their most frequently visited POI type. This replaces expensive survey-based segmentation with automated, trip-behavior-driven classification.

Stage 2: Willingness-to-Pay Prediction

The persona segment becomes one input to a broader propensity model. The full feature set combines:

RFM lifecycle scores: Recency (days since last trip), Frequency (trip count), Monetary (total spend) — each scored 1-4 by quartile. Customers are classified into lifecycle segments: Best Customers (444), Loyal, Need Attention, About to Sleep, Lost.

Bidding behavior: The platform had a bid-based pricing feature. The ratio of high-bid rides to low-bid rides directly reveals price sensitivity. Customers who consistently bid above suggested fare are signaling willingness-to-pay.

POI persona: Students are excluded from the willing-to-pay label regardless of other features, because they have high frequency but low monetary tolerance.

Phone price proxy: The second-hand market value of the customer's phone model (captured from the app) serves as an income proxy. In Pakistan, where direct income data is unavailable, phone value correlates with spending power.

Business-Rule Label

The willingness-to-pay label is defined by business rules combining multiple signals:

  • Frequency score >= 3 AND Monetary score == 4
  • Average trips in last 3 months > 8
  • Average trip distance > 5 km
  • Not a Student persona
  • Accepts more high-bid rides than low-bid rides

This rule-based label was then used as the target for an MLP neural network (50, 25 hidden layers) that learns the full pattern from all available features, enabling scoring of customers who don't have complete bidding history.

Pricing Multiplier Output

The model outputs a probability of willingness-to-pay, mapped to pricing bands:

Propensity Band Multiplier Application
Very Low (0-20%) 1.00x Base fare, protect retention
Low (20-40%) 1.05x Slight premium
Medium (40-60%) 1.08x Moderate premium
High (60-80%) 1.10x Comfortable premium
Very High (80-100%) 1.15x Maximum premium

The multiplier feeds directly into the pricing engine: a customer scored at 1.15x sees a fare 15% higher than base for the same trip.

What Was Real vs What Is Reconstructed

Layer Status
POI persona inference logic (OSM download, nearest-POI matching, persona mapping) Real — from original code
RFM segmentation (quartile scoring, lifecycle segments) Real — from original code
Willingness-to-pay labeling (business rules combining RFM + bidding + persona) Real — from original code
MLP neural network training and probability band scoring Real — from original code
Pricing multiplier mapping (1.00x to 1.15x) Real — from original code
Phone price as income proxy Real — from original code
Modular code organization Reconstructed — originals were separate scripts
Sample data, tests, diagrams Reconstructed — added for portfolio

Limitations and What I Would Test Next

Known limitations:

  • OSM building data coverage varies by area. Dense commercial areas have good coverage; residential areas are sparse. Customers who primarily travel to residential areas get no useful persona signal.
  • The persona mapping is coarse. A customer who goes to both offices and hospitals is assigned whichever they visit more, losing the multi-persona signal.
  • The willingness-to-pay label is rule-based, not measured from actual price experiments. The rules encode domain assumptions about what constitutes price tolerance.
  • The phone price proxy is noisy — some high-income customers use older phones, some low-income customers buy on installment plans.
  • The pricing multipliers (1.00x to 1.15x) were set by business judgment, not optimized through demand elasticity estimation.

What I would test next:

  • Price experiment: randomly assign different multipliers and measure demand elasticity per segment to calibrate optimal multipliers.
  • Multi-persona features: instead of one primary persona, use a vector of persona weights (e.g. 60% Working Professional, 30% Student, 10% Athlete).
  • Dynamic multipliers: adjust multipliers by time of day and demand level, not just customer segment.
  • Trip-level scoring instead of customer-level: a working professional commuting during rush hour may tolerate more than the same person taking a weekend leisure trip.

Key Learnings

  1. Geography reveals demographics. In markets without direct income or occupation data, trip endpoints mapped to OSM building types provide a surprisingly useful proxy for customer segmentation. This approach costs nothing and scales to any city with OSM coverage.

  2. Bidding behavior is the strongest signal. Among all features, the ratio of high-bid to low-bid rides was the most predictive of willingness-to-pay. When customers directly reveal their price sensitivity through their actions, that dominates any inferred signal.

  3. Students must be treated differently. Students have high frequency (multiple daily trips to campus) and decent monetary value, but they are extremely price-sensitive. Without the persona exclusion, the model would have flagged students as premium customers and driven churn in one of the platform's largest segments.

  4. Phone price works as an income proxy in emerging markets. The second-hand value of a customer's device model correlates with spending capacity. This is a feature that wouldn't work in developed markets (where most people have recent phones) but is effective in markets with wide device value ranges.

Repository Structure

README.md
src/
  poi_persona_model.py          # Original OSM-based persona inference (preserved)
  propensity_to_pay_model.py    # Original willingness-to-pay pipeline (preserved)
  persona_inference.py           # Modular persona inference functions
  pricing_propensity.py          # Modular RFM, labeling, and multiplier functions
  __init__.py
tests/
  test_pricing.py                # Persona mapping, RFM, and multiplier tests
sample_data/
  customer_features_sample.csv   # Synthetic customer features (400 customers)
  pois_sample.csv                # Synthetic POI data (80 points)
notebooks/
  methodology_walkthrough.ipynb
config/
  model_config.yaml
diagrams/
  architecture.png
requirements.txt
.gitignore

How to Run

pip install -r requirements.txt
pytest tests/ -v

Note: The POI model requires OSMnx and a network connection to download OpenStreetMap data. The pricing propensity model can run independently on the provided sample data.

About

Customer pricing intelligence system combining geospatial persona inference (OpenStreetMap POI matching) with willingness-to-pay prediction. Outputs per-customer pricing multipliers from RFM lifecycle scores, bidding behavior, and trip-inferred demographics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors