burning-cost.github.io/getting-started.md at master · burning-cost/burning-cost.github.io

layout	page
title	Getting Started
description	Where to begin with Burning Cost depends on where you're coming from. Three entry paths: pricing actuary, data scientist, and technical team lead.
permalink	/getting-started/

Most people coming to Burning Cost have one of three starting points. Pick the path closest to yours — each one recommends three to five libraries that solve problems you will encounter first, in an order that makes sense.

Want to see all our libraries working together? The complete motor pricing pipeline walks through a full production workflow — data with a known DGP, GBM training, SHAP factor extraction, GLM distillation, conformal prediction intervals, fairness audit, drift monitoring, and governance pack — in a single runnable script.

Quick install

The six libraries most teams reach for first. Copy the command that matches your setup.

<div class="gs-install-card">
  <div class="gs-install-card-name"><a href="https://github.com/burning-cost/shap-relativities" target="_blank">shap-relativities</a></div>
  <div class="gs-install-cmds">
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>pip install shap-relativities</div>
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>uv add shap-relativities</div>
  </div>
</div>

<div class="gs-install-card">
  <div class="gs-install-card-name"><a href="https://github.com/burning-cost/insurance-cv" target="_blank">insurance-cv</a></div>
  <div class="gs-install-cmds">
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>pip install insurance-cv</div>
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>uv add insurance-cv</div>
  </div>
</div>

<div class="gs-install-card">
  <div class="gs-install-card-name"><a href="https://github.com/burning-cost/insurance-fairness" target="_blank">insurance-fairness</a></div>
  <div class="gs-install-cmds">
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>pip install insurance-fairness</div>
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>uv add insurance-fairness</div>
  </div>
</div>

<div class="gs-install-card">
  <div class="gs-install-card-name"><a href="https://github.com/burning-cost/insurance-monitoring" target="_blank">insurance-monitoring</a></div>
  <div class="gs-install-cmds">
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>pip install insurance-monitoring</div>
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>uv add insurance-monitoring</div>
  </div>
</div>

<div class="gs-install-card">
  <div class="gs-install-card-name"><a href="https://github.com/burning-cost/insurance-distributional" target="_blank">insurance-distributional</a></div>
  <div class="gs-install-cmds">
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>pip install insurance-distributional</div>
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>uv add insurance-distributional</div>
  </div>
</div>

<div class="gs-install-card">
  <div class="gs-install-card-name"><a href="https://github.com/burning-cost/insurance-glm-tools" target="_blank">insurance-glm-tools</a></div>
  <div class="gs-install-cmds">
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>pip install insurance-glm-tools</div>
    <div class="gs-install-cmd"><span class="cmd-prefix">$ </span>uv add insurance-glm-tools</div>
  </div>
</div>

Your first 5 minutes

A minimal working example using shap-relativities. Fits a CatBoost model on synthetic motor data and extracts a multiplicative factor table — the same output format as exp(β) from a GLM. No data download required.

import numpy as np from catboost import CatBoostRegressor from shap_relativities import SHAPRelativities

# Synthetic motor portfolio: vehicle_age, driver_age, annual_mileage rng = np.random.default_rng(42) X = np.column_stack([ rng.integers(0, 15, 2000), # vehicle_age rng.integers(18, 80, 2000), # driver_age rng.integers(5000, 30000, 2000), # annual_mileage ]) y = 0.08 + 0.005 * X[:, 0] - 0.001 * X[:, 1] + rng.exponential(0.02, 2000)

model = CatBoostRegressor(iterations=200, verbose=0) model.fit(X, y)

sr = SHAPRelativities(model, feature_names=["vehicle_age", "driver_age", "annual_mileage"]) factors = sr.fit_transform(X)

print(factors["vehicle_age"].head()) # level relativity ci_lower ci_upper # 0 1.000 0.981 1.019 # 1 1.043 1.028 1.058 print(f"Reconstruction R² = {sr.reconstruction_r2:.4f}")

The factors dict maps each feature name to a DataFrame with one row per level. relativity is the multiplicative factor — the same structure as a GLM output from Emblem or Radar. reconstruction_r2 tells you how much of the model's variance the factor table explains; above 0.95 is production-usable.

Path 1

Pricing actuary moving to Python

You know the techniques: GLMs, factor tables, A/E monitoring, credibility. What you need are Python equivalents that produce output in the same formats you already use, and that handle the insurance-specific details (IBNR buffers, exposure weighting, renewal cohort structure) that generic ML libraries ignore.

shap-relativities Turns a CatBoost GBM into a multiplicative factor table — same output format as exp(β) from Emblem or Radar, with confidence intervals and exposure weighting. This is where most actuaries start.
insurance-cv Walk-forward cross-validation with IBNR buffers. Stops future experience from leaking into training folds. Produces Poisson and Gamma deviance scores, not just RMSE.
insurance-monitoring Three-layer post-deployment monitoring: exposure-weighted PSI/CSI, segmented A/E ratios with IBNR adjustment, and a Gini z-test that tells you whether to recalibrate or refit.
insurance-governance PRA SS1/23-aligned model validation reports. Bootstrap Gini CI, Poisson A/E CI, double-lift charts, renewal cohort test. HTML/JSON output for model risk committees.
insurance-credibility Bühlmann-Straub credibility in Python. Practical for capping thin segments, stabilising NCD factors, and blending a new model with an incumbent rate.

Your next 3 notebooks

shap-relativities — CatBoost relativities vs GLM vs true DGP on synthetic UK motor data. The factor table output format matches Emblem and Radar imports.
insurance-cv — Random CV vs temporal CV vs true out-of-time holdout. Shows how much your current CV approach is flattering your model.
insurance-distill — GBM-to-GLM distillation, surrogate factor tables for rating engines. The notebook to read before your next "how do we get it into Radar?" conversation.

Path 2

Data scientist joining an insurance pricing team

You have the ML fundamentals. What you are missing is the insurance context: why you cannot use k-fold CV, what IBNR means for your validation, how proxy discrimination is tested under FCA guidance, and why your coefficient estimates might be confounded. These libraries encode that context.

insurance-datasets Synthetic UK motor portfolio data with a known data-generating process. Use it to validate your methods before touching real data.
insurance-cv Temporally correct cross-validation. Random folds are wrong for insurance data. This explains why and fixes it.
insurance-fairness Proxy discrimination auditing aligned with FCA Consumer Duty and Equality Act 2010. Quantifies indirect discrimination risk from rating variables correlated with protected characteristics.
insurance-causal Double machine learning for deconfounding rating factors. If your rating variables correlate with distribution channel or policyholder behaviour, standard GLM coefficients are biased.
shap-relativities Produces outputs that the actuarial side of the team will recognise. Bridge between a model that lives in Python and a committee that wants a factor table.

Your next 3 notebooks

insurance-conformal — Tweedie conformal intervals vs bootstrap on 50k motor policies. Distribution-free coverage guarantees that don't require a correctly specified model.
insurance-causal — DML causal effect vs naive Poisson GLM on confounded data. Shows how much bias your current elasticity estimates contain.
insurance-fairness — Proxy discrimination audit aligned to FCA Consumer Duty. Runs the Lindholm correction and produces the evidence pack the FCA expects to see.

Path 3

Technical pricing team lead evaluating what to adopt

You need to know what is production-ready, what the regulatory exposure is, and how to move models through sign-off without creating a maintenance burden. These libraries have actuarial tests, clear scope, and outputs a pricing committee or auditor can follow.

insurance-deploy Champion/challenger framework with shadow mode, SHA-256 deterministic routing, SQLite quote log, and a bootstrap likelihood ratio test to declare a winner. ICOBS 6B.2 audit trail included.
insurance-governance PRA SS1/23 model validation reports in HTML and JSON. Covers the tests a model risk function will ask for, in a format they can file.
insurance-monitoring Post-deployment drift monitoring with a clear decision rule: recalibrate vs. refit. Reduces the judgement calls that stall model review cycles.
insurance-fairness Proxy discrimination audit that produces an evidence pack for Consumer Duty and FCA supervisory review. Quantifies risk before sign-off, not after a regulatory question.
insurance-conformal Distribution-free prediction intervals with finite-sample coverage guarantees. Relevant wherever a model needs a principled uncertainty bound for Solvency II or internal capital.

Your next 3 notebooks

insurance-governance — PRA SS1/23 validation workflow, model risk tiering, HTML report output. The format a model risk committee or PRA supervisor can act on.
insurance-monitoring — Exposure-weighted PSI/CSI, segmented A/E ratios, Gini drift z-test. Gives you a defensible decision rule for recalibrate vs refit.
insurance-deploy — Shadow mode, quote logging, bootstrap likelihood ratio test, ICOBS 6B.2 audit trail. End-to-end champion/challenger with a clear winner declaration method.

Worked Examples

The burning-cost-examples repo contains 47 Databricks notebooks covering the full ecosystem. (These are in the burning-cost-examples GitHub repo — the Notebooks page on this site hosts a curated subset.) Each one installs its own dependencies, generates synthetic data, fits models, and benchmarks against a standard actuarial baseline. Browse the notebooks/ directory or pick from the table below — sorted by library name.

Library	What it shows
bayesian-pricing	Hierarchical Bayesian vs raw experience on thin segments	view
insurance-causal	DML causal effect vs naive Poisson GLM on confounded data	view
insurance-causal-policy	SDID rate change evaluation with event study and HonestDiD	view
insurance-conformal	Tweedie conformal intervals vs bootstrap on 50k motor	view
insurance-conformal-ts	ACI/SPCI vs split conformal on non-exchangeable time series	view
insurance-covariate-shift	Importance-weighted evaluation after distribution shift	view
insurance-credibility	Bühlmann-Straub credibility vs raw experience on 30 segments	view
insurance-cv	Random CV vs temporal CV vs true OOT holdout	view
insurance-datasets	Synthetic UK motor portfolio with known DGP, parameter recovery	view
insurance-deploy	Shadow mode, quote logging, bootstrap LR test, ENBP audit	view
insurance-dispersion	DGLM vs constant-phi Gamma GLM, per-risk volatility scoring	view
insurance-distributional	Distributional GBM (TweedieGBM) vs standard point predictions	view
insurance-distributional-glm	GAMLSS vs standard Gamma GLM on heterogeneous-variance data	view
insurance-distill	GBM-to-GLM distillation, surrogate factor tables for rating engines	view
insurance-dynamics	GAS Poisson filter vs static GLM, BOCPD changepoint detection	view
insurance-fairness	Proxy discrimination audit, bias metrics, Lindholm correction	view
insurance-frequency-severity	Sarmanov copula joint freq-sev vs independence assumption	view
insurance-gam	EBM/ANAM vs Poisson GLM with planted non-linear effects	view
insurance-glm-tools	Nested GLM embeddings for 500 vehicle makes vs dummy-coded GLM	view
insurance-governance	PRA SS1/23 validation workflow, MRM risk tiering, HTML report	view
insurance-interactions	CANN/NID interaction detection vs exhaustive pairwise GLM search	view
insurance-monitoring	Exposure-weighted PSI/CSI, A/E ratios, Gini drift z-test	view
insurance-multilevel	CatBoost + REML random effects vs one-hot encoding	view
insurance-optimise	SLSQP constrained optimisation, efficient frontier, FCA audit	view
insurance-quantile	CatBoost quantile regression vs lognormal, TVaR, ILF curves	view
insurance-severity	Spliced Lognormal-GPD + DRN vs Gamma GLM, tail quantiles	view
insurance-spatial	BYM2 territory factors vs postcode grouping, Moran's I	view
insurance-survival	Cure models vs KM/Cox PH, CLV bias by cure band	view
insurance-synthetic	Vine copula generation, fidelity report, TSTR benchmarks	view
insurance-telematics	HMM latent-state features vs raw trip aggregates	view
insurance-thin-data	GLMTransfer + TabPFN vs raw GLM on thin segments	view
insurance-trend	Automated trend selection vs naive OLS, structural breaks	view
insurance-whittaker	W-H smoothing with REML lambda vs manual step smoothing	view
shap-relativities	CatBoost relativities vs GLM vs true DGP on synthetic motor	view

How to run: Import any notebook into Databricks via databricks workspace import notebooks/<name>.py /Workspace/Users/you@example.com/<name> --language PYTHON --overwrite, or drag-and-drop the .py file in the Databricks UI. Notebooks use %pip install cells and run on Databricks Free Edition serverless compute — no cluster setup needed.

Each notebook generates synthetic data inline — no external files needed. Install the relevant library and run.

Ready to go deeper?

Browse all libraries Which library do I need? Read the articles View on GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick install

Your first 5 minutes

Pricing actuary moving to Python

Data scientist joining an insurance pricing team

Technical pricing team lead evaluating what to adopt

Worked Examples

Ready to go deeper?

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Quick install

Your first 5 minutes

Pricing actuary moving to Python

Data scientist joining an insurance pricing team

Technical pricing team lead evaluating what to adopt

Worked Examples

Ready to go deeper?