Skip to content

Latest commit

 

History

History
92 lines (59 loc) · 8.47 KB

File metadata and controls

92 lines (59 loc) · 8.47 KB
layout page
title About
description Open-source Python libraries for UK personal lines pricing actuaries. The name comes from a basic actuarial concept — burning cost is the pure loss experience rate, used for experience rating.
permalink /about/

Burning Cost is a set of open-source Python libraries for UK personal lines pricing actuaries. The name comes from a basic actuarial concept: burning cost is the pure loss experience rate — actual losses on a risk as a proportion of the subject premium — used for experience rating. We build tools for the problems where Emblem, Radar, and Akur8 stop — causal inference, proxy discrimination auditing, conformal prediction, model governance.

The name is also a philosophy. Simple, direct, no mystification. That is how we think about tooling.

Built by pricing practitioners who have worked across UK personal lines motor and home books.


What we have built

34 Python libraries covering the full pricing workflow. See the full library index with pip install commands.

UK pricing teams have adopted GBMs (CatBoost is now the dominant choice for most new builds) but many are still taking GLM outputs to production because the GBM outputs are not in a form that rating engines, regulators, or pricing committees can work with. The tools here are about closing that gap — from raw data through to a signed-off rate change with an audit trail. All of it runs on Databricks.

Data & Validation

  • insurance-cv - temporal walk-forward cross-validation with IBNR buffers and sklearn-compatible scorers
  • insurance-datasets - synthetic UK motor data with a known data-generating process, for testing and teaching
  • insurance-synthetic - vine copula synthetic portfolio generation preserving multivariate dependence structure
  • insurance-conformal - distribution-free prediction intervals for insurance GBMs with finite-sample coverage guarantees
  • insurance-monitoring - exposure-weighted PSI/CSI, actual-vs-expected ratios, and Gini drift z-tests for deployed models
  • insurance-governance - structured PRA SS1/23 model validation reports and model risk management, output as HTML and JSON

Model Building

  • insurance-credibility - Buhlmann-Straub credibility in Python with mixed-model equivalence checks
  • bayesian-pricing - hierarchical Bayesian models for thin-data pricing segments using PyMC 5
  • insurance-spatial - BYM2 spatial models for postcode-level territory ratemaking, borrowing strength from neighbours
  • insurance-multilevel - CatBoost combined with REML random effects for high-cardinality categorical groups
  • insurance-trend - loss cost trend analysis with structural break detection and regime-aware projections
  • insurance-gam - interpretable GAM models: EBM tariffs, actuarial NAM, and pairwise interaction networks with exact Shapley values
  • insurance-interactions - automated GLM interaction detection using CANN, NID, and SHAP-based methods
  • insurance-distill - GBM-to-GLM distillation: fits a surrogate GLM to CatBoost predictions and exports multiplicative factor tables for Radar/Emblem rating engines
  • insurance-survival - shared frailty models for recurrent claims, cure models, competing risks, and retention modelling

Interpretation

  • shap-relativities - multiplicative rating factor tables from CatBoost models via SHAP, in the same format as exp(beta) from a GLM
  • insurance-causal - causal inference via double machine learning for deconfounding rating factors; includes price elasticity estimation (CausalForestDML, DR-Learner) via insurance_causal.elasticity

Tail Risk & Distributions

  • insurance-distributional-glm - GAMLSS for Python: model ALL distribution parameters as functions of covariates, seven families, RS algorithm
  • insurance-severity - severity modelling toolkit for UK non-life insurance: full predictive distributions per risk
  • insurance-quantile - quantile and expectile GBMs for tail risk, TVaR, and increased limit factors
  • insurance-distributional - distributional GBMs with Tweedie, Gamma, ZIP, and negative binomial objectives

Commercial

  • insurance-optimise - constrained rate change optimisation with efficient frontier between loss ratio target and movement cap constraints; includes demand modelling (conversion/retention elasticity, price response curves) via insurance_optimise.demand

Compliance & Governance

  • insurance-fairness - proxy discrimination auditing, optimal transport discrimination-free pricing, and FCA Consumer Duty documentation support
  • insurance-causal-policy - synthetic difference-in-differences for causal rate change evaluation and FCA evidence packs
  • insurance-governance - model risk management: ModelCard, ModelInventory, and GovernanceReport generation
  • insurance-deploy - champion/challenger framework with shadow mode, rollback, and full audit trail

Infrastructure

  • burning-cost - the Burning Cost CLI; orchestration for pricing model pipelines

The problem we are solving

UK pricing teams have been building GBMs for years, mostly CatBoost. The models are better than the production GLMs. But many teams are still taking the GLM to production, because the GBM outputs are not in a form that a rating engine, regulator, or pricing committee can work with.

The issue is not technical skill. It is tooling. There is no standard Python library that extracts a multiplicative relativities table from a GBM. There is no standard library that does temporally-correct walk-forward cross-validation with IBNR buffers. There is no standard library that builds a constrained rate optimisation a pricing actuary can challenge. There is no standard library that generates a PRA SS1/23-compliant model validation report.

We wrote those libraries because we needed them. Then we kept going. Everything is built to run on Databricks — that is where UK pricing teams are working, and where our research demonstrates its best practice.


Built for real portfolios

Our benchmarks use synthetic data with known parameters because that is the only way to measure bias — you need ground truth. But the libraries are designed for messy real-world data: fractional exposures from mid-term adjustments, IBNR-contaminated accident years, missing NCD values, vehicle group code changes across ABI revisions, and duplicate records from system migrations. Every API accepts exposure offsets as a first-class parameter. Every model handles missing values through CatBoost's native treatment rather than requiring imputation. If your portfolio does not look like np.random.default_rng(42), that is what these tools are built for.


Get in touch

Start a conversation on GitHub Discussions — that's where we discuss new features, answer questions, and take feedback. For everything else: pricing.frontier@gmail.com.

If you need help getting the libraries into production — adapting examples to your data schema, navigating model risk sign-off, or building a compliant audit trail — see Work with Us.