burning-cost/insert_factcheck_kb.py at main · burningcost/burning-cost · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
#!/usr/bin/env python3
"""Insert fact-check KB entries for 2026-03-07 batch 2. Run from project root."""
import sqlite3
import json
from datetime import datetime, timezone

DB_PATH = "/home/ralph/burning-cost/knowledge/burning_cost.db"

now = datetime.now(timezone.utc).isoformat()

entries = [
    {
        "title": "Fact-Check: 'From CatBoost to Radar' — CRITICAL: Radar CAN execute CatBoost models via Python",
        "category": "editorial",
        "content": """Post: 2026-03-07-from-catboost-to-radar-gbm-to-glm-distillation.md

CRITICAL — requires correction before publication.

The post states: "You cannot load a fitted CatBoost model into Radar."

This is FALSE as an absolute statement. Evidence:

1. WTW announced real-time Python deployment in Radar on 17 September 2024. Radar "integrates with Python so you can deploy code, AI and machine learning from more than 8,000 Python libraries." A fitted CatBoost model CAN be called from within Radar's rating process via Python.

2. Radar has supported PMML import since Radar 2.0 (2016). CatBoost can export to ONNX; Radar supports ONNX. Additional pathways exist.

3. Radar has its own native GBM fitter (Radar 4.2, December 2018).

The commercial value of distillation to factor tables remains real: factor tables are interpretable, auditable, and the form a pricing actuary can review. But the absolute premise is wrong.

Required changes:
- Replace "You cannot load a fitted CatBoost model into Radar" with accurate framing: there is no open-source tool that automatically converts a fitted CatBoost model into the multiplicative GLM factor tables a rating engine uses natively.
- Add that Radar's Python integration (September 2024) allows calling a CatBoost model at runtime, but this bypasses the factor table structure and provides no actuary-readable relativities.
- Add a fourth workaround in the "standard workarounds" list: Radar's Python integration as a runtime option that does not produce factor tables.

This is the same class of error as a prior batch. The September 2024 Radar announcement is public.

Source: https://www.wtwco.com/en-us/news/2024/09/wtw-adds-real-time-python-deployment-and-governance-to-radar-its-market-leading-insurance-rating""",
        "tags": json.dumps(["radar", "wtw", "catboost", "model-import", "fact-check", "critical"]),
        "source_agent": "investigator",
    },
    {
        "title": "Fact-Check: 'From CatBoost to Radar' — Akur8-WTW August 2023 integration claim unverified",
        "category": "editorial",
        "content": """Post: 2026-03-07-from-catboost-to-radar-gbm-to-glm-distillation.md

The post states: "[Akur8] has a WTW integration announced in August 2023."

NO EVIDENCE FOUND. Web searches found:
- WTW's August 2023 partnership announcement was with Manitoba Public Insurance (MPI) using Radar and Emblem — not Akur8.
- Akur8's 2023 partnership announcements include Guidewire (expanded with investment) and hyperexponential. No WTW partnership found.
- Akur8 and WTW position their products as competing platforms. A deep integration announcement would be notable and should have public press coverage.

Required change: Remove the sentence about the August 2023 Akur8-WTW integration, or verify it against a specific press release URL before publication. The surrounding claim — that Akur8 does not accept externally fitted models — is plausible and can stand independently.""",
        "tags": json.dumps(["akur8", "wtw", "fact-check", "unverified"]),
        "source_agent": "investigator",
    },
    {
        "title": "Fact-Check: 'Your NCD Threshold Advice Is Wrong' — Credibility Transformer paper ID incorrect",
        "category": "editorial",
        "content": """Post: 2026-03-07-experience-rating-ncd-bonus-malus.md

The post says: "Wüthrich's Credibility Transformer (SSRN 4726206, June 2025) embeds Bühlmann-Straub credibility inside a Transformer attention mechanism."

SSRN 4726206 is "Experience Rating in Insurance" by Wüthrich — lecture notes on experience rating broadly, not the Credibility Transformer.

The Credibility Transformer is a separate paper: arXiv:2409.16653 by Ronald Richman, Salvatore Scognamiglio, and Mario V. Wüthrich (2024), published in the European Actuarial Journal (2025). The CLS token / attention-weighted averaging mechanism embedding Bühlmann-Straub credibility is in that paper.

Required change in the v0.2 roadmap section:
Replace: "Wüthrich's Credibility Transformer (SSRN 4726206, June 2025)"
With: "Richman, Scognamiglio and Wüthrich's Credibility Transformer (arXiv:2409.16653, 2024)"

Sources:
- SSRN 4726206: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4726206
- Credibility Transformer: https://arxiv.org/html/2409.16653v1""",
        "tags": json.dumps(["ncd", "credibility", "wuthrich", "paper-id", "fact-check", "moderate"]),
        "source_agent": "investigator",
    },
    {
        "title": "Fact-Check: 'Your Pricing Model is Drifting' — Civil Liability Act named incorrectly",
        "category": "editorial",
        "content": """Post: 2026-03-07-your-pricing-model-is-drifting.md

The post refers to "the Civil Liability Act 2021 came into force on 31 May 2021."

The date (31 May 2021) is correct. The Act name is wrong. The legislation is the Civil Liability Act 2018. The whiplash reform programme implementing it (The Whiplash Injury Regulations 2021) came into force on 31 May 2021.

Required change: "the Civil Liability Act 2018's whiplash reforms came into force on 31 May 2021" or "the whiplash reform programme (implementing the Civil Liability Act 2018) came into force on 31 May 2021."

Separately: arXiv 2510.04556 is cited as "(December 2025)". The arXiv ID prefix 2510 indicates October 2025 submission (v1: 6 October 2025). December 2025 is the v2 revision date.

Change to: "(arXiv 2510.04556, October 2025)" or "(arXiv 2510.04556, 2025)".

Source: https://www.dacbeachcroft.com/en/What-we-think/Whiplash-Reforms-To-Be-Implemented-With-Effect-From-31-May-2021""",
        "tags": json.dumps(["civil-liability-act", "whiplash", "arxiv-date", "fact-check", "minor"]),
        "source_agent": "investigator",
    },
    {
        "title": "Fact-Check: 'Demand Modelling for Insurance Pricing' — MS18/1 '85% more' figure unverified for motor",
        "category": "editorial",
        "content": """Post: 2026-03-07-demand-modelling-for-insurance-pricing.md

The post says: "The FCA's MS18/1 market study found that motor insurance customers with 5+ years' tenure were paying on average around 85% more than equivalent new customers in 2018."

The MS18/1 final report documented material price-walking. The specific "85% more" figure for motor insurance is widely cited in press coverage but its precise sourcing in the MS18/1 document for motor specifically — as distinct from home insurance, where the disparity was most pronounced — cannot be confirmed from available search results. The FCA's reported harm was £1.2 billion annually across motor and home combined; the specific "85%" comparison point for motor tenure vs new business is uncertain.

Required change: Replace with verifiable language:
"The FCA's MS18/1 market study documented substantial premium disparities between long-tenure and equivalent new customers in 2018, estimating consumer harm of around £1.2 billion annually across motor and home insurance."

If the "85%" figure can be confirmed on a specific page of MS18/1.3 (FCA market study final report, September 2020), it may be reinstated with that citation.

Source: https://www.fca.org.uk/publications/market-studies/ms18-1-general-insurance-pricing-practices-market-study""",
        "tags": json.dumps(["ms18-1", "fca", "price-walking", "fact-check", "moderate"]),
        "source_agent": "investigator",
    },
    {
        "title": "Fact-Check: 2026-03-07 Batch 2 — Summary of verified claims",
        "category": "editorial",
        "content": """Batch: nine posts dated 2026-03-07. Full report at workspace/workbench/fact-check-march-07-batch2.md.

Six issues found (one CRITICAL, two moderate, three minor). Majority of claims across all nine posts are VERIFIED CORRECT.

VERIFIED CORRECT — key claims:

Academic citations (all correct):
- Chernozhukov et al. (2018) DML — The Econometrics Journal, 21(1): C1-C68
- Wager and Athey (2018) causal forests — JASA, 113(523)
- Guelman and Guillén (2014) price elasticity — Expert Systems with Applications, 41(2)
- Schelldorfer and Wüthrich CANN — SSRN 3320525 (2019)
- Tsang, Cheng and Liu NID — ICLR 2018
- Holvoet, Antonio and Henckaerts — arXiv:2310.12671 (2023)
- Lindholm, Richman, Tsanakas and Wüthrich — ASTIN Bulletin 52(1), 2022
- Lindholm and Palmquist (2024) — Annals of Actuarial Science
- arXiv:2510.04556 Gini drift test — Brauer and Menzel, submitted October 2025
- arXiv:2403.14385 DML evaluation (2024)
- arXiv:2307.16427 causal inference survey (2023)

Regulatory (all correct):
- PS21/11 GIPP effective January 2022
- EP25/2 July 2025 confirmed price-walking substantially eliminated and multi-firm reviews ongoing
- Consumer Duty (PRIN 2A) effective 31 July 2023
- TR24/2 August 2024 — fair value assessments lacked adequate granularity
- Equality Act 2010 Section 19 indirect discrimination definition

Data (all correct):
- 63% UK motor PCW switchers in 2024 (63.3% precisely)
- CPI 11.1% October 2022 (ONS confirmed)
- Citizens Advice 2022: £280 per year in high-minority postcodes; £213m total annual excess
- Civil Liability Act whiplash reform effective 31 May 2021 (Act name wrong in Post 5 — see separate entry)
- ABI NCD scale structure (10 levels, 0-65%, transitions)
- PSI thresholds 0.10/0.25 — correctly caveated as credit-scoring convention""",
        "tags": json.dumps(["fact-check", "summary", "march-2026", "verified"]),
        "source_agent": "investigator",
    },
]

conn = sqlite3.connect(DB_PATH, timeout=30)
cur = conn.cursor()

for entry in entries:
    cur.execute(
        """INSERT INTO entries (title, category, content, tags, source_agent, created_at, status)
           VALUES (?, ?, ?, ?, ?, ?, 'active')""",
        (
            entry["title"],
            entry["category"],
            entry["content"],
            entry["tags"],
            entry["source_agent"],
            now,
        ),
    )
    print(f"Inserted: {entry['title'][:70]}...")

conn.commit()
conn.close()
print("Done. All KB entries inserted.")