Kahzaabu

Automated fact-checking archive for the Maldives Presidency. "Kahzaabu" (ކަޒާބު) is Dhivehi for falsehood — and the street nickname for Mohamed Muizzu. The two names refer to the same person; the project treats them as synonyms.

License: Apache-2.0 · Tests: 328 passing · V2 status: Slices 0–12 done (see V2 build plan) · Trust model: read-only public web, operator actions via CLI (no in-app auth, no passwords)

⚠️ Reference implementation — not an authoritative source

Kahzaabu is a sample Hermes Agent plugin and fact-checking pipeline, built for educational and research purposes. Its output is automated analysis, not findings of fact. Do not cite kahzaabu's verdicts, Truth-O-Meter ratings, or "contradictions" as evidence in journalism, legal proceedings, academic writing, political argument, or social media. The only authoritative material is the underlying press release on presidency.gov.mv, which every fact-check links back to.

Full terms in DISCLAIMER.md. The "what this is / what this isn't" matrix lives there.

This is a research / educational project: it scrapes public press releases from presidency.gov.mv, extracts factual claims with an LLM, curates contradictions across time, verifies them against the open web, and stores the result in a queryable SQLite archive. A native Hermes Agent plugin exposes the archive to a chat agent so you can ask questions in plain English (or through Telegram / WhatsApp / Slack via the hermes gateway).

V2 layers state-of-the-art fact-checking methodology onto V1's six-stage pipeline: AVeriTeC verdict structure (Schlichtkrull et al., EMNLP 2023), RAGAR Chain-of-RAG reasoning (arXiv 2404.12065), Full Fact canonical claim matching, PolitiFact Truth-O-Meter labels, and schema.org ClaimReview JSON-LD for Google Fact Check Explorer indexing. See docs/ARCHITECTURE.md for the full citation map and docs/METHODOLOGY.md for the public-facing methodology.

This is not journalism. It is an automated pipeline that surfaces patterns. Every claim links back to the original press release on presidency.gov.mv. Read sources before drawing conclusions.

What it does

Today (May 2026), the archive holds:

Item	Count
Muizzu-era press releases (EN, 2023-11-17 onwards)	~3,099
Extracted factual claims	~8,954
Q&A sub-questions (AVeriTeC decomposition)	35,648
Canonical-claim paraphrase groups	151
Curated fact-checks (published, V2-enriched)	220
Contradiction pairs (4-way verdict)	2 CONTRADICTION + 46 NOT_CONTRADICTORY
Web-evidence rows backing fact-checks	304
2023 campaign manifesto promises (tracked)	717
EN ↔ DV translation diff rows	varies

V2 publishes each fact-check with three layered labels (see ADR 0005):

Internal category (V1, kept): LIE · MISLEADING · BROKEN_DEADLINE · CREDIT_THEFT · SHIFTING_NUMBERS · CONTRADICTION. Used by the curator's classification.
AVeriTeC verdict (V2): SUPPORTED · REFUTED · NOT_ENOUGH_EVIDENCE · CONFLICTING_EVIDENCE. Used by the agent/skill output, ClaimReview JSON-LD, and the public API.
Truth-O-Meter (V2): a 6-rung public-facing ladder — TRUE (6) · MOSTLY_TRUE (5) · HALF_TRUE (4) · MOSTLY_FALSE (3) · FALSE (2) · PANTS_ON_FIRE (1). Used by the web UI's badge colors and Google's rich-result cards.

All three derive deterministically in kahzaabu/truth_score.py from the curator's (category, confidence) pair — no second LLM call. The mapping is unit-tested as ADR 0005 ground truth (see docs/EVAL_RESULTS.md).

A separate layer — the narrative-tricks analysis — sits on top of every article-derived answer and surfaces framing techniques (hero framing, manufactured momentum, vague timeframes, etc.) even when no factual error is present.

Quick start

Path A — standalone (CLI + web)

git clone https://github.com/Sofwath/kahzaabu.git && cd kahzaabu
python3 -m venv .venv
.venv/bin/pip install -e ".[all]"       # core + web + TUI + MCP server
# or pick extras: .[web]  .[tui]  .[mcp]  — bare `-e .` gets pipeline only
export ANTHROPIC_API_KEY=sk-ant-...

.venv/bin/kahzaabu pipeline --budget 1.00   # one full cycle
.venv/bin/kahzaabu web --port 8765           # open http://127.0.0.1:8765
.venv/bin/kahzaabu tui                       # interactive TUI
.venv/bin/kahzaabu ask "What's Muizzu been doing this month?"

Path B — as a Hermes plugin (recommended)

If you have Hermes Agent installed, kahzaabu integrates natively. The plugin source lives in hermes-plugin/ inside this repo — install symlinks it into hermes' plugins dir so edits are live, no copy step.

# One-time install — plugin + skills
./scripts/install-hermes-plugin.sh     # symlinks hermes-plugin/ → ~/.hermes/hermes-agent/plugins/kahzaabu
./scripts/install-hermes-skills.sh     # symlinks skills/* → ~/.hermes/skills/

hermes kahzaabu setup        # interactive: API key, daily budget, freshness threshold
hermes kahzaabu doctor       # health check (all should be ✅)
hermes skills list           # should show kahzaabu-fact-check + kahzaabu-self-improver

# Use it — three surfaces
hermes kahzaabu status                          # archive counts + freshness
hermes kahzaabu ask "what did he promise about housing?"
hermes kahzaabu ask --continue "and the deadlines on those?"   # ↑ same session
hermes kahzaabu update --budget 0.50            # run pipeline
hermes kahzaabu web                             # start the web UI

# Inside any hermes chat session (terminal OR gateway-routed):
#   /kahzaabu what is he up to this week?
#   /kahzaabu and what about housing?           # ↑ auto-continues the session

# Wire messaging channels (Telegram, WhatsApp, Slack, Discord)
hermes gateway setup       # one-time
hermes gateway install     # install as systemd / launchd service
hermes gateway start       # now messages to your bot route to kahzaabu tools

Three things to know about the integration:

/kahzaabu slash command is available in every hermes chat — terminal, Telegram, WhatsApp, Slack, Discord. Auto-continues the most-recent session (within 24h), so follow-ups don't lose context.
hermes kahzaabu ask --continue mirrors hermes' own --continue UX for the CLI — picks up the previous session_id from the qna_sessions table.
LLM-provider inheritance: the narrative-tricks pass routes through hermes' configured provider (whatever you picked in hermes setup model). Switch hermes from Anthropic to OpenAI to OpenRouter — the secondary pass follows. (Main agentic loop still uses Anthropic — it needs multi-turn tool-use that ctx.llm.complete() doesn't yet support.)

The hermes plugin source lives at hermes-plugin/ in this repo and is symlinked into ~/.hermes/hermes-agent/plugins/kahzaabu/ by the install script. It does not vendor code — it imports the package from this dev tree. See hermes plugin section for details.

Architecture

                       ┌─────────────────────────────────┐
                       │  presidency.gov.mv (EN + DV)    │
                       └──────────────┬──────────────────┘
                                      │ scrape (incremental, 12h cycle)
                                      ▼
   ┌──────────────────────────────────────────────────────────────────────┐
   │                     SQLite (data/kahzaabu.db, WAL)                   │
   │                                                                       │
   │   articles ── claims    fact_checks ── fact_check_evidence           │
   │       │         │           │  (source_article_ids JSON array → articles.id)
   │       │                                                              │
   │       ├── article_fact_cards (per-article inspector output)          │
   │       └── dv_en_inconsistencies (translation diffs)                  │
   │                                                                       │
   │   manifesto_promises ── manifesto_evidence (cross-ref)               │
   │                                                                       │
   │   qna_sessions (multi-turn agent memory)  ── scrape_runs            │
   └──────────────────────────────────────────────────────────────────────┘
                                      │
                ┌─────────────────────┼─────────────────────┐
                ▼                     ▼                     ▼
        ┌────────────┐         ┌────────────┐        ┌────────────┐
        │  CLI       │         │  FastAPI   │        │  Hermes    │
        │  + TUI     │         │  web UI    │        │  plugin    │
        │            │         │  :8765     │        │            │
        └────────────┘         └────────────┘        └─────┬──────┘
                                                          │
                                                  ┌───────┴──────────────┐
                                                  ▼                       ▼
                                          ┌──────────────┐       ┌───────────────┐
                                          │  agent loop  │       │ hermes gateway│
                                          │ (kahzaabu_   │       │ Telegram /    │
                                          │  ask + 8     │       │ WhatsApp /    │
                                          │  tools +     │       │ Slack /       │
                                          │  web_search) │       │ Discord       │
                                          └──────────────┘       └───────────────┘

The DB is the source of truth. Every consumer is read-only over it except the pipeline, which appends.

The pipeline

kahzaabu pipeline runs the V1 six-stage core, with the V2 layers running as separate slice commands. Each stage is idempotent — re-runnable, with budgets, with cost tracking.

V1 core (every cycle):

#	Stage	What it does	LLM cost per item
1	scrape	`scraper.py` — incremental crawl of `presidency.gov.mv/news/{press_release,speech,vp_speech}` (EN + DV). HTTP only.	$0
2	extract	`extractor.py` — Sonnet reads each article, returns a list of `{type, polarity, subject_normalized, is_checkable, ...}` claim records (V2 schema, ADR 0002).	~$0.005-0.010
3	inspect	`inspector.py` — generates a per-article fact card (summary, history-check, severity, viz spec). Stored in `article_fact_cards`.	~$0.015
4	curate	`curator.py` — Sonnet sees all claims on the same topic across time and flags contradictions / broken deadlines / credit theft. Inserts `fact_checks` rows.	~$0.05/topic
5	verify	`verifier.py` — Haiku does Anthropic web_search for each fact-check; agrees/disagrees evidence saved to `fact_check_evidence`. Bounded — only the high-severity ones.	~$0.03 + $0.01/search
6	dv-compare	`dv_compare.py` — Sonnet reads paired EN+DV bodies, flags numeric / omission / softening differences. Inserts `dv_en_inconsistencies`.	~$0.08/pair

V2 enrichment slices (run on demand; one-shot backfills + periodic top-ups):

Cmd	What it does	ADR	Cost so far
`kahzaabu decompose`	`decomposer.py` — Haiku 4.5 breaks each claim into AVeriTeC Q&A pairs (`{question, answer_type, source_medium}`).	0001	$12.51 (8,954 claims)
`kahzaabu match`	`matcher.py` + `embeddings.py` — embeds every claim, groups paraphrases via cosine ≥ 0.85 + entity overlap ≥ 0.6 + Haiku tiebreaker. Provider abstraction (local / OpenAI / Voyage).	0003, 0007	$0 (local)
`kahzaabu find-contradictions`	`contradictions.py` — polarity-pair SQL shortlist + semantic-similarity filter [0.55, 0.95] + Sonnet 4.6 4-way classifier (CONTRADICTION / EVOLVING_POSITION / CONTEXT_CHANGED / NOT_CONTRADICTORY) with reasoning chain.	0004	$3.50
`kahzaabu enrich-factchecks`	`fact_check_enricher.py` — deterministic V2-label backfill: `verdict_label` + `truth_score` + `truth_score_label` + `reasoning_chain` for every fact-check.	0005	$0
`kahzaabu export-claimreview`	`claimreview.py` — schema.org ClaimReview JSON-LD generation + caching to `fact_checks.claimreview_jsonld`.	0006	$0
`kahzaabu eval`	`eval.py` — golden-set evaluation across all five LLM-call stages. Produces verified-subset + drift-detector metrics.	0008	$0
(registry)	`registry.py` — public-sector entity registry (25 Maldives entities); auto-tags `fact_check_evidence.authoritative_entity_id` when a URL is on a registered .gov.mv / .com.mv domain. Source of truth in `data/registry/maldives_public_sector.yaml`.	0011	$0
`kahzaabu reproducibility <id>`	`reproducibility.py` — emits full provenance JSON manifest for a fact-check (curation run + claims + decomposition + evidence + contradiction pair + ClaimReview + git SHA). Also exposed as `/api/reproducibility/{id}.json`.	0010	$0
`kahzaabu audit`	`audit.py` — bias/fairness markdown report with chi-squared on category×year + category×topic, verdict-label + Truth-O-Meter ladder distributions, speaker concentration, authoritative-source coverage.	0010	$0
`kahzaabu transparency-report --since`	`transparency.py` — public-facing window report: fact-checks issued, corrections, LLM spend, methodology git-log.	0010	$0

Defaults: cycle runs every 12h via launchd (scripts/com.kahzaabu.pipeline.plist). Budget cap defaults to $1.00 per cycle. Total V2-build spend: ~$16.50. Total project spend to-date: ~$75.

A separate manifesto-extract + manifesto-crossref flow extracts ~717 promises from the 2023 campaign PDF (Dhivehi, 51 MB) and cross-references each against the archive to assign a delivery status.

Data model

Editor protocol — when changing this block, derive column lists from sqlite3 data/kahzaabu.db ".schema" rather than memory. Each entry below uses the format tablename -- description followed by indented -- cols: a, b, c lines. The cols: convention is load-bearing: tests/test_readme_schema_drift.py parses it and fails if any documented column is absent from the live schema. Run ./scripts/test.sh before committing.

The interesting tables:

articles            -- PK (id, language). EN ↔ DV pairs via shared id + paired_id.
                    -- cols: title, category, body_text, body_html, published_date,
                    --       reference, scraped_at, raw_page_html
claims              -- extracted from article body_text by the LLM.
                    -- cols: article_id, language (FK), type, subject, value,
                    --       deadline, actor_credited, quote, extraction_run_id,
                    --       polarity (V2: AFFIRM/DENY/PROMISE/DENIAL_OF_PROMISE/
                    --                CLAIM_OF_FACT/NEUTRAL — see ADR 0002),
                    --       subject_normalized (V2: entity-resolved subject),
                    --       is_checkable (V2: 0=opinion/rhetoric, 1=factual)
fact_checks         -- curated contradictions / broken deadlines / etc.
                    -- cols: category, claim_date, claim, what_actually_happened,
                    --       topic, confidence, source_article_ids (JSON array
                    --       of articles.id), evidence_quotes (JSON), published,
                    --       public_summary, fingerprint (dedupe key)
fact_check_evidence -- web-search hits backing each fact-check.
                    -- cols: fact_check_id (FK), url, title, snippet, relevance
                    --       ('confirms'|'contradicts'|'context'|'unclear'|
                    --       'not_found'), summary, retrieved_at,
                    --       authoritative_entity_id (V2: ADR 0011, nullable
                    --       pointer into the public-sector registry under
                    --       data/registry/)
article_fact_cards  -- per-article inspector output.
                    -- cols: article_id, language, summary, key_claims_json,
                    --       history_check, severity, visualization_spec_json,
                    --       web_evidence_json, cost_usd, inspection_run_id,
                    --       published
dv_en_inconsistencies -- EN/DV translation diffs.
                    -- cols: en_article_id, dv_article_id (FKs), severity,
                    --       category, en_quote, dv_quote, dv_translation_to_en
manifesto_promises  -- 2023 campaign promises with delivery tracking.
                    -- cols: section, promise_text_dv, promise_text_en,
                    --       category, subject, target_value, deadline_stated,
                    --       delivery_status, delivery_evidence_json (JSON:
                    --       linked article_ids + fact_check_ids + notes),
                    --       chunk_index, published
qna_sessions        -- agentic-ask multi-turn memory.
                    -- cols: id (uuid), messages_json (full message history),
                    --       total_cost_usd, n_turns, created_at, last_used_at
constitution_articles -- parsed Constitution of the Republic of Maldives.
                    -- cols: article_no, chapter, title, body, source_version,
                    --       imported_at
scrape_runs         -- audit log of pipeline cycles.
                    -- cols: category_id, language, started_at, finished_at,
                    --       pages_scraped, articles_scraped, articles_new,
                    --       status, resume_page, error_message
web_users           -- LEGACY: kept for backwards-compat with deployed DBs.
                    -- The password-based admin workflow was removed
                    -- (web UI is read-only public; operator actions
                    -- run from the CLI). No code reads or writes this
                    -- table any more.
                    -- cols: username, password_hash, role, created_at

Article ↔ fact-check linkage is via the JSON column fact_checks.source_article_ids — a list of articles.id values. Use SQLite's json_each() to traverse it (or LIKE on the serialized form as a fallback).

Migrations are idempotent ALTER-COLUMN style in claims_db.py:init_claims_schema(). WAL mode is on; check_same_thread=False for the FastAPI threadpool.

The agentic Q&A loop

kahzaabu/qna_agentic.py:ask_agentic() is the heart of the Q&A experience. It is itself an agent loop — Sonnet calls internal tools to satisfy a question.

user question
    │
    ▼
Sonnet 4.6 + tools = [
    archive_stats, search_articles, get_article,
    search_factchecks, get_factcheck,
    search_manifesto, get_promise,
    list_recent,
    web_search (Anthropic server tool)
]
    │
    ▼
loop up to max_iterations (default 7):
    if Sonnet returns tool_use:
        execute, append result, continue
    else:
        capture final_text, break
    │
    ▼
guarantee-pass (Haiku 4.5, ~$0.01):
    if final_text quotes article text BUT lacks "🎭 Narrative tricks observed":
        ask Haiku to append the section using the catalog
    │
    ▼
return {answer, session_id, n_iterations, cost_usd, tool_trace, web_searches}

Session memory lives in the qna_sessions table. Pass the returned session_id back to continue a conversation — the loop will re-load all prior tool results and turns.

Cost per question:

Simple "how many fact-checks?" (data-only) — ~$0.025
Article-heavy ("what did he say last week?") — ~$0.05-0.10
Open-ended with web_search — ~$0.10-0.30

Daily budget cap (default $5) is enforced at the top of ask_agentic.

The narrative-tricks layer

A 16-technique catalog (hero framing, manufactured momentum, goalpost shifting, empty markers of action, vague timeframes, etc.) is appended in the system prompt with anti-over-claiming rules:

Cap of 5 items per answer
Every flag must include the verbatim quote
Ceremonial language ("expressed gratitude") is explicitly NOT a trick
Hedging language ("could be seen as", "this might imply") is forbidden

The section is enforced via a guarantee-pass: if the agent quotes article text but skips the section, a follow-up Haiku call appends it. Cost: ~$0.01 per article-touching question.

Pure-data questions (e.g. "how many fact-checks?") correctly omit the section — the guarantee-pass is gated on tool_trace containing article-content tools.

See qna_agentic.py:SYSTEM_PROMPT for the catalog and _ARTICLE_TOOLS for the gating set.

Hermes plugin

The plugin source lives at hermes-plugin/ in this repo. The install script (scripts/install-hermes-plugin.sh) symlinks it into ~/.hermes/hermes-agent/plugins/kahzaabu/ so hermes can find it. Edits in hermes-plugin/ are live — no copy/sync step.

Layout:

hermes-plugin/
├── plugin.yaml    Manifest: name, version, provides_tools, platforms
├── __init__.py    register(ctx) — entry point. Three jobs:
│                    1. Hydrate ~/.hermes/.env into os.environ
│                    2. Ensure kahzaabu is importable (self-heal .pth)
│                    3. Register 9 tools + `hermes kahzaabu` CLI
│                       + `/kahzaabu` slash command
├── tools.py       9 handler functions wrapping qna_agentic / claims_db
├── cli.py         argparse setup for `hermes kahzaabu {setup,status,…}`
├── SKILL.md       Agent-facing guidance: when to use which tool
└── README.md      Plugin-source README (design choices, bootstrap layers)

Design choices to know:

Imports, doesn't vendor. Plugin imports the canonical kahzaabu package from this dev tree. Editing code here updates the plugin immediately.
Path discovery is robust. kahzaabu_home() derives the dev tree from Path(kahzaabu.__file__).resolve().parents[1]. No hardcoded paths anywhere.
.pth self-heal. Hermes' venv has no pip, so the plugin writes ~/.hermes/hermes-agent/venv/lib/python3.11/site-packages/kahzaabu.pth on first run. If hermes ever recreates its venv, the next hermes kahzaabu * invocation rewrites it.
Tools are in-process. Unlike the previous MCP-over-stdio design, hermes calls plugin tools directly — no subprocess, ~5-10× faster per call.
update and web shell out. Both need scikit-learn / FastAPI / etc. that don't live in hermes' lean venv, so they exec <dev>/.venv/bin/kahzaabu pipeline|web. doctor checks this.

The 9 tools exposed to the agent:

Tool	What it does
`kahzaabu_stats`	Counts + freshness — call first for "recent" questions
`kahzaabu_ask`	Run the full agentic loop — preferred for any natural-language question
`kahzaabu_list_lies`	List fact-checks with filters
`kahzaabu_get_factcheck`	One fact-check + web evidence + linked source articles
`kahzaabu_manifesto`	2023 promises with delivery status
`kahzaabu_get_article`	One article with claims + linked fact-checks
`kahzaabu_recent_activity`	Last N days of articles
`kahzaabu_constitution_lookup`	BM25 search over the 301 Constitution articles
`kahzaabu_pipeline_run`	Trigger pipeline (gated by `KAHZAABU_ALLOW_PIPELINE=1`; legacy `KAHZAABU_MCP_ALLOW_PIPELINE=1` still honoured)

Three integration surfaces share one Q&A engine:

Agent tool call: hermes chat -q "..." → agent invokes kahzaabu_ask and gets back {answer, session_id, cost_usd, tool_trace, web_searches}.
CLI subcommand: hermes kahzaabu ask [--continue] [--no-web] [--session ID] "..." — direct human use.
Slash command: /kahzaabu <question> works inside any hermes session, including chats routed through the messaging gateway. Auto-continues the most-recent session.

All three call the same kahzaabu/qna_agentic.py:ask_agentic() function, so session memory, the narrative-tricks layer, daily-budget caps, and cost accounting behave identically across surfaces. Sessions persist in the qna_sessions table and survive process restarts; the --continue and slash auto-continue affordances both use claims_db.most_recent_session_id() to find the latest one within a 24h window.

LLM-provider inheritance: the secondary narrative-tricks pass calls ctx.llm.complete() when invoked from the plugin (so it follows hermes setup model), and falls back to Anthropic Haiku 4.5 when called from the standalone CLI / TUI / web. The main agentic loop always uses Anthropic Sonnet — ctx.llm.complete() doesn't yet support multi-turn tool-use.

Web UI tour

kahzaabu web --port 8765 (or hermes kahzaabu web) serves:

Page	What
`/`	Dashboard: 5 stat cards + 6 charts (categories, topics, claims/month, articles/month, manifesto-status, stacked-by-month) + freshness banner
`/browse`	Article browser with filters
`/lies`	Fact-check browser with category/severity filters
`/article/{id}`	One article + claims + linked fact-checks + fact-card chart
`/compare`	EN ↔ DV translation inconsistencies
`/compare/{id}`	Side-by-side EN/DV with the flagged region highlighted
`/manifesto`	2023 promises with delivery status
`/manifesto/{id}`	Per-promise detail + supporting articles
`/ask`	The agentic Q&A interface (sessions, web toggle, tool-trace)
`/methodology`	How the pipeline works (public-facing)
`/corrections`	Public report-a-correction form

Read-only by design. There is no /admin, no /login, no session cookie, no password anywhere in the system. Publishing a fact-check, triggering the pipeline, creating backups — all of it runs from the operator's shell via the kahzaabu CLI and inherits OS-level permissions. The web UI's only writes are: rate-limited /api/ask (Q&A budget capped per day) and /api/corrections (public form that appends to a moderation queue read by the operator in CLI). slowapi rate-limits anonymous traffic.

TUI tour

kahzaabu tui (or python -m kahzaabu.tui) is a Textual-based interactive terminal. Slash commands:

Command	What
`/ask <question>`	Multi-turn agentic ask (session preserved)
`/stats`	Archive counts + freshness
`/lies [category]`	List fact-checks
`/article <id>`	Show an article
`/refresh`	Re-query freshness
`/help`	Show all commands
`/quit`	Exit

A startup banner shows freshness; if stale, it prompts to run kahzaabu update.

Costs

Total spend to date: ~$58. Typical ongoing costs:

Activity	Per item	Per 12h cycle (typical)
Scrape (HTTP)	$0	$0
Extract claims	$0.005-0.010	~$0.05
Inspect (fact-card)	$0.015	~$0.15
Curate (cross-time)	$0.05/topic	~$0.10
Verify (web-search)	$0.03 + $0.01/hit	~$0.20
DV/EN compare	$0.08/pair	~$0.40
Total per 12h cycle		~$0.90
`/api/ask` question	$0.025 (data) → $0.30 (web)	n/a

Daily caps:

Pipeline: --budget 1.00 (CLI flag)
Q&A (per process): KAHZAABU_DAILY_BUDGET_USD=5.00
Public web Q&A (anon): hard cap returns 503 once daily spend exceeds env var

Known issues & TODOs

Known issues

Pipeline via MCP silently skips scrape stage. When the agent calls kahzaabu_pipeline_run, the scrape sub-stage runs but produces no scrape_runs entries. Direct CLI (kahzaabu pipeline) works correctly. The MCP-path bug existed in the legacy MCP server too — the native plugin version may or may not still have it; not retested.
hermes default model shows anthropic/anthropic/... in doctor. Pre-existing cosmetic bug in _hermes_provider() formatting — concatenates provider with a default that already includes the provider prefix.
launchd plist still in use. Migration to hermes cron is documented in hermes kahzaabu setup but not executed. Both can run side-by-side; once you're confident, launchctl unload ~/Library/LaunchAgents/com.kahzaabu.pipeline.plist.

Recently fixed

~~Four plugin handlers used a hallucinated schema.~~ kahzaabu_list_lies, kahzaabu_get_factcheck, kahzaabu_get_article were querying columns that don't exist (title/severity/summary) and joining a table that doesn't exist (fact_check_claims). Now rewritten against the real schema — fact-check ↔ article linkage uses the JSON source_article_ids column.

TODOs

Priority	Item
🔴 High	Public VPS deploy. Caddy + systemd templates in `scripts/`. Methodology page, robots.txt, rate-limits done. Needs: domain, server, DB sync strategy (push from laptop vs. run pipeline on server).
~~🔴 High~~ ✅ done	~~V2 Slice 12 — Reproducibility + observability.~~ Shipped — `/api/reproducibility/{id}.json` + `kahzaabu reproducibility` CLI + `kahzaabu audit` + `kahzaabu transparency-report` + `/metrics` + Grafana dashboard JSON + Dockerfile.
🟡 Medium	Grow the verified golden-set subset. 24 of 25 fixtures are verified ground truth; 1 extractor fixture (`article-32009`) deliberately left unverified pending taxonomy clarification on `deadline_promise` vs event-schedule. Hand-review more articles to broaden coverage per stage.
🟡 Medium	Viber channel. Hermes doesn't support Viber. Would require a custom `ctx.register_platform(...)` adapter — 3-5 days. Out of scope unless Maldives-market demand justifies.
~~🟡 Medium~~ ✅ done	~~Migrate guarantee-pass to `ctx.llm`.~~ Shipped: narrative-tricks pass now uses `ctx.llm.complete()` inside the plugin (anthropic fallback for non-plugin paths). Main loop still uses anthropic — needs tool-use.
~~🟡 Medium~~ ✅ done	~~Quality evals + prompt regression tests.~~ Slice 10 shipped — see `kahzaabu eval` + `docs/EVAL_RESULTS.md`.
🟡 Medium	Self-improver loop. A hermes skill at `~/.hermes/skills/kahzaabu/kahzaabu-self-improver/` already exists. Has produced `test_claims_db.py` with 17 unit tests. Pending: branch merge, additional iterations.
🟢 Low	Replace launchd with `hermes cron` (see Known issues #3).
~~🟢 Low~~ ✅ partial	~~Per-tenant LLM selection.~~ Secondary tricks pass now follows hermes' provider config. Main loop still hard-coded to Anthropic — would need a tool-use-capable host-LLM facade.
🟢 Low	Fix doctor's `anthropic/anthropic/...` cosmetic bug. Strip the provider prefix from `model.default` before formatting.
🟢 Low	Compare-presidents page. Would need historical pre-Muizzu data. Out of scope but the schema supports it.
🟢 Low	RSS/Atom feed of new fact-checks for public consumers.
🟢 Low	One pre-existing scrape-stage MCP bug investigation (see Known issues #1). Good first target for the self-improver.

Security & ethics

The corpus is already public at presidency.gov.mv. No leaks, no inside sources.
Every fact-check links back to the original press release URL.
"Report a correction" form on /corrections appends to a moderation queue; the operator reviews it with kahzaabu CLI tooling, then publishes any resulting fact-check via kahzaabu publish <id>.
The web UI is read-only: only published items (fact_checks.published = 1) ever surface. Publishing flows are CLI-only — no web-side credentials exist anywhere.
Pipeline LLM calls are budget-capped; daily Q&A spend is capped; anonymous web traffic is rate-limited (slowapi).
Subject is a sitting head of state. Treat output as automated analysis, not finished journalism — review the source article before quoting.
No mass scraping of social-media or non-official sources. Web-search-verify uses Anthropic's web_search_20250305 server tool, which respects publisher robots.txt.

Testing

./scripts/test.sh                              # full local suite — 327 tests, ~2.6s
.venv/bin/python -m unittest discover tests/   # just the unit tests
.venv/bin/python tests/system_check.py         # live web-stack integration check
.venv/bin/kahzaabu eval                        # golden-set quality eval (ADR 0008)

The unit suite is offline, no external deps, and runs in seconds. It catches:

V1 invariants: host_llm branch in agentic Q&A, JSON1 vs LIKE-fallback parity, README schema drift
V2 invariants per slice: claim-enrichment migrations, decomposer enums, embedding-provider selection, matcher cosine + entity overlap, contradiction 4-way verdict validation, truth-score deterministic mapping, ClaimReview JSON-LD shape, eval framework metrics + verified-vs-pinned semantics

CI: .github/workflows/test.yml runs the unit suite on every push and PR to main. Use ./scripts/ci-dry-run.sh to validate a fresh-worktree install before pushing.

Quality regression detection: kahzaabu eval produces verified-subset metrics (real quality) and all-fixture metrics (drift detector) for each LLM-call stage. A prompt edit that drops the verified subset below 1.000 is a real regression; the drift-detector subset surfaces LLM noise without claiming truth. See docs/EVAL_RESULTS.md.

Repository layout

kahzaabu/                   The Python package
├── __init__.py
├── cli.py                  Click-based CLI (kahzaabu <subcommand>)
├── pipeline.py             Orchestrates the 6 V1 stages
├── scraper.py              presidency.gov.mv crawler (EN + DV)
├── extractor.py            Per-article claim extraction (Sonnet) — V2 schema
├── inspector.py            Per-article fact card (Sonnet)
├── curator.py              Cross-time contradiction detector (Sonnet)
├── verifier.py             Web-search-verifier (Haiku)
├── dv_compare.py           EN/DV diff (Sonnet)
├── manifesto.py            2023 promise extractor + cross-referencer
├── qna.py                  Legacy single-shot Q&A (kept for CLI parity)
├── qna_agentic.py          The current agentic Q&A loop + narrative-tricks
├── claims_db.py            Schema, migrations, all DB helpers
├── db.py                   Connection plumbing
├── models.py               Type aliases
├── report.py               JSON/CSV export of fact_checks
├── infographics.py         Static-HTML viz generators (legacy tracker)
├── scheduler.py            launchd helper
├── tui.py                  Textual TUI
├── legacy/
│   └── mcp_server.py       [DEPRECATED] stdio MCP server — superseded by hermes-plugin/
│
│  # V2 modules (ADR-driven)
├── decomposer.py           Slice 2 — AVeriTeC Q&A decomposition (Haiku)
├── embeddings.py           Slice 3 — provider abstraction (local/OpenAI/Voyage)
├── matcher.py              Slice 3 — canonical claim matching
├── claims_enricher.py      Slice 4 prep — polarity/subject/is_checkable backfill
├── contradictions.py       Slice 4 — 4-way verdict classifier (Sonnet)
├── truth_score.py          Slice 5 — deterministic AVeriTeC + Truth-O-Meter mapping
├── fact_check_enricher.py  Slice 5 — V2-label backfill for fact_checks
├── claimreview.py          Slice 6 — schema.org ClaimReview JSON-LD generator
├── eval.py                 Slice 10 — golden-set quality evaluation framework
├── registry.py             Slice 11.5 — public-sector entity registry / trust anchor
├── reproducibility.py      Slice 12 — provenance manifest assembly (ADR 0010)
├── audit.py                Slice 12 — bias/fairness audit with chi-squared
├── transparency.py         Slice 12 — public-facing transparency report
├── metrics.py              Slice 12 — prometheus_client + @tracked_stage
│                            decorator (wraps all 8 pipeline run_*)
│
└── web/                    FastAPI app
    ├── app.py
    ├── api/                JSON endpoints (all public read-only)
    │   ├── articles.py / factchecks.py / manifesto.py
    │   ├── ask.py / freshness.py / stats.py / viz.py
    │   ├── corrections.py / inspect.py
    │   ├── claimreview.py / contradictions.py    # V2
    │   ├── constitution.py                       # 301-article BM25 search
    │   ├── reproducibility.py                    # V2 Slice 12
    ├── static/             HTML / CSS / JS (no SPA)
    │   └── contradictions.html                   # V2 — 4-way verdict browser
    ├── db_dep.py           FastAPI Depends() for DB
    └── limits.py           Rate-limiter + LRU cache for /api/ask

hermes-plugin/              Hermes plugin source (symlinked from ~/.hermes/...)
                            — see hermes-plugin/README.md
skills/                     Hermes-installable agentskills.io skills
                            — kahzaabu-fact-check (symlinked into
                            ~/.hermes/skills/)
tests/                      Unit + integration tests (197 total)
├── test_*.py               14 modules
└── golden/                 Quality-eval fixtures (5 stages, 25 fixtures,
                            24/25 verified ground truth — ADR 0008)
research/                   Historical one-shot scripts (see research/README.md)
                            — NOT imported by the package
scripts/                    test.sh, install-hermes-plugin.sh, install-hermes-skills.sh,
                            ci-dry-run.sh, run_pipeline.sh, Caddyfile, systemd unit
docs/                       Project documentation
├── ARCHITECTURE.md         Full V2 architecture with citation map
├── METHODOLOGY.md          Public-facing methodology (Slice 11)
├── MODEL_CARD.md           Per-stage LLM model card (Slice 11)
├── DATA_CARD.md            Corpus data card (Slice 11)
├── EVAL_RESULTS.md         Auto-generated quality eval report
├── V2_BUILD_PLAN.md        Slice tracker
├── TEST_REPORT.md          Latest full-stack test snapshot
└── adr/                    Architecture Decision Records (0001-0010)
.github/
├── workflows/              CI: test.yml runs unit suite on push + PR
├── ISSUE_TEMPLATE/         Bug report + feature request templates (Slice 11)
└── PULL_REQUEST_TEMPLATE.md
data/                       SQLite DB + manifesto/ (other contents gitignored)
├── registry/               Public-sector entity registry (ADR 0011)
│   ├── maldives_public_sector.yaml   ← source of truth (human-edit here)
│   └── maldives_public_sector.json   ← machine-loaded twin
├── backups/                Local-only sqlite3 dumps (gitignored)
└── reports/                `kahzaabu audit` + `transparency-report` output
                             (gitignored — regenerate locally)
Dockerfile                  One-command reproduction — python:3.11-slim
                             base, editable install with chosen embedding
                             backend (ADR 0010)
docs/observability/
└── grafana-dashboard.json  Importable dashboard (6 panels)
LICENSE                     Apache-2.0
SECURITY.md                 Vulnerability disclosure (90-day window)
CONTRIBUTING.md             Slice discipline, ADR process, test gates
CODE_OF_CONDUCT.md          Contributor Covenant 2.1

License & contributing

License: Apache-2.0. Patent grant included. Derivative works may re-license; attribution required. See ADR 0009 for the rationale.
Contributing: CONTRIBUTING.md — slice discipline, ADR process, test gates, commit format. PRs without ADRs for non-trivial changes get bounced.
Code of Conduct: CODE_OF_CONDUCT.md — Contributor Covenant 2.1. Enforcement contact: Sofwathullah.Mohamed@gmail.com.
Security: SECURITY.md — 90-day responsible-disclosure window; report to Sofwathullah.Mohamed@gmail.com with [kahzaabu-security] in the subject.
ADRs: every architectural decision is documented under docs/adr/. 0001–0010 cover V2.
Model & data cards: docs/MODEL_CARD.md and docs/DATA_CARD.md describe each LLM-call stage's prompts/biases/limitations and the corpus's coverage, gaps, and refresh cadence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kahzaabu

⚠️ Reference implementation — not an authoritative source

Table of contents

What it does

Quick start

Path A — standalone (CLI + web)

Path B — as a Hermes plugin (recommended)

Architecture

The pipeline

Data model

The agentic Q&A loop

The narrative-tricks layer

Hermes plugin

Web UI tour

TUI tour

Costs

Known issues & TODOs

Known issues

Recently fixed

TODOs

Security & ethics

Testing

Repository layout

License & contributing

Further reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github		.github
data		data
docs		docs
hermes-plugin		hermes-plugin
hermes-stub/plugins		hermes-stub/plugins
kahzaabu		kahzaabu
research		research
scripts		scripts
skills		skills
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
DISCLAIMER.md		DISCLAIMER.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Kahzaabu

⚠️ Reference implementation — not an authoritative source

Table of contents

What it does

Quick start

Path A — standalone (CLI + web)

Path B — as a Hermes plugin (recommended)

Architecture

The pipeline

Data model

The agentic Q&A loop

The narrative-tricks layer

Hermes plugin

Web UI tour

TUI tour

Costs

Known issues & TODOs

Known issues

Recently fixed

TODOs

Security & ethics

Testing

Repository layout

License & contributing

Further reading

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages