Skip to content

Sofwath/kahzaabu

Kahzaabu

Automated fact-checking archive for the Maldives Presidency. "Kahzaabu" (ή†ή¦ή’ή§ή„ήͺ) is Dhivehi for falsehood β€” and the street nickname for Mohamed Muizzu. The two names refer to the same person; the project treats them as synonyms.

License: Apache-2.0 Β· Tests: 328 passing Β· V2 status: Slices 0–12 done (see V2 build plan) Β· Trust model: read-only public web, operator actions via CLI (no in-app auth, no passwords)

⚠️ Reference implementation β€” not an authoritative source

Kahzaabu is a sample Hermes Agent plugin and fact-checking pipeline, built for educational and research purposes. Its output is automated analysis, not findings of fact. Do not cite kahzaabu's verdicts, Truth-O-Meter ratings, or "contradictions" as evidence in journalism, legal proceedings, academic writing, political argument, or social media. The only authoritative material is the underlying press release on presidency.gov.mv, which every fact-check links back to.

Full terms in DISCLAIMER.md. The "what this is / what this isn't" matrix lives there.

This is a research / educational project: it scrapes public press releases from presidency.gov.mv, extracts factual claims with an LLM, curates contradictions across time, verifies them against the open web, and stores the result in a queryable SQLite archive. A native Hermes Agent plugin exposes the archive to a chat agent so you can ask questions in plain English (or through Telegram / WhatsApp / Slack via the hermes gateway).

V2 layers state-of-the-art fact-checking methodology onto V1's six-stage pipeline: AVeriTeC verdict structure (Schlichtkrull et al., EMNLP 2023), RAGAR Chain-of-RAG reasoning (arXiv 2404.12065), Full Fact canonical claim matching, PolitiFact Truth-O-Meter labels, and schema.org ClaimReview JSON-LD for Google Fact Check Explorer indexing. See docs/ARCHITECTURE.md for the full citation map and docs/METHODOLOGY.md for the public-facing methodology.

This is not journalism. It is an automated pipeline that surfaces patterns. Every claim links back to the original press release on presidency.gov.mv. Read sources before drawing conclusions.


Table of contents

  1. What it does
  2. Quick start (two paths)
  3. Architecture in one diagram
  4. The pipeline, stage by stage
  5. Data model
  6. The agentic Q&A loop
  7. The narrative-tricks layer
  8. Hermes plugin: how it's wired
  9. Web UI tour
  10. TUI tour
  11. Costs
  12. Known issues & TODOs
  13. Security & ethics
  14. Testing
  15. Repository layout
  16. License & contributing
  17. Further reading

What it does

Today (May 2026), the archive holds:

Item Count
Muizzu-era press releases (EN, 2023-11-17 onwards) ~3,099
Extracted factual claims ~8,954
Q&A sub-questions (AVeriTeC decomposition) 35,648
Canonical-claim paraphrase groups 151
Curated fact-checks (published, V2-enriched) 220
Contradiction pairs (4-way verdict) 2 CONTRADICTION + 46 NOT_CONTRADICTORY
Web-evidence rows backing fact-checks 304
2023 campaign manifesto promises (tracked) 717
EN ↔ DV translation diff rows varies

V2 publishes each fact-check with three layered labels (see ADR 0005):

  1. Internal category (V1, kept): LIE Β· MISLEADING Β· BROKEN_DEADLINE Β· CREDIT_THEFT Β· SHIFTING_NUMBERS Β· CONTRADICTION. Used by the curator's classification.
  2. AVeriTeC verdict (V2): SUPPORTED Β· REFUTED Β· NOT_ENOUGH_EVIDENCE Β· CONFLICTING_EVIDENCE. Used by the agent/skill output, ClaimReview JSON-LD, and the public API.
  3. Truth-O-Meter (V2): a 6-rung public-facing ladder β€” TRUE (6) Β· MOSTLY_TRUE (5) Β· HALF_TRUE (4) Β· MOSTLY_FALSE (3) Β· FALSE (2) Β· PANTS_ON_FIRE (1). Used by the web UI's badge colors and Google's rich-result cards.

All three derive deterministically in kahzaabu/truth_score.py from the curator's (category, confidence) pair β€” no second LLM call. The mapping is unit-tested as ADR 0005 ground truth (see docs/EVAL_RESULTS.md).

A separate layer β€” the narrative-tricks analysis β€” sits on top of every article-derived answer and surfaces framing techniques (hero framing, manufactured momentum, vague timeframes, etc.) even when no factual error is present.


Quick start

Path A β€” standalone (CLI + web)

git clone https://github.com/Sofwath/kahzaabu.git && cd kahzaabu
python3 -m venv .venv
.venv/bin/pip install -e ".[all]"       # core + web + TUI + MCP server
# or pick extras: .[web]  .[tui]  .[mcp]  β€” bare `-e .` gets pipeline only
export ANTHROPIC_API_KEY=sk-ant-...

.venv/bin/kahzaabu pipeline --budget 1.00   # one full cycle
.venv/bin/kahzaabu web --port 8765           # open http://127.0.0.1:8765
.venv/bin/kahzaabu tui                       # interactive TUI
.venv/bin/kahzaabu ask "What's Muizzu been doing this month?"

Path B β€” as a Hermes plugin (recommended)

If you have Hermes Agent installed, kahzaabu integrates natively. The plugin source lives in hermes-plugin/ inside this repo β€” install symlinks it into hermes' plugins dir so edits are live, no copy step.

# One-time install β€” plugin + skills
./scripts/install-hermes-plugin.sh     # symlinks hermes-plugin/ β†’ ~/.hermes/hermes-agent/plugins/kahzaabu
./scripts/install-hermes-skills.sh     # symlinks skills/* β†’ ~/.hermes/skills/

hermes kahzaabu setup        # interactive: API key, daily budget, freshness threshold
hermes kahzaabu doctor       # health check (all should be βœ…)
hermes skills list           # should show kahzaabu-fact-check + kahzaabu-self-improver

# Use it β€” three surfaces
hermes kahzaabu status                          # archive counts + freshness
hermes kahzaabu ask "what did he promise about housing?"
hermes kahzaabu ask --continue "and the deadlines on those?"   # ↑ same session
hermes kahzaabu update --budget 0.50            # run pipeline
hermes kahzaabu web                             # start the web UI

# Inside any hermes chat session (terminal OR gateway-routed):
#   /kahzaabu what is he up to this week?
#   /kahzaabu and what about housing?           # ↑ auto-continues the session

# Wire messaging channels (Telegram, WhatsApp, Slack, Discord)
hermes gateway setup       # one-time
hermes gateway install     # install as systemd / launchd service
hermes gateway start       # now messages to your bot route to kahzaabu tools

Three things to know about the integration:

  1. /kahzaabu slash command is available in every hermes chat β€” terminal, Telegram, WhatsApp, Slack, Discord. Auto-continues the most-recent session (within 24h), so follow-ups don't lose context.
  2. hermes kahzaabu ask --continue mirrors hermes' own --continue UX for the CLI β€” picks up the previous session_id from the qna_sessions table.
  3. LLM-provider inheritance: the narrative-tricks pass routes through hermes' configured provider (whatever you picked in hermes setup model). Switch hermes from Anthropic to OpenAI to OpenRouter β€” the secondary pass follows. (Main agentic loop still uses Anthropic β€” it needs multi-turn tool-use that ctx.llm.complete() doesn't yet support.)

The hermes plugin source lives at hermes-plugin/ in this repo and is symlinked into ~/.hermes/hermes-agent/plugins/kahzaabu/ by the install script. It does not vendor code β€” it imports the package from this dev tree. See hermes plugin section for details.


Architecture

                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚  presidency.gov.mv (EN + DV)    β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚ scrape (incremental, 12h cycle)
                                      β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                     SQLite (data/kahzaabu.db, WAL)                   β”‚
   β”‚                                                                       β”‚
   β”‚   articles ── claims    fact_checks ── fact_check_evidence           β”‚
   β”‚       β”‚         β”‚           β”‚  (source_article_ids JSON array β†’ articles.id)
   β”‚       β”‚                                                              β”‚
   β”‚       β”œβ”€β”€ article_fact_cards (per-article inspector output)          β”‚
   β”‚       └── dv_en_inconsistencies (translation diffs)                  β”‚
   β”‚                                                                       β”‚
   β”‚   manifesto_promises ── manifesto_evidence (cross-ref)               β”‚
   β”‚                                                                       β”‚
   β”‚   qna_sessions (multi-turn agent memory)  ── scrape_runs            β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β–Ό                     β–Ό                     β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  CLI       β”‚         β”‚  FastAPI   β”‚        β”‚  Hermes    β”‚
        β”‚  + TUI     β”‚         β”‚  web UI    β”‚        β”‚  plugin    β”‚
        β”‚            β”‚         β”‚  :8765     β”‚        β”‚            β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                                                          β”‚
                                                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                  β–Ό                       β–Ό
                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                          β”‚  agent loop  β”‚       β”‚ hermes gatewayβ”‚
                                          β”‚ (kahzaabu_   β”‚       β”‚ Telegram /    β”‚
                                          β”‚  ask + 8     β”‚       β”‚ WhatsApp /    β”‚
                                          β”‚  tools +     β”‚       β”‚ Slack /       β”‚
                                          β”‚  web_search) β”‚       β”‚ Discord       β”‚
                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The DB is the source of truth. Every consumer is read-only over it except the pipeline, which appends.


The pipeline

kahzaabu pipeline runs the V1 six-stage core, with the V2 layers running as separate slice commands. Each stage is idempotent β€” re-runnable, with budgets, with cost tracking.

V1 core (every cycle):

# Stage What it does LLM cost per item
1 scrape scraper.py β€” incremental crawl of presidency.gov.mv/news/{press_release,speech,vp_speech} (EN + DV). HTTP only. $0
2 extract extractor.py β€” Sonnet reads each article, returns a list of {type, polarity, subject_normalized, is_checkable, ...} claim records (V2 schema, ADR 0002). ~$0.005-0.010
3 inspect inspector.py β€” generates a per-article fact card (summary, history-check, severity, viz spec). Stored in article_fact_cards. ~$0.015
4 curate curator.py β€” Sonnet sees all claims on the same topic across time and flags contradictions / broken deadlines / credit theft. Inserts fact_checks rows. ~$0.05/topic
5 verify verifier.py β€” Haiku does Anthropic web_search for each fact-check; agrees/disagrees evidence saved to fact_check_evidence. Bounded β€” only the high-severity ones. ~$0.03 + $0.01/search
6 dv-compare dv_compare.py β€” Sonnet reads paired EN+DV bodies, flags numeric / omission / softening differences. Inserts dv_en_inconsistencies. ~$0.08/pair

V2 enrichment slices (run on demand; one-shot backfills + periodic top-ups):

Cmd What it does ADR Cost so far
kahzaabu decompose decomposer.py β€” Haiku 4.5 breaks each claim into AVeriTeC Q&A pairs ({question, answer_type, source_medium}). 0001 $12.51 (8,954 claims)
kahzaabu match matcher.py + embeddings.py β€” embeds every claim, groups paraphrases via cosine β‰₯ 0.85 + entity overlap β‰₯ 0.6 + Haiku tiebreaker. Provider abstraction (local / OpenAI / Voyage). 0003, 0007 $0 (local)
kahzaabu find-contradictions contradictions.py β€” polarity-pair SQL shortlist + semantic-similarity filter [0.55, 0.95] + Sonnet 4.6 4-way classifier (CONTRADICTION / EVOLVING_POSITION / CONTEXT_CHANGED / NOT_CONTRADICTORY) with reasoning chain. 0004 $3.50
kahzaabu enrich-factchecks fact_check_enricher.py β€” deterministic V2-label backfill: verdict_label + truth_score + truth_score_label + reasoning_chain for every fact-check. 0005 $0
kahzaabu export-claimreview claimreview.py β€” schema.org ClaimReview JSON-LD generation + caching to fact_checks.claimreview_jsonld. 0006 $0
kahzaabu eval eval.py β€” golden-set evaluation across all five LLM-call stages. Produces verified-subset + drift-detector metrics. 0008 $0
(registry) registry.py β€” public-sector entity registry (25 Maldives entities); auto-tags fact_check_evidence.authoritative_entity_id when a URL is on a registered .gov.mv / .com.mv domain. Source of truth in data/registry/maldives_public_sector.yaml. 0011 $0
kahzaabu reproducibility <id> reproducibility.py β€” emits full provenance JSON manifest for a fact-check (curation run + claims + decomposition + evidence + contradiction pair + ClaimReview + git SHA). Also exposed as /api/reproducibility/{id}.json. 0010 $0
kahzaabu audit audit.py β€” bias/fairness markdown report with chi-squared on categoryΓ—year + categoryΓ—topic, verdict-label + Truth-O-Meter ladder distributions, speaker concentration, authoritative-source coverage. 0010 $0
kahzaabu transparency-report --since transparency.py β€” public-facing window report: fact-checks issued, corrections, LLM spend, methodology git-log. 0010 $0

Defaults: cycle runs every 12h via launchd (scripts/com.kahzaabu.pipeline.plist). Budget cap defaults to $1.00 per cycle. Total V2-build spend: ~$16.50. Total project spend to-date: ~$75.

A separate manifesto-extract + manifesto-crossref flow extracts ~717 promises from the 2023 campaign PDF (Dhivehi, 51 MB) and cross-references each against the archive to assign a delivery status.


Data model

Editor protocol β€” when changing this block, derive column lists from sqlite3 data/kahzaabu.db ".schema" rather than memory. Each entry below uses the format tablename -- description followed by indented -- cols: a, b, c lines. The cols: convention is load-bearing: tests/test_readme_schema_drift.py parses it and fails if any documented column is absent from the live schema. Run ./scripts/test.sh before committing.

The interesting tables:

articles            -- PK (id, language). EN ↔ DV pairs via shared id + paired_id.
                    -- cols: title, category, body_text, body_html, published_date,
                    --       reference, scraped_at, raw_page_html
claims              -- extracted from article body_text by the LLM.
                    -- cols: article_id, language (FK), type, subject, value,
                    --       deadline, actor_credited, quote, extraction_run_id,
                    --       polarity (V2: AFFIRM/DENY/PROMISE/DENIAL_OF_PROMISE/
                    --                CLAIM_OF_FACT/NEUTRAL β€” see ADR 0002),
                    --       subject_normalized (V2: entity-resolved subject),
                    --       is_checkable (V2: 0=opinion/rhetoric, 1=factual)
fact_checks         -- curated contradictions / broken deadlines / etc.
                    -- cols: category, claim_date, claim, what_actually_happened,
                    --       topic, confidence, source_article_ids (JSON array
                    --       of articles.id), evidence_quotes (JSON), published,
                    --       public_summary, fingerprint (dedupe key)
fact_check_evidence -- web-search hits backing each fact-check.
                    -- cols: fact_check_id (FK), url, title, snippet, relevance
                    --       ('confirms'|'contradicts'|'context'|'unclear'|
                    --       'not_found'), summary, retrieved_at,
                    --       authoritative_entity_id (V2: ADR 0011, nullable
                    --       pointer into the public-sector registry under
                    --       data/registry/)
article_fact_cards  -- per-article inspector output.
                    -- cols: article_id, language, summary, key_claims_json,
                    --       history_check, severity, visualization_spec_json,
                    --       web_evidence_json, cost_usd, inspection_run_id,
                    --       published
dv_en_inconsistencies -- EN/DV translation diffs.
                    -- cols: en_article_id, dv_article_id (FKs), severity,
                    --       category, en_quote, dv_quote, dv_translation_to_en
manifesto_promises  -- 2023 campaign promises with delivery tracking.
                    -- cols: section, promise_text_dv, promise_text_en,
                    --       category, subject, target_value, deadline_stated,
                    --       delivery_status, delivery_evidence_json (JSON:
                    --       linked article_ids + fact_check_ids + notes),
                    --       chunk_index, published
qna_sessions        -- agentic-ask multi-turn memory.
                    -- cols: id (uuid), messages_json (full message history),
                    --       total_cost_usd, n_turns, created_at, last_used_at
constitution_articles -- parsed Constitution of the Republic of Maldives.
                    -- cols: article_no, chapter, title, body, source_version,
                    --       imported_at
scrape_runs         -- audit log of pipeline cycles.
                    -- cols: category_id, language, started_at, finished_at,
                    --       pages_scraped, articles_scraped, articles_new,
                    --       status, resume_page, error_message
web_users           -- LEGACY: kept for backwards-compat with deployed DBs.
                    -- The password-based admin workflow was removed
                    -- (web UI is read-only public; operator actions
                    -- run from the CLI). No code reads or writes this
                    -- table any more.
                    -- cols: username, password_hash, role, created_at

Article ↔ fact-check linkage is via the JSON column fact_checks.source_article_ids β€” a list of articles.id values. Use SQLite's json_each() to traverse it (or LIKE on the serialized form as a fallback).

Migrations are idempotent ALTER-COLUMN style in claims_db.py:init_claims_schema(). WAL mode is on; check_same_thread=False for the FastAPI threadpool.


The agentic Q&A loop

kahzaabu/qna_agentic.py:ask_agentic() is the heart of the Q&A experience. It is itself an agent loop β€” Sonnet calls internal tools to satisfy a question.

user question
    β”‚
    β–Ό
Sonnet 4.6 + tools = [
    archive_stats, search_articles, get_article,
    search_factchecks, get_factcheck,
    search_manifesto, get_promise,
    list_recent,
    web_search (Anthropic server tool)
]
    β”‚
    β–Ό
loop up to max_iterations (default 7):
    if Sonnet returns tool_use:
        execute, append result, continue
    else:
        capture final_text, break
    β”‚
    β–Ό
guarantee-pass (Haiku 4.5, ~$0.01):
    if final_text quotes article text BUT lacks "🎭 Narrative tricks observed":
        ask Haiku to append the section using the catalog
    β”‚
    β–Ό
return {answer, session_id, n_iterations, cost_usd, tool_trace, web_searches}

Session memory lives in the qna_sessions table. Pass the returned session_id back to continue a conversation β€” the loop will re-load all prior tool results and turns.

Cost per question:

  • Simple "how many fact-checks?" (data-only) β€” ~$0.025
  • Article-heavy ("what did he say last week?") β€” ~$0.05-0.10
  • Open-ended with web_search β€” ~$0.10-0.30

Daily budget cap (default $5) is enforced at the top of ask_agentic.


The narrative-tricks layer

A 16-technique catalog (hero framing, manufactured momentum, goalpost shifting, empty markers of action, vague timeframes, etc.) is appended in the system prompt with anti-over-claiming rules:

  • Cap of 5 items per answer
  • Every flag must include the verbatim quote
  • Ceremonial language ("expressed gratitude") is explicitly NOT a trick
  • Hedging language ("could be seen as", "this might imply") is forbidden

The section is enforced via a guarantee-pass: if the agent quotes article text but skips the section, a follow-up Haiku call appends it. Cost: ~$0.01 per article-touching question.

Pure-data questions (e.g. "how many fact-checks?") correctly omit the section β€” the guarantee-pass is gated on tool_trace containing article-content tools.

See qna_agentic.py:SYSTEM_PROMPT for the catalog and _ARTICLE_TOOLS for the gating set.


Hermes plugin

The plugin source lives at hermes-plugin/ in this repo. The install script (scripts/install-hermes-plugin.sh) symlinks it into ~/.hermes/hermes-agent/plugins/kahzaabu/ so hermes can find it. Edits in hermes-plugin/ are live β€” no copy/sync step.

Layout:

hermes-plugin/
β”œβ”€β”€ plugin.yaml    Manifest: name, version, provides_tools, platforms
β”œβ”€β”€ __init__.py    register(ctx) β€” entry point. Three jobs:
β”‚                    1. Hydrate ~/.hermes/.env into os.environ
β”‚                    2. Ensure kahzaabu is importable (self-heal .pth)
β”‚                    3. Register 9 tools + `hermes kahzaabu` CLI
β”‚                       + `/kahzaabu` slash command
β”œβ”€β”€ tools.py       9 handler functions wrapping qna_agentic / claims_db
β”œβ”€β”€ cli.py         argparse setup for `hermes kahzaabu {setup,status,…}`
β”œβ”€β”€ SKILL.md       Agent-facing guidance: when to use which tool
└── README.md      Plugin-source README (design choices, bootstrap layers)

Design choices to know:

  • Imports, doesn't vendor. Plugin imports the canonical kahzaabu package from this dev tree. Editing code here updates the plugin immediately.
  • Path discovery is robust. kahzaabu_home() derives the dev tree from Path(kahzaabu.__file__).resolve().parents[1]. No hardcoded paths anywhere.
  • .pth self-heal. Hermes' venv has no pip, so the plugin writes ~/.hermes/hermes-agent/venv/lib/python3.11/site-packages/kahzaabu.pth on first run. If hermes ever recreates its venv, the next hermes kahzaabu * invocation rewrites it.
  • Tools are in-process. Unlike the previous MCP-over-stdio design, hermes calls plugin tools directly β€” no subprocess, ~5-10Γ— faster per call.
  • update and web shell out. Both need scikit-learn / FastAPI / etc. that don't live in hermes' lean venv, so they exec <dev>/.venv/bin/kahzaabu pipeline|web. doctor checks this.

The 9 tools exposed to the agent:

Tool What it does
kahzaabu_stats Counts + freshness β€” call first for "recent" questions
kahzaabu_ask Run the full agentic loop β€” preferred for any natural-language question
kahzaabu_list_lies List fact-checks with filters
kahzaabu_get_factcheck One fact-check + web evidence + linked source articles
kahzaabu_manifesto 2023 promises with delivery status
kahzaabu_get_article One article with claims + linked fact-checks
kahzaabu_recent_activity Last N days of articles
kahzaabu_constitution_lookup BM25 search over the 301 Constitution articles
kahzaabu_pipeline_run Trigger pipeline (gated by KAHZAABU_ALLOW_PIPELINE=1; legacy KAHZAABU_MCP_ALLOW_PIPELINE=1 still honoured)

Three integration surfaces share one Q&A engine:

  • Agent tool call: hermes chat -q "..." β†’ agent invokes kahzaabu_ask and gets back {answer, session_id, cost_usd, tool_trace, web_searches}.
  • CLI subcommand: hermes kahzaabu ask [--continue] [--no-web] [--session ID] "..." β€” direct human use.
  • Slash command: /kahzaabu <question> works inside any hermes session, including chats routed through the messaging gateway. Auto-continues the most-recent session.

All three call the same kahzaabu/qna_agentic.py:ask_agentic() function, so session memory, the narrative-tricks layer, daily-budget caps, and cost accounting behave identically across surfaces. Sessions persist in the qna_sessions table and survive process restarts; the --continue and slash auto-continue affordances both use claims_db.most_recent_session_id() to find the latest one within a 24h window.

LLM-provider inheritance: the secondary narrative-tricks pass calls ctx.llm.complete() when invoked from the plugin (so it follows hermes setup model), and falls back to Anthropic Haiku 4.5 when called from the standalone CLI / TUI / web. The main agentic loop always uses Anthropic Sonnet β€” ctx.llm.complete() doesn't yet support multi-turn tool-use.


Web UI tour

kahzaabu web --port 8765 (or hermes kahzaabu web) serves:

Page What
/ Dashboard: 5 stat cards + 6 charts (categories, topics, claims/month, articles/month, manifesto-status, stacked-by-month) + freshness banner
/browse Article browser with filters
/lies Fact-check browser with category/severity filters
/article/{id} One article + claims + linked fact-checks + fact-card chart
/compare EN ↔ DV translation inconsistencies
/compare/{id} Side-by-side EN/DV with the flagged region highlighted
/manifesto 2023 promises with delivery status
/manifesto/{id} Per-promise detail + supporting articles
/ask The agentic Q&A interface (sessions, web toggle, tool-trace)
/methodology How the pipeline works (public-facing)
/corrections Public report-a-correction form

Read-only by design. There is no /admin, no /login, no session cookie, no password anywhere in the system. Publishing a fact-check, triggering the pipeline, creating backups β€” all of it runs from the operator's shell via the kahzaabu CLI and inherits OS-level permissions. The web UI's only writes are: rate-limited /api/ask (Q&A budget capped per day) and /api/corrections (public form that appends to a moderation queue read by the operator in CLI). slowapi rate-limits anonymous traffic.


TUI tour

kahzaabu tui (or python -m kahzaabu.tui) is a Textual-based interactive terminal. Slash commands:

Command What
/ask <question> Multi-turn agentic ask (session preserved)
/stats Archive counts + freshness
/lies [category] List fact-checks
/article <id> Show an article
/refresh Re-query freshness
/help Show all commands
/quit Exit

A startup banner shows freshness; if stale, it prompts to run kahzaabu update.


Costs

Total spend to date: ~$58. Typical ongoing costs:

Activity Per item Per 12h cycle (typical)
Scrape (HTTP) $0 $0
Extract claims $0.005-0.010 ~$0.05
Inspect (fact-card) $0.015 ~$0.15
Curate (cross-time) $0.05/topic ~$0.10
Verify (web-search) $0.03 + $0.01/hit ~$0.20
DV/EN compare $0.08/pair ~$0.40
Total per 12h cycle ~$0.90
/api/ask question $0.025 (data) β†’ $0.30 (web) n/a

Daily caps:

  • Pipeline: --budget 1.00 (CLI flag)
  • Q&A (per process): KAHZAABU_DAILY_BUDGET_USD=5.00
  • Public web Q&A (anon): hard cap returns 503 once daily spend exceeds env var

Known issues & TODOs

Known issues

  1. Pipeline via MCP silently skips scrape stage. When the agent calls kahzaabu_pipeline_run, the scrape sub-stage runs but produces no scrape_runs entries. Direct CLI (kahzaabu pipeline) works correctly. The MCP-path bug existed in the legacy MCP server too β€” the native plugin version may or may not still have it; not retested.
  2. hermes default model shows anthropic/anthropic/... in doctor. Pre-existing cosmetic bug in _hermes_provider() formatting β€” concatenates provider with a default that already includes the provider prefix.
  3. launchd plist still in use. Migration to hermes cron is documented in hermes kahzaabu setup but not executed. Both can run side-by-side; once you're confident, launchctl unload ~/Library/LaunchAgents/com.kahzaabu.pipeline.plist.

Recently fixed

  • Four plugin handlers used a hallucinated schema. kahzaabu_list_lies, kahzaabu_get_factcheck, kahzaabu_get_article were querying columns that don't exist (title/severity/summary) and joining a table that doesn't exist (fact_check_claims). Now rewritten against the real schema β€” fact-check ↔ article linkage uses the JSON source_article_ids column.

TODOs

Priority Item
πŸ”΄ High Public VPS deploy. Caddy + systemd templates in scripts/. Methodology page, robots.txt, rate-limits done. Needs: domain, server, DB sync strategy (push from laptop vs. run pipeline on server).
πŸ”΄ High βœ… done V2 Slice 12 β€” Reproducibility + observability. Shipped β€” /api/reproducibility/{id}.json + kahzaabu reproducibility CLI + kahzaabu audit + kahzaabu transparency-report + /metrics + Grafana dashboard JSON + Dockerfile.
🟑 Medium Grow the verified golden-set subset. 24 of 25 fixtures are verified ground truth; 1 extractor fixture (article-32009) deliberately left unverified pending taxonomy clarification on deadline_promise vs event-schedule. Hand-review more articles to broaden coverage per stage.
🟑 Medium Viber channel. Hermes doesn't support Viber. Would require a custom ctx.register_platform(...) adapter β€” 3-5 days. Out of scope unless Maldives-market demand justifies.
🟑 Medium βœ… done Migrate guarantee-pass to ctx.llm. Shipped: narrative-tricks pass now uses ctx.llm.complete() inside the plugin (anthropic fallback for non-plugin paths). Main loop still uses anthropic β€” needs tool-use.
🟑 Medium βœ… done Quality evals + prompt regression tests. Slice 10 shipped β€” see kahzaabu eval + docs/EVAL_RESULTS.md.
🟑 Medium Self-improver loop. A hermes skill at ~/.hermes/skills/kahzaabu/kahzaabu-self-improver/ already exists. Has produced test_claims_db.py with 17 unit tests. Pending: branch merge, additional iterations.
🟒 Low Replace launchd with hermes cron (see Known issues #3).
🟒 Low βœ… partial Per-tenant LLM selection. Secondary tricks pass now follows hermes' provider config. Main loop still hard-coded to Anthropic β€” would need a tool-use-capable host-LLM facade.
🟒 Low Fix doctor's anthropic/anthropic/... cosmetic bug. Strip the provider prefix from model.default before formatting.
🟒 Low Compare-presidents page. Would need historical pre-Muizzu data. Out of scope but the schema supports it.
🟒 Low RSS/Atom feed of new fact-checks for public consumers.
🟒 Low One pre-existing scrape-stage MCP bug investigation (see Known issues #1). Good first target for the self-improver.

Security & ethics

  • The corpus is already public at presidency.gov.mv. No leaks, no inside sources.
  • Every fact-check links back to the original press release URL.
  • "Report a correction" form on /corrections appends to a moderation queue; the operator reviews it with kahzaabu CLI tooling, then publishes any resulting fact-check via kahzaabu publish <id>.
  • The web UI is read-only: only published items (fact_checks.published = 1) ever surface. Publishing flows are CLI-only β€” no web-side credentials exist anywhere.
  • Pipeline LLM calls are budget-capped; daily Q&A spend is capped; anonymous web traffic is rate-limited (slowapi).
  • Subject is a sitting head of state. Treat output as automated analysis, not finished journalism β€” review the source article before quoting.
  • No mass scraping of social-media or non-official sources. Web-search-verify uses Anthropic's web_search_20250305 server tool, which respects publisher robots.txt.

Testing

./scripts/test.sh                              # full local suite β€” 327 tests, ~2.6s
.venv/bin/python -m unittest discover tests/   # just the unit tests
.venv/bin/python tests/system_check.py         # live web-stack integration check
.venv/bin/kahzaabu eval                        # golden-set quality eval (ADR 0008)

The unit suite is offline, no external deps, and runs in seconds. It catches:

  • V1 invariants: host_llm branch in agentic Q&A, JSON1 vs LIKE-fallback parity, README schema drift
  • V2 invariants per slice: claim-enrichment migrations, decomposer enums, embedding-provider selection, matcher cosine + entity overlap, contradiction 4-way verdict validation, truth-score deterministic mapping, ClaimReview JSON-LD shape, eval framework metrics + verified-vs-pinned semantics

CI: .github/workflows/test.yml runs the unit suite on every push and PR to main. Use ./scripts/ci-dry-run.sh to validate a fresh-worktree install before pushing.

Quality regression detection: kahzaabu eval produces verified-subset metrics (real quality) and all-fixture metrics (drift detector) for each LLM-call stage. A prompt edit that drops the verified subset below 1.000 is a real regression; the drift-detector subset surfaces LLM noise without claiming truth. See docs/EVAL_RESULTS.md.


Repository layout

kahzaabu/                   The Python package
β”œβ”€β”€ __init__.py
β”œβ”€β”€ cli.py                  Click-based CLI (kahzaabu <subcommand>)
β”œβ”€β”€ pipeline.py             Orchestrates the 6 V1 stages
β”œβ”€β”€ scraper.py              presidency.gov.mv crawler (EN + DV)
β”œβ”€β”€ extractor.py            Per-article claim extraction (Sonnet) β€” V2 schema
β”œβ”€β”€ inspector.py            Per-article fact card (Sonnet)
β”œβ”€β”€ curator.py              Cross-time contradiction detector (Sonnet)
β”œβ”€β”€ verifier.py             Web-search-verifier (Haiku)
β”œβ”€β”€ dv_compare.py           EN/DV diff (Sonnet)
β”œβ”€β”€ manifesto.py            2023 promise extractor + cross-referencer
β”œβ”€β”€ qna.py                  Legacy single-shot Q&A (kept for CLI parity)
β”œβ”€β”€ qna_agentic.py          The current agentic Q&A loop + narrative-tricks
β”œβ”€β”€ claims_db.py            Schema, migrations, all DB helpers
β”œβ”€β”€ db.py                   Connection plumbing
β”œβ”€β”€ models.py               Type aliases
β”œβ”€β”€ report.py               JSON/CSV export of fact_checks
β”œβ”€β”€ infographics.py         Static-HTML viz generators (legacy tracker)
β”œβ”€β”€ scheduler.py            launchd helper
β”œβ”€β”€ tui.py                  Textual TUI
β”œβ”€β”€ legacy/
β”‚   └── mcp_server.py       [DEPRECATED] stdio MCP server β€” superseded by hermes-plugin/
β”‚
β”‚  # V2 modules (ADR-driven)
β”œβ”€β”€ decomposer.py           Slice 2 β€” AVeriTeC Q&A decomposition (Haiku)
β”œβ”€β”€ embeddings.py           Slice 3 β€” provider abstraction (local/OpenAI/Voyage)
β”œβ”€β”€ matcher.py              Slice 3 β€” canonical claim matching
β”œβ”€β”€ claims_enricher.py      Slice 4 prep β€” polarity/subject/is_checkable backfill
β”œβ”€β”€ contradictions.py       Slice 4 β€” 4-way verdict classifier (Sonnet)
β”œβ”€β”€ truth_score.py          Slice 5 β€” deterministic AVeriTeC + Truth-O-Meter mapping
β”œβ”€β”€ fact_check_enricher.py  Slice 5 β€” V2-label backfill for fact_checks
β”œβ”€β”€ claimreview.py          Slice 6 β€” schema.org ClaimReview JSON-LD generator
β”œβ”€β”€ eval.py                 Slice 10 β€” golden-set quality evaluation framework
β”œβ”€β”€ registry.py             Slice 11.5 β€” public-sector entity registry / trust anchor
β”œβ”€β”€ reproducibility.py      Slice 12 β€” provenance manifest assembly (ADR 0010)
β”œβ”€β”€ audit.py                Slice 12 β€” bias/fairness audit with chi-squared
β”œβ”€β”€ transparency.py         Slice 12 β€” public-facing transparency report
β”œβ”€β”€ metrics.py              Slice 12 β€” prometheus_client + @tracked_stage
β”‚                            decorator (wraps all 8 pipeline run_*)
β”‚
└── web/                    FastAPI app
    β”œβ”€β”€ app.py
    β”œβ”€β”€ api/                JSON endpoints (all public read-only)
    β”‚   β”œβ”€β”€ articles.py / factchecks.py / manifesto.py
    β”‚   β”œβ”€β”€ ask.py / freshness.py / stats.py / viz.py
    β”‚   β”œβ”€β”€ corrections.py / inspect.py
    β”‚   β”œβ”€β”€ claimreview.py / contradictions.py    # V2
    β”‚   β”œβ”€β”€ constitution.py                       # 301-article BM25 search
    β”‚   β”œβ”€β”€ reproducibility.py                    # V2 Slice 12
    β”œβ”€β”€ static/             HTML / CSS / JS (no SPA)
    β”‚   └── contradictions.html                   # V2 β€” 4-way verdict browser
    β”œβ”€β”€ db_dep.py           FastAPI Depends() for DB
    └── limits.py           Rate-limiter + LRU cache for /api/ask

hermes-plugin/              Hermes plugin source (symlinked from ~/.hermes/...)
                            β€” see hermes-plugin/README.md
skills/                     Hermes-installable agentskills.io skills
                            β€” kahzaabu-fact-check (symlinked into
                            ~/.hermes/skills/)
tests/                      Unit + integration tests (197 total)
β”œβ”€β”€ test_*.py               14 modules
└── golden/                 Quality-eval fixtures (5 stages, 25 fixtures,
                            24/25 verified ground truth β€” ADR 0008)
research/                   Historical one-shot scripts (see research/README.md)
                            β€” NOT imported by the package
scripts/                    test.sh, install-hermes-plugin.sh, install-hermes-skills.sh,
                            ci-dry-run.sh, run_pipeline.sh, Caddyfile, systemd unit
docs/                       Project documentation
β”œβ”€β”€ ARCHITECTURE.md         Full V2 architecture with citation map
β”œβ”€β”€ METHODOLOGY.md          Public-facing methodology (Slice 11)
β”œβ”€β”€ MODEL_CARD.md           Per-stage LLM model card (Slice 11)
β”œβ”€β”€ DATA_CARD.md            Corpus data card (Slice 11)
β”œβ”€β”€ EVAL_RESULTS.md         Auto-generated quality eval report
β”œβ”€β”€ V2_BUILD_PLAN.md        Slice tracker
β”œβ”€β”€ TEST_REPORT.md          Latest full-stack test snapshot
└── adr/                    Architecture Decision Records (0001-0010)
.github/
β”œβ”€β”€ workflows/              CI: test.yml runs unit suite on push + PR
β”œβ”€β”€ ISSUE_TEMPLATE/         Bug report + feature request templates (Slice 11)
└── PULL_REQUEST_TEMPLATE.md
data/                       SQLite DB + manifesto/ (other contents gitignored)
β”œβ”€β”€ registry/               Public-sector entity registry (ADR 0011)
β”‚   β”œβ”€β”€ maldives_public_sector.yaml   ← source of truth (human-edit here)
β”‚   └── maldives_public_sector.json   ← machine-loaded twin
β”œβ”€β”€ backups/                Local-only sqlite3 dumps (gitignored)
└── reports/                `kahzaabu audit` + `transparency-report` output
                             (gitignored β€” regenerate locally)
Dockerfile                  One-command reproduction β€” python:3.11-slim
                             base, editable install with chosen embedding
                             backend (ADR 0010)
docs/observability/
└── grafana-dashboard.json  Importable dashboard (6 panels)
LICENSE                     Apache-2.0
SECURITY.md                 Vulnerability disclosure (90-day window)
CONTRIBUTING.md             Slice discipline, ADR process, test gates
CODE_OF_CONDUCT.md          Contributor Covenant 2.1

License & contributing

  • License: Apache-2.0. Patent grant included. Derivative works may re-license; attribution required. See ADR 0009 for the rationale.
  • Contributing: CONTRIBUTING.md β€” slice discipline, ADR process, test gates, commit format. PRs without ADRs for non-trivial changes get bounced.
  • Code of Conduct: CODE_OF_CONDUCT.md β€” Contributor Covenant 2.1. Enforcement contact: Sofwathullah.Mohamed@gmail.com.
  • Security: SECURITY.md β€” 90-day responsible-disclosure window; report to Sofwathullah.Mohamed@gmail.com with [kahzaabu-security] in the subject.
  • ADRs: every architectural decision is documented under docs/adr/. 0001–0010 cover V2.
  • Model & data cards: docs/MODEL_CARD.md and docs/DATA_CARD.md describe each LLM-call stage's prompts/biases/limitations and the corpus's coverage, gaps, and refresh cadence.

Further reading

If you want to change something, start with hermes kahzaabu doctor to confirm your environment is healthy, then read the relevant module β€” they're small (mostly 200-400 LOC) and prose-heavy. The pipeline is the most opinionated part; everything else is glue.

About

Reference implementation of a Hermes Agent fact-checking plugin (educational/research). Pipeline over Maldives Presidency press releases. Not an authoritative source.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors