Automated fact-checking archive for the Maldives Presidency. "Kahzaabu" (ήή¦ήή§ήήͺ) is Dhivehi for falsehood β and the street nickname for Mohamed Muizzu. The two names refer to the same person; the project treats them as synonyms.
License: Apache-2.0 Β· Tests: 328 passing Β· V2 status: Slices 0β12 done (see V2 build plan) Β· Trust model: read-only public web, operator actions via CLI (no in-app auth, no passwords)
Kahzaabu is a sample Hermes Agent plugin and fact-checking pipeline, built for educational and research purposes. Its output is automated analysis, not findings of fact. Do not cite kahzaabu's verdicts, Truth-O-Meter ratings, or "contradictions" as evidence in journalism, legal proceedings, academic writing, political argument, or social media. The only authoritative material is the underlying press release on
presidency.gov.mv, which every fact-check links back to.Full terms in DISCLAIMER.md. The "what this is / what this isn't" matrix lives there.
This is a research / educational project: it scrapes public press releases from presidency.gov.mv, extracts factual claims with an LLM, curates contradictions across time, verifies them against the open web, and stores the result in a queryable SQLite archive. A native Hermes Agent plugin exposes the archive to a chat agent so you can ask questions in plain English (or through Telegram / WhatsApp / Slack via the hermes gateway).
V2 layers state-of-the-art fact-checking methodology onto V1's six-stage pipeline: AVeriTeC verdict structure (Schlichtkrull et al., EMNLP 2023), RAGAR Chain-of-RAG reasoning (arXiv 2404.12065), Full Fact canonical claim matching, PolitiFact Truth-O-Meter labels, and schema.org ClaimReview JSON-LD for Google Fact Check Explorer indexing. See docs/ARCHITECTURE.md for the full citation map and docs/METHODOLOGY.md for the public-facing methodology.
This is not journalism. It is an automated pipeline that surfaces patterns. Every claim links back to the original press release on presidency.gov.mv. Read sources before drawing conclusions.
- What it does
- Quick start (two paths)
- Architecture in one diagram
- The pipeline, stage by stage
- Data model
- The agentic Q&A loop
- The narrative-tricks layer
- Hermes plugin: how it's wired
- Web UI tour
- TUI tour
- Costs
- Known issues & TODOs
- Security & ethics
- Testing
- Repository layout
- License & contributing
- Further reading
Today (May 2026), the archive holds:
| Item | Count |
|---|---|
| Muizzu-era press releases (EN, 2023-11-17 onwards) | ~3,099 |
| Extracted factual claims | ~8,954 |
| Q&A sub-questions (AVeriTeC decomposition) | 35,648 |
| Canonical-claim paraphrase groups | 151 |
| Curated fact-checks (published, V2-enriched) | 220 |
| Contradiction pairs (4-way verdict) | 2 CONTRADICTION + 46 NOT_CONTRADICTORY |
| Web-evidence rows backing fact-checks | 304 |
| 2023 campaign manifesto promises (tracked) | 717 |
| EN β DV translation diff rows | varies |
V2 publishes each fact-check with three layered labels (see ADR 0005):
- Internal category (V1, kept):
LIEΒ·MISLEADINGΒ·BROKEN_DEADLINEΒ·CREDIT_THEFTΒ·SHIFTING_NUMBERSΒ·CONTRADICTION. Used by the curator's classification. - AVeriTeC verdict (V2):
SUPPORTEDΒ·REFUTEDΒ·NOT_ENOUGH_EVIDENCEΒ·CONFLICTING_EVIDENCE. Used by the agent/skill output, ClaimReview JSON-LD, and the public API. - Truth-O-Meter (V2): a 6-rung public-facing ladder β
TRUE(6) Β·MOSTLY_TRUE(5) Β·HALF_TRUE(4) Β·MOSTLY_FALSE(3) Β·FALSE(2) Β·PANTS_ON_FIRE(1). Used by the web UI's badge colors and Google's rich-result cards.
All three derive deterministically in kahzaabu/truth_score.py from the curator's (category, confidence) pair β no second LLM call. The mapping is unit-tested as ADR 0005 ground truth (see docs/EVAL_RESULTS.md).
A separate layer β the narrative-tricks analysis β sits on top of every article-derived answer and surfaces framing techniques (hero framing, manufactured momentum, vague timeframes, etc.) even when no factual error is present.
git clone https://github.com/Sofwath/kahzaabu.git && cd kahzaabu
python3 -m venv .venv
.venv/bin/pip install -e ".[all]" # core + web + TUI + MCP server
# or pick extras: .[web] .[tui] .[mcp] β bare `-e .` gets pipeline only
export ANTHROPIC_API_KEY=sk-ant-...
.venv/bin/kahzaabu pipeline --budget 1.00 # one full cycle
.venv/bin/kahzaabu web --port 8765 # open http://127.0.0.1:8765
.venv/bin/kahzaabu tui # interactive TUI
.venv/bin/kahzaabu ask "What's Muizzu been doing this month?"If you have Hermes Agent installed, kahzaabu integrates natively. The plugin source lives in hermes-plugin/ inside this repo β install symlinks it into hermes' plugins dir so edits are live, no copy step.
# One-time install β plugin + skills
./scripts/install-hermes-plugin.sh # symlinks hermes-plugin/ β ~/.hermes/hermes-agent/plugins/kahzaabu
./scripts/install-hermes-skills.sh # symlinks skills/* β ~/.hermes/skills/
hermes kahzaabu setup # interactive: API key, daily budget, freshness threshold
hermes kahzaabu doctor # health check (all should be β
)
hermes skills list # should show kahzaabu-fact-check + kahzaabu-self-improver
# Use it β three surfaces
hermes kahzaabu status # archive counts + freshness
hermes kahzaabu ask "what did he promise about housing?"
hermes kahzaabu ask --continue "and the deadlines on those?" # β same session
hermes kahzaabu update --budget 0.50 # run pipeline
hermes kahzaabu web # start the web UI
# Inside any hermes chat session (terminal OR gateway-routed):
# /kahzaabu what is he up to this week?
# /kahzaabu and what about housing? # β auto-continues the session
# Wire messaging channels (Telegram, WhatsApp, Slack, Discord)
hermes gateway setup # one-time
hermes gateway install # install as systemd / launchd service
hermes gateway start # now messages to your bot route to kahzaabu toolsThree things to know about the integration:
/kahzaabuslash command is available in every hermes chat β terminal, Telegram, WhatsApp, Slack, Discord. Auto-continues the most-recent session (within 24h), so follow-ups don't lose context.hermes kahzaabu ask --continuemirrors hermes' own--continueUX for the CLI β picks up the previous session_id from the qna_sessions table.- LLM-provider inheritance: the narrative-tricks pass routes through hermes' configured provider (whatever you picked in
hermes setup model). Switch hermes from Anthropic to OpenAI to OpenRouter β the secondary pass follows. (Main agentic loop still uses Anthropic β it needs multi-turn tool-use thatctx.llm.complete()doesn't yet support.)
The hermes plugin source lives at hermes-plugin/ in this repo and is symlinked into ~/.hermes/hermes-agent/plugins/kahzaabu/ by the install script. It does not vendor code β it imports the package from this dev tree. See hermes plugin section for details.
βββββββββββββββββββββββββββββββββββ
β presidency.gov.mv (EN + DV) β
ββββββββββββββββ¬βββββββββββββββββββ
β scrape (incremental, 12h cycle)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SQLite (data/kahzaabu.db, WAL) β
β β
β articles ββ claims fact_checks ββ fact_check_evidence β
β β β β (source_article_ids JSON array β articles.id)
β β β
β βββ article_fact_cards (per-article inspector output) β
β βββ dv_en_inconsistencies (translation diffs) β
β β
β manifesto_promises ββ manifesto_evidence (cross-ref) β
β β
β qna_sessions (multi-turn agent memory) ββ scrape_runs β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β CLI β β FastAPI β β Hermes β
β + TUI β β web UI β β plugin β
β β β :8765 β β β
ββββββββββββββ ββββββββββββββ βββββββ¬βββββββ
β
βββββββββ΄βββββββββββββββ
βΌ βΌ
ββββββββββββββββ βββββββββββββββββ
β agent loop β β hermes gatewayβ
β (kahzaabu_ β β Telegram / β
β ask + 8 β β WhatsApp / β
β tools + β β Slack / β
β web_search) β β Discord β
ββββββββββββββββ βββββββββββββββββ
The DB is the source of truth. Every consumer is read-only over it except the pipeline, which appends.
kahzaabu pipeline runs the V1 six-stage core, with the V2 layers running as separate slice commands. Each stage is idempotent β re-runnable, with budgets, with cost tracking.
V1 core (every cycle):
| # | Stage | What it does | LLM cost per item |
|---|---|---|---|
| 1 | scrape | scraper.py β incremental crawl of presidency.gov.mv/news/{press_release,speech,vp_speech} (EN + DV). HTTP only. |
$0 |
| 2 | extract | extractor.py β Sonnet reads each article, returns a list of {type, polarity, subject_normalized, is_checkable, ...} claim records (V2 schema, ADR 0002). |
~$0.005-0.010 |
| 3 | inspect | inspector.py β generates a per-article fact card (summary, history-check, severity, viz spec). Stored in article_fact_cards. |
~$0.015 |
| 4 | curate | curator.py β Sonnet sees all claims on the same topic across time and flags contradictions / broken deadlines / credit theft. Inserts fact_checks rows. |
~$0.05/topic |
| 5 | verify | verifier.py β Haiku does Anthropic web_search for each fact-check; agrees/disagrees evidence saved to fact_check_evidence. Bounded β only the high-severity ones. |
~$0.03 + $0.01/search |
| 6 | dv-compare | dv_compare.py β Sonnet reads paired EN+DV bodies, flags numeric / omission / softening differences. Inserts dv_en_inconsistencies. |
~$0.08/pair |
V2 enrichment slices (run on demand; one-shot backfills + periodic top-ups):
| Cmd | What it does | ADR | Cost so far |
|---|---|---|---|
kahzaabu decompose |
decomposer.py β Haiku 4.5 breaks each claim into AVeriTeC Q&A pairs ({question, answer_type, source_medium}). |
0001 | $12.51 (8,954 claims) |
kahzaabu match |
matcher.py + embeddings.py β embeds every claim, groups paraphrases via cosine β₯ 0.85 + entity overlap β₯ 0.6 + Haiku tiebreaker. Provider abstraction (local / OpenAI / Voyage). |
0003, 0007 | $0 (local) |
kahzaabu find-contradictions |
contradictions.py β polarity-pair SQL shortlist + semantic-similarity filter [0.55, 0.95] + Sonnet 4.6 4-way classifier (CONTRADICTION / EVOLVING_POSITION / CONTEXT_CHANGED / NOT_CONTRADICTORY) with reasoning chain. |
0004 | $3.50 |
kahzaabu enrich-factchecks |
fact_check_enricher.py β deterministic V2-label backfill: verdict_label + truth_score + truth_score_label + reasoning_chain for every fact-check. |
0005 | $0 |
kahzaabu export-claimreview |
claimreview.py β schema.org ClaimReview JSON-LD generation + caching to fact_checks.claimreview_jsonld. |
0006 | $0 |
kahzaabu eval |
eval.py β golden-set evaluation across all five LLM-call stages. Produces verified-subset + drift-detector metrics. |
0008 | $0 |
| (registry) | registry.py β public-sector entity registry (25 Maldives entities); auto-tags fact_check_evidence.authoritative_entity_id when a URL is on a registered .gov.mv / .com.mv domain. Source of truth in data/registry/maldives_public_sector.yaml. |
0011 | $0 |
kahzaabu reproducibility <id> |
reproducibility.py β emits full provenance JSON manifest for a fact-check (curation run + claims + decomposition + evidence + contradiction pair + ClaimReview + git SHA). Also exposed as /api/reproducibility/{id}.json. |
0010 | $0 |
kahzaabu audit |
audit.py β bias/fairness markdown report with chi-squared on categoryΓyear + categoryΓtopic, verdict-label + Truth-O-Meter ladder distributions, speaker concentration, authoritative-source coverage. |
0010 | $0 |
kahzaabu transparency-report --since |
transparency.py β public-facing window report: fact-checks issued, corrections, LLM spend, methodology git-log. |
0010 | $0 |
Defaults: cycle runs every 12h via launchd (scripts/com.kahzaabu.pipeline.plist). Budget cap defaults to $1.00 per cycle. Total V2-build spend: ~$16.50. Total project spend to-date: ~$75.
A separate manifesto-extract + manifesto-crossref flow extracts ~717 promises from the 2023 campaign PDF (Dhivehi, 51 MB) and cross-references each against the archive to assign a delivery status.
Editor protocol β when changing this block, derive column lists from
sqlite3 data/kahzaabu.db ".schema"rather than memory. Each entry below uses the formattablename -- descriptionfollowed by indented-- cols: a, b, clines. Thecols:convention is load-bearing:tests/test_readme_schema_drift.pyparses it and fails if any documented column is absent from the live schema. Run./scripts/test.shbefore committing.
The interesting tables:
articles -- PK (id, language). EN β DV pairs via shared id + paired_id.
-- cols: title, category, body_text, body_html, published_date,
-- reference, scraped_at, raw_page_html
claims -- extracted from article body_text by the LLM.
-- cols: article_id, language (FK), type, subject, value,
-- deadline, actor_credited, quote, extraction_run_id,
-- polarity (V2: AFFIRM/DENY/PROMISE/DENIAL_OF_PROMISE/
-- CLAIM_OF_FACT/NEUTRAL β see ADR 0002),
-- subject_normalized (V2: entity-resolved subject),
-- is_checkable (V2: 0=opinion/rhetoric, 1=factual)
fact_checks -- curated contradictions / broken deadlines / etc.
-- cols: category, claim_date, claim, what_actually_happened,
-- topic, confidence, source_article_ids (JSON array
-- of articles.id), evidence_quotes (JSON), published,
-- public_summary, fingerprint (dedupe key)
fact_check_evidence -- web-search hits backing each fact-check.
-- cols: fact_check_id (FK), url, title, snippet, relevance
-- ('confirms'|'contradicts'|'context'|'unclear'|
-- 'not_found'), summary, retrieved_at,
-- authoritative_entity_id (V2: ADR 0011, nullable
-- pointer into the public-sector registry under
-- data/registry/)
article_fact_cards -- per-article inspector output.
-- cols: article_id, language, summary, key_claims_json,
-- history_check, severity, visualization_spec_json,
-- web_evidence_json, cost_usd, inspection_run_id,
-- published
dv_en_inconsistencies -- EN/DV translation diffs.
-- cols: en_article_id, dv_article_id (FKs), severity,
-- category, en_quote, dv_quote, dv_translation_to_en
manifesto_promises -- 2023 campaign promises with delivery tracking.
-- cols: section, promise_text_dv, promise_text_en,
-- category, subject, target_value, deadline_stated,
-- delivery_status, delivery_evidence_json (JSON:
-- linked article_ids + fact_check_ids + notes),
-- chunk_index, published
qna_sessions -- agentic-ask multi-turn memory.
-- cols: id (uuid), messages_json (full message history),
-- total_cost_usd, n_turns, created_at, last_used_at
constitution_articles -- parsed Constitution of the Republic of Maldives.
-- cols: article_no, chapter, title, body, source_version,
-- imported_at
scrape_runs -- audit log of pipeline cycles.
-- cols: category_id, language, started_at, finished_at,
-- pages_scraped, articles_scraped, articles_new,
-- status, resume_page, error_message
web_users -- LEGACY: kept for backwards-compat with deployed DBs.
-- The password-based admin workflow was removed
-- (web UI is read-only public; operator actions
-- run from the CLI). No code reads or writes this
-- table any more.
-- cols: username, password_hash, role, created_atArticle β fact-check linkage is via the JSON column fact_checks.source_article_ids β a list of articles.id values. Use SQLite's json_each() to traverse it (or LIKE on the serialized form as a fallback).
Migrations are idempotent ALTER-COLUMN style in claims_db.py:init_claims_schema(). WAL mode is on; check_same_thread=False for the FastAPI threadpool.
kahzaabu/qna_agentic.py:ask_agentic() is the heart of the Q&A experience. It is itself an agent loop β Sonnet calls internal tools to satisfy a question.
user question
β
βΌ
Sonnet 4.6 + tools = [
archive_stats, search_articles, get_article,
search_factchecks, get_factcheck,
search_manifesto, get_promise,
list_recent,
web_search (Anthropic server tool)
]
β
βΌ
loop up to max_iterations (default 7):
if Sonnet returns tool_use:
execute, append result, continue
else:
capture final_text, break
β
βΌ
guarantee-pass (Haiku 4.5, ~$0.01):
if final_text quotes article text BUT lacks "π Narrative tricks observed":
ask Haiku to append the section using the catalog
β
βΌ
return {answer, session_id, n_iterations, cost_usd, tool_trace, web_searches}
Session memory lives in the qna_sessions table. Pass the returned session_id back to continue a conversation β the loop will re-load all prior tool results and turns.
Cost per question:
- Simple "how many fact-checks?" (data-only) β ~$0.025
- Article-heavy ("what did he say last week?") β ~$0.05-0.10
- Open-ended with web_search β ~$0.10-0.30
Daily budget cap (default $5) is enforced at the top of ask_agentic.
A 16-technique catalog (hero framing, manufactured momentum, goalpost shifting, empty markers of action, vague timeframes, etc.) is appended in the system prompt with anti-over-claiming rules:
- Cap of 5 items per answer
- Every flag must include the verbatim quote
- Ceremonial language ("expressed gratitude") is explicitly NOT a trick
- Hedging language ("could be seen as", "this might imply") is forbidden
The section is enforced via a guarantee-pass: if the agent quotes article text but skips the section, a follow-up Haiku call appends it. Cost: ~$0.01 per article-touching question.
Pure-data questions (e.g. "how many fact-checks?") correctly omit the section β the guarantee-pass is gated on tool_trace containing article-content tools.
See qna_agentic.py:SYSTEM_PROMPT for the catalog and _ARTICLE_TOOLS for the gating set.
The plugin source lives at hermes-plugin/ in this repo. The install script (scripts/install-hermes-plugin.sh) symlinks it into ~/.hermes/hermes-agent/plugins/kahzaabu/ so hermes can find it. Edits in hermes-plugin/ are live β no copy/sync step.
Layout:
hermes-plugin/
βββ plugin.yaml Manifest: name, version, provides_tools, platforms
βββ __init__.py register(ctx) β entry point. Three jobs:
β 1. Hydrate ~/.hermes/.env into os.environ
β 2. Ensure kahzaabu is importable (self-heal .pth)
β 3. Register 9 tools + `hermes kahzaabu` CLI
β + `/kahzaabu` slash command
βββ tools.py 9 handler functions wrapping qna_agentic / claims_db
βββ cli.py argparse setup for `hermes kahzaabu {setup,status,β¦}`
βββ SKILL.md Agent-facing guidance: when to use which tool
βββ README.md Plugin-source README (design choices, bootstrap layers)
Design choices to know:
- Imports, doesn't vendor. Plugin imports the canonical
kahzaabupackage from this dev tree. Editing code here updates the plugin immediately. - Path discovery is robust.
kahzaabu_home()derives the dev tree fromPath(kahzaabu.__file__).resolve().parents[1]. No hardcoded paths anywhere. .pthself-heal. Hermes' venv has nopip, so the plugin writes~/.hermes/hermes-agent/venv/lib/python3.11/site-packages/kahzaabu.pthon first run. If hermes ever recreates its venv, the nexthermes kahzaabu *invocation rewrites it.- Tools are in-process. Unlike the previous MCP-over-stdio design, hermes calls plugin tools directly β no subprocess, ~5-10Γ faster per call.
updateandwebshell out. Both need scikit-learn / FastAPI / etc. that don't live in hermes' lean venv, so they exec<dev>/.venv/bin/kahzaabu pipeline|web.doctorchecks this.
The 9 tools exposed to the agent:
| Tool | What it does |
|---|---|
kahzaabu_stats |
Counts + freshness β call first for "recent" questions |
kahzaabu_ask |
Run the full agentic loop β preferred for any natural-language question |
kahzaabu_list_lies |
List fact-checks with filters |
kahzaabu_get_factcheck |
One fact-check + web evidence + linked source articles |
kahzaabu_manifesto |
2023 promises with delivery status |
kahzaabu_get_article |
One article with claims + linked fact-checks |
kahzaabu_recent_activity |
Last N days of articles |
kahzaabu_constitution_lookup |
BM25 search over the 301 Constitution articles |
kahzaabu_pipeline_run |
Trigger pipeline (gated by KAHZAABU_ALLOW_PIPELINE=1; legacy KAHZAABU_MCP_ALLOW_PIPELINE=1 still honoured) |
Three integration surfaces share one Q&A engine:
- Agent tool call:
hermes chat -q "..."β agent invokeskahzaabu_askand gets back{answer, session_id, cost_usd, tool_trace, web_searches}. - CLI subcommand:
hermes kahzaabu ask [--continue] [--no-web] [--session ID] "..."β direct human use. - Slash command:
/kahzaabu <question>works inside any hermes session, including chats routed through the messaging gateway. Auto-continues the most-recent session.
All three call the same kahzaabu/qna_agentic.py:ask_agentic() function, so session memory, the narrative-tricks layer, daily-budget caps, and cost accounting behave identically across surfaces. Sessions persist in the qna_sessions table and survive process restarts; the --continue and slash auto-continue affordances both use claims_db.most_recent_session_id() to find the latest one within a 24h window.
LLM-provider inheritance: the secondary narrative-tricks pass calls ctx.llm.complete() when invoked from the plugin (so it follows hermes setup model), and falls back to Anthropic Haiku 4.5 when called from the standalone CLI / TUI / web. The main agentic loop always uses Anthropic Sonnet β ctx.llm.complete() doesn't yet support multi-turn tool-use.
kahzaabu web --port 8765 (or hermes kahzaabu web) serves:
| Page | What |
|---|---|
/ |
Dashboard: 5 stat cards + 6 charts (categories, topics, claims/month, articles/month, manifesto-status, stacked-by-month) + freshness banner |
/browse |
Article browser with filters |
/lies |
Fact-check browser with category/severity filters |
/article/{id} |
One article + claims + linked fact-checks + fact-card chart |
/compare |
EN β DV translation inconsistencies |
/compare/{id} |
Side-by-side EN/DV with the flagged region highlighted |
/manifesto |
2023 promises with delivery status |
/manifesto/{id} |
Per-promise detail + supporting articles |
/ask |
The agentic Q&A interface (sessions, web toggle, tool-trace) |
/methodology |
How the pipeline works (public-facing) |
/corrections |
Public report-a-correction form |
Read-only by design. There is no /admin, no /login, no
session cookie, no password anywhere in the system. Publishing a
fact-check, triggering the pipeline, creating backups β all of it
runs from the operator's shell via the kahzaabu CLI and inherits
OS-level permissions. The web UI's only writes are: rate-limited
/api/ask (Q&A budget capped per day) and /api/corrections
(public form that appends to a moderation queue read by the operator
in CLI). slowapi rate-limits anonymous traffic.
kahzaabu tui (or python -m kahzaabu.tui) is a Textual-based interactive terminal. Slash commands:
| Command | What |
|---|---|
/ask <question> |
Multi-turn agentic ask (session preserved) |
/stats |
Archive counts + freshness |
/lies [category] |
List fact-checks |
/article <id> |
Show an article |
/refresh |
Re-query freshness |
/help |
Show all commands |
/quit |
Exit |
A startup banner shows freshness; if stale, it prompts to run kahzaabu update.
Total spend to date: ~$58. Typical ongoing costs:
| Activity | Per item | Per 12h cycle (typical) |
|---|---|---|
| Scrape (HTTP) | $0 | $0 |
| Extract claims | $0.005-0.010 | ~$0.05 |
| Inspect (fact-card) | $0.015 | ~$0.15 |
| Curate (cross-time) | $0.05/topic | ~$0.10 |
| Verify (web-search) | $0.03 + $0.01/hit | ~$0.20 |
| DV/EN compare | $0.08/pair | ~$0.40 |
| Total per 12h cycle | ~$0.90 | |
/api/ask question |
$0.025 (data) β $0.30 (web) | n/a |
Daily caps:
- Pipeline:
--budget 1.00(CLI flag) - Q&A (per process):
KAHZAABU_DAILY_BUDGET_USD=5.00 - Public web Q&A (anon): hard cap returns 503 once daily spend exceeds env var
- Pipeline via MCP silently skips scrape stage. When the agent calls
kahzaabu_pipeline_run, the scrape sub-stage runs but produces noscrape_runsentries. Direct CLI (kahzaabu pipeline) works correctly. The MCP-path bug existed in the legacy MCP server too β the native plugin version may or may not still have it; not retested. hermes default modelshowsanthropic/anthropic/...in doctor. Pre-existing cosmetic bug in_hermes_provider()formatting β concatenates provider with a default that already includes the provider prefix.- launchd plist still in use. Migration to
hermes cronis documented inhermes kahzaabu setupbut not executed. Both can run side-by-side; once you're confident,launchctl unload ~/Library/LaunchAgents/com.kahzaabu.pipeline.plist.
Four plugin handlers used a hallucinated schema.kahzaabu_list_lies,kahzaabu_get_factcheck,kahzaabu_get_articlewere querying columns that don't exist (title/severity/summary) and joining a table that doesn't exist (fact_check_claims). Now rewritten against the real schema β fact-check β article linkage uses the JSONsource_article_idscolumn.
| Priority | Item |
|---|---|
| π΄ High | Public VPS deploy. Caddy + systemd templates in scripts/. Methodology page, robots.txt, rate-limits done. Needs: domain, server, DB sync strategy (push from laptop vs. run pipeline on server). |
/api/reproducibility/{id}.json + kahzaabu reproducibility CLI + kahzaabu audit + kahzaabu transparency-report + /metrics + Grafana dashboard JSON + Dockerfile. |
|
| π‘ Medium | Grow the verified golden-set subset. 24 of 25 fixtures are verified ground truth; 1 extractor fixture (article-32009) deliberately left unverified pending taxonomy clarification on deadline_promise vs event-schedule. Hand-review more articles to broaden coverage per stage. |
| π‘ Medium | Viber channel. Hermes doesn't support Viber. Would require a custom ctx.register_platform(...) adapter β 3-5 days. Out of scope unless Maldives-market demand justifies. |
ctx.llm.ctx.llm.complete() inside the plugin (anthropic fallback for non-plugin paths). Main loop still uses anthropic β needs tool-use. |
|
kahzaabu eval + docs/EVAL_RESULTS.md. |
|
| π‘ Medium | Self-improver loop. A hermes skill at ~/.hermes/skills/kahzaabu/kahzaabu-self-improver/ already exists. Has produced test_claims_db.py with 17 unit tests. Pending: branch merge, additional iterations. |
| π’ Low | Replace launchd with hermes cron (see Known issues #3). |
| π’ Low | Fix doctor's anthropic/anthropic/... cosmetic bug. Strip the provider prefix from model.default before formatting. |
| π’ Low | Compare-presidents page. Would need historical pre-Muizzu data. Out of scope but the schema supports it. |
| π’ Low | RSS/Atom feed of new fact-checks for public consumers. |
| π’ Low | One pre-existing scrape-stage MCP bug investigation (see Known issues #1). Good first target for the self-improver. |
- The corpus is already public at
presidency.gov.mv. No leaks, no inside sources. - Every fact-check links back to the original press release URL.
- "Report a correction" form on
/correctionsappends to a moderation queue; the operator reviews it withkahzaabuCLI tooling, then publishes any resulting fact-check viakahzaabu publish <id>. - The web UI is read-only: only published items (
fact_checks.published = 1) ever surface. Publishing flows are CLI-only β no web-side credentials exist anywhere. - Pipeline LLM calls are budget-capped; daily Q&A spend is capped; anonymous web traffic is rate-limited (
slowapi). - Subject is a sitting head of state. Treat output as automated analysis, not finished journalism β review the source article before quoting.
- No mass scraping of social-media or non-official sources. Web-search-verify uses Anthropic's
web_search_20250305server tool, which respects publisher robots.txt.
./scripts/test.sh # full local suite β 327 tests, ~2.6s
.venv/bin/python -m unittest discover tests/ # just the unit tests
.venv/bin/python tests/system_check.py # live web-stack integration check
.venv/bin/kahzaabu eval # golden-set quality eval (ADR 0008)The unit suite is offline, no external deps, and runs in seconds. It catches:
- V1 invariants:
host_llmbranch in agentic Q&A, JSON1 vs LIKE-fallback parity, README schema drift - V2 invariants per slice: claim-enrichment migrations, decomposer enums, embedding-provider selection, matcher cosine + entity overlap, contradiction 4-way verdict validation, truth-score deterministic mapping, ClaimReview JSON-LD shape, eval framework metrics + verified-vs-pinned semantics
CI: .github/workflows/test.yml runs the unit suite on every push and PR to main. Use ./scripts/ci-dry-run.sh to validate a fresh-worktree install before pushing.
Quality regression detection: kahzaabu eval produces verified-subset metrics (real quality) and all-fixture metrics (drift detector) for each LLM-call stage. A prompt edit that drops the verified subset below 1.000 is a real regression; the drift-detector subset surfaces LLM noise without claiming truth. See docs/EVAL_RESULTS.md.
kahzaabu/ The Python package
βββ __init__.py
βββ cli.py Click-based CLI (kahzaabu <subcommand>)
βββ pipeline.py Orchestrates the 6 V1 stages
βββ scraper.py presidency.gov.mv crawler (EN + DV)
βββ extractor.py Per-article claim extraction (Sonnet) β V2 schema
βββ inspector.py Per-article fact card (Sonnet)
βββ curator.py Cross-time contradiction detector (Sonnet)
βββ verifier.py Web-search-verifier (Haiku)
βββ dv_compare.py EN/DV diff (Sonnet)
βββ manifesto.py 2023 promise extractor + cross-referencer
βββ qna.py Legacy single-shot Q&A (kept for CLI parity)
βββ qna_agentic.py The current agentic Q&A loop + narrative-tricks
βββ claims_db.py Schema, migrations, all DB helpers
βββ db.py Connection plumbing
βββ models.py Type aliases
βββ report.py JSON/CSV export of fact_checks
βββ infographics.py Static-HTML viz generators (legacy tracker)
βββ scheduler.py launchd helper
βββ tui.py Textual TUI
βββ legacy/
β βββ mcp_server.py [DEPRECATED] stdio MCP server β superseded by hermes-plugin/
β
β # V2 modules (ADR-driven)
βββ decomposer.py Slice 2 β AVeriTeC Q&A decomposition (Haiku)
βββ embeddings.py Slice 3 β provider abstraction (local/OpenAI/Voyage)
βββ matcher.py Slice 3 β canonical claim matching
βββ claims_enricher.py Slice 4 prep β polarity/subject/is_checkable backfill
βββ contradictions.py Slice 4 β 4-way verdict classifier (Sonnet)
βββ truth_score.py Slice 5 β deterministic AVeriTeC + Truth-O-Meter mapping
βββ fact_check_enricher.py Slice 5 β V2-label backfill for fact_checks
βββ claimreview.py Slice 6 β schema.org ClaimReview JSON-LD generator
βββ eval.py Slice 10 β golden-set quality evaluation framework
βββ registry.py Slice 11.5 β public-sector entity registry / trust anchor
βββ reproducibility.py Slice 12 β provenance manifest assembly (ADR 0010)
βββ audit.py Slice 12 β bias/fairness audit with chi-squared
βββ transparency.py Slice 12 β public-facing transparency report
βββ metrics.py Slice 12 β prometheus_client + @tracked_stage
β decorator (wraps all 8 pipeline run_*)
β
βββ web/ FastAPI app
βββ app.py
βββ api/ JSON endpoints (all public read-only)
β βββ articles.py / factchecks.py / manifesto.py
β βββ ask.py / freshness.py / stats.py / viz.py
β βββ corrections.py / inspect.py
β βββ claimreview.py / contradictions.py # V2
β βββ constitution.py # 301-article BM25 search
β βββ reproducibility.py # V2 Slice 12
βββ static/ HTML / CSS / JS (no SPA)
β βββ contradictions.html # V2 β 4-way verdict browser
βββ db_dep.py FastAPI Depends() for DB
βββ limits.py Rate-limiter + LRU cache for /api/ask
hermes-plugin/ Hermes plugin source (symlinked from ~/.hermes/...)
β see hermes-plugin/README.md
skills/ Hermes-installable agentskills.io skills
β kahzaabu-fact-check (symlinked into
~/.hermes/skills/)
tests/ Unit + integration tests (197 total)
βββ test_*.py 14 modules
βββ golden/ Quality-eval fixtures (5 stages, 25 fixtures,
24/25 verified ground truth β ADR 0008)
research/ Historical one-shot scripts (see research/README.md)
β NOT imported by the package
scripts/ test.sh, install-hermes-plugin.sh, install-hermes-skills.sh,
ci-dry-run.sh, run_pipeline.sh, Caddyfile, systemd unit
docs/ Project documentation
βββ ARCHITECTURE.md Full V2 architecture with citation map
βββ METHODOLOGY.md Public-facing methodology (Slice 11)
βββ MODEL_CARD.md Per-stage LLM model card (Slice 11)
βββ DATA_CARD.md Corpus data card (Slice 11)
βββ EVAL_RESULTS.md Auto-generated quality eval report
βββ V2_BUILD_PLAN.md Slice tracker
βββ TEST_REPORT.md Latest full-stack test snapshot
βββ adr/ Architecture Decision Records (0001-0010)
.github/
βββ workflows/ CI: test.yml runs unit suite on push + PR
βββ ISSUE_TEMPLATE/ Bug report + feature request templates (Slice 11)
βββ PULL_REQUEST_TEMPLATE.md
data/ SQLite DB + manifesto/ (other contents gitignored)
βββ registry/ Public-sector entity registry (ADR 0011)
β βββ maldives_public_sector.yaml β source of truth (human-edit here)
β βββ maldives_public_sector.json β machine-loaded twin
βββ backups/ Local-only sqlite3 dumps (gitignored)
βββ reports/ `kahzaabu audit` + `transparency-report` output
(gitignored β regenerate locally)
Dockerfile One-command reproduction β python:3.11-slim
base, editable install with chosen embedding
backend (ADR 0010)
docs/observability/
βββ grafana-dashboard.json Importable dashboard (6 panels)
LICENSE Apache-2.0
SECURITY.md Vulnerability disclosure (90-day window)
CONTRIBUTING.md Slice discipline, ADR process, test gates
CODE_OF_CONDUCT.md Contributor Covenant 2.1
- License: Apache-2.0. Patent grant included. Derivative works may re-license; attribution required. See ADR 0009 for the rationale.
- Contributing: CONTRIBUTING.md β slice discipline, ADR process, test gates, commit format. PRs without ADRs for non-trivial changes get bounced.
- Code of Conduct: CODE_OF_CONDUCT.md β Contributor Covenant 2.1. Enforcement contact:
Sofwathullah.Mohamed@gmail.com. - Security: SECURITY.md β 90-day responsible-disclosure window; report to
Sofwathullah.Mohamed@gmail.comwith[kahzaabu-security]in the subject. - ADRs: every architectural decision is documented under
docs/adr/. 0001β0010 cover V2. - Model & data cards:
docs/MODEL_CARD.mdanddocs/DATA_CARD.mddescribe each LLM-call stage's prompts/biases/limitations and the corpus's coverage, gaps, and refresh cadence.
docs/ARCHITECTURE.mdβ full V2 architecture with citation map (AVeriTeC, RAGAR, Full Fact, PolitiFact, ClaimReview)docs/METHODOLOGY.mdβ public-facing methodology (Slice 11)docs/paper/kahzaabu-methodology.mdβ arXiv-style paper draft v0.1 (Markdown source; pandoc-convertible)docs/EVAL_RESULTS.mdβ quality evaluation results (auto-generated bykahzaabu eval)docs/V2_BUILD_PLAN.mdβ slice tracker- The hermes plugin's own README:
~/.hermes/hermes-agent/plugins/kahzaabu/README.md - The agent's usage guide for kahzaabu:
~/.hermes/hermes-agent/plugins/kahzaabu/SKILL.md - Hermes docs: https://github.com/NousResearch/hermes-agent
If you want to change something, start with hermes kahzaabu doctor to confirm your environment is healthy, then read the relevant module β they're small (mostly 200-400 LOC) and prose-heavy. The pipeline is the most opinionated part; everything else is glue.