A ready-to-fork starter for building an AI-maintained knowledge base on the Open Knowledge Format (OKF) v0.1 (Google Cloud, 2026-06-12) — a directory of Markdown files with YAML frontmatter that humans and AI agents can author, exchange, and consume without translation.
OKF formalizes Andrej Karpathy's "LLM-wiki" pattern into a portable, vendor-neutral spec. Instead of re-retrieving raw chunks on every query (classic RAG), an agent continuously synthesizes sources into curated, cross-linked Markdown that loads straight into context.
AGENTS.md ← schema + agent operating rules (READ FIRST)
.claude/skills/okf/ ← Claude Code skill that drives ingest/query/lint/validate
raw/ ← Layer 1: immutable source materials (read-only for the agent)
wiki/ ← Layer 2: the OKF bundle (agent-maintained concepts)
index.md ← reserved: progressive-disclosure catalog
log.md ← reserved: append-only change log
tables/ datasets/ metrics/ playbooks/ references/ ← example concepts (replace with yours)
viz.html ← generated single-file graph viewer (libs inlined; air-gap)
tools/ ← init · validate · viz (air-gap) · index (BM25) · embed (Ollama) · search (hybrid RRF) · lease
vendor/ ← Cytoscape + marked (MIT), inlined into viz.html for offline use
skill/okf/ ← installable Claude Code skill (SKILL.md) — `install.sh` bundles it with the tools
install.sh ← install the skill globally (~/.claude/skills) or per-project
server/ ← okf_mcp_server.py — self-hostable MCP access layer for agents
deploy/ ← docker-compose (gitea + MCP + TLS proxy) for on-prem self-hosting
.gitea/ ci/ ← conformance CI gate (Gitea Actions / GitLab CI)
docs/ ← USAGE.md (how-to, EN/TH) + GUIDELINES.md + ENTERPRISE.md (self-host architecture)
book/ ← Thai manual (mdBook) → GitHub Pages; beginner → enterprise
research/ ← OKF best-practice report, mind map, and reference-impl findings
A full beginner→enterprise handbook, built with mdBook and deployed to GitHub Pages by
.github/workflows/book.yml (cover page, in-page TOC, language switcher, embedded example graph, PDF):
- 🇹🇭 ภาษาไทย: https://supachai-j.github.io/open-knowledge-format-starter/ — source in
book/ - 🇬🇧 English: https://supachai-j.github.io/open-knowledge-format-starter/en/ — source in
book-en/
git clone https://github.com/supachai-j/open-knowledge-format-starter.git
cd open-knowledge-format-starter
python3 tools/okf-validate.py # → ✓ CONFORMANT with OKF v0.1
python3 tools/okf-viz.py # → writes wiki/viz.html, open it in any browser
bash tools/okf-selftest.sh # → exercises the whole toolchain (10 checks)- Read AGENTS.md — governs how concepts are structured and how the agent behaves.
- Browse the bundle from wiki/index.md.
- Add knowledge: drop a source in
raw/, then run the supervised INGEST workflow. - New concepts start from tools/concept-template.md.
Full walkthrough (EN/TH): docs/USAGE.md · Authoring rules: docs/GUIDELINES.md
Package the whole capability as an installable skill so any project/session can create and operate OKF bundles — no need to be inside this repo.
./install.sh # global → ~/.claude/skills/okf (every project)
./install.sh --project # project → ./.claude/skills/okf (current repo only)
./install.sh --dir <path> # custom location
./install.sh --uninstall # removeThe installer bundles skill/okf/SKILL.md with all pure-Python tools + the vendored viewer libs into a
self-contained skill directory. Then, in any project: "init an OKF knowledge base here" → the skill runs
okf-init.py to scaffold a conformant bundle, and you ingest/query/validate/visualize from there.
This repo ships a skill at .claude/skills/okf/. Open the repo in Claude Code and just state intent:
| You say | The agent does |
|---|---|
| "ingest raw/notes.pdf into the wiki" | Extracts 5–15 claims, shows them for approval, then writes concepts + updates index.md/log.md |
| "what does the wiki say about WAU?" | Reads the index, opens concepts, answers with Concept-ID citations |
| "create an OKF concept for the Sessions table" | Scaffolds a conformant concept from the template |
| "validate the bundle" | Runs tools/okf-validate.py and reports |
Any agent that reads AGENTS.md (e.g. via CLAUDE.md/GEMINI.md) can follow the same procedures.
Run the same bundle on-prem as shared, internal knowledge for every session and agent team — no SaaS, air-gap friendly. Git is the source of truth; an internal MCP server is the access layer. Reads are instant; writes are PR-gated (audit + review + CI). Full architecture, security model, concurrency options, and deploy steps: docs/ENTERPRISE.md.
# Local / dev (stdio, no network):
python3 tools/okf-index.py build # build the BM25 search index
python3 server/okf_mcp_server.py # serve the bundle over MCP (stdio)
# Optional semantic upgrade (on-prem, no external API) — search auto-falls back to BM25 if absent:
ollama pull nomic-embed-text
python3 tools/okf-embed.py build # embeddings → wiki/.okf-embed.json
python3 tools/okf-search.py "how is WAU defined" # hybrid BM25 + semantic (RRF)
# Fully offline viewer (libs inlined, no CDN):
python3 tools/okf-viz.py # wiki/viz.html — single self-contained file
# On-prem stack (internal git + MCP + TLS/auth proxy):
cd deploy && cp .env.example .env && docker compose up -dAgents connect to one internal MCP endpoint and get okf_search, okf_get_concept,
okf_list_concepts, okf_read_index, and okf_propose_change (branch + PR, never direct to main).
For high write-throughput teams, set OKF_WRITE_MODE=lease to switch to advisory leases
(okf_acquire_lease / okf_commit_concept / okf_release_lease) — short-TTL, token-verified,
auto-expiring so a crashed agent never deadlocks a concept.
- Every non-reserved
.mdinwiki/has parseable YAML frontmatter with a non-emptytype. index.md/log.mdfollow their reserved structure.- Concept ID = path within
wiki/minus.md(e.g.tables/orders.md→tables/orders). - Consumers tolerate unknown keys, unknown
typevalues, missing optional fields, and broken links.
See research/okf-best-practice-implementation-report.md
for the sourced best-practice guide, research/okf-mindmap.json for the concept map,
and research/knowledge-catalog-findings.md for the gap analysis against
Google's official reference implementation (which shaped the conventions used here).
OKF is v0.1 (days old at scaffold time). Versioning is
<major>.<minor>; expect changes. The normative spec requires only thetypefield — most "best practices" here come from the surrounding LLM-wiki community, not the spec itself. Patterns like confidence-decay and hybrid search are optional add-ons, not part of v0.1.
MIT © 2026 Supachai-ja