Convert existing Markdown — local .md files or scraped web pages — into a
conformant Open Knowledge Format (OKF) v0.1
bundle: a folder of Markdown concept files with YAML frontmatter, per-document
image folders, bundle-relative cross-links, and a tag-grouped index.md.
The design splits cleanly into two halves:
- Python does everything deterministic — parsing, field derivation, image relocation, link rewriting, index generation, validation. No API key, no model, no network (except when scraping a URL). Pure stdlib, zero dependencies.
- An LLM fills only the semantic gaps the script can't derive: a concept's
type, a one-sentencedescription, and topicaltags. Skip this step and the tool still emits a valid bundle with sane defaults.
It ships as a Claude Code skill (LLM enrichment is handled by Claude itself), but the engine is a standalone CLI you can run anywhere.
OKF is a vendor-neutral way to represent knowledge as plain Markdown + YAML so that AI agents and humans can both read it. Google's own tooling includes a producer that walks a BigQuery dataset — but nothing that turns the Markdown you already have (docs, exports, web pages, an Obsidian vault) into a bundle. That's the gap this fills.
A flat bundle — every concept is one .md file in a single folder:
docs/okf/
├── grounding.md # a concept, with full frontmatter
├── grounding/ # that concept's images (same name, minus .md)
│ └── diagram.png
└── index.md # auto-generated, grouped by tag
Each concept gets conformant frontmatter:
---
type: Article
title: "How AI Search Grounding Actually Works"
description: A side-by-side analysis of how three AI platforms cite web sources.
resource: https://example.com/blog/grounding/
tags: [grounding, ai-search, citations]
timestamp: 2026-06-13T14:51:03Z
---Conformance note. OKF's prose spec only requires
type. But the reference implementation's validator (OKFDocument.validate()) rejects any document missingtype,title,description, ortimestamp— and Google's own sample bundles carry all four. This tool validates against that stricter 4-field rule, derivingtitle/timestampdeterministically and falling back to the body's first sentence fordescriptionin offline mode. Seereferences/okf-spec-v0.1.md.
git clone https://github.com/chapter42/okf-convert.git
cp -R okf-convert ~/.claude/skills/okf-convert
cp okf-convert/commands/okf.md ~/.claude/commands/okf.md # enables /okfRestart Claude Code, then:
/okf https://example.com/some-article # ingest one URL
/okf ./my-markdown-folder # convert a folder
No install needed beyond Python 3.10+:
python3 scripts/okf_convert.py --help# 1. Convert a folder of Markdown into a draft bundle + gap report
python3 scripts/okf_convert.py convert --input ./my-docs --out ./docs/okf
# 2. (optional) Fill the gaps in ./docs/okf/_okf_gaps.json with an LLM,
# writing {slug: {type, description, tags}} to enrich.json — then:
python3 scripts/okf_convert.py finalize --bundle ./docs/okf --enrichment enrich.json
# ...or skip the LLM entirely (offline):
python3 scripts/okf_convert.py finalize --bundle ./docs/okf --default-type "Document"
# 3. Check conformance any time
python3 scripts/okf_convert.py validate --bundle ./docs/okfIngesting a single scraped page:
python3 scripts/okf_convert.py convert \
--url "https://example.com/article" --scraped /tmp/page.md --out ./docs/okfresource: is set to the URL automatically, and the concept is slugged from
the URL path.
- Frontmatter parsing (built-in minimal YAML — no PyYAML needed) and re-emission with unknown/custom keys preserved.
titlefrom frontmatter → H1 → filename;timestampfrom frontmatter → file mtime → now (ISO 8601 UTC);resourcefrom frontmatter or the source URL.- Tag extraction from frontmatter
tagsand inline#hashtags. - Images relocated into a per-document folder (
foo.md→foo/) with links rewritten; remote (http(s)) images left untouched. - Inter-document links and
[[wikilinks]]rewritten to bundle-relative/slug.md; broken links tolerated (as OKF requires). - Incremental ingestion: repeated runs append without clobbering existing concepts (colliding slugs get a numeric suffix).
index.mdgrouped by tag — the primary navigation axis for a flat bundle.
See examples/. Google's canonical GA4 bundle is the best
ground-truth reference; fetch it with:
./examples/fetch-reference-bundle.shMIT © 2026 Roy Huiskes.
OKF and the referenced sample bundles are by Google (GoogleCloudPlatform/knowledge-catalog, Apache-2.0) and are not redistributed here.