okf-convert

Convert existing Markdown — local .md files or scraped web pages — into a conformant Open Knowledge Format (OKF) v0.1 bundle: a folder of Markdown concept files with YAML frontmatter, per-document image folders, bundle-relative cross-links, and a tag-grouped index.md.

The design splits cleanly into two halves:

Python does everything deterministic — parsing, field derivation, image relocation, link rewriting, index generation, validation. No API key, no model, no network (except when scraping a URL). Pure stdlib, zero dependencies.
An LLM fills only the semantic gaps the script can't derive: a concept's type, a one-sentence description, and topical tags. Skip this step and the tool still emits a valid bundle with sane defaults.

It ships as a Claude Code skill (LLM enrichment is handled by Claude itself), but the engine is a standalone CLI you can run anywhere.

Why

OKF is a vendor-neutral way to represent knowledge as plain Markdown + YAML so that AI agents and humans can both read it. Google's own tooling includes a producer that walks a BigQuery dataset — but nothing that turns the Markdown you already have (docs, exports, web pages, an Obsidian vault) into a bundle. That's the gap this fills.

What it produces

A flat bundle — every concept is one .md file in a single folder:

docs/okf/
├── grounding.md          # a concept, with full frontmatter
├── grounding/            # that concept's images (same name, minus .md)
│   └── diagram.png
└── index.md              # auto-generated, grouped by tag

Each concept gets conformant frontmatter:

---
type: Article
title: "How AI Search Grounding Actually Works"
description: A side-by-side analysis of how three AI platforms cite web sources.
resource: https://example.com/blog/grounding/
tags: [grounding, ai-search, citations]
timestamp: 2026-06-13T14:51:03Z
---

Conformance note. OKF's prose spec only requires type. But the reference implementation's validator (OKFDocument.validate()) rejects any document missing type, title, description, or timestamp — and Google's own sample bundles carry all four. This tool validates against that stricter 4-field rule, deriving title/timestamp deterministically and falling back to the body's first sentence for description in offline mode. See references/okf-spec-v0.1.md.

Install

As a Claude Code skill (recommended)

git clone https://github.com/chapter42/okf-convert.git
cp -R okf-convert ~/.claude/skills/okf-convert
cp okf-convert/commands/okf.md ~/.claude/commands/okf.md   # enables /okf

Restart Claude Code, then:

/okf https://example.com/some-article    # ingest one URL
/okf ./my-markdown-folder                 # convert a folder

As a standalone CLI

No install needed beyond Python 3.10+:

python3 scripts/okf_convert.py --help

Usage (CLI)

# 1. Convert a folder of Markdown into a draft bundle + gap report
python3 scripts/okf_convert.py convert --input ./my-docs --out ./docs/okf

# 2. (optional) Fill the gaps in ./docs/okf/_okf_gaps.json with an LLM,
#    writing {slug: {type, description, tags}} to enrich.json — then:
python3 scripts/okf_convert.py finalize --bundle ./docs/okf --enrichment enrich.json

#    ...or skip the LLM entirely (offline):
python3 scripts/okf_convert.py finalize --bundle ./docs/okf --default-type "Document"

# 3. Check conformance any time
python3 scripts/okf_convert.py validate --bundle ./docs/okf

Ingesting a single scraped page:

python3 scripts/okf_convert.py convert \
  --url "https://example.com/article" --scraped /tmp/page.md --out ./docs/okf

resource: is set to the URL automatically, and the concept is slugged from the URL path.

What the engine handles deterministically

Frontmatter parsing (built-in minimal YAML — no PyYAML needed) and re-emission with unknown/custom keys preserved.
title from frontmatter → H1 → filename; timestamp from frontmatter → file mtime → now (ISO 8601 UTC); resource from frontmatter or the source URL.
Tag extraction from frontmatter tags and inline #hashtags.
Images relocated into a per-document folder (foo.md → foo/) with links rewritten; remote (http(s)) images left untouched.
Inter-document links and [[wikilinks]] rewritten to bundle-relative /slug.md; broken links tolerated (as OKF requires).
Incremental ingestion: repeated runs append without clobbering existing concepts (colliding slugs get a numeric suffix).
index.md grouped by tag — the primary navigation axis for a flat bundle.

Examples

See examples/. Google's canonical GA4 bundle is the best ground-truth reference; fetch it with:

./examples/fetch-reference-bundle.sh

License

OKF and the referenced sample bundles are by Google (GoogleCloudPlatform/knowledge-catalog, Apache-2.0) and are not redistributed here.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
commands		commands
examples		examples
references		references
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

okf-convert

Why

What it produces

Install

As a Claude Code skill (recommended)

As a standalone CLI

Usage (CLI)

What the engine handles deterministically

Examples

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

okf-convert

Why

What it produces

Install

As a Claude Code skill (recommended)

As a standalone CLI

Usage (CLI)

What the engine handles deterministically

Examples

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages