Skip to content

chapter42/okf-convert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

okf-convert

Convert existing Markdown — local .md files or scraped web pages — into a conformant Open Knowledge Format (OKF) v0.1 bundle: a folder of Markdown concept files with YAML frontmatter, per-document image folders, bundle-relative cross-links, and a tag-grouped index.md.

The design splits cleanly into two halves:

  • Python does everything deterministic — parsing, field derivation, image relocation, link rewriting, index generation, validation. No API key, no model, no network (except when scraping a URL). Pure stdlib, zero dependencies.
  • An LLM fills only the semantic gaps the script can't derive: a concept's type, a one-sentence description, and topical tags. Skip this step and the tool still emits a valid bundle with sane defaults.

It ships as a Claude Code skill (LLM enrichment is handled by Claude itself), but the engine is a standalone CLI you can run anywhere.

Why

OKF is a vendor-neutral way to represent knowledge as plain Markdown + YAML so that AI agents and humans can both read it. Google's own tooling includes a producer that walks a BigQuery dataset — but nothing that turns the Markdown you already have (docs, exports, web pages, an Obsidian vault) into a bundle. That's the gap this fills.

What it produces

A flat bundle — every concept is one .md file in a single folder:

docs/okf/
├── grounding.md          # a concept, with full frontmatter
├── grounding/            # that concept's images (same name, minus .md)
│   └── diagram.png
└── index.md              # auto-generated, grouped by tag

Each concept gets conformant frontmatter:

---
type: Article
title: "How AI Search Grounding Actually Works"
description: A side-by-side analysis of how three AI platforms cite web sources.
resource: https://example.com/blog/grounding/
tags: [grounding, ai-search, citations]
timestamp: 2026-06-13T14:51:03Z
---

Conformance note. OKF's prose spec only requires type. But the reference implementation's validator (OKFDocument.validate()) rejects any document missing type, title, description, or timestamp — and Google's own sample bundles carry all four. This tool validates against that stricter 4-field rule, deriving title/timestamp deterministically and falling back to the body's first sentence for description in offline mode. See references/okf-spec-v0.1.md.

Install

As a Claude Code skill (recommended)

git clone https://github.com/chapter42/okf-convert.git
cp -R okf-convert ~/.claude/skills/okf-convert
cp okf-convert/commands/okf.md ~/.claude/commands/okf.md   # enables /okf

Restart Claude Code, then:

/okf https://example.com/some-article    # ingest one URL
/okf ./my-markdown-folder                 # convert a folder

As a standalone CLI

No install needed beyond Python 3.10+:

python3 scripts/okf_convert.py --help

Usage (CLI)

# 1. Convert a folder of Markdown into a draft bundle + gap report
python3 scripts/okf_convert.py convert --input ./my-docs --out ./docs/okf

# 2. (optional) Fill the gaps in ./docs/okf/_okf_gaps.json with an LLM,
#    writing {slug: {type, description, tags}} to enrich.json — then:
python3 scripts/okf_convert.py finalize --bundle ./docs/okf --enrichment enrich.json

#    ...or skip the LLM entirely (offline):
python3 scripts/okf_convert.py finalize --bundle ./docs/okf --default-type "Document"

# 3. Check conformance any time
python3 scripts/okf_convert.py validate --bundle ./docs/okf

Ingesting a single scraped page:

python3 scripts/okf_convert.py convert \
  --url "https://example.com/article" --scraped /tmp/page.md --out ./docs/okf

resource: is set to the URL automatically, and the concept is slugged from the URL path.

What the engine handles deterministically

  • Frontmatter parsing (built-in minimal YAML — no PyYAML needed) and re-emission with unknown/custom keys preserved.
  • title from frontmatter → H1 → filename; timestamp from frontmatter → file mtime → now (ISO 8601 UTC); resource from frontmatter or the source URL.
  • Tag extraction from frontmatter tags and inline #hashtags.
  • Images relocated into a per-document folder (foo.mdfoo/) with links rewritten; remote (http(s)) images left untouched.
  • Inter-document links and [[wikilinks]] rewritten to bundle-relative /slug.md; broken links tolerated (as OKF requires).
  • Incremental ingestion: repeated runs append without clobbering existing concepts (colliding slugs get a numeric suffix).
  • index.md grouped by tag — the primary navigation axis for a flat bundle.

Examples

See examples/. Google's canonical GA4 bundle is the best ground-truth reference; fetch it with:

./examples/fetch-reference-bundle.sh

License

MIT © 2026 Roy Huiskes.

OKF and the referenced sample bundles are by Google (GoogleCloudPlatform/knowledge-catalog, Apache-2.0) and are not redistributed here.

About

Convert Markdown files and web pages into a conformant Open Knowledge Format (OKF v0.1) bundle. Deterministic Python + optional LLM enrichment. Claude Code skill + standalone CLI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages