Build an explicit semantic layer for knowledge-intensive documents that graph parsers can consume with precision — without modifying the original text.
Graph parsers (GraphRAG, graphify) struggle with raw documents. They guess at entity boundaries, hallucinate relationships, and produce noisy graphs full of semantically_similar_to edges. The root cause is simple: human-written documents optimize for human reading, not machine parsing.
The Semantic Skeleton Method is a three-layer upstream data governance method:
| Layer | Name | What It Does |
|---|---|---|
| L1 | Semantic Skeleton | A few hundred lines of graph-native syntax — explicit [[entities]] and --[:TYPED_EDGES]--> |
| L2 | Original Document | Unmodified. The authoritative source for human reading, audit, and deep-reference fallback |
| L3 | Bridge Index | Pins every skeleton node and edge back to original.md §X.Y with paragraph fingerprints |
Result: Downstream graph tools consume L1 directly. No guessing. Zero AMBIGUOUS edges. Graph node count matches entity count exactly.
✅ Technical specifications, system architecture docs, insurance policies, clinical guidelines, legal regulations — any document where you can draw the core concepts as a meaningful diagram.
❌ Pure narrative text (novels, news), short documents (< 2,000 words).
Whiteboard Litmus Test: Can you draw the document's core concepts on a whiteboard as a diagram that is more valuable than the original text? If yes, this method applies.
git clone https://github.com/phxjdocker2/semantic-skeleton.git
cd semantic-skeleton
./setup.shThe install script symlinks the skill into ~/.claude/skills/semantic-skeleton/. Once installed, use it in Claude Code:
/semantic-skeleton <directory> # Full pipeline on a document set
/semantic-skeleton <dir> --mode quality-only # Check existing skeletons only
/semantic-skeleton <dir> --mode update # Incremental update after original changes
semantic-skeleton/
├── SKILL.md # The full Claude Code skill definition (9-phase methodology)
├── setup.sh # One-command install script
├── README.md # You are here
├── LICENSE # MIT
│
└── templates/ # Clean starter templates — copy into your project
├── vocabulary.md # Entity / relation / namespace registry
├── kg-entities.md # Entity skeleton
├── kg-rules.md # Business rules & constraints skeleton
├── kg-flows.md # Flow & workflow skeleton
├── kg-dataflow.md # Data flow skeleton
├── kg-shared.md # Global constraints & narrative blocks
├── kg-bridge.md # Bridge index (L3)
├── catalog.md # Document inventory
├── skeleton.schema.json # JSON Schema for structure validation
└── graphify-mapping.json # Verb → graph edge type mapping
| Phase | Name | What Happens |
|---|---|---|
| 0 | Suitability Gating | KD-Score: is this document set worth skeletonizing? |
| 1 | Document Audit | Classify documents: dense / mixed / narrative |
| 2 | Vocabulary Baseline | Establish namespace and relation type registry |
| 3 | Domain Model Extraction | Scan all documents for candidate entities, merge across docs |
| 4 | Skeleton Generation | LLM-assisted extraction + human refinement — the core step |
| 5 | Bridge Construction | Pin every node and edge to original.md §X.Y |
| 6 | Quality Gate | Automated lint + LLM audit — all checks pass before downstream |
| 7 | Downstream Integration | Schema + mapping files for graphify / GraphRAG |
| 8 | Query-Time Retrieval | Three-layer fallback: skeleton graph → narrative jump → vector RAG |
| 9 | Incremental Update | Diff & Rebase when original documents change |
Skeleton files are designed to be consumed directly by:
- graphify —
graphify <dir> --skeleton --mode deep(skip LLM extraction entirely) - GraphRAG — Feed skeleton files as input text; entity extraction runs on explicit
[[links]] - Any graph database — Use
graphify-mapping.jsonto convert verb types to edge types
MIT — see LICENSE for details.