Semantic Skeleton

Build an explicit semantic layer for knowledge-intensive documents that graph parsers can consume with precision — without modifying the original text.

The Problem

Graph parsers (GraphRAG, graphify) struggle with raw documents. They guess at entity boundaries, hallucinate relationships, and produce noisy graphs full of semantically_similar_to edges. The root cause is simple: human-written documents optimize for human reading, not machine parsing.

The Solution

The Semantic Skeleton Method is a three-layer upstream data governance method:

Layer	Name	What It Does
L1	Semantic Skeleton	A few hundred lines of graph-native syntax — explicit `[[entities]]` and `--[:TYPED_EDGES]-->`
L2	Original Document	Unmodified. The authoritative source for human reading, audit, and deep-reference fallback
L3	Bridge Index	Pins every skeleton node and edge back to `original.md §X.Y` with paragraph fingerprints

Result: Downstream graph tools consume L1 directly. No guessing. Zero AMBIGUOUS edges. Graph node count matches entity count exactly.

When to Use

✅ Technical specifications, system architecture docs, insurance policies, clinical guidelines, legal regulations — any document where you can draw the core concepts as a meaningful diagram.

❌ Pure narrative text (novels, news), short documents (< 2,000 words).

Whiteboard Litmus Test: Can you draw the document's core concepts on a whiteboard as a diagram that is more valuable than the original text? If yes, this method applies.

Quick Install

git clone https://github.com/phxjdocker2/semantic-skeleton.git
cd semantic-skeleton
./setup.sh

The install script symlinks the skill into ~/.claude/skills/semantic-skeleton/. Once installed, use it in Claude Code:

/semantic-skeleton <directory>              # Full pipeline on a document set
/semantic-skeleton <dir> --mode quality-only # Check existing skeletons only
/semantic-skeleton <dir> --mode update       # Incremental update after original changes

What's Included

semantic-skeleton/
├── SKILL.md               # The full Claude Code skill definition (9-phase methodology)
├── setup.sh               # One-command install script
├── README.md              # You are here
├── LICENSE                # MIT
│
└── templates/             # Clean starter templates — copy into your project
    ├── vocabulary.md      # Entity / relation / namespace registry
    ├── kg-entities.md     # Entity skeleton
    ├── kg-rules.md        # Business rules & constraints skeleton
    ├── kg-flows.md        # Flow & workflow skeleton
    ├── kg-dataflow.md     # Data flow skeleton
    ├── kg-shared.md       # Global constraints & narrative blocks
    ├── kg-bridge.md       # Bridge index (L3)
    ├── catalog.md         # Document inventory
    ├── skeleton.schema.json      # JSON Schema for structure validation
    └── graphify-mapping.json     # Verb → graph edge type mapping

The 9 Phases

Phase	Name	What Happens
0	Suitability Gating	KD-Score: is this document set worth skeletonizing?
1	Document Audit	Classify documents: dense / mixed / narrative
2	Vocabulary Baseline	Establish namespace and relation type registry
3	Domain Model Extraction	Scan all documents for candidate entities, merge across docs
4	Skeleton Generation	LLM-assisted extraction + human refinement — the core step
5	Bridge Construction	Pin every node and edge to `original.md §X.Y`
6	Quality Gate	Automated lint + LLM audit — all checks pass before downstream
7	Downstream Integration	Schema + mapping files for graphify / GraphRAG
8	Query-Time Retrieval	Three-layer fallback: skeleton graph → narrative jump → vector RAG
9	Incremental Update	Diff & Rebase when original documents change

Downstream Tools

Skeleton files are designed to be consumed directly by:

graphify — graphify <dir> --skeleton --mode deep (skip LLM extraction entirely)
GraphRAG — Feed skeleton files as input text; entity extraction runs on explicit [[links]]
Any graph database — Use graphify-mapping.json to convert verb types to edge types

License

MIT — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Skeleton

The Problem

The Solution

When to Use

Quick Install

What's Included

The 9 Phases

Downstream Tools

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
templates		templates
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Semantic Skeleton

The Problem

The Solution

When to Use

Quick Install

What's Included

The 9 Phases

Downstream Tools

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages