Skip to content

phxjdocker2/semantic-skeleton

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Skeleton

Build an explicit semantic layer for knowledge-intensive documents that graph parsers can consume with precision — without modifying the original text.

License: MIT

The Problem

Graph parsers (GraphRAG, graphify) struggle with raw documents. They guess at entity boundaries, hallucinate relationships, and produce noisy graphs full of semantically_similar_to edges. The root cause is simple: human-written documents optimize for human reading, not machine parsing.

The Solution

The Semantic Skeleton Method is a three-layer upstream data governance method:

Layer Name What It Does
L1 Semantic Skeleton A few hundred lines of graph-native syntax — explicit [[entities]] and --[:TYPED_EDGES]-->
L2 Original Document Unmodified. The authoritative source for human reading, audit, and deep-reference fallback
L3 Bridge Index Pins every skeleton node and edge back to original.md §X.Y with paragraph fingerprints

Result: Downstream graph tools consume L1 directly. No guessing. Zero AMBIGUOUS edges. Graph node count matches entity count exactly.

When to Use

✅ Technical specifications, system architecture docs, insurance policies, clinical guidelines, legal regulations — any document where you can draw the core concepts as a meaningful diagram.

❌ Pure narrative text (novels, news), short documents (< 2,000 words).

Whiteboard Litmus Test: Can you draw the document's core concepts on a whiteboard as a diagram that is more valuable than the original text? If yes, this method applies.

Quick Install

git clone https://github.com/phxjdocker2/semantic-skeleton.git
cd semantic-skeleton
./setup.sh

The install script symlinks the skill into ~/.claude/skills/semantic-skeleton/. Once installed, use it in Claude Code:

/semantic-skeleton <directory>              # Full pipeline on a document set
/semantic-skeleton <dir> --mode quality-only # Check existing skeletons only
/semantic-skeleton <dir> --mode update       # Incremental update after original changes

What's Included

semantic-skeleton/
├── SKILL.md               # The full Claude Code skill definition (9-phase methodology)
├── setup.sh               # One-command install script
├── README.md              # You are here
├── LICENSE                # MIT
│
└── templates/             # Clean starter templates — copy into your project
    ├── vocabulary.md      # Entity / relation / namespace registry
    ├── kg-entities.md     # Entity skeleton
    ├── kg-rules.md        # Business rules & constraints skeleton
    ├── kg-flows.md        # Flow & workflow skeleton
    ├── kg-dataflow.md     # Data flow skeleton
    ├── kg-shared.md       # Global constraints & narrative blocks
    ├── kg-bridge.md       # Bridge index (L3)
    ├── catalog.md         # Document inventory
    ├── skeleton.schema.json      # JSON Schema for structure validation
    └── graphify-mapping.json     # Verb → graph edge type mapping

The 9 Phases

Phase Name What Happens
0 Suitability Gating KD-Score: is this document set worth skeletonizing?
1 Document Audit Classify documents: dense / mixed / narrative
2 Vocabulary Baseline Establish namespace and relation type registry
3 Domain Model Extraction Scan all documents for candidate entities, merge across docs
4 Skeleton Generation LLM-assisted extraction + human refinement — the core step
5 Bridge Construction Pin every node and edge to original.md §X.Y
6 Quality Gate Automated lint + LLM audit — all checks pass before downstream
7 Downstream Integration Schema + mapping files for graphify / GraphRAG
8 Query-Time Retrieval Three-layer fallback: skeleton graph → narrative jump → vector RAG
9 Incremental Update Diff & Rebase when original documents change

Downstream Tools

Skeleton files are designed to be consumed directly by:

  • graphifygraphify <dir> --skeleton --mode deep (skip LLM extraction entirely)
  • GraphRAG — Feed skeleton files as input text; entity extraction runs on explicit [[links]]
  • Any graph database — Use graphify-mapping.json to convert verb types to edge types

License

MIT — see LICENSE for details.

About

Medical, legal, finance docs, LLM Wiki — semantic skeleton first, before GraphRAG or graphify. 30%+ more nodes & communities.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages