graphify is a Claude Code skill backed by a Python library. The skill orchestrates the library; the library can be used standalone.
detect() → extract() → build_graph() → cluster() → analyze() → report() → export()
Each stage is a single function in its own module. They communicate through plain Python dicts and NetworkX graphs - no shared state, no side effects outside graphify-out/.
| Module | Function | Input → Output |
|---|---|---|
detect.py |
collect_files(root) |
directory → [Path] filtered list |
extract.py |
extract(path) |
file path → {nodes, edges} dict |
build.py |
build_graph(extractions) |
list of extraction dicts → nx.Graph |
cluster.py |
cluster(G) |
graph → graph with community attr on each node |
analyze.py |
analyze(G) |
graph → analysis dict (god nodes, surprises, questions) |
report.py |
render_report(G, analysis) |
graph + analysis → GRAPH_REPORT.md string |
export.py |
export(G, out_dir, ...) |
graph → Obsidian vault, graph.json, graph.html, graph.svg |
ingest.py |
ingest(url, ...) |
URL → file saved to corpus dir |
cache.py |
check_semantic_cache / save_semantic_cache |
files → (cached, uncached) split |
security.py |
validation helpers | URL / path / label → validated or raises |
validate.py |
validate_extraction(data) |
extraction dict → raises on schema errors |
serve.py |
start_server(graph_path) |
graph file path → MCP stdio server |
watch.py |
watch(root, flag_path) |
directory → writes flag file on change |
benchmark.py |
run_benchmark(graph_path) |
graph file → corpus vs subgraph token comparison |
Every extractor returns:
{
"nodes": [
{"id": "unique_string", "label": "human name", "source_file": "path", "source_location": "L42"}
],
"edges": [
{"source": "id_a", "target": "id_b", "relation": "calls|imports|uses|...", "confidence": "EXTRACTED|INFERRED|AMBIGUOUS"}
]
}validate.py enforces this schema before build_graph() consumes it.
| Label | Meaning |
|---|---|
EXTRACTED |
Relationship is explicitly stated in the source (e.g., an import statement, a direct call) |
INFERRED |
Relationship is a reasonable deduction (e.g., call-graph second pass, co-occurrence in context) |
AMBIGUOUS |
Relationship is uncertain; flagged for human review in GRAPH_REPORT.md |
- Add a
extract_<lang>(path: Path) -> dictfunction inextract.pyfollowing the existing pattern (tree-sitter parse → walk nodes → collectnodesandedges→ call-graph second pass for INFERREDcallsedges). - Register the file suffix in
extract()dispatch andcollect_files(). - Add the suffix to
CODE_EXTENSIONSindetect.pyand_WATCHED_EXTENSIONSinwatch.py. - Add the tree-sitter package to
pyproject.tomldependencies. - Add a fixture file to
tests/fixtures/and tests totests/test_languages.py.
All external input passes through graphify/security.py before use:
- URLs →
validate_url()(http/https only) +_NoFileRedirectHandler(blocks file:// redirects) - Fetched content →
safe_fetch()/safe_fetch_text()(size cap, timeout) - Graph file paths →
validate_graph_path()(must resolve insidegraphify-out/) - Node labels →
sanitize_label()(strips control chars, caps 256 chars, HTML-escapes)
See SECURITY.md for the full threat model.
One test file per module under tests/. Run with:
pytest tests/ -qAll tests are pure unit tests - no network calls, no file system side effects outside tmp_path.