Skip to content

Emit wiki/ on code-only rebuild so report wikilinks resolve#382

Open
prkash1704 wants to merge 144 commits intosafishamsi:mainfrom
prkash1704:hydrate-wiki-on-rebuild
Open

Emit wiki/ on code-only rebuild so report wikilinks resolve#382
prkash1704 wants to merge 144 commits intosafishamsi:mainfrom
prkash1704:hydrate-wiki-on-rebuild

Conversation

@prkash1704
Copy link
Copy Markdown

Summary

  • watch._rebuild_code and the cluster-only CLI path regenerate GRAPH_REPORT.md + graph.json but never call to_wiki().
  • Consequence: the [[_COMMUNITY_<name>]] and [[<god-node>]] wikilinks the report emits dead-end until the user runs a separate wiki export. The AGENTS/CLAUDE.md guidance to "navigate wiki/index.md instead of raw files" resolves to nothing on incremental rebuilds — agents fall back to reading raw source, exactly what the graph is meant to avoid.
  • This PR wires to_wiki() into both rebuild paths so graphify-out/wiki/ stays in sync with the report on every tick.

to_wiki() already produces rich articles (key concepts with degree, cross-community relationships, source files, audit trail, god-node neighbors grouped by relation) — no changes needed there, just call it.

Test plan

  • Ran _rebuild_code(Path('.')) against a real project (141 communities, 1791 nodes, 3321 edges) — graphify-out/wiki/ populated with community + god-node articles + index.md, all non-empty
  • Verified GRAPH_REPORT.md wikilinks now resolve to populated files
  • cluster-only path (not exercised end-to-end, but mirrors the same call shape)

🤖 Generated with Claude Code

- Add GitHub Actions CI workflow (Python 3.10 and 3.12)
- Add CI badge to README
- Add ARCHITECTURE.md: pipeline overview, module table, schema, how to
  add a language extractor, security summary
- Move eval reports from tests/ to worked/httpx/ and worked/mixed-corpus/
- Fix README: test count 163→212, language table (13 languages via
  tree-sitter), extract.py description, worked examples links

benchmark: 8.8x token reduction on nanoGPT + minGPT + micrograd

- Run AST extraction on 29 Python files across 3 Karpathy repos
- 177 nodes, 246 edges, 17 communities (Leiden)
- 8.8x avg token reduction vs naive full-corpus context stuffing
- Notable: micrograd cleanly splits into engine/nn communities;
  nanoGPT model vs training loop correctly separated
- Honest: stdlib import noise flagged, config isolates documented

benchmark: 71.5x token reduction on mixed corpus (code+papers+images)

Full run: nanoGPT+minGPT+micrograd + 5 research papers + 4 images
285 nodes, 340 edges, 53 communities
Average BFS query: 1,726 tokens vs 123,488 naive (71.5x)
Code-only (AST) sub-benchmark: 8.8x on 13k-word corpus
style: replace all em dashes with hyphens

fix: explain hidden .graphify/ folder in skill output and README

fix: rename .graphify/ to graphify-out/ so output is visible by default
- Replace pyvis with custom vis.js renderer: node size by degree,
  click-to-inspect panel with clickable neighbors, search box,
  community filter, physics clustering by community
- HTML graph generated by default on every run (no --html flag needed)
- Token reduction benchmark auto-runs after every /graphify on corpora >5k words
- Fix 292 edge warnings: silently skip stdlib/external edges in build.py
- Fix build() to merge extractions before building (cross-extraction edges were dropped)
- Add 5 HTML renderer tests (223 total)
- Remove unnecessary files: lib/, tests/eval_attention.py, misplaced eval reports
- Add graphify-out/ and .graphify_*.json to .gitignore
- Bump version to 0.1.4, remove pyvis dependency
- README: token reduction as top-level selling point, vis.js in tech stack,
  graph.html in output listing, correct test count and install command
Covers detect → extract → build → cluster → analyze → report → export
using existing fixtures. AST-only (no LLM calls), catches regressions
in how modules connect, not just individual module behaviour.
- Semantic extraction chunks: 12-15 → 20-25 files (fewer subagent round trips)
- Code-only corpora skip semantic dispatch entirely (AST covers it)
- Print estimated time before extraction so the wait feels intentional
…hecks, no-viz clarity

- Add --graphml to Usage table (was implemented but undocumented there)
- Remove early manifest save from --update merge step (Step 9 owns it; saving early meant failed pipelines left manifest ahead of graph)
- query/path/explain now check graph.json exists before running, with clear "run /graphify first" message
- --no-viz: clarify it skips both Obsidian vault and HTML (was contradictory)
…laude Code hooks

- confidence_score required on every edge (INFERRED: 0.4-0.9, EXTRACTED: 1.0, AMBIGUOUS: 0.1-0.3)
- semantically_similar_to edges for non-obvious cross-file conceptual links
- hyperedges for 3+ node group relationships - fixed cache and merge pipeline that was silently dropping them
- check_semantic_cache returns 4-tuple including cached_hyperedges
- extract.py: mine the "why" - module/class/function docstrings and rationale comments (# NOTE: # IMPORTANT: # HACK: # WHY: # RATIONALE: # TODO: # FIXME:) as rationale_for nodes
- skill.md: rationale_for in relation schema, doc files extract design rationale
- obsidian output opt-in (--obsidian flag) - default output is graph.html + graph.json + GRAPH_REPORT.md only
- hooks.py: post-checkout hook added alongside post-commit - graph rebuilds on branch switch
- claude install: writes .claude/settings.json PreToolUse hook on Glob/Grep - Claude checks graph before searching raw files
- README updated with all v2 features
safishamsi and others added 29 commits April 10, 2026 16:07
…ort.py, bound collision loop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…aceholder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…afishamsi#195: skill.md requires general-purpose subagent type for extraction dispatch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… and bump to 0.4.2

- extract.py: use str(path) for node IDs to prevent same-basename collision (safishamsi#211)
- build.py: normalize from/to edge keys before KeyError (safishamsi#216)
- export.py: guard ZeroDivisionError when graph has no edges (safishamsi#217)
- hooks.py: remove stale CODE_EXTS filter, rebuild on any changed file (safishamsi#222)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hamsi#221 into 0.4.2

- build/validate: accept NetworkX <=3.1 "links" key alongside "edges" (safishamsi#212)
- __main__: skip version check during install/uninstall, deduplicate paths (safishamsi#220)
- all file IO: explicit encoding="utf-8" to prevent crashes on Windows CJK locales (safishamsi#204)
- hooks: add newline="\n" on write to prevent CRLF shebang breakage on Windows (safishamsi#204)
- export: strip trailing .md from safe_name so "CLAUDE.md" doesn't become "CLAUDE.md.md" (safishamsi#221)
- report: add Community Hubs navigation block so Obsidian vault stays connected (safishamsi#221)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, safishamsi#254 and bump to 0.4.3

- extract.py: resolve relative JS/TS imports to full-path IDs (fixes 0 import edges on TS codebases) (safishamsi#256)
- extract.py: resolve relative Python imports to full-path IDs (safishamsi#256)
- watch.py: merge fresh AST with existing semantic nodes instead of overwriting (safishamsi#253)
- hooks.py: add python fallback after python3 for Windows; exit 0 if neither found (safishamsi#244)
- analyze.py: guard stale _src/_tgt hints with node membership check (safishamsi#226)
- detect.py + extract.py: add .vue and .svelte to CODE_EXTENSIONS and _DISPATCH (safishamsi#254)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- watch.py: preserve INFERRED/AMBIGUOUS edges (code<->doc) across rebuilds (safishamsi#261)
- __main__.py: fix Codex hook - use additionalContext instead of permissionDecision:allow (safishamsi#249)
- detect.py: skip common lockfiles (package-lock.json, yarn.lock, Cargo.lock etc.) (safishamsi#266)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Some MCP clients send blank lines between JSON messages. The stdio
transport tried to parse every line as JSONRPCMessage, crashing with
a Pydantic ValidationError. _filter_blank_stdin() installs an OS-level
pipe that relays stdin while silently dropping blank-only lines.

Closes safishamsi#201

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s, fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… path bug, .graphifyignore subfolder patterns; v0.4.10: Dart, Hermes, 6 CLI commands, PHP improvements

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…NTS.md python3 fix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e plugin, cache root, PHP missing edges, Windows stability, cross-file calls

- safishamsi#352: add skill-kiro.md to pyproject.toml package-data
- safishamsi#341: guard edge_betweenness at >5000 nodes; use approximate k=100 for suggest_questions on large graphs
- safishamsi#354/safishamsi#229: add Step 6b in skill.md to call to_wiki() when --wiki given (before Step 9 cleanup)
- safishamsi#356: call _install_opencode_plugin() from install --platform opencode path
- safishamsi#350: add cache_root param to extract() so subdirectory runs keep cache at ./graphify-out/cache/
- safishamsi#230: PHP class_constant_access_expression emits references_constant edges
- safishamsi#232: PHP scoped_call_expression (static method calls) emits calls edges
- safishamsi#287: os.replace fallback for Windows WinError 5; graphify update exits 1 on failure; templates use graphify update . instead of python3 -c
- safishamsi#348: cross-file call resolution for all languages via raw_calls + global label map pass in extract()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The watch._rebuild_code and `cluster-only` CLI paths regenerate
GRAPH_REPORT.md + graph.json but never call to_wiki(). Result: the
[[_COMMUNITY_...]] and [[<god-node>]] wikilinks the report emits
dead-end unless the user runs a separate wiki export, and the
AGENTS/CLAUDE.md guidance to "navigate wiki/index.md instead of raw
files" resolves to nothing on incremental rebuilds.

Wire to_wiki() into both rebuild paths so graphify-out/wiki/ stays in
sync with the report on every tick. Passes through community_labels,
cohesion, and a god-node list derived from analyze.god_nodes().
The god-node article emits [[<neighbor_label>]] for every neighbor,
but to_wiki() only writes articles for community hubs and god nodes.
Non-god neighbors have no landing page, so ~40% of god-node wikilinks
dead-end.

Track which labels will have articles and route the rest through the
community page containing that neighbor: **<label>** (in [[<community>]]).
Readers still get to a real page; agents don't chase ghosts.

Verified on a 1791-node / 141-community project: broken-link ratio
dropped from 43.8% (272 / 621) to 0% (0 / 484).
@prkash1704
Copy link
Copy Markdown
Author

Follow-up commit on this branch: fixed a related dead-link issue in god-node articles. _god_node_article was emitting [[<neighbor_label>]] for every connection, but to_wiki only writes articles for community hubs + god nodes, so non-god neighbors had no landing page.

Now the god-node connection list only wikilinks neighbors that have their own article; everyone else gets **<label>** (in [[<community>]]) so the reader still lands somewhere real.

Verified on a 1791-node / 141-community project: broken-link ratio dropped from 43.8% (272/621) → 0% (0/484). Happy to split into a separate PR if you'd prefer to review them independently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants