Skip to content

feat: hierarchical Leiden community detection + community summaries + relevance-pruned query#347

Open
zoedecker wants to merge 4 commits intosafishamsi:v4from
zoedecker:feat/hierarchical-leiden-summaries
Open

feat: hierarchical Leiden community detection + community summaries + relevance-pruned query#347
zoedecker wants to merge 4 commits intosafishamsi:v4from
zoedecker:feat/hierarchical-leiden-summaries

Conversation

@zoedecker
Copy link
Copy Markdown

Summary

Adds Microsoft GraphRAG-inspired hierarchical clustering and community summaries to graphify, enabling significantly fewer tokens on broad queries by matching against pre-computed community summaries and only expanding relevant subgraphs.

  • Multi-resolution Leiden clustering at 3 levels (resolution 0.5, 1.0, 2.0)
  • Pluggable summary backends: extractive (default, no LLM), ollama (local), claude (API)
  • Relevance-pruned query in MCP server: scores communities by summary-vs-query term overlap, expands only top-K
  • New list_communities MCP tool for browsing community structure
  • Fully backward-compatible: opt-in via --hierarchical flag
  • graph.json schema unchanged for existing users (new keys are additive)

Details

Hierarchical Clustering (cluster.py)

  • New hierarchical_cluster(G, resolutions) runs Leiden (or Louvain fallback) at multiple resolution levels
  • Returns {level: {community_id: [node_ids]}} stored in graph.json under community_hierarchy
  • Existing flat community field on nodes unchanged

Community Summaries (summarize.py — new file)

  • Three backends: extractive (template from top-degree nodes + edge types, zero LLM cost), ollama (local model), claude (API)
  • Stored in graph.json under community_summaries

Relevance-Pruned Query (serve.py)

  • _community_relevance_score() scores each community summary against query terms
  • _tool_query_graph now prunes to top-K relevant communities before BFS/DFS when hierarchy exists
  • Graceful fallback to existing behavior when no hierarchy present
  • New list_communities MCP tool returns summaries for browsing

CLI

  • graphify cluster-only <path> --hierarchical enables multi-resolution clustering + summaries
  • graphify cluster-only <path> --summary-backend <extractive|ollama|claude> selects summary backend

Test plan

  • 26 new tests across test_hierarchical.py, test_summarize.py, and test_serve.py
  • All 457 tests passing (including all existing tests — zero regressions)

Open questions

  • graspologic's leiden() resolution parameter support may vary by version — code falls back to networkx gracefully
  • Term-overlap relevance scoring is intentionally simple (v1); TF-IDF or embedding similarity could follow
  • report.py and wiki.py not yet updated to include summaries — keeping scope minimal for this PR

Zane Harris and others added 4 commits April 13, 2026 20:24
Add hierarchical_cluster() to cluster.py that runs Leiden/Louvain at
multiple resolution parameters [0.5, 1.0, 2.0] to produce coarse,
medium, and fine community levels. Falls back gracefully to networkx
when graspologic is not installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New summarize.py with three backends: extractive (template-based, no
LLM), ollama (local model), and claude (Anthropic API). LLM backends
fall back to extractive on failure. Public API: summarize_community()
and summarize_all_communities().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --hierarchical and --summary-backend flags to cluster-only command.
Update export.to_json() to store community_hierarchy and
community_summaries as optional top-level keys in graph.json.
Backward-compatible: omitted when flags are not used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When community_summaries exist in graph.json, query_graph now scores
communities by term overlap and restricts traversal to top-K relevant
communities before applying token budget. Falls back to original
behavior when no summaries are present. New list_communities tool
exposes summaries and hierarchy levels for browsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant