From 1b24985cf439695b22003539a7a3a6cbf3bda741 Mon Sep 17 00:00:00 2001 From: prosdev Date: Tue, 31 Mar 2026 11:50:04 -0700 Subject: [PATCH 1/8] =?UTF-8?q?docs:=20update=201.6=20plan=20=E2=80=94=20a?= =?UTF-8?q?ddress=20review=20findings=20before=20implementation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Key findings: - callers metadata is dead code (not stored in index) — callees-only is correct - Keep incomingRefs as real count, add score field for PageRank - Existing tests need callees data + relative ordering assertions - Add performance test for 2k-node graph Co-Authored-By: Claude Opus 4.6 (1M context) --- .../1.6-pagerank-map.md | 621 ++++++++++++++---- .../phase-1-mcp-tools-improvement/overview.md | 16 +- .claude/scratchpad.md | 4 +- 3 files changed, 527 insertions(+), 114 deletions(-) diff --git a/.claude/da-plans/mcp/phase-1-mcp-tools-improvement/1.6-pagerank-map.md b/.claude/da-plans/mcp/phase-1-mcp-tools-improvement/1.6-pagerank-map.md index 005d038..4ea4875 100644 --- a/.claude/da-plans/mcp/phase-1-mcp-tools-improvement/1.6-pagerank-map.md +++ b/.claude/da-plans/mcp/phase-1-mcp-tools-improvement/1.6-pagerank-map.md @@ -1,18 +1,24 @@ -# Part 1.6: PageRank File Ranking for dev_map +# Part 1.6: Graph Algorithms for dev_map and dev_refs > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. -**Goal:** Replace simple reference counting in `dev_map` hot paths with PageRank over the call graph for more meaningful file ranking. +**Goal:** Replace simple reference counting in `dev_map` hot paths with weighted PageRank +over the call graph for more meaningful file ranking. -**User stories:** US-12 (meaningful file importance in codebase map) +Connected components and shortest path are implemented alongside PageRank in the +same `graph.ts` module but wired into consumers (formatCodebaseMap, dev_refs) in a +follow-up PR. + +**User stories:** US-12 (meaningful file importance) **Inspiration:** [aider's repo map](https://aider.chat/docs/repomap.html) — Apache 2.0. Uses PageRank over dependency graph to identify architecturally central files. We already have the call graph data from the scanner (callees metadata in Antfly). **Files:** -- Create: `packages/core/src/map/pagerank.ts` -- Create: `packages/core/src/map/__tests__/pagerank.test.ts` -- Modify: `packages/core/src/map/index.ts` -- Modify: `packages/core/src/map/types.ts` +- Create: `packages/core/src/map/graph.ts` (PageRank, graph builder, connected components, shortest path) +- Create: `packages/core/src/map/__tests__/graph.test.ts` +- Modify: `packages/core/src/map/index.ts` (replace computeHotPaths with PageRank) +- Modify: `packages/core/src/map/types.ts` (add `score` to HotPath) +- Modify: `packages/core/src/map/__tests__/map.test.ts` (rewrite callers→callees tests) --- @@ -30,6 +36,24 @@ This is a good proxy but misses graph structure. A file could have few direct re We already have the data: every indexed document has `callees: [{ name, file, line }]` metadata. This is the dependency graph. +**Important finding:** `callers` metadata is NOT stored in the index — the scanner +comment says "callers are computed at query time via reverse lookup." The current +`computeHotPaths` reads `callers` from metadata (line 455 of map/index.ts) but this +field is always empty for real indexed docs. Only the `callees` path (lines 469-480) +produces results in production. This means switching to callees-only `buildDependencyGraph` +is not a regression — it matches what actually works. + +--- + +## Review findings (addressed before implementation) + +| Finding | Fix | Risk mitigation | +|---------|-----|-----------------| +| Plan only uses `callees`, current code uses both | `callers` is dead code in production (not stored in index). Use `callees` only. | Update tests from callers to callees mock data. Add comment explaining why. | +| `incomingRefs` changes meaning | Keep `incomingRefs` as actual incoming edge count. Add `score` field for PageRank value. Sort by `score`. | Backward compatible — `incomingRefs` is still a real count. Display stays "refs". | +| Existing tests will break | Update to use callees data + relative ordering assertions. | PageRank unit tests have exact assertions; integration tests verify behavior. | +| Performance claim unverified | Add perf test: 2k-node graph, assert <50ms. | Generous threshold avoids CI flakiness. | + --- ## Task 1: Implement PageRank algorithm @@ -41,59 +65,102 @@ Pure function — takes a graph, returns ranked nodes. No I/O. ```typescript // In packages/core/src/map/__tests__/pagerank.test.ts -import { pageRank } from '../pagerank'; +import { pageRank, buildDependencyGraph, type WeightedEdge } from '../pagerank'; + +function edge(target: string, weight = 1): WeightedEdge { + return { target, weight }; +} describe('pageRank', () => { it('should rank nodes by importance', () => { // A -> B -> C, A -> C // C should rank highest (most incoming from important nodes) - const graph = new Map(); - graph.set('A', ['B', 'C']); - graph.set('B', ['C']); - graph.set('C', []); + const graph = new Map(); + graph.set('A', [edge('B'), edge('C')]); + graph.set('B', [edge('C')]); const ranks = pageRank(graph); - expect(ranks.get('C')).toBeGreaterThan(ranks.get('A')!); - expect(ranks.get('C')).toBeGreaterThan(ranks.get('B')!); + expect(ranks.get('C')!).toBeGreaterThan(ranks.get('A')!); + expect(ranks.get('C')!).toBeGreaterThan(ranks.get('B')!); }); it('should handle cycles', () => { - // A -> B -> A (mutual dependency) - const graph = new Map(); - graph.set('A', ['B']); - graph.set('B', ['A']); + const graph = new Map(); + graph.set('A', [edge('B')]); + graph.set('B', [edge('A')]); const ranks = pageRank(graph); - // Both should have similar rank - expect(Math.abs(ranks.get('A')! - ranks.get('B')!)).toBeLessThan(0.1); + expect(Math.abs(ranks.get('A')! - ranks.get('B')!)).toBeLessThan(0.01); }); it('should handle disconnected nodes', () => { - const graph = new Map(); - graph.set('A', ['B']); - graph.set('B', []); - graph.set('C', []); // No connections + const graph = new Map(); + graph.set('A', [edge('B')]); + // B and C have no outgoing edges (dangling) + // B should rank higher — it has an incoming edge from A + + const ranks = pageRank(graph); + expect(ranks.get('B')!).toBeGreaterThan(ranks.get('C') || 0); + }); + + it('should handle dangling nodes (no outgoing edges)', () => { + // types.ts is imported by many but exports nothing callable + const graph = new Map(); + graph.set('a.ts', [edge('types.ts'), edge('b.ts')]); + graph.set('b.ts', [edge('types.ts')]); + // types.ts has no outgoing edges — dangling node + + const ranks = pageRank(graph); + // types.ts should rank highest (most incoming) + expect(ranks.get('types.ts')!).toBeGreaterThan(ranks.get('a.ts')!); + // Dangling node's rank should be distributed, not lost + const totalRank = Array.from(ranks.values()).reduce((a, b) => a + b, 0); + expect(totalRank).toBeCloseTo(1.0, 2); + }); + + it('should respect edge weights', () => { + const graph = new Map(); + // A depends heavily on B (weight 10), lightly on C (weight 1) + graph.set('A', [edge('B', 10), edge('C', 1)]); const ranks = pageRank(graph); - expect(ranks.get('A')).toBeDefined(); - expect(ranks.get('C')).toBeDefined(); - // Connected nodes should rank higher than isolated expect(ranks.get('B')!).toBeGreaterThan(ranks.get('C')!); }); it('should return empty map for empty graph', () => { - const ranks = pageRank(new Map()); - expect(ranks.size).toBe(0); + expect(pageRank(new Map()).size).toBe(0); }); - it('should converge within iterations', () => { - // Large-ish graph - const graph = new Map(); + it('should converge for large ring graph', () => { + const graph = new Map(); for (let i = 0; i < 100; i++) { - graph.set(`node${i}`, [`node${(i + 1) % 100}`]); + graph.set(`node${i}`, [edge(`node${(i + 1) % 100}`)]); } const ranks = pageRank(graph); expect(ranks.size).toBe(100); + // All nodes in a ring should have equal rank + const values = Array.from(ranks.values()); + const avg = values.reduce((a, b) => a + b, 0) / values.length; + for (const v of values) { + expect(v).toBeCloseTo(avg, 4); + } + }); + + it('should complete 2k-node graph in under 50ms', () => { + const graph = new Map(); + for (let i = 0; i < 2000; i++) { + const edges: WeightedEdge[] = []; + for (let j = 0; j < 5; j++) { + edges.push(edge(`node${(i + j + 1) % 2000}`, Math.random() * 5)); + } + graph.set(`node${i}`, edges); + } + const start = Date.now(); + const ranks = pageRank(graph); + const duration = Date.now() - start; + console.log(`PageRank: 2000 nodes, 10000 edges, ${duration}ms`); + expect(ranks.size).toBe(2000); + expect(duration).toBeLessThan(50); }); }); ``` @@ -103,71 +170,102 @@ describe('pageRank', () => { Run: `pnpm test -- packages/core/src/map/__tests__/pagerank.test.ts` Expected: FAIL — module not found -- [ ] **Step 3: Implement PageRank** +- [ ] **Step 3: Implement weighted PageRank with dangling node handling** + +Learnings from studying aider's implementation (Apache 2.0, uses NetworkX): +- Weighted edges (sqrt-dampened reference counts) +- Dangling node handling (files with no outgoing edges distribute rank to all) +- Convergence check (stop early if delta < 1e-6) +- Standard damping 0.85, max 100 iterations (matches NetworkX defaults) + +No external dependency — hand-rolled (~60 lines). If we ever need more +graph algorithms, graphology (MIT, TS types, 1.6k stars) is the upgrade path. ```typescript // packages/core/src/map/pagerank.ts +export interface WeightedEdge { + target: string; + weight: number; +} + /** - * PageRank algorithm for ranking nodes in a directed graph. + * Weighted PageRank with dangling node handling and convergence. * Pure function — no I/O. * - * Inspired by aider's repo map (https://github.com/Aider-AI/aider). - * - * @param graph - Map of node -> outgoing edges (dependencies) - * @param damping - Damping factor (default 0.85, standard for PageRank) - * @param iterations - Number of iterations (default 20, sufficient for convergence) - * @returns Map of node -> rank score (higher = more important) + * Inspired by aider's repo map (https://github.com/Aider-AI/aider) + * which uses NetworkX PageRank over a weighted dependency graph. */ export function pageRank( - graph: Map, + graph: Map, damping = 0.85, - iterations = 20 + maxIterations = 100, + tolerance = 1e-6 ): Map { + // Collect all nodes (sources + targets) const nodes = new Set(); - for (const [src, targets] of graph) { + for (const [src, edges] of graph) { nodes.add(src); - for (const t of targets) nodes.add(t); + for (const e of edges) nodes.add(e.target); } if (nodes.size === 0) return new Map(); const n = nodes.size; - const ranks = new Map(); - const initial = 1 / n; + let ranks = new Map(); // Initialize equal rank - for (const node of nodes) { - ranks.set(node, initial); - } + for (const node of nodes) ranks.set(node, 1 / n); - // Build reverse graph (who points to me?) - const inbound = new Map(); + // Build inbound map: target → [{ source, weight }] + const inbound = new Map>(); for (const node of nodes) inbound.set(node, []); - for (const [src, targets] of graph) { - for (const t of targets) { - inbound.get(t)?.push(src); + + // Build outgoing weight sums for normalization + const outWeightSum = new Map(); + for (const [src, edges] of graph) { + let sum = 0; + for (const e of edges) { + inbound.get(e.target)?.push({ source: src, weight: e.weight }); + sum += e.weight; } + outWeightSum.set(src, sum); } - // Iterate - for (let i = 0; i < iterations; i++) { + // Identify dangling nodes (no outgoing edges) + const danglingNodes: string[] = []; + for (const node of nodes) { + if (!outWeightSum.has(node) || outWeightSum.get(node) === 0) { + danglingNodes.push(node); + } + } + + // Iterate until convergence or max iterations + for (let iter = 0; iter < maxIterations; iter++) { const newRanks = new Map(); + // Dangling rank: sum of dangling nodes' ranks, distributed to all + let danglingRank = 0; + for (const d of danglingNodes) danglingRank += ranks.get(d) || 0; + for (const node of nodes) { let sum = 0; - const sources = inbound.get(node) || []; - for (const src of sources) { - const outDegree = graph.get(src)?.length || 1; - sum += (ranks.get(src) || 0) / outDegree; + for (const { source, weight } of inbound.get(node) || []) { + const srcOutWeight = outWeightSum.get(source) || 1; + sum += ((ranks.get(source) || 0) * weight) / srcOutWeight; } - newRanks.set(node, (1 - damping) / n + damping * sum); + // Standard PageRank formula with dangling node contribution + newRanks.set(node, (1 - damping) / n + damping * (sum + danglingRank / n)); } - // Update ranks - for (const [node, rank] of newRanks) { - ranks.set(node, rank); + // Check convergence + let delta = 0; + for (const node of nodes) { + delta += Math.abs((newRanks.get(node) || 0) - (ranks.get(node) || 0)); } + + ranks = newRanks; + if (delta < tolerance) break; } return ranks; @@ -198,32 +296,40 @@ git commit -m "feat(core): add PageRank algorithm for file importance ranking" import type { SearchResult } from '../vector/types.js'; /** - * Build a file dependency graph from indexed documents. - * Uses callees metadata to create edges: file A calls something in file B → A depends on B. + * Build a weighted file dependency graph from indexed documents. + * Uses callees metadata: file A calls N things in file B → edge weight = sqrt(N). + * sqrt dampening (from aider) prevents high-frequency references from dominating. * Pure function. */ -export function buildDependencyGraph(docs: SearchResult[]): Map { - const graph = new Map(); +export function buildDependencyGraph(docs: SearchResult[]): Map { + // Count raw references per (source, target) pair + const rawCounts = new Map>(); for (const doc of docs) { const sourceFile = doc.metadata.path as string; if (!sourceFile) continue; - if (!graph.has(sourceFile)) graph.set(sourceFile, []); + if (!rawCounts.has(sourceFile)) rawCounts.set(sourceFile, new Map()); const callees = doc.metadata.callees as Array<{ file?: string }> | undefined; if (callees && Array.isArray(callees)) { for (const callee of callees) { if (callee.file && callee.file !== sourceFile) { - graph.get(sourceFile)!.push(callee.file); + const targets = rawCounts.get(sourceFile)!; + targets.set(callee.file, (targets.get(callee.file) || 0) + 1); } } } } - // Deduplicate edges - for (const [node, edges] of graph) { - graph.set(node, [...new Set(edges)]); + // Convert to weighted edges with sqrt dampening + const graph = new Map(); + for (const [source, targets] of rawCounts) { + const edges: WeightedEdge[] = []; + for (const [target, count] of targets) { + edges.push({ target, weight: Math.sqrt(count) }); + } + graph.set(source, edges); } return graph; @@ -234,8 +340,8 @@ export function buildDependencyGraph(docs: SearchResult[]): Map { - it('should build graph from callees metadata', () => { - const docs: SearchResult[] = [ + it('should build weighted graph from callees metadata', () => { + const docs = [ { id: '1', score: 0.9, metadata: { path: 'src/a.ts', callees: [{ name: 'foo', file: 'src/b.ts', line: 10 }], @@ -246,38 +352,45 @@ describe('buildDependencyGraph', () => { }}, ]; - const graph = buildDependencyGraph(docs); - expect(graph.get('src/a.ts')).toContain('src/b.ts'); - expect(graph.get('src/b.ts')).toContain('src/c.ts'); + const graph = buildDependencyGraph(docs as any); + const aEdges = graph.get('src/a.ts')!; + expect(aEdges.some(e => e.target === 'src/b.ts')).toBe(true); + expect(aEdges[0].weight).toBe(1); // sqrt(1) = 1 }); - it('should handle docs without callees metadata', () => { - const docs: SearchResult[] = [ - { id: '1', score: 0.9, metadata: { path: 'src/types.ts', type: 'interface' } }, - { id: '2', score: 0.9, metadata: { + it('should sqrt-dampen weights for multiple references', () => { + const docs = [ + { id: '1', score: 0.9, metadata: { path: 'src/a.ts', - callees: [{ name: 'MyType', file: 'src/types.ts', line: 1 }], + callees: [ + { name: 'foo', file: 'src/b.ts', line: 10 }, + { name: 'bar', file: 'src/b.ts', line: 20 }, + { name: 'baz', file: 'src/b.ts', line: 30 }, + { name: 'qux', file: 'src/b.ts', line: 40 }, + ], }}, ]; - const graph = buildDependencyGraph(docs); - expect(graph.get('src/a.ts')).toContain('src/types.ts'); - expect(graph.get('src/types.ts')).toEqual([]); + const graph = buildDependencyGraph(docs as any); + const aEdges = graph.get('src/a.ts')!; + expect(aEdges.length).toBe(1); // deduplicated to one edge + expect(aEdges[0].target).toBe('src/b.ts'); + expect(aEdges[0].weight).toBe(2); // sqrt(4) = 2 }); - it('should deduplicate edges', () => { - const docs: SearchResult[] = [ - { id: '1', score: 0.9, metadata: { + it('should handle docs without callees metadata', () => { + const docs = [ + { id: '1', score: 0.9, metadata: { path: 'src/types.ts', type: 'interface' } }, + { id: '2', score: 0.9, metadata: { path: 'src/a.ts', - callees: [ - { name: 'foo', file: 'src/b.ts', line: 10 }, - { name: 'bar', file: 'src/b.ts', line: 20 }, - ], + callees: [{ name: 'MyType', file: 'src/types.ts', line: 1 }], }}, ]; - const graph = buildDependencyGraph(docs); - expect(graph.get('src/a.ts')).toEqual(['src/b.ts']); + const graph = buildDependencyGraph(docs as any); + expect(graph.get('src/a.ts')!.some(e => e.target === 'src/types.ts')).toBe(true); + // types.ts has no outgoing edges (not even in the graph as a source) + expect(graph.has('src/types.ts')).toBe(false); }); }); ``` @@ -298,12 +411,21 @@ git commit -m "feat(core): add dependency graph builder from indexed callees" Replace the current `computeHotPaths` function (simple reference count) with PageRank-based ranking: ```typescript -import { buildDependencyGraph, pageRank } from './pagerank.js'; +import { buildDependencyGraph, pageRank } from './graph.js'; function computeHotPaths(docs: SearchResult[], maxPaths: number): HotPath[] { const graph = buildDependencyGraph(docs); const ranks = pageRank(graph); + // Count real incoming edges per file (distinct source files) + const incomingCounts = new Map>(); + for (const [src, edges] of graph) { + for (const e of edges) { + if (!incomingCounts.has(e.target)) incomingCounts.set(e.target, new Set()); + incomingCounts.get(e.target)!.add(src); + } + } + // Build a lookup for primary component name per file const componentByFile = new Map(); for (const doc of docs) { @@ -313,48 +435,327 @@ function computeHotPaths(docs: SearchResult[], maxPaths: number): HotPath[] { } } - // Sort by PageRank score and take top N + // Sort by PageRank score, display real incoming ref count return Array.from(ranks.entries()) .sort((a, b) => b[1] - a[1]) .slice(0, maxPaths) .map(([file, score]) => ({ file, - incomingRefs: Math.round(score * 1000), // Normalized PageRank for display + incomingRefs: incomingCounts.get(file)?.size ?? 0, + score, primaryComponent: componentByFile.get(file), })); } ``` -- [ ] **Step 2: Update HotPath type if needed** +- [ ] **Step 2: Update HotPath type** -In `packages/core/src/map/types.ts`, consider adding a `pageRankScore` field: +In `packages/core/src/map/types.ts`, add `score` field: ```typescript export interface HotPath { file: string; - incomingRefs: number; + incomingRefs: number; // actual count of files that depend on this file + score: number; // PageRank score (used for sorting) primaryComponent?: string; - pageRankScore?: number; // Optional, for debugging/verbose output } ``` -- [ ] **Step 3: Run full test suite** +Sort by `score` (PageRank), display `incomingRefs` (real count) — keeps display meaningful. + +- [ ] **Step 3: Rewrite existing hot paths tests (callers → callees)** + +Three tests in `map.test.ts` use `callers` mock data which is dead code in production. +Rewrite them to use `callees` data and assert relative ordering (not exact counts): + +**Test 1: "should compute hot paths from callers data" (line 288)** +→ Rewrite as "should compute hot paths from callees data" + - Mock docs with `callees` pointing to a common target file + - Assert the target file ranks first (PageRank should rank it highest) + - Assert `incomingRefs` is the real incoming edge count + - Assert `score` is a positive number + +**Test 2: "should limit hot paths to maxHotPaths" (line 365)** +→ Keep structure, change mock data from `callers` to `callees` + - Assert `hotPaths.length <= maxHotPaths` + - Assert sorted by score descending + +**Test 3: "should format hot paths in output" (line 411)** +→ Keep structure, change mock data from `callers` to `callees` + - Assert output contains "refs" label + - Assert file names appear in output + +**Test 4 (existing, unchanged): "should not include hot paths when disabled" (line 390)** +→ No change needed — doesn't use callers data + +- [ ] **Step 4: Add test for empty callees array** + +```typescript +it('should handle docs with empty callees array', () => { + const docs = [ + { id: '1', score: 0.9, metadata: { path: 'src/types.ts', callees: [] } }, + ]; + const graph = buildDependencyGraph(docs as any); + expect(graph.get('src/types.ts')).toEqual([]); +}); +``` + +- [ ] **Step 5: Run full test suite** Run: `pnpm build && pnpm test` -Expected: ALL PASS (hot paths test may need updating for new ranking order) +Expected: ALL PASS - [ ] **Step 4: Commit** ```bash -git add packages/core/src/map/index.ts packages/core/src/map/types.ts packages/core/src/map/pagerank.ts packages/core/src/map/__tests__/ +git add packages/core/src/map/ git commit -m "feat(core): use PageRank for dev_map hot paths ranking" ``` --- +## Task 4 (deferred): Connected components for subsystem identification + +**Deferred to follow-up PR.** Implement the algorithm and tests in this PR +(it's in graph.ts alongside PageRank), but don't wire it into CodebaseMap or +formatCodebaseMap. Wire it when there's a consumer. + +Identifies clusters of files that form independent subsystems. Uses the +undirected version of the dependency graph (A depends on B = A and B are connected). + +**Value for agents:** "This codebase has 3 isolated subsystems: core (45 files), +CLI (12 files), and MCP server (18 files)." Helps agents scope their work. + +- [ ] **Step 1: Implement connected components (BFS)** + +```typescript +// In graph.ts + +/** + * Find connected components in the dependency graph (undirected). + * Returns groups of files sorted by size (largest first). + * Pure function — no I/O. + */ +export function connectedComponents( + graph: Map +): string[][] { + // Build undirected adjacency list + const adj = new Map>(); + const allNodes = new Set(); + for (const [src, edges] of graph) { + allNodes.add(src); + if (!adj.has(src)) adj.set(src, new Set()); + for (const e of edges) { + allNodes.add(e.target); + if (!adj.has(e.target)) adj.set(e.target, new Set()); + adj.get(src)!.add(e.target); + adj.get(e.target)!.add(src); + } + } + + const visited = new Set(); + const components: string[][] = []; + + for (const node of allNodes) { + if (visited.has(node)) continue; + // BFS from this node + const component: string[] = []; + const queue = [node]; + visited.add(node); + while (queue.length > 0) { + const current = queue.shift()!; + component.push(current); + for (const neighbor of adj.get(current) || []) { + if (!visited.has(neighbor)) { + visited.add(neighbor); + queue.push(neighbor); + } + } + } + components.push(component); + } + + // Sort by size (largest first) + return components.sort((a, b) => b.length - a.length); +} +``` + +- [ ] **Step 2: Write tests** + +```typescript +describe('connectedComponents', () => { + it('should identify separate clusters', () => { + const graph = new Map(); + // Cluster 1: A -> B -> C + graph.set('A', [edge('B')]); + graph.set('B', [edge('C')]); + // Cluster 2: D -> E + graph.set('D', [edge('E')]); + + const components = connectedComponents(graph); + expect(components.length).toBe(2); + expect(components[0].length).toBe(3); // A, B, C + expect(components[1].length).toBe(2); // D, E + }); + + it('should treat the graph as undirected', () => { + const graph = new Map(); + // A -> B, C -> B (B connects A and C even though edges point inward) + graph.set('A', [edge('B')]); + graph.set('C', [edge('B')]); + + const components = connectedComponents(graph); + expect(components.length).toBe(1); // All connected + expect(components[0].length).toBe(3); + }); + + it('should handle single-node components', () => { + const graph = new Map(); + graph.set('A', [edge('B')]); + graph.set('lonely', []); // Isolated node + + const components = connectedComponents(graph); + expect(components.length).toBe(2); + }); + + it('should return empty for empty graph', () => { + expect(connectedComponents(new Map()).length).toBe(0); + }); +}); +``` + +- [ ] **Step 3: Commit (algorithm + tests only, no wiring)** + +```bash +git add packages/core/src/map/graph.ts packages/core/src/map/__tests__/graph.test.ts +git commit -m "feat(core): add connected components algorithm (consumer wired in follow-up)" +``` + +--- + +## Task 5 (deferred): Shortest path for call chain tracing + +**Deferred to follow-up PR.** Implement algorithm and tests in this PR. +Wire into dev_refs as "trace path" feature in a separate PR. + +Answers "how does file A reach file B?" — BFS for hop count on the +unweighted dependency graph. Not Dijkstra — agents care about hops, not weights. + +**Value for agents:** Instead of multiple `dev_refs` calls, one query shows: +"auth.ts → user-service.ts → repository.ts → database.ts (3 hops)" + +- [ ] **Step 1: Implement shortest path (BFS)** + +```typescript +// In graph.ts + +/** + * Find shortest path between two files in the dependency graph. + * Uses BFS (unweighted — hop count, not edge weight). + * Returns the path as an array of files, or null if unreachable. + * Pure function — no I/O. + */ +export function shortestPath( + graph: Map, + from: string, + to: string +): string[] | null { + if (from === to) return [from]; + if (!graph.has(from)) return null; + + const visited = new Set([from]); + const parent = new Map(); + const queue = [from]; + + while (queue.length > 0) { + const current = queue.shift()!; + for (const { target } of graph.get(current) || []) { + if (visited.has(target)) continue; + visited.add(target); + parent.set(target, current); + if (target === to) { + // Reconstruct path + const path = [to]; + let node = to; + while (parent.has(node)) { + node = parent.get(node)!; + path.unshift(node); + } + return path; + } + queue.push(target); + } + } + + return null; // Unreachable +} +``` + +- [ ] **Step 2: Write tests** + +```typescript +describe('shortestPath', () => { + it('should find direct path', () => { + const graph = new Map(); + graph.set('A', [edge('B')]); + + expect(shortestPath(graph, 'A', 'B')).toEqual(['A', 'B']); + }); + + it('should find multi-hop path', () => { + const graph = new Map(); + graph.set('A', [edge('B')]); + graph.set('B', [edge('C')]); + graph.set('C', [edge('D')]); + + expect(shortestPath(graph, 'A', 'D')).toEqual(['A', 'B', 'C', 'D']); + }); + + it('should find shortest among multiple paths', () => { + const graph = new Map(); + graph.set('A', [edge('B'), edge('C')]); + graph.set('B', [edge('D')]); + graph.set('C', [edge('D')]); // A->C->D is same length as A->B->D + + const path = shortestPath(graph, 'A', 'D'); + expect(path?.length).toBe(3); // 3 nodes = 2 hops + }); + + it('should return null for unreachable target', () => { + const graph = new Map(); + graph.set('A', [edge('B')]); + graph.set('C', [edge('D')]); // Disconnected + + expect(shortestPath(graph, 'A', 'D')).toBeNull(); + }); + + it('should return single-node path for self', () => { + const graph = new Map(); + graph.set('A', [edge('B')]); + + expect(shortestPath(graph, 'A', 'A')).toEqual(['A']); + }); + + it('should return null for unknown source', () => { + expect(shortestPath(new Map(), 'X', 'Y')).toBeNull(); + }); +}); +``` + +- [ ] **Step 3: Commit (algorithm + tests only, no wiring)** + +```bash +git add packages/core/src/map/graph.ts packages/core/src/map/__tests__/graph.test.ts +git commit -m "feat(core): add shortest path algorithm (consumer wired in follow-up)" +``` + +--- + ## Notes -- **Existing map tests need updating:** Tests in `map.test.ts` mock `callers` data. After this change, `computeHotPaths` uses `callees` via `buildDependencyGraph`. Update mock data to use `callees` instead of `callers`, and adjust expected ranking order since PageRank differs from simple ref counting. -- PageRank is ~O(V + E) per iteration × 20 iterations. For a 2k-file repo with 10k edges, this is <10ms. Negligible. -- The `incomingRefs` field now shows normalized PageRank score, not raw count. Display label in `formatCodebaseMap` could change from "refs" to "importance" or keep "refs" for familiarity. -- Attribution: add to ARCHITECTURE.md: "File importance ranking inspired by [aider's repo map](https://github.com/Aider-AI/aider)" +- **Existing map tests:** 3 tests use `callers` mock data (dead code). Concrete rewrites specified in Task 3 Step 3. +- **Performance:** PageRank is ~O(V + E) per iteration × 100 max iterations. <10ms for 2k files. Perf test verifies. +- **Display:** `incomingRefs` = real incoming edge count. `score` = PageRank value for sorting. Label stays "refs". +- **Deferred consumers:** Connected components → `formatCodebaseMap`. Shortest path → `dev_refs`. Both in follow-up PRs. +- **All algorithms in one file:** `graph.ts` contains pageRank, buildDependencyGraph, connectedComponents, shortestPath. ~115 lines total. +- **Attribution:** "File importance ranking inspired by [aider's repo map](https://github.com/Aider-AI/aider)" diff --git a/.claude/da-plans/mcp/phase-1-mcp-tools-improvement/overview.md b/.claude/da-plans/mcp/phase-1-mcp-tools-improvement/overview.md index a482a85..1a8bb36 100644 --- a/.claude/da-plans/mcp/phase-1-mcp-tools-improvement/overview.md +++ b/.claude/da-plans/mcp/phase-1-mcp-tools-improvement/overview.md @@ -1,6 +1,6 @@ # Phase 1: MCP Tools Improvement -**Status:** Draft +**Status:** In progress (Parts 1.1–1.5 merged, 1.6 in progress) ## Context @@ -67,8 +67,18 @@ This is an acceptable trade-off: line count is a cheap stat call. | [1.2](./1.2-index-based-analysis.md) | Add `getDocsByFilePath`, index analysis path, wire VectorStorage | Medium — new code path | | [1.3](./1.3-cleanup.md) | Consolidate reads, remove dead code, remove GitHub from health | Low — cleanup | | [1.4](./1.4-agent-usability.md) | Merge status/health, add error suggestions, rename params, JSON output | Medium — tool surface change | -| [1.5](./1.5-ast-pattern-analysis.md) | AST-based pattern analysis via ast-grep (optional dep) | Low — additive, regex fallback | -| [1.6](./1.6-pagerank-map.md) | PageRank file ranking for dev_map hot paths | Low — replaces simple counting | +| [1.5](./1.5-ast-pattern-analysis.md) | AST-based pattern analysis via tree-sitter queries | Low — additive, regex fallback | +| [1.6](./1.6-pagerank-map.md) | Graph algorithms: PageRank, connected components, shortest path | Low — replaces simple counting | + +### Part 1.6 Commit Plan + +| # | Commit | What changes | +|---|--------|-------------| +| 1 | `feat(core): add graph algorithms — PageRank, connected components, shortest path` | New `graph.ts` with pure functions + `graph.test.ts` (~20 tests). No wiring. | +| 2 | `feat(core): replace ref counting with PageRank in dev_map` | Wire PageRank into `computeHotPaths`. Add `score` to `HotPath`. Rewrite 3 callers→callees tests. | +| 3 | `feat(core): wire connected components into dev_map output` | Add `components` to `CodebaseMap` + `formatCodebaseMap`. | +| 4 | `feat(mcp): add path tracing to dev_refs` | New `trace` param on RefsAdapter. Schema + tests. | +| 5 | `docs: complete MCP Phase 1, attribution, plan status` | Plan updates, aider attribution, mark Phase 1 complete. | --- diff --git a/.claude/scratchpad.md b/.claude/scratchpad.md index b00ea8f..2067b41 100644 --- a/.claude/scratchpad.md +++ b/.claude/scratchpad.md @@ -11,7 +11,9 @@ ## Future Work - Antfly SDK: server-side path filter for `getDocsByFilePath` (eliminates 5k cap) -- PageRank for `dev_map` hot paths (MCP Phase 1, Part 1.6) +- Wire `shortestPath` into `dev_refs` as a "trace path" feature (graph.ts is ready, adapter wiring is separate scope) +- Wire `connectedComponents` into `dev_map` verbose output (graph.ts is ready) +- Betweenness centrality — identifies bridge files between subsystems. Worth adding if agents need refactoring guidance. graphology (MIT, 1.6k stars) is the upgrade path if we need more than 3 hand-rolled algorithms. - E2E tests in CI — blocked on Antfly memory requirements vs GitHub runner limits (7GB) - **Python language support** — tree-sitter-python WASM is ~300KB, already in tree-sitter-wasms. Needs a Python scanner (document extraction) + Python-specific pattern rules. High demand — large ecosystem. Worth a standalone plan covering: scanner, pattern rules, test fixtures, indexer integration. The PatternMatcher interface from 1.5 is language-agnostic so pattern rules slot right in; the scanner is the real work. - Vue/Svelte SFC support — `.vue`/`.svelte` files have embedded `