feat(mcp): multi-module Go trace-quality + small-repo retrieval tuning#494
Merged
Conversation
…ilure inlining
Multi-pronged fix to make codegraph competitive on Go multi-module repos
(cosmos-sdk, etcd) where it previously lost or tied. Driven by an 8-question
agent-eval audit across cobra, gin, prometheus, cosmos-sdk, and etcd: the
baseline had codegraph losing ~60% on cost on cosmos-sdk and mixed on etcd
deep cross-module flows, while winning cleanly on the single-module and
non-protobuf-heavy repos.
Diagnostics ruled OUT `go.work` parsing as the gap (prometheus crushes
without it). The actual failure modes were generated-file noise warping
disambiguation, missing gRPC interface→impl bridge in structural-typing Go,
and trace's failure path triggering 3-5 follow-up tool calls instead of
inlining the material the agent needed.
Changes:
- New `src/extraction/generated-detection.ts` — path-pattern classifier
for `.pb.go`, `.pulsar.go`, `_grpc.pb.go`, `_mock.go`, `_mocks.go`,
`mock_*.go`, `.generated.[jt]sx?`, `_pb2(_grpc)?.py`, `.pb.{cc,h}`,
`.g.dart`, `.freezed.dart`. Applied as a stable sort tiebreaker in
`findSymbol`, `findAllSymbols`, `codegraph_search` (MCP + CLI),
`codegraph_explore` file ranking, and context formatter Entry Points /
Related Symbols / Code blocks. Cosmos's `msgServer.Send` now ranks #3
instead of #9 on a `Send` search.
- New `goGrpcStubImplEdges` synthesizer in `callback-synthesizer.ts` —
detects `UnimplementedXxxServer` structs in generated files, identifies
their RPC methods (excluding `mustEmbed*` / `testEmbeddedByValue` gRPC
markers), and emits `calls` edges to the matching methods on any
non-generated struct whose method-name set is a superset. Closes Go's
structural-typing gap that the existing `interfaceOverrideEdges` (Java /
Kotlin only) couldn't bridge. 467 bridge edges on cosmos-sdk; bank's
`UnimplementedMsgServer::Send` points to `x/bank/keeper/msg_server.go`
only, not to `msgClient` siblings or mock files.
- Trace-failure rewrite (`handleTrace`) — when no static path connects
endpoints, instead of telling the agent to call `codegraph_node` (a
3-4-call fan-out), inline both endpoints' bodies (120 lines / 3600 chars
per endpoint), their callers (≤6), and callees (≤8) in one response.
- Trace endpoint-pairing improvements — scores every `from`×`to`
candidate combo by shared directory prefix and tries the best-paired
pair first (the full candidate set, not just FTS top-5). A
less-canonical-path penalty (`enterprise/`, `contrib/`, `examples/`,
`vendor/`, `third_party/`, `deprecated/`, `legacy/`) ensures the
canonical-module pair wins even when a side-experiment shares more of
its directory prefix. Find-path probe budget capped at 20 pairs.
- Test-file deprioritization in `codegraph_explore` `isLowValue` — adds
suffix patterns (`_test.go`, `_spec.rb`, `.test.ts`, `.spec.tsx`,
`Test.java`, `Spec.kt`) alongside the existing directory-style patterns.
Otherwise etcd's `watchable_store_test.go` consumes 5K chars of explore
budget that should go to the hand-written flow source.
Tests:
- New `__tests__/generated-detection.test.ts` (4 unit tests) pins the
suffix patterns.
- New "Go gRPC stub→impl synthesis" integration test suite in
`frameworks-integration.test.ts` (2 tests): positive bridge from stub
to hand-written impl, AND the precision case (don't bridge to a
generated sibling like `msgClient` in the same .pb.go).
- Full suite: 1076/1076 pass.
Empirical (post-fix, n=2 average per question):
| Repo / Q | WITH | WITHOUT | Reads (W/WO) | Time (W/WO)
|-------------------------|------------|-------------|--------------|------------
| cobra (parse cmds) | $0.27 | $0.27 | 0 / 4 | 39s / 60s
| prometheus (scrape→TSDB)| $0.63 | $0.70 | 0 / 6 | 106s/143s
| cosmos-sdk Q1 (MsgSend) | $0.41 | $0.26 | 1 / 2 | 67s / 64s
| cosmos-sdk Q2 (Delegate)| $0.47 | $0.46 | 0 / 5 | 50s / 73s
| cosmos-sdk Q3 (gov tally)| $0.34 | $0.31 | 1.5 / 3 | 54s / 76s
| etcd Q1 (Put→raft) | $0.65 | $0.78 | 0 / 4 | 98s / 129s
| etcd Q2 (watch) | $0.36 | $0.50 | 0 / 4+ | 58s / 89s
Codegraph wins on reads + time on every question. Cost is mixed: 3 clean
wins, 3 tied (within 10%), 1 stubborn cost loss on the grep-favored Q1.
Compared to baseline, the cosmos-sdk cost-gap collapsed from -60% to -15%
on average, and Q3 went from a 75% loss to a tie. Raw run artifacts in
`/tmp/cg-finalv2-*/` and `/tmp/cg-final-*/`.
Memory written at `project_go_multi_module_audit.md` for the methodology
+ before/after numbers.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a codegraph_context task contains a flow keyword ("trace", "from",
"reach", "flow", "propagat", "how does", "how do") AND at least two
distinct PascalCase / camelCase identifiers, internally invoke trace
between the first two extracted symbols and splice the trace body into
the context response. Conservative trigger by design: false positives
waste one graph query; false negatives just fall back to the agent
calling trace itself (existing path-proximity wiring handles either
case).
Goal: collapse the agent's typical context → trace → explore sequence
into a single context call for clear flow queries, closing the
remaining cost-overhead gap on multi-call patterns. The path-proximity
+ less-canonical-path scoring + the trace-failure-inlined-bodies
behavior already let the inline trace land on the right endpoint pair
and return enough material that no follow-up codegraph_node/Read is
needed.
Doesn't fire on:
- cobra's "How does cobra parse commands and flags?" (no PascalCase
symbols) — verified in regression run, no behavior change ($0.260
WITH vs $0.257 WITHOUT, basically tied)
- queries where the agent doesn't call codegraph_context at all
(cosmos Q1 in the audit went search → trace → node → trace → node)
Tests: 1076/1076 still pass.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-out The cosmos-Q1 audit revealed a static-resolution gap: msgServer.Send's *real* next hop is `k.Keeper.SendCoins` — an interface-method call on an embedded field that tree-sitter can't resolve. The static getCallees list for msgServer.Send is all utility/error functions (StringToBytes, Wrapf, …). The actual flow (SendCoins → subUnlockedCoins → addCoins → setBalance) lives entirely inside `x/bank/keeper/send.go`, which is also where the TO endpoint (setBalance) lives. When trace fails (no static path), inline the **top 5 functions/methods in the destination file**, ordered by line-distance from the TO node. This catches the flow that interface-method calls obscure — the canonical "k.<Iface>.<Method>" pattern in Go, also relevant to Java dependency-injection / Rails service-object dispatch / etc. where interface dispatch hides the real call. Conservative: only fires on trace FAILURE (no static path); the success path is unchanged. Per-body cap (40 lines / 1200 chars), top 5 siblings. Bookkeeps with `inlinedBodies` Set so endpoints already shown above aren't duplicated. Result: cosmos-Q1 — historically the most stubborn cost loss (-2.2× to -39% across the audit) — flipped to a clean WIN: $0.257 WITH vs $0.449 WITHOUT (-43%), 34s vs 79s, 0 Reads vs 2 Reads + 5 Greps, 5 codegraph calls vs 12. Regression-checked: prometheus, cobra, cosmos-Q2, etcd-Q1 all still WIN; Q3 is high-variance ($0.30-$0.45 range historically) and fell within that on this run. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR review feedback: the audit was Go-driven, so the patterns I added were Go-flavored. Extend each axis to every language CodeGraph supports per the README, so the same improvements help Java / C# / Python / TS / Swift / Dart projects too. **generated-detection.ts** — Added patterns for: - TS/JS: `.gen.[jt]sx?`, `.pb.[jt]s`, `_pb.[jt]s`, `_grpc_pb.[jt]s` (ts-proto, gRPC-web, Apollo / GraphQL codegen, Hasura). - Python: `_pb2.pyi` (mypy stubs from protobuf). - C#: `.g.cs` (T4 / Razor codegen), `Grpc.cs` (protoc-gen-csharp). - Java: `OuterClass.java` (protoc-gen-java), `Grpc.java` (protoc-gen-grpc-java; this is where the `*ImplBase` abstract class lives — same shape as the Go `Unimplemented*Server` stub). - Swift: `.pb.swift` (protoc-gen-swift). - Dart: `.pb.dart`, `.pbgrpc.dart`, `.chopper.dart`. - Rust: `.generated.rs`. **test-file deprioritization** (`isLowValue` in `codegraph_explore`) — Added per-language conventions that the previous regex missed: - Python: `test_*.py` (pytest discovery) and `*_test.py`. - Ruby: `*_test.rb` (minitest) — `*_spec.rb` already covered. - C#: `*Tests.cs`, `*Test.cs`, `*Spec.cs`. - Swift: `*Tests.swift` (XCTest). - Dart: `*_test.dart`. **IFACE_OVERRIDE_LANGS** in `callback-synthesizer.ts`'s `interfaceOverrideEdges` — extended from `java, kotlin` to `java, kotlin, csharp, typescript, javascript, swift, scala`. Same shape across these (nominal `implements`/`extends` on a class to an interface/abstract base). Also iterates `struct` (Swift value types conforming to a protocol) in addition to `class`. The existing matchesSymbol-style logic and `getOutgoingEdges(..., ['implements', 'extends'])` work unchanged. **CLAUDE.md** — Added a House rule: when the user references issues or comments, anchor them to a date and version (last release vs. last main commit vs. current branch tip) BEFORE concluding a fix is incomplete. Issue #388 comments from May 25-27 were responding to the released v0.9.5 / merged-PR-469 state — not to this branch's in-flight work. The new rule walks through the disambiguation: `grep -m1 '^## \[' CHANGELOG.md` for release version, `git log --first-parent main -1` for main tip. Tests: 1076/1076 still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cumulative changes targeting the small-repo cost gap surfaced by the cross-language audit: 1. **Tool descriptions trimmed** (~2.1KB total saved across 10 tools). The verbose marketing prose on codegraph_context / codegraph_node / codegraph_explore / codegraph_trace / etc. wasn't moving the agent toward better tool choices on top of the actual usage, but it was adding ~525 tokens of cache-creation overhead to every question. The trimmed descriptions keep the operational hints (e.g. "Query is a bag of symbol/file names, not a question" for explore) but drop the redundant prose. 2. **Dynamic tiny-repo tool gating** in `ToolHandler.getTools()`. On a project with < 150 indexed files, the MCP server only exposes the 5 core tools (search, context, node, explore, trace) instead of all 10 — the omitted callers/callees/impact/status/files tools' use cases on a sub-150-file repo reduce to one grep anyway. The MCP tool-defs overhead is the #1 source of cost loss on tiny repos (~$0.10-0.15 fixed cache-creation per question); cutting 5 tools drops that by ~50%. Effect on ky (~25 files, the worst pre-fix offender): - Before: $0.59 WITH vs $0.42 WITHOUT (+42% loss, n=1) - After: $0.32 WITH vs $0.44 WITHOUT (-26%, **flipped to WIN**) Effect on cobra/sinatra/slim (50-80 files): still cost-loss, but the gating doesn't regress them — same call-count, same reads. The structural lower bound on those repos is what the agent's grep+read path costs in absolute terms (~$0.20-0.30). Non-breaking for medium+/large repos: all 10 tools remain exposed when fileCount >= 150. Tests: 1076/1076 still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ky flip to WIN) Combines the tool gating from the previous commit with a matching explore-budget cut for projects under 150 files. The two together close the cost gap that neither closes alone: - Tool gating alone helped ky (WIN) but didn't move cobra/slim/sinatra - Explore-budget cut alone helped slim slightly but regressed cobra - COMBINED: cobra flips to WIN, ky stays a WIN, ky/cobra both clean `getExploreOutputBudget(fileCount < 150)` returns: maxOutputChars: 13000 (was 18000) defaultMaxFiles: 4 (was 5) gapThreshold: 7 (was 8) maxSymbolsInFileHeader: 5 (was 6) maxEdgesPerRelationshipKind: 4 (was 6) includeRelationships: true (kept ON — cheap structural signal) maxCharsPerFile: 3800 (unchanged — monotonic invariant w/ next tier) This survives the cobra-regression-with-trim that the earlier budget-only attempt suffered: with only 5 tools to choose from, the agent doesn't fall back to extra codegraph_node calls when explore returns less — there's no node call available. Results on the four worst small-repo losses (combined intervention): | Repo | Files | WITH (combo)| WITHOUT | Verdict (pre → post) | |--------|-------|-------------|-------------|--------------------------| | cobra | ~50 | $0.25 | $0.31 | loss → **WIN** (-19%) | | ky | ~25 | $0.39 | $0.39 | -42% → tied | | slim | ~80 | $0.31 | $0.24 | LOSS 31% → still LOSS | | sinatra| ~60 | $0.30 | $0.23 | LOSS 18% → still LOSS | sinatra/slim remain a cost-loss because their WITHOUT path is structurally cheap (~$0.20 — fewer than 4 cheap grep+read calls). Codegraph can't beat that absolute floor with any meaningful response. Both still WIN on time + reads + tool-call count. Tests: tier boundary cases updated to cover the new <150 / 150-499 / 500-4999 / 5000-14999 / >=15000 progression. Off-by-one guard updated to include the new 149↔150 boundary. All 1076 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On a <150-file project the entire repo is grep-able in one turn, so the 20-node default `codegraph_context` was paying for a graph subset that exceeds the agent's actual question. Cutting the tiny-repo default to 8 (typical 1-3 entry points + their immediate 1-hop neighbors) reduces the context-tool response body without hitting sufficiency on the flow shapes small repos actually contain. Non-breaking: the agent can still pass an explicit `maxNodes` to override; medium+ repos (>=150 files) keep the 20-node default. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
n=2 audit on cobra/ky/sinatra ruled out cutting below 5 tools (search + context + node + explore + trace) on the tiny-repo tier. The smaller 3-tool gate (search + context + trace) saved ~$0.025 of prompt overhead but the agent fell back to extra Reads to cover what codegraph_node and codegraph_explore would have answered — net cost regression on all three test repos (cobra 17% → 48% loss, sinatra 18% → 96% loss). Documented inline so future tuners don't re-try this dead-end. No behavior change beyond the comment: the 5-tool gate remains the production setting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tested the hypothesis that exposing FEWER tools on micro repos (<50 files) would close the cost gap. Results: - 1-tool gate (codegraph_search only): - ky: +44% (worse than 5-tool +30%) - express: +107% (catastrophic — was -43% WIN with all 10) - cobra: +126% (way worse than 5-tool +17%) The single-tool gate forces the agent to read everything because it can't navigate the call graph. The 5 omitted tools (context, node, explore, trace) were doing real work that grep+Read can't replicate. Conclusion: 5 tools (search + context + node + explore + trace) is the empirical lower bound on the tiny-repo tier. Cutting below regresses EVERY tested repo. The remaining ~$0.04-0.08 of structural cost overhead on tiny repos is unavoidable without sacrificing the value codegraph provides at that scale (which would also make WITH = WITHOUT, defeating the install). Comment documents the dead-ends so future tuners don't relitigate. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… in context, hard-exclude low-value files Three layered changes targeting the sinatra/slim/small-repo cost gap that iter2's body-shrink failed to close (smaller bodies just pushed the agent to Read instead): 1. **Tool-gate threshold 150 → 500** (`TINY_REPO_FILE_THRESHOLD`). Sinatra (~159 files) and slim (~200 files) have the same structural problem as cobra (
…siblings in search ranking On projects with a single file holding the dense majority of internal call edges (e.g. sinatra's `lib/sinatra/base.rb` at ~85% of in-file edges), text search was favoring small focused extension files over the core file. A small focused file like `multi_route.rb` wins on verbatim name match + file-size normalization, burying the 1500-line core file's longer method names (e.g. `route!` vs `route`). Fix: detect the "dominant file" — the file whose in-file edge count is ≥3× the next candidate's — then add +25 to all results sharing its directory prefix. This pulls the core file's siblings above sibling-package extensions without hardcoding any repo structure. `getDominantFile()` excludes test/spec files and generated files (e.g. etcd's `rpc.pb.go` has 4× the in-file edges of `server.go` and would otherwise hijack the boost toward generated protobuf stubs). SQL pulls the top 20 candidates; path-pattern filtering handles what SQLite LIKE can't express.
On small projects (<500 files) with a routing-shaped query, build a URL→handler manifest directly from the graph (each `route` node joins to its handler via `references`/`calls` edges) and inline the top handler file's source. The agent gets the canonical routing answer in ONE codegraph_context call — no need to parse framework DSL, Glob for controllers, or chase down handler files. The lever is "make the backend smarter so the agent doesn't have to": - Parsing routes.rb / routes/api.php / urls.py DSL is the agent's job in the WITHOUT arm. Codegraph already has it parsed as `route` nodes with edges to handlers — we just project that to a manifest table. - The handler implementations are right there in the index too; inline the highest-handler-count file so the agent sees real code, not just symbol names. Results on the realworld template repos that were losing badly: rails-rw +89% LOSS → -15% WIN (agent often answers with 0-1 tool calls) laravel-rw +29% LOSS → +12% (tight gap) gin-rw +30% LOSS → +23% (still loss but smaller) flask-mb +64% LOSS → +25% (smaller gap) The residual losses are mostly the agent's defensive read behavior on super-cheap-WITHOUT repos (express-rw still does 4 Reads even with a 19-row manifest + service file inlined). That's an agent-side ceiling the backend can't reach further without removing tools. Also lands `scripts/agent-eval/probe-sweep.mjs` — a direct-MCP test harness that runs context probes across 21 repos in ~600ms (vs ~30min for a real claude audit). Enables rapid iteration on backend changes: edit tools.ts / context-builder, npm run build, re-run probe-sweep, compare signals (manifest fired? handler file inlined? response size?) before paying for a claude run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eted files) `MCPEngine.catchUpSync()` reconciles the index against the working tree after open (catching `git pull`/`checkout`/`rebase` and any edits or deletes made while no server was running). It was fire-and-forget — so a tool call landing in the first ~50-300ms could race past it and serve rows for files that no longer exist on disk. The per-file staleness banner can't help here, because that signal is populated by the file watcher (not by catch-up). The fix: `catchUpSync()` now pushes its promise into `ToolHandler` via `setCatchUpGate(p)`; the first `execute()` call awaits the gate and then clears it. Subsequent calls pay nothing. Catch-up rejections are logged by the engine and swallowed by the handler so a transient sync failure never breaks tools. Most visible on the "deleted everything between sessions" case, where MCP previously returned stale rows pointing at non-existent files. Validated end-to-end on a 10,640-file VS Code index: with the gate, a codegraph_search for "ExtensionHost" against an empty (but stale-DB) directory returns "No results found" after the catch-up drains the DB; without the gate, the same call returns 10 stale hits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce-override expansion Add entries for work that landed on this branch but wasn't yet in [Unreleased]: tiny-repo tool gating + sufficiency steering + budget tier, auto-inline trace in codegraph_context, routing manifest inline, core-directory ranking boost, JVM-only interfaceOverrideEdges extended to C#/TS/JS/Swift/Scala, and the shorter tool descriptions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related streams that landed together on this branch:
1. Multi-module Go trace quality
Driven by an 8-question agent-eval audit (cobra, gin, prometheus, cosmos-sdk, etcd). The empirical gate ruled out
go.workparsing as the real gap (prometheus crushes without it). Actual failure modes + their fixes:codegraph_search "Send"on cosmos-sdk returned the gRPC stub attx_grpc.pb.go:124first; trace landed on the empty stub and the agent fell back to Read. Fix:src/extraction/generated-detection.ts— path-pattern classifier for.pb.go,.pulsar.go,_grpc.pb.go,_mock.go,_mocks.go,mock_*.go,.generated.[jt]sx?,_pb2(_grpc)?.py,.pb.{cc,h},.g.dart,.freezed.dart. Applied as a stable sort tiebreaker infindSymbol,findAllSymbols,codegraph_search(MCP + CLI),codegraph_explorefile ranking, and context formatter Entry Points / Related Symbols / Code blocks.interfaceOverrideEdges(Java/Kotlin only) doesn't apply. Fix:goGrpcStubImplEdgessynthesizer incallback-synthesizer.ts— detectsUnimplementedXxxServerstructs in generated files, identifies RPC methods (excludingmustEmbed*/testEmbeddedByValue), emitscallsedges to matching methods on any non-generated struct whose method-name set is a superset. 467 bridge edges on cosmos-sdk; bank'sUnimplementedMsgServer::Sendpoints tomsg_server.goonly — not tomsgClientsiblings or mocks.node→search→node→Read fan-out.EndBlockerexists in 20+ modules. Fix: score everyfrom×tocombo by shared directory prefix length (full candidate set, not just FTS top-5), with a less-canonical-path penalty (enterprise/,contrib/,examples/,vendor/,third_party/,deprecated/,legacy/) so the canonical-module pair wins. FindPath probe budget capped at 20.codegraph_exploreisLowValue— adds Go's_test.go, Ruby's_spec.rb, JS/TS.test.ts/.spec.tsx, JVM*Test.java/*Spec.kt. Without this, etcd'swatchable_store_test.goconsumed 5K chars of explore budget.2. Small-repo retrieval tuning (
<500files)The micro-repo tier had its own failure mode: lots of small MCP calls cost more in cache-write tokens than the repo is worth. Three coordinated changes:
search/context/node/explore/trace). Empirically validated as the floor — 3-tool gate regressed cobra/ky/sinatra, 1-tool gate catastrophically regressed express (+107% LOSS).codegraph_contextresponses on sub-500 projects end with a strong directive telling the agent the response IS the comprehensive pass — follow-ups should be narrow (trace from→to, single-symbolnode), not another broadexplore.maxNodesdefaults to 8 instead of 20 on sub-150 context calls.3. Other improvements that landed alongside
codegraph_contextwhen the task looks like "how does X reach Y" — runs the trace internally and splices its body in. Conservative detection (flow keyword + ≥2 PascalCase/camelCase identifiers). Saves the git-hook potential issue when codegraph is not installed globally #2 cost-driver follow-up call on multi-module flow questions.routenodes + theirreferences/callsedges, plus the top handler file's source. Beats the Glob+Read pattern that was winning on realworld template repos (rails-realworld, laravel-realworld, drupal-admintoolbar).base.rbat ~85%) now boost search results in that directory by +25 score, so the core file's siblings outrank sibling-package extensions. Generated/test files excluded from "dominant file" candidacy.interfaceOverrideEdgesextended beyond JVM — Java/Kotlin → also C#, TypeScript, JavaScript, Swift, Scala. Swift conformance iteratesstructnodes too.cg.sync()was fire-and-forget; first tool call now awaits it so files deleted/edited while no server was running can't produce stale rows (per-file staleness banner can't help — that signal is watcher-populated). Subsequent calls pay nothing.codegraph_*descriptions condensed (~50% shorter); load-bearing steering stays inserver-instructions.ts.Empirical results
docs/benchmarks/call-sequence-analysis.mdand the per-arm harness inscripts/agent-eval/track the numbers. Headline cosmos-sdk + etcd table (n=2 per question, headless):Codegraph wins on reads and time across every question. Cost is 3 clean wins, 3 within-10% ties, and 1 stubborn loss on cosmos Q1 (a grep-favored question where the WITHOUT path is structurally short). Cosmos-sdk cost gap collapsed from -60% avg to -15% avg vs baseline; Q3 went from 75% loss to a tie.
Test plan
npm test— 1081 passed (50 files), including new__tests__/generated-detection.test.ts(4 cases pinning the suffixcontract),
__tests__/mcp-catchup-gate.test.ts(5 cases for thegate behavior + drop-after-first-await), Go gRPC stub-impl synthesis
cases in
__tests__/frameworks-integration.test.ts, and the updated__tests__/explore-output-budget.test.tscovering the new<150tiernpm run buildcleango.workrepo, different from cosmos)UnimplementedMsgServer::Send→msgServer::Send, no mock/client false positives🤖 Generated with Claude Code