test(routing): real-world integration report against forgecode CLI by Duy-Nguyen-2006 · Pull Request #3 · Duy-Nguyen-2006/ForgeKit

Duy-Nguyen-2006 · 2026-05-12T14:13:07Z

Summary

Verifies ForgeKit routing end-to-end against the real forge CLI (forgecode@2.12.14, model cx/gpt-5.5 via openai-compatible provider at api.trannhatcse.tokyo/v1), on real public repositories with concrete tasks, instead of generated model fixtures.

Adds tests/integration/:

real-world-tasks.json — 20 hand-written prompts tied to real repos (expressjs/express, vercel/next.js, prisma/prisma, stripe/stripe-node, microsoft/playwright, shadcn-ui/ui, …).
run-real-tasks.cjs — scores them with the deterministic router.
real-world-tasks-results.json — captured results.
README.md — full methodology and findings.

Results

Deterministic router (scripts/route-intent.cjs)

Existing fixtures: 98/100 (98.0%) — unchanged baseline.
Hand-written real-repo prompts: 19/20 (95.0%).

End-to-end via forge -p ':ck:auto ...' against real cloned repos: 5/5 expected behaviours.

#	Repo	Task	Action	Primary	Conf	Verdict
1	expressjs/express	Tạo REST API endpoint POST /api/products với middleware validate request body	route	backend-development	1.00	PASS
2	expressjs/express	Viết unit test với Jest cho lib/router/index.js, tăng test coverage	route	test	1.00	PASS
3	expressjs/express	Viết playwright tests for login page tại tests/login.spec.ts	disambiguate	auth vs web-testing	0.50	PASS (asked, didn't mis-route)
4	expressjs/express	Thêm đăng nhập Google OAuth2 với JWT session management	route	auth	1.00	PASS
5	shadcn-ui/ui	Thiết kế landing page responsive đẹp với dark mode cho coffee shop	route	ui-ux-pro-max	1.00	PASS

All decisions hit .forgekit/route-log.jsonl with intent hashed (no raw text on disk) — verified.

Issues found (documented, not fixed in this PR)

B1 — installer ships an incomplete MCP runtime. bin/lgmmo-forgekit-installer.js doesn't copy mcp-server/, doesn't write a real project-root .mcp.json (only .mcp.json.example inside .forge/), and doesn't pull @modelcontextprotocol/sdk. After npx lgmmo-forgekit-installer, ForgeCode has no MCP router and the "MANDATORY FIRST ACTION: call route_intent" prompt silently degrades.
B2 — login verb collision (fixture #34 + microsoft/playwright real case): auth.verbs contains the bare word "login", hijacking testing prompts that mention a login page.
B3 — Deploy và scan security routes to security-scan instead of deploy because both verbs+nouns match and the noun-bonus tilts security-scan ahead.

See tests/integration/README.md for full reproduction details and suggested fixes.

Test plan

npm run test:routing — baseline 98/100
node tests/integration/run-real-tasks.cjs — 19/20 on hand-written prompts
5 live forge -p runs against cloned express/ui repos with MCP wired manually
(Follow-up) fix B1 in installer
(Follow-up) fix B2/B3 in routing table

Generated by Claude Code

Tested the deterministic router and end-to-end MCP routing through the actual `forge` CLI (forgecode@2.12.14, model cx/gpt-5.5 via openai- compatible provider) on real repositories (expressjs/express, shadcn-ui/ui) with concrete tasks. Results, runner, and findings live under tests/integration/. Key findings: - Deterministic router: 19/20 (95%) on 20 hand-written real-repo prompts; the one miss reproduces existing fixture #34 (login → auth vs test). - End-to-end via `forge -p ':ck:auto ...'`: 5/5 expected behaviours, including correct `disambiguate` action on the ambiguous case. - B1 (installer ships incomplete MCP runtime): mcp-server/ + .mcp.json + @modelcontextprotocol/sdk are not copied by lgmmo-forgekit-installer, so MCP routing silently fails after `npx` install. - B2 / B3: documented `login`-keyword collision and deploy+security compound-intent regression.

Copilot

Pull request overview

Adds a new tests/integration/ “real-world” routing evaluation bundle to validate ForgeKit’s deterministic router (and document end-to-end CLI+MCP observations) using hand-written prompts tied to public repos.

Changes:

Add a Node runner to score real-world prompts via scripts/route-intent.cjs and write a results JSON snapshot.
Add a curated prompt set (real-world-tasks.json) and a captured run output (real-world-tasks-results.json).
Add a methodology/report README documenting setup, results, and known routing/installer issues.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
tests/integration/run-real-tasks.cjs	Adds a CLI-style runner to route prompts and emit a results JSON file.
tests/integration/real-world-tasks.json	Adds 20 real-repo prompt cases with expected primary skills.
tests/integration/real-world-tasks-results.json	Commits a snapshot of routing results for the 20 cases.
tests/integration/README.md	Documents methodology, commands, results, and observed issues (B1–B3).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+const { route } = require('../../scripts/route-intent.cjs');
+const fs = require('fs');
+
+const cases = JSON.parse(fs.readFileSync('./real-world-tasks.json', 'utf8'));


+  }
+}
+console.log(`\n${pass}/${cases.length} passed (${(pass/cases.length*100).toFixed(1)}%)`);
+fs.writeFileSync('./real-world-tasks-results.json', JSON.stringify(results, null, 2));


+deterministic router (no LLM in the loop):
+
+```
+$ node tests/integration/run-real-tasks.cjs


+because the action was `disambiguate` (gap 0.20 < 0.15 threshold violated),
+the orchestrator correctly stopped and asked the user. The product flow is
+not broken on that input even though the top score is wrong.


+  }
+}
+console.log(`\n${pass}/${cases.length} passed (${(pass/cases.length*100).toFixed(1)}%)`);
+fs.writeFileSync('./real-world-tasks-results.json', JSON.stringify(results, null, 2));


Duy-Nguyen-2006 marked this pull request as ready for review May 12, 2026 14:20

Copilot AI review requested due to automatic review settings May 12, 2026 14:20

Duy-Nguyen-2006 merged commit 1fb675b into main May 12, 2026
2 checks passed

Copilot started reviewing on behalf of Duy-Nguyen-2006 May 12, 2026 14:20 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(routing): real-world integration report against forgecode CLI#3

test(routing): real-world integration report against forgecode CLI#3
Duy-Nguyen-2006 merged 1 commit into
mainfrom
claude/test-forgekit-routing-OGMPj

Duy-Nguyen-2006 commented May 12, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Duy-Nguyen-2006 commented May 12, 2026

Summary

Results

Issues found (documented, not fixed in this PR)

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants