feat(skills): add debug-issue skill#2612
Closed
NilanshBansal wants to merge 1 commit into
Closed
Conversation
Skyler-style end-to-end debugging skill ported from appsmith-v2/kite-triage-bot. Three-path decision tree (app_id given / derive via cluster logs / fallback), per-application sequence (metadata → orchestrator → logs → app files → Vercel), evidence persistence to /workspace/agent/debug_evidence/ with deterministic filenames, structured ###OUTPUT_START### / ###OUTPUT_END### output format.
This was referenced May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this skill does
debug-issueis a Skyler-powered triage skill that investigates application or platform-wide incidents end-to-end. Given an incoming message describing an error, it fetches the right logs, correlates them, and produces a structured root-cause analysis — without requiring the user to specify what to look at.Three execution paths
The skill picks exactly one path based on whether an
application_idis present:search_cluster_logs(up to 5 refinement rounds), and — if a valid app UUID surfaces — continues into Path A's per-application steps.MCP/tool requirements
Primary (required):
mcp__skyler__*— covers metadata, orchestrator summary,fetch_all_debug_logs(Langfuse + Grafana + E2B bundle), per-source re-fetches, cluster log search, app file download, and Vercel proxy.Optional / conditional:
mcp__skyler__get_slack_thread— when the user references a Slack thread_tsmcp__skyler__get_vercel) — triggered automatically when deploy/domain keywords appear in the messagemcp__nanoclaw__send_message— for progress updates at major step transitionsNo Langfuse, GitHub, or Grafana MCP servers are called directly; all observability access goes through
mcp__skyler__*.Workspace layout
/workspace/agent/debug_evidence//workspace/agent/output.md###OUTPUT_START###…###OUTPUT_END###+ JSON filter block + MCP call log)Evidence filenames follow the pattern:
<tool_id>__<start>_<end>__<app_id>.json(e.g.grafana__20260423T120000Z_20260423T130000Z__550e8400-e29b-41d4-a716-446655440000.json).Provenance
Ported from the
debug-issueskill inappsmith-v2/kite-triage-bot(OpenCode/E2B environment). The core decision tree, step sequence, output format, and analysis principles transfer directly. Adapted for NanoClaw:skyler_*→mcp__skyler__*tool names/home/user/skyler/sandbox paths →/workspace/agent/workspace paths[STEP]/[STATUS]/[HEARTBEAT]runner logs →mcp__nanoclaw__send_messageprogress updates/home/user/skyler/repo) andskyler_search_repo_code/skyler_read_repo_file(not available) removed; Step 4 fallback refactored accordinglymcp__skyler__fetch_all_debug_logspromoted to primary fast-path in Step 3c (it wasn't explicitly called out in the original flow)Findings discovered during the port
Finding 1 — Skill file location: Skills must live under
container/skills/<name>/SKILL.mdin the repo root, not undergroups/<name>/skills/. This is not documented in CLAUDE.md, which caused initial confusion when writing the destination path. Recommend adding a one-liner to CLAUDE.md: "Skills live atcontainer/skills/<skill-name>/SKILL.md."Finding 2 —
${VAR}not expanded in MCP args: The Claude MCP launcher does not expand${VAR}environment variable references that appear inside theargsarray of the per-MCPenvblock. HTTP MCPs wrapped viamcp-remotethat rely on injected env vars in their args fail silently at connection time without a clear error. Recommend either (a) pre-expanding env vars in NanoClaw's container-runner before passing the config to Claude, or (b) documenting this limitation explicitly so skill authors know to use literal values or a different injection mechanism.Validated by
Used this skill in production immediately after porting it. It correctly root-caused a fork/claim race condition on production app
5309a049-c630-41c2-b745-a2d805d66eb7("Paisajismo Nativo"): the/api/v1/applications/{id}/forkendpoint was returning HTTP 500 because sandbox boot was triggered before the forked website's EFS directory was fully provisioned — consistently failing within 7 seconds of fork record creation, while a later successful fork (next morning) completed in 18 seconds with the filesystem ready. Four orphaned fork records were identified for the affected user (paisaje@paisajismonativo.com). The skill ran Path A (app_id provided), fetched metadata + orchestrator summary + full debug bundle + Vercel deployments, and identified the root cause without any manual log queries.