feat: add insforge diagnose command for backend health diagnostics#32
feat: add insforge diagnose command for backend health diagnostics#32
Conversation
- Inline isOssMode check to avoid cross-module coupling - Add ProjectNotLinkedError check to db and logs subcommands - Replace Math.max(...array) with reduce to prevent stack overflow - Remove unused source parameter from parseLogEntry
WalkthroughAdds a new top-level Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…terfaces Network metrics (network_in/network_out) are returned per-interface by the API, causing duplicate rows. Now sums across interfaces into a single row per metric.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/index.ts (1)
166-168: Add a description for thediagnosecommand group.Other command groups include descriptions (e.g.,
db,functions,secrets,schedules), butdiagnoseis missing one. This affects--helpoutput consistency.Proposed fix
// Diagnose commands -const diagnoseCmd = program.command('diagnose'); +const diagnoseCmd = program.command('diagnose').description('Backend health diagnostics'); registerDiagnoseCommands(diagnoseCmd);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/index.ts` around lines 166 - 168, The diagnose command group created via program.command('diagnose') has no description, causing inconsistent --help output; update the call that defines diagnoseCmd (the program.command('diagnose') invocation) to include a short descriptive string (e.g., program.command('diagnose').description('...')) or otherwise set a description on the diagnoseCmd before calling registerDiagnoseCommands(diagnoseCmd) so the --help output matches other groups; ensure the description is concise and mirrors the style used for db/functions/secrets/schedules.src/commands/diagnose/logs.ts (1)
51-61: Consider parallelizing source fetches.The current implementation fetches each log source sequentially. With 4 sources, parallelization could reduce latency. However, this is a minor optimization and acceptable as-is.
Optional: parallel fetch implementation
export async function fetchLogsSummary(limit = 100): Promise<SourceSummary[]> { - const results: SourceSummary[] = []; - for (const source of LOG_SOURCES) { - try { - results.push(await fetchSourceLogs(source, limit)); - } catch { - results.push({ source, total: 0, errors: [] }); - } - } - return results; + const settled = await Promise.allSettled( + LOG_SOURCES.map((source) => fetchSourceLogs(source, limit)), + ); + return settled.map((result, i) => + result.status === 'fulfilled' + ? result.value + : { source: LOG_SOURCES[i], total: 0, errors: [] }, + ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/commands/diagnose/logs.ts` around lines 51 - 61, fetchLogsSummary currently iterates LOG_SOURCES sequentially calling fetchSourceLogs which increases latency; change it to kick off parallel fetches (e.g., map LOG_SOURCES to promises of fetchSourceLogs) and await them together using Promise.allSettled (or Promise.all with per-promise catch) to preserve per-source error handling, then convert settled results into SourceSummary objects (using fetchSourceLogs, LOG_SOURCES, and SourceSummary to locate code) so failures produce { source, total: 0, errors: [] } while successful results are pushed as before.docs/specs/2026-03-27-diagnose-command-design.md (1)
32-32: Add language specifiers to fenced code blocks.Static analysis flagged code blocks at lines 32, 66, 94, and 151 as missing language specifiers. Since these are output mockups, use
textorplaintextto satisfy the linter.Example fix for line 32
-``` +```text ┌─────────────────────────────────────────────────┐Apply similar changes to code blocks at lines 66, 94, and 151.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/specs/2026-03-27-diagnose-command-design.md` at line 32, Fenced code blocks that show the ASCII output mockups (they start with a triple backtick followed by the box-drawing line "┌─────────────────────────────────────────────────┐" and similar blocks later) are missing language specifiers; update each triple-backtick fence to include a language such as text or plaintext (e.g., change ``` to ```text) for every output mockup block (the one beginning with the box-drawing line and the other similar mockup blocks) so the linter stops flagging them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/specs/2026-03-27-diagnose-implementation-plan.md`:
- Around line 699-750: The plan snippet is out of sync with the shipped
implementation: it imports isOssMode from metrics.js and uses spread-based
Math.max(...vals), both of which were intentionally removed; update the plan to
match the real code by (1) importing the correct helper (or removing the
isOssMode import) and referencing the actual OSS detection used in
registerDiagnoseCommands, and (2) replacing any spread-based Math.max usage with
the safe alternative used in the implementation (e.g., Math.max.apply or an
explicit loop/reduce) for computing max in the metrics handling so the document
mirrors the shipped functions fetchMetricsSummary, registerDiagnoseCommands, and
the metrics aggregation logic.
In `@src/commands/diagnose/index.ts`:
- Around line 33-38: The current code models OSS skips as rejected promises
which get treated as errors by the aggregator; change the ossMode branches for
metricsPromise and advisorPromise to return a fulfilled sentinel (e.g.,
Promise.resolve(null) or Promise.resolve({ skipped: true })) instead of
Promise.reject(...), keeping fetchMetricsSummary and fetchAdvisorSummary calls
unchanged, and ensure downstream code that inspects the results (the
aggregator/renderer that consumes metricsPromise and advisorPromise) explicitly
checks for that sentinel and renders a "skipped" state rather than treating it
as an error.
---
Nitpick comments:
In `@docs/specs/2026-03-27-diagnose-command-design.md`:
- Line 32: Fenced code blocks that show the ASCII output mockups (they start
with a triple backtick followed by the box-drawing line
"┌─────────────────────────────────────────────────┐" and similar blocks later)
are missing language specifiers; update each triple-backtick fence to include a
language such as text or plaintext (e.g., change ``` to ```text) for every
output mockup block (the one beginning with the box-drawing line and the other
similar mockup blocks) so the linter stops flagging them.
In `@src/commands/diagnose/logs.ts`:
- Around line 51-61: fetchLogsSummary currently iterates LOG_SOURCES
sequentially calling fetchSourceLogs which increases latency; change it to kick
off parallel fetches (e.g., map LOG_SOURCES to promises of fetchSourceLogs) and
await them together using Promise.allSettled (or Promise.all with per-promise
catch) to preserve per-source error handling, then convert settled results into
SourceSummary objects (using fetchSourceLogs, LOG_SOURCES, and SourceSummary to
locate code) so failures produce { source, total: 0, errors: [] } while
successful results are pushed as before.
In `@src/index.ts`:
- Around line 166-168: The diagnose command group created via
program.command('diagnose') has no description, causing inconsistent --help
output; update the call that defines diagnoseCmd (the
program.command('diagnose') invocation) to include a short descriptive string
(e.g., program.command('diagnose').description('...')) or otherwise set a
description on the diagnoseCmd before calling
registerDiagnoseCommands(diagnoseCmd) so the --help output matches other groups;
ensure the description is concise and mirrors the style used for
db/functions/secrets/schedules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 993d4c09-babb-42c3-8910-ec2e47b4258f
📒 Files selected for processing (10)
docs/specs/2026-03-27-diagnose-command-design.mddocs/specs/2026-03-27-diagnose-implementation-plan.mdpackage.jsonsrc/commands/diagnose/advisor.tssrc/commands/diagnose/db.tssrc/commands/diagnose/index.tssrc/commands/diagnose/logs.tssrc/commands/diagnose/metrics.tssrc/index.tssrc/lib/api/platform.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/commands/diagnose/db.ts`:
- Line 149: The diagnostic queries are static SELECTs but currently call
runRawSql with unrestricted=true, which unnecessarily broadens privileges;
update both call sites (the invocations that look like
runRawSql(DB_CHECKS[key].sql, true)) to call the read-only form (either pass
false instead of true or omit the unrestricted flag per runRawSql's signature)
so the checks use the restricted/read-only SQL path; keep the SQL source
DB_CHECKS[key].sql unchanged.
- Around line 169-181: The loop over checkNames currently logs unknown check
names and continues, which leads to partial/empty results; update the behavior
in the loop that inspects DB_CHECKS[name] (inside the for-of iterating
checkNames built from opts.check/ALL_CHECKS) to fail fast instead of continuing:
when a lookup yields no check, raise an error or call process.exit(1) after
printing a clear message that includes the invalid name and ALL_CHECKS, so the
command (including --json consumers) receives a non-zero failure rather than
silent partial success.
- Around line 145-153: The runDbChecks function currently swallows SQL errors
and sets results[key] = [] which hides failures; change the catch block in
runDbChecks (which iterates ALL_CHECKS and calls runRawSql with
DB_CHECKS[key].sql) to preserve per-check error metadata instead of coercing to
an empty array — e.g., assign results[key] to an array/object that includes the
error message/stack and identifying info (error, message, maybe sql or check id)
so downstream code in diagnose/index.ts can distinguish "no findings" vs "DB
unavailable"; keep the successful path returning rows unchanged and ensure error
serialization is safe (stringify message/stack).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6ea8c721-14ab-4103-97c9-39f5b9f7be4e
📒 Files selected for processing (2)
src/commands/diagnose/db.tssrc/commands/diagnose/metrics.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/commands/diagnose/metrics.ts
|
screenshot as: |
|
@jwfing what happens if there's no data? so the cloud backend advisor only runs once per day right? oh it can see the database recent errors systme metrics it's nice if you can do a knowledge share or short blurb in slack!! |
just display |
|
|
||
| | Check | SQL | | ||
| |-------|-----| | ||
| | `connections` | `SELECT count(*) AS active FROM pg_stat_activity WHERE state IS NOT NULL` combined with `SHOW max_connections` | |
There was a problem hiding this comment.
is this very heavy? will this overload nano instance?
There was a problem hiding this comment.
no, not heavy at all, it counts from shared memory.
Summary
insforge diagnosecommand group for SRE-style backend health diagnosticsdiagnose(no subcommand): comprehensive health report aggregating metrics, advisor, DB checks, and logsdiagnose metrics: EC2 instance metrics (CPU, memory, disk, network) with latest/avg/max statsdiagnose advisor: latest advisor scan results with severity/category filteringdiagnose db: 7 predefined PostgreSQL health checks (connections, slow queries, bloat, table sizes, index usage, locks, cache hit ratio)diagnose logs: error-level log aggregation across all 4 backend log sources--api-key, gracefully skips Platform API calls (metrics/advisor) and only runs DB + logs checks--jsonoutput for agent consumptionTest plan
insforge diagnose --helpshows all subcommandsinsforge diagnoseproduces comprehensive health report (with linked project)insforge diagnose metrics --range 1hdisplays EC2 metrics tableinsforge diagnose advisor --severity criticalfilters issuesinsforge diagnose db --check connections,cache-hitruns specific checksinsforge diagnose logs --source postgres.logsfilters by sourceinsforge --json diagnoseoutputs valid JSON--api-key): metrics/advisor show N/A, db/logs work normally🤖 Generated with Claude Code
Note
Add
insforge diagnosecommand group for backend health diagnosticsdiagnosecommand group to theinsforgeCLI with four subcommands:metrics,advisor,db, andlogs.diagnosecommand runs all checks concurrently viaPromise.allSettledand renders a combined health report in table or JSON format.diagnose metricsfetches CPU/memory/disk/network metrics from the Platform API;diagnose advisorfetches the latest advisor scan and issues;diagnose dbruns predefined SQL health checks (connections, slow queries, bloat, locks, cache-hit);diagnose logsaggregates error-level log entries across sources.platformFetchfrom src/lib/api/platform.ts so diagnose subcommands can call the Platform API.Macroscope summarized efb7abb.
Summary by CodeRabbit
New Features
insforge diagnosewith subcommands:metrics,advisor,db,logsand a comprehensive health report.User-Facing Enhancements
--jsonconsolidated output with per-section partial-failure reporting.Documentation
Chores