feat: add insforge diagnose command for backend health diagnostics by jwfing · Pull Request #32 · InsForge/CLI

jwfing · 2026-03-27T18:46:39Z

Summary

Add insforge diagnose command group for SRE-style backend health diagnostics
diagnose (no subcommand): comprehensive health report aggregating metrics, advisor, DB checks, and logs
diagnose metrics: EC2 instance metrics (CPU, memory, disk, network) with latest/avg/max stats
diagnose advisor: latest advisor scan results with severity/category filtering
diagnose db: 7 predefined PostgreSQL health checks (connections, slow queries, bloat, table sizes, index usage, locks, cache hit ratio)
diagnose logs: error-level log aggregation across all 4 backend log sources
OSS mode support: when linked via --api-key, gracefully skips Platform API calls (metrics/advisor) and only runs DB + logs checks
All commands support --json output for agent consumption

Test plan

insforge diagnose --help shows all subcommands
insforge diagnose produces comprehensive health report (with linked project)
insforge diagnose metrics --range 1h displays EC2 metrics table
insforge diagnose advisor --severity critical filters issues
insforge diagnose db --check connections,cache-hit runs specific checks
insforge diagnose logs --source postgres.logs filters by source
insforge --json diagnose outputs valid JSON
OSS-linked project (--api-key): metrics/advisor show N/A, db/logs work normally
Unlinked project: shows "No project linked" error

🤖 Generated with Claude Code

Note

Add `insforge diagnose` command group for backend health diagnostics

Adds a new diagnose command group to the insforge CLI with four subcommands: metrics, advisor, db, and logs.
The top-level diagnose command runs all checks concurrently via Promise.allSettled and renders a combined health report in table or JSON format.
diagnose metrics fetches CPU/memory/disk/network metrics from the Platform API; diagnose advisor fetches the latest advisor scan and issues; diagnose db runs predefined SQL health checks (connections, slow queries, bloat, locks, cache-hit); diagnose logs aggregates error-level log entries across sources.
OSS-linked projects skip metrics and advisor checks, marking them as N/A in the report.
Exports platformFetch from src/lib/api/platform.ts so diagnose subcommands can call the Platform API.

^{Macroscope summarized efb7abb.}

Summary by CodeRabbit

New Features
- Added insforge diagnose with subcommands: metrics, advisor, db, logs and a comprehensive health report.
User-Facing Enhancements
- Human-readable tables and --json consolidated output with per-section partial-failure reporting.
- Metrics: range selection, aggregated network metrics, latest/avg/max.
- Advisor: scan summary with filtered issues.
- DB: predefined PostgreSQL health checks with per-check results.
- Logs: multi-source retrieval with error/fatal filtering and summaries.
Documentation
- Added design spec and implementation plan for the diagnose command.
Chores
- Bumped package version to 0.1.32

- Inline isOssMode check to avoid cross-module coupling - Add ProjectNotLinkedError check to db and logs subcommands - Replace Math.max(...array) with reduce to prevent stack overflow - Remove unused source parameter from parseLogEntry

coderabbitai · 2026-03-27T18:46:51Z

Walkthrough

Adds a new top-level diagnose CLI command group with four flat subcommands (metrics, advisor, db, logs) that concurrently collect backend health data and emit either human-friendly tables or a unified JSON report with per-source failure isolation and aggregated errors.

Changes

Cohort / File(s)	Summary
Specs `docs/specs/2026-03-27-diagnose-command-design.md`, `docs/specs/2026-03-27-diagnose-implementation-plan.md`	Added design and implementation plan detailing command surface, subcommands, JSON report schema, error handling, data shaping rules, and recommended file layout/wiring.
Diagnose core & orchestration `src/commands/diagnose/index.ts`, `src/index.ts`	New `diagnose` top-level command registration and root handler that runs sub-sources concurrently (Promise.allSettled), aggregates results/errors, formats JSON vs console output, and records CLI usage.
Subcommands — metrics & advisor `src/commands/diagnose/metrics.ts`, `src/commands/diagnose/advisor.ts`	New `diagnose metrics` and `diagnose advisor` subcommands: auth/project checks, platform fetches, query option parsing, data enrichment (metrics latest/avg/max; advisor scan + issues), and JSON/table rendering.
Subcommands — db & logs `src/commands/diagnose/db.ts`, `src/commands/diagnose/logs.ts`	New `diagnose db` implementing configurable read‑only SQL checks and `runDbChecks()`; `diagnose logs` fetching multiple log sources, error-line extraction, summaries, and per-source error details.
API helper export `src/lib/api/platform.ts`	Made `platformFetch` exported for use by the new diagnose modules.
Package metadata `package.json`	Bumped package version from `0.1.31` → `0.1.32`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

0.1.21 #26 — package version bump (related to this PR's package.json version update).

Poem

🐇 I hopped through endpoints, sniffed each log and trace,
Metrics, scans, and DBs—lined up in tidy place.
Promise.allSettled kept each piece unfrayed,
Tables or JSON — a rabbit-built brigade,
I nudge a bug away and nibble on a grace. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title clearly and concisely summarizes the main change: adding a new `insforge diagnose` command for backend health diagnostics, which aligns with the comprehensive changeset introducing the diagnose command group and its subcommands.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/diagnosis

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

src/commands/diagnose/advisor.ts

docs/specs/2026-03-27-diagnose-implementation-plan.md

…terfaces Network metrics (network_in/network_out) are returned per-interface by the API, causing duplicate rows. Now sums across interfaces into a single row per metric.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

src/index.ts (1)

166-168: Add a description for the diagnose command group.

Other command groups include descriptions (e.g., db, functions, secrets, schedules), but diagnose is missing one. This affects --help output consistency.

Proposed fix

 // Diagnose commands
-const diagnoseCmd = program.command('diagnose');
+const diagnoseCmd = program.command('diagnose').description('Backend health diagnostics');
 registerDiagnoseCommands(diagnoseCmd);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/index.ts` around lines 166 - 168, The diagnose command group created via
program.command('diagnose') has no description, causing inconsistent --help
output; update the call that defines diagnoseCmd (the
program.command('diagnose') invocation) to include a short descriptive string
(e.g., program.command('diagnose').description('...')) or otherwise set a
description on the diagnoseCmd before calling
registerDiagnoseCommands(diagnoseCmd) so the --help output matches other groups;
ensure the description is concise and mirrors the style used for
db/functions/secrets/schedules.

src/commands/diagnose/logs.ts (1)

51-61: Consider parallelizing source fetches.

The current implementation fetches each log source sequentially. With 4 sources, parallelization could reduce latency. However, this is a minor optimization and acceptable as-is.

Optional: parallel fetch implementation

 export async function fetchLogsSummary(limit = 100): Promise<SourceSummary[]> {
-  const results: SourceSummary[] = [];
-  for (const source of LOG_SOURCES) {
-    try {
-      results.push(await fetchSourceLogs(source, limit));
-    } catch {
-      results.push({ source, total: 0, errors: [] });
-    }
-  }
-  return results;
+  const settled = await Promise.allSettled(
+    LOG_SOURCES.map((source) => fetchSourceLogs(source, limit)),
+  );
+  return settled.map((result, i) =>
+    result.status === 'fulfilled'
+      ? result.value
+      : { source: LOG_SOURCES[i], total: 0, errors: [] },
+  );
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/commands/diagnose/logs.ts` around lines 51 - 61, fetchLogsSummary
currently iterates LOG_SOURCES sequentially calling fetchSourceLogs which
increases latency; change it to kick off parallel fetches (e.g., map LOG_SOURCES
to promises of fetchSourceLogs) and await them together using Promise.allSettled
(or Promise.all with per-promise catch) to preserve per-source error handling,
then convert settled results into SourceSummary objects (using fetchSourceLogs,
LOG_SOURCES, and SourceSummary to locate code) so failures produce { source,
total: 0, errors: [] } while successful results are pushed as before.

docs/specs/2026-03-27-diagnose-command-design.md (1)

32-32: Add language specifiers to fenced code blocks.

Static analysis flagged code blocks at lines 32, 66, 94, and 151 as missing language specifiers. Since these are output mockups, use text or plaintext to satisfy the linter.

Example fix for line 32

-```
+```text
 ┌─────────────────────────────────────────────────┐

Apply similar changes to code blocks at lines 66, 94, and 151.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/specs/2026-03-27-diagnose-command-design.md` at line 32, Fenced code
blocks that show the ASCII output mockups (they start with a triple backtick
followed by the box-drawing line
"┌─────────────────────────────────────────────────┐" and similar blocks later)
are missing language specifiers; update each triple-backtick fence to include a
language such as text or plaintext (e.g., change ``` to ```text) for every
output mockup block (the one beginning with the box-drawing line and the other
similar mockup blocks) so the linter stops flagging them.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/specs/2026-03-27-diagnose-implementation-plan.md`:
- Around line 699-750: The plan snippet is out of sync with the shipped
implementation: it imports isOssMode from metrics.js and uses spread-based
Math.max(...vals), both of which were intentionally removed; update the plan to
match the real code by (1) importing the correct helper (or removing the
isOssMode import) and referencing the actual OSS detection used in
registerDiagnoseCommands, and (2) replacing any spread-based Math.max usage with
the safe alternative used in the implementation (e.g., Math.max.apply or an
explicit loop/reduce) for computing max in the metrics handling so the document
mirrors the shipped functions fetchMetricsSummary, registerDiagnoseCommands, and
the metrics aggregation logic.

In `@src/commands/diagnose/index.ts`:
- Around line 33-38: The current code models OSS skips as rejected promises
which get treated as errors by the aggregator; change the ossMode branches for
metricsPromise and advisorPromise to return a fulfilled sentinel (e.g.,
Promise.resolve(null) or Promise.resolve({ skipped: true })) instead of
Promise.reject(...), keeping fetchMetricsSummary and fetchAdvisorSummary calls
unchanged, and ensure downstream code that inspects the results (the
aggregator/renderer that consumes metricsPromise and advisorPromise) explicitly
checks for that sentinel and renders a "skipped" state rather than treating it
as an error.

---

Nitpick comments:
In `@docs/specs/2026-03-27-diagnose-command-design.md`:
- Line 32: Fenced code blocks that show the ASCII output mockups (they start
with a triple backtick followed by the box-drawing line
"┌─────────────────────────────────────────────────┐" and similar blocks later)
are missing language specifiers; update each triple-backtick fence to include a
language such as text or plaintext (e.g., change ``` to ```text) for every
output mockup block (the one beginning with the box-drawing line and the other
similar mockup blocks) so the linter stops flagging them.

In `@src/commands/diagnose/logs.ts`:
- Around line 51-61: fetchLogsSummary currently iterates LOG_SOURCES
sequentially calling fetchSourceLogs which increases latency; change it to kick
off parallel fetches (e.g., map LOG_SOURCES to promises of fetchSourceLogs) and
await them together using Promise.allSettled (or Promise.all with per-promise
catch) to preserve per-source error handling, then convert settled results into
SourceSummary objects (using fetchSourceLogs, LOG_SOURCES, and SourceSummary to
locate code) so failures produce { source, total: 0, errors: [] } while
successful results are pushed as before.

In `@src/index.ts`:
- Around line 166-168: The diagnose command group created via
program.command('diagnose') has no description, causing inconsistent --help
output; update the call that defines diagnoseCmd (the
program.command('diagnose') invocation) to include a short descriptive string
(e.g., program.command('diagnose').description('...')) or otherwise set a
description on the diagnoseCmd before calling
registerDiagnoseCommands(diagnoseCmd) so the --help output matches other groups;
ensure the description is concise and mirrors the style used for
db/functions/secrets/schedules.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 993d4c09-babb-42c3-8910-ec2e47b4258f

📥 Commits

Reviewing files that changed from the base of the PR and between b4bb6ad and d24e815.

📒 Files selected for processing (10)

docs/specs/2026-03-27-diagnose-command-design.md
docs/specs/2026-03-27-diagnose-implementation-plan.md
package.json
src/commands/diagnose/advisor.ts
src/commands/diagnose/db.ts
src/commands/diagnose/index.ts
src/commands/diagnose/logs.ts
src/commands/diagnose/metrics.ts
src/index.ts
src/lib/api/platform.ts

docs/specs/2026-03-27-diagnose-implementation-plan.md

src/commands/diagnose/index.ts

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/commands/diagnose/db.ts`:
- Line 149: The diagnostic queries are static SELECTs but currently call
runRawSql with unrestricted=true, which unnecessarily broadens privileges;
update both call sites (the invocations that look like
runRawSql(DB_CHECKS[key].sql, true)) to call the read-only form (either pass
false instead of true or omit the unrestricted flag per runRawSql's signature)
so the checks use the restricted/read-only SQL path; keep the SQL source
DB_CHECKS[key].sql unchanged.
- Around line 169-181: The loop over checkNames currently logs unknown check
names and continues, which leads to partial/empty results; update the behavior
in the loop that inspects DB_CHECKS[name] (inside the for-of iterating
checkNames built from opts.check/ALL_CHECKS) to fail fast instead of continuing:
when a lookup yields no check, raise an error or call process.exit(1) after
printing a clear message that includes the invalid name and ALL_CHECKS, so the
command (including --json consumers) receives a non-zero failure rather than
silent partial success.
- Around line 145-153: The runDbChecks function currently swallows SQL errors
and sets results[key] = [] which hides failures; change the catch block in
runDbChecks (which iterates ALL_CHECKS and calls runRawSql with
DB_CHECKS[key].sql) to preserve per-check error metadata instead of coercing to
an empty array — e.g., assign results[key] to an array/object that includes the
error message/stack and identifying info (error, message, maybe sql or check id)
so downstream code in diagnose/index.ts can distinguish "no findings" vs "DB
unavailable"; keep the successful path returning rows unchanged and ensure error
serialization is safe (stringify message/stack).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6ea8c721-14ab-4103-97c9-39f5b9f7be4e

📥 Commits

Reviewing files that changed from the base of the PR and between d24e815 and b6da2cf.

📒 Files selected for processing (2)

src/commands/diagnose/db.ts
src/commands/diagnose/metrics.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/commands/diagnose/metrics.ts

src/commands/diagnose/db.ts

jwfing · 2026-03-27T19:12:32Z

screenshot as:

$ insforge diagnose --help
Usage: insforge diagnose [options] [command]

Backend diagnostics — run with no subcommand for a full health report

Options:
  -h, --help         display help for command

Commands:
  metrics [options]  Display EC2 instance metrics (CPU, memory, disk, network)
  advisor [options]  Display latest advisor scan results and issues
  db [options]       Run database health checks (connections, bloat, index usage, etc.)
  logs [options]     Aggregate error-level logs from all backend sources

$ insforge diagnose     

  InsForge Health Report — Sudo Database Assistant

── System Metrics (last 1h) ────────────────────
  CPU: 4.9%   Memory: 75.7%
  Disk: 66.3%  Network: ↑555B/s ↓597B/s

── Advisor Scan ────────────────────────────────
  3/27/2026 (completed) — 3 critical · 0 warning · 0 info

── Database ────────────────────────────────────
  Connections: 5/100  Cache Hit: 98.7%
  Dead tuples: 26   Locks waiting: 0

── Recent Errors (last 100 logs/source) ────────
  insforge.logs: 0  postgREST.logs: 0  postgres.logs: 1  function.logs: 0

tonychang04 · 2026-03-27T20:16:57Z

@jwfing what happens if there's no data? so the cloud backend advisor only runs once per day right?

oh it can see the database recent errors systme metrics

it's nice if you can do a knowledge share or short blurb in slack!!

jwfing · 2026-03-27T20:18:43Z

what happens if there's no data?

just display N/A.

tonychang04 · 2026-03-27T20:19:52Z

docs/specs/2026-03-27-diagnose-command-design.md

+
+| Check | SQL |
+|-------|-----|
+| `connections` | `SELECT count(*) AS active FROM pg_stat_activity WHERE state IS NOT NULL` combined with `SHOW max_connections` |


is this very heavy? will this overload nano instance?

no, not heavy at all, it counts from shared memory.

jwfing added 8 commits March 27, 2026 11:25

feat(diagnose): add metrics subcommand with EC2 metrics display

155310a

feat(diagnose): add advisor subcommand with scan summary and issues

1038055

feat(diagnose): add db subcommand with predefined health checks

c9039a5

feat(diagnose): add logs subcommand with error aggregation

d83f0a1

feat(diagnose): add comprehensive health report and command registration

20bf502

feat(diagnose): register diagnose command group in CLI entry point

c588b8b

fix(diagnose): address code review findings

bba0744

- Inline isOssMode check to avoid cross-module coupling - Add ProjectNotLinkedError check to db and logs subcommands - Replace Math.max(...array) with reduce to prevent stack overflow - Remove unused source parameter from parseLogEntry

bump version

d24e815

macroscopeapp bot reviewed Mar 27, 2026

View reviewed changes

src/commands/diagnose/advisor.ts Show resolved Hide resolved

docs/specs/2026-03-27-diagnose-implementation-plan.md Show resolved Hide resolved

fix(diagnose): aggregate metrics by name to merge multiple network in…

0a2c56f

…terfaces Network metrics (network_in/network_out) are returned per-interface by the API, causing duplicate rows. Now sums across interfaces into a single row per metric.

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

docs/specs/2026-03-27-diagnose-implementation-plan.md Outdated Show resolved Hide resolved

src/commands/diagnose/index.ts Show resolved Hide resolved

jwfing added 3 commits March 27, 2026 11:57

fix(diagnose): add schema prefix to index-usage table names

b6da2cf

fix(diagnose): pass apiUrl to requireAuth for custom API server support

8f1e4ad

fix(diagnose): swap network direction arrows (↓in ↑out)

d085bb1

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

src/commands/diagnose/db.ts Show resolved Hide resolved

src/commands/diagnose/db.ts Show resolved Hide resolved

src/commands/diagnose/db.ts Show resolved Hide resolved

docs(diagnose): sync implementation plan with shipped code

efb7abb

jwfing requested a review from tonychang04 March 27, 2026 19:11

tonychang04 reviewed Mar 27, 2026

View reviewed changes

tonychang04 self-requested a review March 27, 2026 21:26

tonychang04 approved these changes Mar 27, 2026

View reviewed changes

jwfing merged commit 83b1ed0 into main Mar 27, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add insforge diagnose command for backend health diagnostics#32

feat: add insforge diagnose command for backend health diagnostics#32
jwfing merged 13 commits intomainfrom
feat/diagnosis

jwfing commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jwfing commented Mar 27, 2026

Uh oh!

tonychang04 commented Mar 27, 2026 •

edited

Loading

Uh oh!

jwfing commented Mar 27, 2026

Uh oh!

tonychang04 Mar 27, 2026

Uh oh!

jwfing Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jwfing commented Mar 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Add insforge diagnose command group for backend health diagnostics

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jwfing commented Mar 27, 2026

Uh oh!

tonychang04 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jwfing commented Mar 27, 2026

Uh oh!

tonychang04 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

jwfing Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jwfing commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

Add `insforge diagnose` command group for backend health diagnostics

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

tonychang04 commented Mar 27, 2026 •

edited

Loading