Skip to content

Weekly tech debt audit: miso-chat - 2026-06-24 #622

Description

@itsmiso-ai

Weekly tech debt audit: miso-chat - 2026-06-24

Issue URL: #622

Summary / Overall Risk Level

Overall risk level: P1 / medium. Compared to the 2026-06-03 audit (#533), the codebase has moved forward: server.js is down from 1,803 → 1,513 lines with lib/auth-session.js, lib/session-auth.js, lib/mobile-manifest-validator.js, lib/link-preview-cache.js, lib/ssrf-validation.js carved out, the test suite grew from 174 → 274 passing tests, lint runs real ESLint (0 errors, 36 warnings), npm audit reports 0 vulnerabilities, the Dockerfile now runs as non-root node, and OTA manifest trust is now schema/tag/asset-host validated. Almost every P1 from the prior audit closed via PRs #534#546.

What remained or emerged during the 0.4.13 → 0.4.19 rapid release cycle:

No P0 was confirmed in this read-only audit. I did not implement fixes, open PRs, create child issues, or call decomposition APIs. Open audit follow-ups at audit time: only #440 (Renovate dashboard) and #387 (umbrella for 0.5.0 planning) remain open on the repo.

Top Findings

P0

  • None confirmed. No live secret exposure, no unauthenticated destructive route, no failing mainline CI in this pass.

P1

  • Route-level CSRF tokens ship but the frontend never fetches or sends them (issue Add route-level CSRF tokens for browser session commands #537 partially regressed). security.js:121-129 documents that the frontend MUST fetch /api/csrf-token and send X-CSRF-Token. The route exists at server.js:453 but grep -rn "X-CSRF-Token\|/api/csrf-token" public/ public/lib/ public/mobile/ returns zero matches outside server files. csrfTokenCheck then early-returns when req.session.csrfToken is unset (security.js:152-154), so today's protection is only origin-check + SameSite. Send/abort/reaction/logout state changes are reachable from any origin that origin-checks will accept (capacitor://localhost, ionic://localhost, app://localhost, null, plus configured CORS_ORIGIN/ALLOWED_ORIGINS/CSRF_TRUSTED_ORIGINS).

  • Android release APKs are not signed for production (issue Replace debug APK release lane with production-grade artifact policy #536 partially regressed). android/app/build.gradle:19-25 declares release { minifyEnabled false; proguardFiles ... } with no signingConfig, and the release workflow builds assembleRelease + bundleRelease against that type. The OTA manifest's apkUrl for stable/beta points at these artifacts (android-release.yml:113,121,135,151). Without a release keystore wired up (env-injected or committed), installs from OTA fail on real devices, and the published APK/AAB is indistinguishable from a debug build.

  • script-src 'unsafe-inline' after PR fix: remove CSP nonce that blocks inline scripts #604 removed the inline-script nonce. security.js:204 now allows inline scripts; public/index.html has 30+ inline <script> blocks, plus uses div.innerHTML = '<a href="' + url + '"...' (public/index.html:3233) and other innerHTML sinks driven by gateway-controlled image URLs. The CSP defense-in-depth that would catch a future XSS bug is gone. The same PR also shipped because frontend startup regression was caught late; need a CSP-preserving inline-script policy (hash-based or extracted) plus pre-merge frontend smoke.

  • Auth regression introduced by merge automation and caught post-merge (PRs [codex] restore frontend startup #618, [codex] restore shared session access #620). PR [codex] restore frontend startup #618 broke frontend startup; [codex] restore shared session access #620 had to re-instate shared session access after the auth boundary tightened too much and broke isAuthenticated semantics for agent:main:main keys. Both were catchable by integration tests that don't yet exist for the boot path or the deployment-boundary semantics. This is the same regression class as bug(release): release workflow fails before image build on app startup regressions #374 (release workflow) and chore(release): bump version to 0.4.17 #617/[codex] restore frontend startup #618 (frontend startup).

P2

  • public/index.html is still 4,085 lines mixing UI, auth, queue, SSE, rendering, reactions, and notifications. Some extraction to public/lib/ (api-client, capacitor-detect, render-utils, reaction-events-browser) happened, but session-key hydration, mobile auth callback, queueing, theme, message rendering, history poll, and the rest still live in one HTML file. This is the same P2 Extract public/index.html runtime modules #541 had, partially closed.

  • server.js is 1,513 lines. Auth/sessions, Gateway WS, sessions routes, link previews, mobile OTA manifest proxying, SSE, reactions, and reactions all coexist. #540 partially closed; remaining hotspots are session/agent routes (738-1346) and reactions (1321-1365).

  • No tests cover /api/agents, /api/assistant-identity, /api/csrf-token, /api/openclaw-status, /api/openclaw-stop, or the /api/config exposure contract (only /api/sessions redirects and CSRF-origin cases in tests/authz-integration.test.js). The previously closed Add authorization/integration test matrix #539 added the test file but the matrix is sparse for new endpoints.

  • No graceful shutdown handler. No process.on('SIGTERM'/'SIGINT') in server.js or lib/*.js — the gateway WebSocket manager does not get a disconnect() on SIGTERM, leaving an open WS until process exit. Affects rollouts in Kubernetes and the Docker HEALTHCHECK drain.

  • Dead devDependency: typescript ^6.0.0 in package.json:40 with no .ts files anywhere in the repo (only .js and one .gradle).

  • Two update managers with overlapping concepts. lib/update-manager.js (server metadata helper, 164 lines) and public/mobile/update-manager.js (browser OTA lifecycle, 280 lines) both compute semver comparisons, both look at GitHub releases, but they live separately. Consolidate mobile update manager logic #542 partially closed — the client-side module now documents lib/update-manager.js as "for reference" but the server module still pulls @capacitor/core at import time, which is only safe because Capacitor?.isNativePlatform?.() short-circuits in Node. Importing the server module on Node without @capacitor/core installed would still crash.

  • README/OTA-docs drift. README changelog stops at v0.4.13. docs/OTA-UPDATES.md still references CAPGO_API_KEY and Capgo Cloud despite public/mobile/update-manager.js:7-9 saying "No Capgo account/API key required" and the validator rejecting cloud-shaped artifacts. Current package.json is 0.4.19, current release is 0.4.19.

  • Link preview performance still soft-bounded. Cache + coalescer exist (lib/link-preview-cache.js, applied at server.js:967-983), but _fetchLinkPreview can still do DNS resolution + up to 5 redirect hops + up to LINK_PREVIEW_MAX_HTML_CHARS (default 250,000) per uncached request. Tests cover cache behavior but not concurrent SSRF-DNS timeout pressure.

  • CSRF/origin integration tests don't cover the rotated-token path or the X-CSRF-Token missing-with-token-in-session path. tests/authz-integration.test.js exercises csrfOriginCheck (trusted/untrusted) and the unauthenticated redirects for /api/sessions//api/agents. It does not exercise csrfTokenCheck at all (the middleware name is even destructured out in tests/security.test.js:4).

P3

  • CSP is set per-request but not parsed/tested for frame-ancestors, object-src, form-action, or base-uri. security.js:198-205 emits the policy but no test asserts the full set.
  • addImage error path uses innerHTML with url interpolated unescaped (public/index.html:3233). Only reachable if a gateway response produces an images[] entry whose src= is javascript: — link-preview URLs are filtered to http(s), but data.images is not. Worth a unit test + use textContent/createElement instead.
  • No log-level management. 34 console.log/console.warn/console.error calls in server.js; nothing reads LOG_LEVEL or redirects structured logs to stderr. Pairs poorly with the systemd/k8s log shipping model.
  • docker-compose.yml mounts . to /app (docker-compose.yml:11), which means the SQLite DB and data/ directory inside the container are shadowed by the host bind mount. Fine for dev, surprising for ops, and a footgun if anyone copies the compose file for production.
  • scripts/release-readiness-check.sh and scripts/post-deploy-smoke.sh are not invoked by any workflow (manual-only). Good for humans, easy to forget before a release.
  • /api/health body shape is documented in scripts but not in README, and CI doesn't probe it post-deploy — #543 container-hardening PR doesn't add a healthcheck wiring into the release/publish workflow.

Evidence

Repository / context:

Code/test/docs volumes (from wc -l):

server.js                                  1513
security.js                                 234
lib/auth-session.js                         545
lib/capacitor-detect.js                      34
lib/db.js                                   160
lib/emoji-shortcodes.js                     94
lib/gateway-ws.js                           459
lib/link-preview-cache.js                  165
lib/mobile-manifest-validator.js           264
lib/reaction-events-browser.js             131
lib/reaction-events.js                      74
lib/render-utils.js                        201
lib/session-auth.js                         29
lib/ssrf-validation.js                     123
lib/update-manager.js                      164
public/lib/api-client.js                   304
public/lib/capacitor-detect.js              34
public/lib/reaction-events-browser.js      131
public/lib/render-utils.js                 203
public/mobile/sw.js                        122
public/mobile/update-manager.js            280
public/index.html                         4085
tests/                                     4492

Tests + lint + audit:

$ npm test
# summary
274 tests, 274 pass, 0 fail, duration ~30.1s
$ npx --no-install eslint server.js security.js lib/ tests/
✖ 36 problems (0 errors, 36 warnings)
$ npm audit --omit=dev
found 0 vulnerabilities

P1 evidence — CSRF token half-implementation:

  • security.js:121-129 — frontend-integration note: "The frontend MUST fetch a fresh CSRF token on page load via GET /api/csrf-token. The frontend MUST include the token in the X-CSRF-Token header for all state-changing browser requests."
  • security.js:152-154if (!req.session || !req.session.csrfToken) { return next(); } (early-returns when token is absent).
  • server.js:452-457 — route defined, no callers.
  • grep -rn "X-CSRF-Token\|x-csrf-token\|csrfToken\|/api/csrf-token" public/ public/lib/ public/mobile/ → only matches in security.js and server.js. Zero matches in browser code.
  • tests/security.test.js:4 destructures csrfTokenCheck out of the array (it's not exercised).
  • tests/authz-integration.test.js only tests csrfOriginCheck, never csrfTokenCheck.
  • PR Add route-level CSRF tokens for browser session commands #537 (da48538, "feat: add route-level CSRF tokens for browser session commands") modified only security.js, server.js, and tests/security.test.js. No public/lib or public/index.html changes.
  • PR [codex] restore shared session access #620 (610bfe5, "[codex] restore shared session access") shows the same regression class: a codex-style change tightened isAuthenticated too far and broke the deployment access boundary; merged, then hot-fixed.

P1 evidence — Android release builds unsigned:

  • android/app/build.gradle:19-25:
    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
        }
    }
    No signingConfig. No signingConfigs block anywhere in the repo (grep -r signingConfig android/ returns nothing).
  • .github/workflows/android-release.yml:48-66 calls ./gradlew assembleRelease and ./gradlew bundleRelease against that build type.
  • android-release.yml:113,121,135,151 writes the manifest apkUrl for stable/beta/internal channels pointing at the unsigned APK.
  • No android/key.properties or env-injected keystore handling. grep -r keystore android/ → no release keystore.

P1 evidence — CSP unsafe-inline after PR #604:

  • security.js:198-205:
    "default-src 'self'",
    ...
    "script-src 'self' 'unsafe-inline'",
    "style-src 'self' 'unsafe-inline'"
  • security.js:88-89: "Since index.html is served as a static file (no template rendering), nonce-based inline script restriction is not feasible."
  • public/index.html contains 30+ inline script bodies (grep -c '<script' public/index.html = 24+).
  • public/index.html:3233: div.innerHTML = '<a href="' + url + '" target="_blank" ...' — the url is interpolated unescaped.
  • PR fix: remove CSP nonce that blocks inline scripts #604 (26cc382, "fix: remove CSP nonce that blocks inline scripts") is the proximate cause; fix: render-utils.js module not defined + Rocket Loader script conflict #616, [codex] restore frontend startup #618 then restored frontend startup.
  • tests/security.test.js:24 asserts script-src 'self' 'unsafe-inline' — locking the regression in.

P1 evidence — auth / frontend regressions during 0.4.17-0.4.19:

These five fixes shipped in 0.4.15 → 0.4.19 to repair regressions from prior merges — pattern identical to closed P0 #374 ("release workflow fails before image build on app startup regressions").

P2 evidence — file size and module boundaries:

  • public/index.html:1-4085: still houses auth, secure storage, message rendering, queue, SSE, theme, emoji, mobile callback, session-key hydration, history polling, and update notifications in one file.
  • server.js:738-1365: agents / sessions / history / send / send-stream / link-preview / openclaw-status / openclaw-stop / reactions routes still inline.
  • lib/update-manager.js:1-13: requires @capacitor/core for Capacitor?.isNativePlatform?.(). The browser-side module no longer needs Capgo Cloud, but the server module still pulls @capacitor/core and silently relies on the short-circuit.
  • package.json:40: "typescript": "^6.0.0". find . -name "*.ts" -not -path "./node_modules/*" -not -path "./android/*" → no matches.
  • README.md:242-322: changelog last entry is ### v0.4.13 (2026-06-04). Current release is 0.4.19 (2026-06-23).
  • docs/OTA-UPDATES.md:14-38: still references CAPGO_API_KEY and "Optional Capgo Cloud Integration" despite the browser update-manager.js saying no cloud key is used.

P2 evidence — missing tests and missing shutdown:

  • server.js: no process.on('SIGTERM'|'SIGINT'|'SIGHUP') handler. gatewayWsManager.disconnect() is only called from inside request handlers.
  • tests/authz-integration.test.js:1-3 documents the matrix but contains only ~6 tests (origin trust, unauth /api/sessions redirect, unauth /api/agents redirect, mobile origins, null origin). No tests for /api/csrf-token behavior, /api/assistant-identity, /api/openclaw-status/-stop, or /api/config.
  • tests/mobile-update-behavioral.test.js and tests/mobile-apk-flow-regression.test.js test the validator/flow but do not test gatewayWsManager disconnect on shutdown or graceful drain.

P3 evidence — minor:

  • security.js:198-205: CSP doesn't assert form-action, base-uri, media-src, or worker-src.
  • public/index.html:3228-3233: addImage onerror uses innerHTML with url interpolation. Link-preview URLs are http(s) filtered (extractUniqueUrls at line 1487) but data.images from gateway responses are not. The same gateway controls chat.send results.
  • server.js: 34 console.* calls, no LOG_LEVEL reading, no log shipping hint.
  • docker-compose.yml:11: volumes: - .:/app shadowing the in-container /app/data directory.
  • scripts/release-readiness-check.sh and scripts/post-deploy-smoke.sh: not invoked by release.yaml, publish-release.yml, or manual-release.yml workflows (manual only).
  • .github/workflows/release.yaml/publish-release.yml/manual-release.yml do not invoke ./scripts/post-deploy-smoke.sh against a target.

Recommended Issue Breakdown

  1. P1 — Finish CSRF token frontend integration (Add route-level CSRF tokens for browser session commands #537 follow-up): make public/index.html (or the extracted public/lib/api-client.js successor) fetch /api/csrf-token on page load, attach X-CSRF-Token to every state-changing request from apiFetch, retry once on 403 with a fresh token, and add integration tests covering both the accepted-token and rotated-token paths.
  2. P1 — Wire Android release signing: add signingConfigs.release driven by env-injected keystore (with android/key.properties gitignored) and document the required GitHub Actions secrets; reject release builds when signing config is missing instead of silently publishing unsigned APKs.
  3. P1 — Restore inline-script CSP without breaking startup: extract or hash-inline the 30+ scripts in public/index.html so script-src 'self' 'sha256-…' (or similar) replaces 'unsafe-inline'; update tests/security.test.js to assert the tightened policy; add a frontend-boot smoke that fails the PR if index.html can't load without 'unsafe-inline'.
  4. P1 — Add pre-merge guard-rails for codex-style regressions: integration smoke that runs npm run lint && npm test && curl http://localhost:3000/api/health && curl -c jar http://localhost:3000/login && POST /api/sessions/:key/send with a stubbed gateway; required check on every PR; targets the same failure class that bug(release): release workflow fails before image build on app startup regressions #374 / [codex] restore frontend startup #618 / [codex] restore shared session access #620 needed hot-fixes for.
  5. P2 — Continue extracting public/index.html runtime modules: session-key hydration (hydrateStoredSessionKey, persistStoredSessionKey), secure-storage wrapper, mobile-auth callback flow, message rendering (renderBubbleContent), queue persistence, theme/shortcodes, history polling. Each as a public/lib/*.js with unit tests; document the seam before extraction.
  6. P2 — Continue splitting server.js by boundary: move agents / sessions / history / send / send-stream into lib/routes/sessions.js, reactions into lib/routes/reactions.js, openclaw-status / openclaw-stop into lib/routes/openclaw.js, link-preview into lib/routes/link-preview.js. Keep middleware order and error contracts stable.
  7. P2 — Expand the authz/integration test matrix: cover /api/csrf-token (issue, rotate, reject), /api/assistant-identity, /api/openclaw-status/-stop denial paths, /api/config exposure contract, and OIDC vs local auth interactions under multi-user settings.
  8. P2 — Add graceful shutdown: install process.on('SIGTERM'|'SIGINT') handler that closes the HTTP server, drains the SSE client set, calls gatewayWsManager.disconnect(), waits up to a budget, then exits. Test the drain with a fake server.
  9. P2 — Remove dead typescript devDependency and re-evaluate whether the server-side lib/update-manager.js should drop its @capacitor/core import in favor of a tiny isNode check; either way, add a unit test that imports the module in a Node-only context.
  10. P2 — Repair README / docs drift: backfill changelog through v0.4.19, remove stale CAPGO_API_KEY and Capgo Cloud references from docs/OTA-UPDATES.md, align the release runbook with the current manual-release.yml + publish-release.yml flow.
  11. P2 — Harden link-preview performance: add explicit max concurrent fetches per host, jittered retry on 5xx, structured timeout metrics (dns, connect+headers, body-read, overall), and tests that simulate simultaneous SSRF-DNS slow responses.
  12. P3 — Replace addImage innerHTML sink with createElement/textContent and add an isSafeImageUrl filter that rejects javascript:/data: schemes (except data:image/...) for gateway-supplied image URLs.
  13. P3 — Wire post-deploy smoke into release/publish workflows: invoke scripts/post-deploy-smoke.sh after publish-release.yml creates the GitHub release and on manual-release.yml if a deploy URL is configured; fail the release on smoke failure.
  14. P3 — Tighten CSP with form-action, base-uri, frame-ancestors, media-src, worker-src and add header-presence tests; document any third-party origin that needs an explicit allowance.

Not Worth Doing Yet

Decomposed into

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent/saffronauditAudit, review, or investigation work.enhancementNew feature or improvement.priority/p1High priority.umbrellaParent issue that may need decomposition.

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions