Follow-ups: structured diagnostics logger + begin App() de-monolith (find-in-page hook)#17
Open
Floofy6 wants to merge 5 commits into
Open
Conversation
…atches
Operational failures were console.error at best and silent catch {} at worst, so a user
losing data or a tab crashing left no durable breadcrumb. electron/logger.cjs writes leveled
JSON lines to a size-capped, single-backup file under the profile (prism-diagnostics.log),
still mirrors to the console, and never throws into its caller (it's used from catch blocks
and crash handlers). It's tiny and synchronous on purpose — only low-frequency events log,
never per-request.
Wired into the points that previously had no durable record: secure-store corruption
quarantine, uncaughtException, unhandledRejection, and guest render-process-gone. Level is
configurable via PRISM_LOG_LEVEL.
tests/logger.cjs covers level filtering, structured fields, rotation, and the never-throws
guarantee. electron-smoke boots clean with the logger in place.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
First slice of the renderer de-monolith (the audit's biggest structural finding: App() is one ~6,200-line component holding all state, with ~59 hand-maintained ref-mirrors). Lifts the entire find-in-page feature — open/query/matches state, the input ref, the in-page search calls, and the focus-on-open + auto-close effects — out of App() into src/hooks/useFindInPage.ts. The hook takes only what it needs (activeIdRef, isActiveWebview, a stable getActiveWebview getter) and returns a small API. App() keeps the JSX and wires reportFound into each BrowserFrame's onFound; the keyboard handler / command palette / agent find-text tool call openFindBar(). Return names match the old locals, so only the 4 setFindOpen(true) calls and the onFound prop changed — every read-site stayed put. The run/close callbacks are now stable (they read the active webview via a stable getter instead of capturing activeId), so they no longer churn on tab switches. Adds tests/find-in-page.cjs — find-in-page had NO coverage before; it now has an e2e (Cmd+F open, controlled query, Escape close) that also serves as the regression net for the extraction. Establishes the pattern for future slices (useTabModel, useAssistantChat, useAgentRun). Full suite + tsc green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… cache on re-encode Addressing low-severity findings from the xhigh code review: - secure-store writeAsync: re-check the write sequence before stamping the cache after the awaited rename, so a beforeunload sync flush that superseded us can't leave the cache holding our stale items under the newer file's mtime. Also drop the dead 'const seq =' binding in writeSync (the writeSeq bump is the load-bearing effect; the unused local was a refactor smell). - match-pattern: collapse runs of '*' before building the path regex so a '**' glob doesn't compile to adjacent '.*.*' (needless backtracking on long non-matching paths in did-navigate). - reencodeSecureStoresToPlaintext: invalidate the chat/memory store caches after the out-of-band atomicWriteFileSync rewrites (wires the previously-unused invalidateCache), so a same-mtime-tick re-encode can't be masked by a stale cache. Refuted in review (no change): wouldClobberEncrypted TOCTOU (isEncryptionAvailable checked first, serialize is synchronous after), cross-extension token leak (per-origin realm isolation), uncaughtException semantics (intentional, audit-recommended). secure-store + match-pattern + logger unit tests + tsc pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…le renames Adversarially-verified Windows-portability audit of the session's changes (29 agents, 6 confirmed / 16 refuted) surfaced one HIGH cross-platform regression and a few file-lock edges: 1. Guest crash recovery (HIGH). The new main-process render-process-gone handler auto-reloaded the crashed guest, but Electron fires that same event on BOTH the guest WebContents (main) and the <webview> DOM element (renderer). The renderer's pre-existing handleCrash deliberately does NOT auto-reload — it shows an error page and lets the user retry — specifically to avoid reload loops under Windows GPU contention (the path it guards with a per-URL/lifetime cap). The main-process reloader was a second, uncoordinated reloader with only a 15s time throttle and no lifetime cap, reintroducing exactly that loop on Windows. Fix at altitude: the renderer owns recovery; the main process now ONLY logs the crash (its stated purpose) and steps aside. This also resolves the no-lifetime-cap and reason-allow-list findings (it no longer reloads at all). 2. Log rotation (LOW, Windows file lock). rotateIfNeeded renamed the log over its ".1" backup; on Windows a rename can fail if the log is held open (a tail, antivirus, the search indexer), silently skipping rotation so the log grows unbounded. Now: remove the prior ".1" first (rename over an existing target is fragile on win32), and if the rename still fails, truncate in place so the size cap always holds — we lose the backup, not the bound. 3. Store renames (LOW, Windows file lock). atomicWriteFileSync, the writeAsync commit, and the corrupt-file quarantine all rename over a live file; under a transient AV/indexer lock that throws EPERM/EBUSY on Windows. Added a small bounded EPERM/EBUSY/EACCES retry (sync + async). POSIX never hits it, so the first attempt wins there. Newly load-bearing because memory/runs gained atomic temp+rename this session. Refuted by the audit (no change): the new electron/*.cjs ARE bundled (build.files has "electron/**/*"); mode 0o600 is correctly a guarded no-op on Windows; the 0700 profile chmod and 0600 token file are pre-existing guarded no-ops (not session regressions); safeStorage use is platform-abstract (DPAPI on Windows); the npm test scripts and worker-revival are cross-platform. Full suite (23) + tsc green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The grounding gate let a factual final answer through after ANY grounding tool ran — including a web-search that only returned snippets. The v6 failure mode it missed: a model searched, saw the result list, then answered a population number from priors instead of opening and reading a source. Splits grounding into two tiers: - grounding tools (web-search, read-active-page, read-tabs-content, read-selected-text, extract-data) gather leads, as before; - source-read tools (read-active-page, read-selected-text, read-tabs-content, extract-data) are the subset that bring actual source text into the observations. New pure helpers in agent.ts: isAgentSourceReadTool classifies a tool, and getAgentFinalGroundingBlock returns "none" | "no-evidence" | "needs-source-read" from the run counters (autonomy, grounding/source-read observation counts, action tools run, pushbacks). Both are unit-tested in agent-harness-logic.cjs. App.tsx runAgentTask wires it in: - track sourceReadObservations alongside groundingObservations; - on a "needs-source-read" block, push back with guidance to open the 1-3 best result URLs and read/extract them (bounded by the existing <2 pushback budget so no-source and future-event traps still land honestly); - as a last resort when a source tab is open but unread after the budget, auto-issue a read-tabs-content on the run's opened tabs instead of finalizing; - defer memory writes until source-read evidence exists, so a fact never gets stored from priors; - if still ungrounded after the budget, land an honest safety-fallback answer and skip the critic resynthesis pass on that fallback. Telemetry + eval: agent-live-run.cjs now records a stepTimeline and memoryEvents (with timing) so a run can be judged on WHEN it read a source vs. wrote memory. Adds tests/agent-policy-compare.cjs — a live prism-brain vN-vs-vM policy/action comparison (source-read before factual answers, no memory-before-source, valid-directive rate, agent-call efficiency) — wired as the eval:agent-policy script. It needs local Ollama + the live web, so it is intentionally NOT part of npm test. tsc + agent-harness-logic green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two follow-ups from the audit, stacked on #16 (base is
claude/five-huge-improvements, so this PR's diff shows only the follow-up work — retarget tomainonce #16 merges).Structured diagnostics logger (
electron/logger.cjs). Operational failures wereconsole.errorat best and silentcatch {}at worst — no durable breadcrumb when a user lost data or a tab crashed. The logger writes leveled JSON lines to a size-capped, single-backup file under the profile (prism-diagnostics.log), still mirrors to the console, and never throws into its caller (it's used from catch blocks and crash handlers). Wired into the points that previously had no record: secure-store corruption quarantine,uncaughtException,unhandledRejection, and guestrender-process-gone. Level viaPRISM_LOG_LEVEL. Covered bytests/logger.cjs(level filtering, structured fields, rotation, never-throws).Begin the App() de-monolith — extract find-in-page (
src/hooks/useFindInPage.ts). First slice of the renderer's biggest structural issue (one ~6,200-lineApp()holding all state, with ~59 hand-maintained ref-mirrors). Lifts the entire find-in-page feature — state, input ref, in-page search calls, focus-on-open + auto-close effects — into a hook that takes only what it needs (activeIdRef,isActiveWebview, a stablegetActiveWebviewgetter). Return names match the old locals, so only the 4setFindOpen(true)calls and theonFoundprop changed; every read-site stayed put. The run/close callbacks are now stable (read the active webview via a stable getter instead of capturingactiveId), so they no longer churn on tab switches. Addstests/find-in-page.cjs— this feature had no test before. Establishes the pattern for the next slices (useTabModel,useAssistantChat,useAgentRun).Testing
npm test— all 23 suites pass (incl. the newloggerandfind-in-pagesuites);tscclean.🤖 Generated with Claude Code