Skip to content

Security Providers: vault-backed secret management for the desktop (AIR-017→023)#673

Open
0xr00tf3rr3t wants to merge 32 commits into
fathah:mainfrom
0xr00tf3rr3t:secrets/04
Open

Security Providers: vault-backed secret management for the desktop (AIR-017→023)#673
0xr00tf3rr3t wants to merge 32 commits into
fathah:mainfrom
0xr00tf3rr3t:secrets/04

Conversation

@0xr00tf3rr3t

Copy link
Copy Markdown
Contributor

Security Providers: vault-backed secret management for the desktop

This PR adds a Security Providers capability to the desktop app: choose and test a
secrets provider (KeePassXC/command, etc.), have vault-resolved credentials feed the
gateway, and surface clear config-health diagnostics — all without writing secrets to
plaintext .env. It then hardens that feature against a series of real, live-data-found
correctness and credential-routing bugs (the AIR-017→023 series below).

What it adds

  • Security Providers section in Settings — pick a provider, test it, see which keys it
    resolves (labeled "Vault Provided"), and Refresh-from-vault.
  • First-run onboarding step for choosing a secrets provider, including a guided
    KeePassXC vault-create flow (picker + prerequisites, not a dead-end).
  • Vault-aware config diagnostics — config-health surfaces missing/mismatched
    credentials and never blocks Send on a credential the gateway can actually resolve.
  • Off-main-thread TPM seal + vault db-create (AIR-016) so the UI never freezes during
    a multi-second crypto operation.

Correctness / credential-routing hardening (found by live-data testing)

Each of these was found by exercising the feature against a real vault + real OAuth
token
, not mocks — and each is fixed with a RED-proven regression test:

  • AIR-017secretsProviderStatus listed vault keys plus all of process.env (130
    badges, not 6). Now lists vault-only keys.
  • AIR-018 — Setup didn't detect an Anthropic credential stored as the vault OAuth
    token (CLAUDE_CODE_OAUTH_TOKEN); added alias detection + a "use vault / enter a key"
    toggle.
  • AIR-019 — Setup could persist an empty model.default (→ model: "" → API 400).
    Now never writes an empty model, and a blank field falls back to a model discovered via
    the provider's /v1/models.
  • AIR-020 — the env-key-mismatch auto-fix would offer to copy an unrelated service's
    token (e.g. MATRIX_ACCESS_TOKEN) into ANTHROPIC_API_KEY — a credential-bleed.
    Restricted to known aliases.
  • AIR-021 — a populated OAuth-token alias was flagged as a "wrong name" mismatch with a
    one-click "copy into ANTHROPIC_API_KEY" fix that would send the OAuth token as
    x-api-key → 401. A populated accepted alias is now treated as satisfied (no banner,
    no copy).
  • AIR-022 — the pre-send chat-readiness gate blocked Send with "Missing
    ANTHROPIC_API_KEY" for vault OAuth users. Now accepts the alias.
  • AIR-023 — the credential-name list the desktop forwards from the security provider to
    the gateway (KNOWN_API_KEYS) had drifted out of sync with the gateway provider plugins'
    env_vars, dropping every OAuth/Bearer name and per-vendor alias (anthropic, gemini,
    copilot, zai, kimi, dashscope, xai, nvidia, …). Extended to mirror the plugins, with a
    drift-guard test.

Tests

Full suite green except 3 pre-existing reconcile-streamed-with-db failures that are
unrelated to this work (present on the base). Each AIR fix ships a behavior-contract test
(not a snapshot), and the credential-name set has a drift-guard so it can't silently fall
out of sync with the provider plugins again.

Notes for reviewers

  • The primary buildGatewayEnv path already overlayed the full provider secret set; the
    AIR-023 fix brings the CLI-fallback path to parity.
  • Credential-name equivalence (canonical key ↔ OAuth/Bearer/alias names) is now consistent
    across all five gates (install, config-health warn, config-health mismatch, Setup detect,
    chat-readiness) plus the env-forward layer.

@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a vault-backed secrets management layer to the desktop app — a two-stage setup wizard (security provider first, then model provider), a Settings panel for testing/editing vault keys, async vault create/TPM-seal, and a KeePassXC onboarding guide — and then fixes six live-data-found credential-routing bugs (AIR-017 through AIR-023) found by exercising the feature against a real vault.

  • AIR-017–022: Vault key listing fixed to show only provider-resolved keys (not the full process.env), OAuth-token aliases (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_TOKEN) now correctly satisfy every credential gate (install, config-health, Setup detect, chat-readiness), and the env-key-mismatch auto-fix that used to offer credential-bleed copy operations is removed.
  • AIR-023: KNOWN_API_KEYS in hermes.ts extended to include OAuth/Bearer names and per-vendor aliases; a drift-guard test keeps it in sync with gateway provider plugins. The key alias table is unified in src/shared/url-key-map.ts as a single source of truth.
  • Security hardening: commandProvider.ts migrated to spawnSync with detached process group and post-run SIGKILL reap to prevent orphaned vault-helper grandchildren on timeout; secret values are always delivered on stdin, never interpolated into shell strings.

Confidence Score: 4/5

Safe to merge with the installer vault fallback fixed; the broad token regex can strand users at a 401 chat error after incorrectly bypassing onboarding.

The vault install gate in installer.ts falls through to a regex that accepts any _API_KEY/_TOKEN credential from the vault — including unrelated service tokens — without first checking the alias-constrained set (aliasesForEnvKey) that every other gate in this PR uses. A user with Anthropic configured who happens to store GITHUB_TOKEN in their vault for unrelated use passes the gate without an LLM credential, then hits a chat-blocking 401 with no actionable guidance. The remaining changes are well-constructed and the AIR series fixes are logically sound.

src/main/installer.ts — the vault-path hasApiKey fallback should check aliasesForEnvKey(expectedKey) before the broad regex, mirroring how resolvedHasKey() and envHasUsableValue() handle aliases in this same PR.

Important Files Changed

Filename Overview
src/main/installer.ts Adds vault-awareness to the install gate and exports envHasUsableValue with alias support. The vault-path fallback accepts any *_API_KEY/*_TOKEN credential rather than first checking aliasesForEnvKey(expectedKey), creating a false-positive path for users with unrelated tokens in their vault.
src/main/secrets/commandProvider.ts Migrates from execFileSync to spawnSync with detached process group + post-run SIGKILL orphan reap, Windows short-circuit, and structured-failure-only logging. Security posture is improved; no issues found.
src/main/secrets/commandProviderWrite.ts New write/delete vault helper: value delivered only on stdin, key name only via env, stderr discarded. Timeout uses SIGTERM to direct child only — unlike commandProvider.ts which kills the full process group — so backgrounded grandchildren of write helpers are not reaped.
src/main/secrets/vaultBootstrap.ts New vault create/detect/seal module. Addresses prior findings (snap-aware CLI resolution, shellQuote around hasBinary). Async path for slow ops correctly prevents Electron main-thread freeze. No new issues found.
src/shared/url-key-map.ts Introduces KEY_ALIASES and aliasesForEnvKey() as a single source of truth for credential alias detection, consolidating the three previously-duplicated alias maps. Clean and correct.
src/main/config-health.ts Adds remote/SSH mode short-circuits and makes vault key lookup alias-aware via resolvedHasKey(). checkRuntimeEnvKeyMismatch now always returns [] (intentionally gutted to remove the credential-bleed footgun).
src/main/validation.ts validateChatReadiness gains remote/SSH fail-open and alias-aware credential detection from aliasesForEnvKey(). Clean fix for AIR-022.
src/main/hermes.ts KNOWN_API_KEYS promoted to module-level export and extended to include OAuth/Bearer aliases and per-vendor names missing from the old private list (AIR-023). Drift-guarded by a test.
src/renderer/src/screens/Settings/SecretsProviders.tsx New Settings section for choosing/testing the secrets provider and inline vault key edit/delete. Secret values are never returned to the renderer. editValue is not cleared in the finally block, persisting in React state on write failure.
src/renderer/src/screens/Setup/Setup.tsx Major rewrite adding a two-stage setup flow. Vault bootstrap runs off-main-thread via IPC. pickDefaultModel and vaultHasModelKey helpers are correctly extracted and tested.
src/main/config.ts Adds secretsProviderStatus (AIR-017), decideCanWrite, secretsProviderCanWrite, shouldSkipUpdaterWiring, and guards against writing an empty model.default (AIR-019). Logic is well-tested.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant U as User (Renderer)
    participant IPC as Electron IPC
    participant M as Main Process
    participant V as Vault (KeePassXC)

    note over U,V: First-run onboarding Stage 1 secrets
    U->>IPC: vaultDetectExisting()
    IPC->>M: detectExistingVault()
    M->>V: command -v keepassxc-cli
    V-->>M: resolved CLI name
    M-->>U: found, kind, keys, suggestedCommand

    U->>IPC: vaultCreate()
    IPC->>M: createVault() async
    M->>V: keepassxc-cli db-create --set-key-file
    V-->>M: vault and key created
    M->>M: chmod 0600 key and vault
    M-->>U: ok, vaultPath, keyPath, suggestedCommand

    U->>IPC: vaultSealTpm(keyPath)
    IPC->>M: sealKeyFileToTpm() async
    M->>M: "systemd-creds encrypt --with-key=tpm2"
    M-->>U: sealed true or false

    note over U,V: Stage 2 model provider
    U->>IPC: secretsProviderStatus()
    IPC->>M: providerListSafe() vault key NAMES only
    M-->>U: provider, keys, count

    note over U,V: Runtime credential resolution
    U->>IPC: sendMessage()
    IPC->>M: resolvedSecrets() via providerListSafe()
    M->>V: run helper HERMES_SECRET_KEY env
    V-->>M: stdout single key value
    M->>M: SIGKILL process group orphan reap
    M-->>U: gateway env built chat proceeds
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant U as User (Renderer)
    participant IPC as Electron IPC
    participant M as Main Process
    participant V as Vault (KeePassXC)

    note over U,V: First-run onboarding Stage 1 secrets
    U->>IPC: vaultDetectExisting()
    IPC->>M: detectExistingVault()
    M->>V: command -v keepassxc-cli
    V-->>M: resolved CLI name
    M-->>U: found, kind, keys, suggestedCommand

    U->>IPC: vaultCreate()
    IPC->>M: createVault() async
    M->>V: keepassxc-cli db-create --set-key-file
    V-->>M: vault and key created
    M->>M: chmod 0600 key and vault
    M-->>U: ok, vaultPath, keyPath, suggestedCommand

    U->>IPC: vaultSealTpm(keyPath)
    IPC->>M: sealKeyFileToTpm() async
    M->>M: "systemd-creds encrypt --with-key=tpm2"
    M-->>U: sealed true or false

    note over U,V: Stage 2 model provider
    U->>IPC: secretsProviderStatus()
    IPC->>M: providerListSafe() vault key NAMES only
    M-->>U: provider, keys, count

    note over U,V: Runtime credential resolution
    U->>IPC: sendMessage()
    IPC->>M: resolvedSecrets() via providerListSafe()
    M->>V: run helper HERMES_SECRET_KEY env
    V-->>M: stdout single key value
    M->>M: SIGKILL process group orphan reap
    M-->>U: gateway env built chat proceeds
Loading

Reviews (5): Last reviewed commit: "fix(secrets): make command-provider vaul..." | Re-trigger Greptile

Comment thread src/main/validation.ts Outdated
Comment thread src/main/secrets/vaultBootstrap.ts
Comment thread src/main/secrets/vaultBootstrap.ts
0xr00tf3rr3t added a commit to 0xr00tf3rr3t/hermes-desktop that referenced this pull request Jun 15, 2026
…reptile P1)

The credential NAME-ALIAS map (ANTHROPIC_API_KEY → ANTHROPIC_TOKEN,
CLAUDE_CODE_OAUTH_TOKEN) was defined THREE times independently — config-health.ts,
validation.ts, and Setup.tsx (as MODEL_KEY_ALIASES) — kept in sync only by
comments. Adding an alias to one but not the others would silently split the five
security gates (Greptile P1 on PR fathah#673).

Centralize it in src/shared/url-key-map.ts (already the single source of truth for
credential-key names, already imported by both main and renderer) as KEY_ALIASES +
aliasesForEnvKey(). All three sites now import it; no local copies remain.

Test: src/shared/url-key-map.test.ts guards the shared table. RED-proven — removing
CLAUDE_CODE_OAUTH_TOKEN from the ONE shared map now reds 5 tests across the shared
guard + validation (AIR-022) + Setup (AIR-018/018b), confirming every gate consumes
the single source. Full suite 1549 passed.
0xr00tf3rr3t added a commit to 0xr00tf3rr3t/hermes-desktop that referenced this pull request Jun 15, 2026
…ary probes (Greptile P1+P2)

Two vaultBootstrap fixes from the PR fathah#673 review:

P1 — detectExistingVault hardcoded 'keepassxc-cli' in its suggestedCommand for an
on-disk vault, while createVault (same module) correctly uses the snap-aware
resolveKeepassxcCli() name. On a snap-only system the binary is 'keepassxc.cli', so
a user who already has a vault was shown a read command that fails immediately.
Now resolves the CLI name (falling back to the apt name for display when none is
installed yet), matching createVault.

P2/security — hasBinary() and keepassxcIsSnap() interpolated their name/cli
argument into a /bin/sh -c string without quoting. All current callers pass
hardcoded literals so there's no live exploit, but the unquoted surface is a trap
for any future dynamic caller. Both now shellQuote the interpolated value (reusing
the module's existing shellQuote helper).

vaultBootstrap tests 18/18; full suite 1549 passed.
@0xr00tf3rr3t

Copy link
Copy Markdown
Contributor Author

Thanks for the review — all three findings addressed:

P1 — Triplicate KEY_ALIASES (no shared source) ✓ Fixed in 7694532. The alias map is now a single source of truth in src/shared/url-key-map.ts (KEY_ALIASES + aliasesForEnvKey()) — already the shared module for credential-key names, imported by both main and renderer. config-health.ts, validation.ts, and Setup.tsx all import it; no local copies remain. Added src/shared/url-key-map.test.ts as a drift guard, and RED-proved it: removing CLAUDE_CODE_OAUTH_TOKEN from the one shared map now reds 5 tests across the shared guard + the validation (AIR-022) and Setup (AIR-018/018b) gates, confirming every gate consumes the single source.

P1 — detectExistingVault hardcodes keepassxc-cli ✓ Fixed in ab45317. It now resolves the snap-aware CLI name via resolveKeepassxcCli() (matching createVault), falling back to the apt name only for display when no CLI is installed yet. On a snap-only system the suggested read command now uses keepassxc.cli and works.

P2/security — hasBinary / keepassxcIsSnap unquoted shell interpolation ✓ Fixed in ab45317. Both now shellQuote() the interpolated binary name before it reaches /bin/sh -c, closing the injection surface for any future dynamic caller (all current callers pass literals, so no behavior change today).

Full suite: 1549 passed, 0 failed. The branch was also merged up to current main (conflicts resolved, gates RED-proved intact through the merge), so the PR is mergeable.

0xr00tf3rr3t added a commit to 0xr00tf3rr3t/hermes-desktop that referenced this pull request Jun 15, 2026
…check TS1117)

The upstream merge (5185d0d) brought together two identical invalidateSecretsCache
property definitions in the contextBridge object literal in preload/index.ts — one
from upstream, one from secrets/04. tsc -p tsconfig.node.json (the CI typecheck
config) rejects this with TS1117 'An object literal cannot have multiple properties
with the same name'; the duplicate was the CI 'check' job failure on PR fathah#673.

(Local bare 'npx tsc --noEmit' uses the default config and did not flag it — must
run 'npm run typecheck' to match CI's per-project tsconfig.node/web invocation.)

Removed the duplicate (kept upstream's, identical body). npm run typecheck and
npm test both pass.
@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Want your agent to iterate on Greptile's feedback? Try greploops.

@0xr00tf3rr3t

Copy link
Copy Markdown
Contributor Author

AIR-016 consistency gap — fixed (f35c407).

Good catch. commandWriteSecret / commandDeleteSecret were still synchronous (execFileSync) while createVault / sealKeyFileToTpm in this PR were made async — so a vault write could block the Electron main thread up to 5s on an already-async IPC path. Now non-blocking.

One implementation note for reviewers: the obvious fix — promisify(execFile) — is wrong here. The async execFile does not honor the input (stdin) option the sync version used to deliver the secret value; it silently drops it, which would break value delivery with no type error. I caught this against a real /bin/sh before it shipped. The fix instead uses child_process.spawn and writes the value to child.stdin explicitly.

Every security invariant of the original is preserved (and now regression-tested):

  • value on stdin only — never argv / shell string / env
  • key name via HERMES_SECRET_KEY env only, validated ^[A-Za-z_]\w*$ before any spawn
  • hard timeout kills a hung helper; output capped at 1 MiB
  • stderr piped + discarded (it can echo the value) — never inherited
  • coarse, secret-free error reasons; single Promise resolution (idempotent settle guard); EPIPE-on-early-exit handled

The two IPC handlers now await the result with a defensive .catch returning a coarse error. Added spawn-based tests plus timeout / helper-not-found branch coverage — full secrets suite green (162/162), typecheck + lint clean.

Re: the checkRuntimeEnvKeyMismatch[] dead-code note (the UI_RUNTIME_ENVKEY_MISMATCH issue code + auto-fix handler are now unreachable) — acknowledged, tracking separately to keep this PR scoped to the blocking-write fix.

Comment thread src/main/installer.ts Outdated
Comment on lines +563 to +570
// The gateway token name may differ from the .env key name (the
// masking layer's Bearer variant). Accept any resolved provider-shaped
// credential (*_API_KEY / *_TOKEN) so a vault user isn't blocked.
hasApiKey = Object.entries(resolved).some(
([k, v]) => /(_API_KEY|_TOKEN)$/.test(k) && usable(v),
);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Vault install gate bypassed by any *_TOKEN/*_API_KEY in vault

When expectedKey is known (e.g. "ANTHROPIC_API_KEY") but not found directly in the vault, the code falls through to a broad regex that accepts ANY *_API_KEY or *_TOKEN credential from the vault. A user who has GITHUB_TOKEN, SLACK_BOT_TOKEN, or any other service token in their vault — but no LLM credential — incorrectly passes this gate and is shown the chat screen instead of being guided back through setup. The alias-constrained check (aliasesForEnvKey) already used by resolvedHasKey() in config-health.ts and by envHasUsableValue() directly above in this same file should be consulted before falling through to the broad regex. The broad !expectedKey fallback (provider not catalogued) is still appropriate; the issue is that it also fires when expectedKey is known but its aliases were not checked.

Surfaces the secrets-provider state in the renderer and lets a user pick up a
vault rotation without restarting. Builds on the provider (PR 1) and its IPC
wiring (PR 2).

- Gateway.tsx / Settings.tsx: roll getApiServerKeyStatus into loadConfig and add
  a 10s poll, so the "API key not configured" banner self-clears within 10s of a
  vault rotation. Add a "Refresh from vault" button that invalidates the secrets
  cache then re-fetches, with a disabled "Refreshing…" state while in flight.
- preload: expose hermesAPI.invalidateSecretsCache() over the new IPC channel;
  the getApiServerKeyStatus result carries the additive { hasKey, providerId?,
  checkedAt? } shape so the UI can distinguish vault vs .env vs missing.
- i18n: new English keys (settings/gateway refreshFromVault + refreshingFromVault;
  diagnose apiKeyModal note that the warning is ignorable for vault users). Other
  locales fall back to English until translated.

Also fixes a pre-existing react-hooks/exhaustive-deps issue this work surfaced in
Gateway.tsx: `platforms` was a fresh array each render, defeating the
filteredPlatforms useMemo — now wrapped in its own useMemo.

Tests: Gateway.test.tsx asserts the 10s poll re-fetches the key status and mocks
the new invalidateSecretsCache; new Settings.test.tsx covers the polling. 7
renderer tests pass; typecheck:node + :web clean.
…a provider)

Renderer counterpart to the unified secrets.provider selector and the
`hermes secrets` CLI verbs: a Settings section where the user can see the
active secret provider, switch between env / command / bitwarden, configure
the command helper, and TEST it — all without any secret value crossing the
IPC boundary.

- New src/main/config.ts secretsProviderStatus(profile): resolves the active
  provider's keys via resolvedSecrets() (which routes through the
  spawn-rate-floored providerListSafe, so repeated Test clicks can't flood the
  main process) and returns { provider, keys, count } — KEY NAMES ONLY, never
  values. Honors the bitwarden back-compat (bare enabled ⇒ provider=bitwarden).
- IPC "secrets-provider-status" wired in main/index.ts + exposed in preload
  (+ .d.ts type).
- New SecretsProviders.tsx (mirrors MemoryProviders): three provider cards with
  active badge, "Use this" to switch (setConfig secrets.provider +
  invalidateSecretsCache), a helper-command input for the command provider, and
  a Test button that lists resolved key names + count with an explicit
  "values are never displayed" note. Embedded in the Settings screen next to the
  existing config-health / refresh-from-vault block.
- i18n: secrets_* strings added to the settings namespace (English; other
  locales fall back to English until translated — no new namespace, smaller diff).

Tests: SecretsProviders.test.tsx — 5 cases: renders the three cards, reflects
the active provider from config, activate writes secrets.provider + invalidates
cache, Test renders resolved key NAMES + count + the values-hidden note AND
asserts the IPC contract carries no values field, empty-resolve surfaces a
warning. typecheck (node + web) clean; Settings + secrets suite 66 passed;
lint clean for the changed files.
First-run setup is now two-stage: pick a model provider (unchanged), then choose
where secrets live — env (.env, recommended to start) / command (vault helper) /
bitwarden. An active choice at the moment the API key is saved, so the vault
option is discoverable instead of buried in Settings.

- The command card includes WHAT YOU NEED FIRST guidance: how to create a
  KeePassXC vault (install, db-create, one entry per key with title = env var
  name), keep it unlocked at startup, and where the full guide lives — so picking
  the vault path isn't a dead end for a first-timer with no vault yet.
- env is the default and a one-tap pass-through (no friction for "just let me
  chat"). The entered API key is always saved to .env regardless; the provider
  choice only governs resolution going forward, stated plainly in the UI.
- command → setConfig(secrets.provider, command) + secrets.command + cache
  invalidate; bitwarden → setConfig(secrets.provider, bitwarden) and points at
  the CLI wizard to finish.

Also fixes an arg-order bug introduced while refactoring: setModelConfig is
(provider, model, baseUrl) — a regression test now locks that order so the
baseUrl can never again land in the model slot.

i18n strings added to the setup namespace (English; other locales fall back).
Tests: Setup.test.tsx — 4 cases: Continue advances without completing, Finish
saves model config with correct arg order + no secrets write for env, command
choice writes the selector + helper + invalidates cache, Back returns to stage 1.
typecheck:web clean; lint clean.
User walkthrough (docs/keepassxc-vault-guide.md + screenshots) for keeping API
keys in an encrypted KeePassXC vault instead of plaintext .env: create the vault
with keepassxc-cli, add an entry per key (title = env var name), verify the
helper resolves it, then pick "Vault command" in first-run setup.

Pairs with the Security Providers setup onboarding shipped on this branch. Every
step captured from a real run; screenshots use placeholder values (no secrets).
…rStatus

Greptile gate, Family 1 (contract-invariant). The renderer suite asserts the
no-values invariant against a MOCKED IPC bridge; nothing tested the REAL
main-process secretsProviderStatus(). This adds a direct test that calls the
production function and asserts it returns {provider,keys,count} with key NAMES
only — serializing the whole return object and asserting a sentinel secret
value appears nowhere. Covers env-empty and throw-degrade paths too.

RED-proven: injecting a 'values' field into secretsProviderStatus reds the
shape assertion. Reverted; function unchanged.

Part of the SDLC pre-launch gate of secrets/03 rebased onto secrets/02.
Closes the two backlog adversarial-test families the Greptile-findings catalog
tracked as "NOT YET WRITTEN" on the secrets-provider rate-limiting code. Both
are mutation-proven load-bearing (RED-verified by flipping the operator /
resetting the timestamp in index.ts, then restoring).

AIR-005 — exact comparison-operator boundaries (family: boundary):
  - get() floor: strict `<` MIN_SPAWN_INTERVAL_MS — degrades at 999ms,
    re-spawns at EXACTLY 1000ms (boundary exclusive).
  - list() TTL: `<=` LIST_CACHE_TTL_MS — fresh through EXACTLY 5000ms inclusive.
  - list() spawnAllowed: `>=` MIN_SPAWN_INTERVAL_MS — inclusive at 1000ms.
  A future `<`<->`<=` slip on any of the three reds a test (the get() floor and
  list() spawnAllowed agree at t==1000 today; nothing pinned that before).

AIR-006 — deletion-visibility window (family: state-ordering):
  invalidateProviderListCache() marks data stale but does NOT reset `ts`, so a
  HARD-DELETED vault key stays visible to a freshly-spawned gateway for up to
  MIN_SPAWN_INTERVAL_MS after "Refresh from vault" — a deliberate
  "stale beats wedged" tradeoff. Tests pin the window: deleted key visible
  inside the floor, gone at 1000ms, and ROTATION (vs deletion) has no data-loss
  window (key never disappears, value just refreshes). Mock gained a
  `vaultHasKey` flag to simulate a hard deletion.

No production code changed; only the two test files. Full tracked secrets suite:
10 files, 102 tests green (was 93).
…ndows

Two confirmed lock-up / lock-out classes a community user's setup hits that the
happy-path suite missed. Both reproduced against the real Node runtime and
RED-proven load-bearing (revert the fix -> the matching test reds).

T1.1 ORPHANED GRANDCHILD ON TIMEOUT (process leak / slow lock-up).
  The provider ran the helper via `execFileSync("/bin/sh", ...)` whose timeout
  SIGTERMs only the direct shell. A helper that backgrounds a child — a locked-
  vault unlock agent, `keepassxc-cli` forking gpg-agent, a `( … ) & wait`
  pipeline — leaves that grandchild ORPHANED on every 3s timeout. Resolved
  per-key at gateway spawn under a locked/slow vault, that's a steady process
  leak. Fix: run the helper in its OWN process group (`detached`) and, after
  spawnSync returns, `process.kill(-pid, "SIGKILL")` the whole group so
  grandchildren are reaped. Switched execFileSync -> spawnSync (group kill needs
  the returned pid) behind a single `runHelper()`; killSignal is now SIGKILL
  (SIGTERM let a blocked helper linger). All existing behavior preserved
  (timeout, output cap, key-as-env-data, stderr-discard F6, structured-only
  logging).

T1.2 WINDOWS SILENT DEAD-END (total key lock-out).
  `/bin/sh` does not exist on win32, so a configured command provider threw
  ENOENT -> caught -> EVERY key resolved to null silently (only a console.warn
  the user never sees). A Windows community user who picks "command" gets a
  silent lock-out of their keys with no explanation. Fix: `runHelper` short-
  circuits on win32 (code EUNSUPPORTED_PLATFORM, no doomed spawn), and a new
  exported `commandProviderUnsupportedReason()` gives the onboarding/Settings UI
  an actionable message + steers the user to the env provider — no dead-end
  picker.

T1.3 (install-gate / config-health fail direction) was investigated and is NOT a
bug: checkInstallStatus defaults hasApiKey=false and runConfigHealthCheck wraps
every check in try/catch (swallow + degrade), so a throwing secrets probe fails
SAFE (to setup / empty audit), never wedges the app. No change needed; verified,
not assumed.

Tests: new commandProvider.robustness.test.ts (orphan-reap live spawn; win32
gate: unsupported-reason, runHelper short-circuit, get()/list() degrade no-throw;
POSIX sanity). stdio F6 test reworked onto runHelper. Tracked secrets suite:
11 files, 107 green. tsc + eslint clean on changed files. No new failures in the
full suite (the 3 pre-existing reconcileStreamedWithDb reds are unrelated, see
PR findings).
…ardened base

Reconciles fix/readiness-remote-mode-guard (vault bootstrap: first-run create,
auto-detect, opt-in TPM seal, UID-safe paths, snap-KeePassXC, write path) onto
the secrets/03 hardened tree, preserving AIR-001..015 (esp. AIR-014 orphan reap
+ AIR-015 Windows gate). Shared-core files (commandProvider.ts, index.ts,
spawnRateFloor/property tests) taken from 03 — 03 is a strict superset of
bootstrap's edits to them. AIR-010 sibling-divergence handled by additive layer,
not raw merge.
sealKeyFileToTpm() ran `systemd-creds encrypt --with-key=tpm2` via synchronous
execFileSync on the main thread. Measured worst case on a real TPM + snap box:
a non-deterministic 7-15s block (bounded by TOOL_TIMEOUT_MS), freezing the whole
UI on every onboarding seal attempt -- the lock-up mumbo hit live while the seal
'timed out while adding the password'. The outcome was already honest (never a
false sealed:true; 0600 fallback on timeout); the defect was purely the block.

Fix: add an async tryExecAsync (execFile wrapped in a never-rejecting Promise)
used only for the SLOW calls; make sealKeyFileToTpm async/Promise<SealResult>
and await it in the already-async vault-seal-tpm IPC handler. The fast sub-100ms
probes (command -v / readlink, ~7ms total) stay on the sync helper.

Proven: PROBE-5 (live tier) starts a 100ms event-loop ticker and asserts it keeps
ticking during the seal -- 76 ticks over 7.6s with the fix; 0 ticks over a 15s
freeze when reverted to sync (RED-proven). Tracked suite 166/166 green; full
suite 1328 passed (the only 7 reds are the pre-existing live-smoke + reconcile-
streamed set, identical on 03's tip). Catalog: AIR-016.
…bling)

createVault() called keepassxc-cli db-create via synchronous execFileSync on the
Electron main thread — the same wedge class as the TPM seal (AIR-016), just the
sibling slow call. A snap-confined CLI or slow disk can make db-create take
seconds, freezing the UI during first-run vault creation.

Fix: createVault is now async/Promise<CreateVaultResult>, using the same
tryExecAsync helper for the slow db-create; the already-async vault-create IPC
handler awaits it. Renderer is unaffected (it calls over IPC, already a promise).
All callers updated to await (typecheck enforces it — a sync caller won't compile
against the Promise return). Tracked suite 166/166; full suite 1328 passed (only
the known live-smoke + reconcile-streamed reds remain, identical on 03).
…providers screen

On Settings -> Security Providers, after a successful Test the resolved keys
rendered as bare name badges with no indication the VALUE comes from the vault.
Add a per-key 'Vault Provided' label next to each resolved key name so the user
can see the value is supplied by the vault (not typed / .env). Values are still
never shown — only names + this provenance label.

- New i18n string settings.secrets_vaultProvided ('Vault Provided').
- Per-key label span in the resolved-keys list (success-colored, uppercase, small).
- Test: assert exactly one 'Vault Provided' label per resolved key (2 keys -> 2),
  RED-proven by removing the label span (test reds 'Unable to find element').

Tracked Settings/i18n suites green; full suite 1328 passed (only the known
live-smoke + reconcile-streamed reds remain, identical on 03).
…ocess.env overlay (AIR-017)

Found by LIVE-DATA testing the IPC against the real vault: secretsProviderStatus()
computed Object.keys(resolvedSecrets()), which overlays the ENTIRE process.env onto
the vault keys — so in the Electron main process it returned ~130 keys (PATH, HOME,
npm_config_*, DISPLAY, …), not the user's 6 real vault keys. The Security Providers
screen renders each as a 'Vault Provided' badge, so it would falsely label every
environment variable as vault-provided.

This is the DISPLAY-path sibling of the appsec 'fail-open gate trap': the canWrite
gate was already fixed to count providerListSafe() (vault-only) for exactly this
reason; the display path never got the same treatment. Both the renderer test and
the main-process contract test HID it by mocking the resolver to a small clean set
(AIR-011 trap) — only real-vault + real-process.env testing surfaced the 130.

Fix: Object.keys(providerListSafe(profile)) (vault-only, spawn-floor cached). The
no-VALUE invariant was always intact; this corrects the key SET. Live-confirmed:
count 130 -> 6 (the real vault keys only). Contract test re-pointed to mock
providerListSafe (the fn the producer now calls) + new AIR-017 case asserting the
process.env overlay does NOT bleed into the badge list; RED-proven by reverting
(3 tests red, count 123). Full suite 1333 passed (only the pre-existing
reconcile-streamed reds remain). Catalog: AIR-017.
…ual key toggle (AIR-018)

Found by live GUI testing against the real vault: on the Setup model step, a
vault-only user whose Anthropic credential is the OAuth token
(CLAUDE_CODE_OAUTH_TOKEN, the Claude Code auth-patch token) was falsely forced to
enter an API key. The onboarding's vaultHasModelKey() did a bare
vaultKeys.includes(ANTHROPIC_API_KEY) with no alias awareness — the THIRD
credential-name gate with this class of bug (install gate + warning banner already
had partial alias bridges; Setup had none).

Part 1 — detection: add CLAUDE_CODE_OAUTH_TOKEN as an ANTHROPIC_API_KEY alias in
both config-health.ts KEY_ALIASES and a new MODEL_KEY_ALIASES in Setup.tsx (kept
in lock-step). All three authenticate to Anthropic, so a vault holding any one
already provides the model key.

Part 2 — both options (mumbo's ask): when the vault covers the key, show an
explicit toggle 'Use vault credential' vs 'Enter an API key' (radiogroup) on both
the named-provider and custom-URL branches. Default vault; manual reveals the key
field and re-requires a typed key. handleFinish validates the effective mode.

Tests: Setup.test.tsx AIR-018 case (alias->vault-covered by default, toggle offers
both, manual reveals field, switch-back hides, Finish works in vault mode with no
empty setEnv). GREEN 11/11; RED-proven by reverting the alias (1 test red).
Full suite 1334 passed (only pre-existing reconcile-streamed reds). Catalog: AIR-018.
…-guard + auto-load) — AIR-018 follow-up

LIVE bug mumbo hit: on the Setup model step with an EXISTING vault (config already
has secrets.provider=command), the Anthropic credential was NOT detected — no
vault/manual toggle appeared, the screen demanded an API key. Two root causes, both
in the renderer:

1. vaultHasModelKey() bailed on its FIRST line: if (secretsChoice === 'env') return
   false. secretsChoice is local React state that defaults to 'env' and only flips
   when the user CLICKS the command tile during onboarding. An existing-vault user
   reaches the model step with secretsChoice still 'env' while the CONFIG has a
   command provider — so detection died before the alias check ran. Fix: drop the
   env-guard; the resolved vaultKeys list is the authoritative signal (empty keys
   already yields false, and env resolves no provider keys).

2. vaultKeys was only populated by an explicit 'Test vault' click on the secrets
   step. A user who reached the model step without it had vaultKeys=[]. Fix: a
   useEffect auto-loads secretsProviderStatus() key names on entering the provider
   stage (self-guards: env returns no keys -> no toggle).

Also aligned the Finish button's disabled condition to showingVaultCredential()
(was bare vaultHasModelKey()) so it matches handleFinish validation.

Diagnosed by instrumenting vaultHasModelKey: debug showed vaultKeys correctly held
CLAUDE_CODE_OAUTH_TOKEN and anthropic was selected, but secretsChoice='env' early
-returned. Test AIR-018b reproduces the existing-vault path (reach model step with
NO Test-vault click); GREEN 12/12, RED-proven by restoring the env-guard (1 red).
Full suite 1335 passed (only pre-existing reconcile-streamed reds).
…d in envkey-mismatch auto-fix (AIR-019, AIR-020)

Two distinct bugs surfaced by mumbo live-testing chat after the AIR-018 toggle fix:

AIR-019 — empty model.default bricks chat (400). The Setup model-name field is
optional; a blank submission persisted model.default: "", and setModelConfig wrote
it unconditionally. The gateway then POSTs model:"" → Anthropic 400 'model: String
should have at least 1 character'. Fix: setModelConfig only writes model.default
when the model string is non-empty — a blank call leaves any existing valid model
untouched (never clobbers a good selection, never writes an empty one). Defense in
depth in the main process so ANY caller is protected. Tests: empty-model guard +
no-clobber, RED-proven (revert → 2 red).

AIR-020 — credential-bleed in UI_RUNTIME_ENVKEY_MISMATCH auto-fix. When the
expected provider key was empty, the detector picked ANY populated *_API_KEY /
*_TOKEN in .env and offered to copy it across — so with ANTHROPIC_API_KEY empty it
suggested copying the UNRELATED MATRIX_ACCESS_TOKEN into the Anthropic slot: a
wrong, non-working value AND a mislabel of another service's secret. Fix: restrict
candidates to KNOWN ALIASES of the expected key (KEY_ALIASES — ANTHROPIC_TOKEN /
CLAUDE_CODE_OAUTH_TOKEN for ANTHROPIC_API_KEY). If no alias is populated there is
no safe rename — that's MODEL_KEY_MISSING territory, not an auto-copy. Tests:
alias-IS-flagged + unrelated-key-NOT-flagged, RED-proven (greedy revert → 1 red).

Full suite 1338 passed (only pre-existing reconcile-streamed reds).
…v1/models (AIR-019)

Completes the empty-model fix: the guard (prev commit) stops the empty-string
brick, but a fresh user who leaves the optional model-name field blank still ended
up with NO model. Per mumbo: query the provider's /v1/models endpoint and use a
returned id as the fallback — a LIVE default that can't go stale like a hardcoded
constant.

handleFinish, when the field is blank, calls the existing discoverProviderModels
IPC (reuses model-discovery.ts, which already auth-resolves via the vault and has
Anthropic /v1/models support) and picks a model via pickDefaultModel(): prefer a
clean stable id, de-prioritising dated snapshots (…-20250219) and -preview/-beta/
deprecated, keeping the provider's own ordering otherwise. Best-effort: if
discovery is unreachable (no network / unresolved key) it persists no model and
the main-process guard preserves any existing valid selection — never writes empty.

Tests: pickDefaultModel unit suite (empty, snapshot-vs-stable, ordering, all-noisy
fallback, trim/blank) + two Setup integration tests (blank field → discovered
model persisted; discovery-unreachable → empty persisted, guard handles it).
Full suite 1345 passed (only pre-existing reconcile-streamed reds).
…kill the false banner + harmful copy (AIR-021)

mumbo's running app showed: 'ANTHROPIC_API_KEY is empty but CLAUDE_CODE_OAUTH_TOKEN
has a value — likely saved under the wrong name' with a one-click 'copy across'
auto-fix. Both the banner AND the fix are wrong:

- FALSE POSITIVE: the gateway's anthropic provider plugin reads ANTHROPIC_API_KEY,
  ANTHROPIC_TOKEN AND CLAUDE_CODE_OAUTH_TOKEN directly (env_vars). A populated
  CLAUDE_CODE_OAUTH_TOKEN ALREADY satisfies the credential — ANTHROPIC_API_KEY
  being empty is correct/intended for an OAuth setup, not a misname.
- HARMFUL FIX: an OAuth token (sk-ant-oat…) is only valid on the Authorization:
  Bearer path. Copied into ANTHROPIC_API_KEY it is sent as x-api-key → Anthropic
  401 'invalid x-api-key' (the documented OAuth-in-api-key-slot self-inflicted-401
  trap). The auto-fix would BREAK a working setup.

Fix: when a populated accepted-alias exists, the credential is satisfied — emit NO
issue (return []), exactly like the customEndpointKeyResolvable early-return. The
copy-suggestion path is removed entirely (the AIR-020 alias-restriction already
killed the unrelated-key bleed; this removes the remaining alias-copy footgun).
When neither expected key nor any accepted alias is populated, that's
MODEL_KEY_MISSING territory, not a rename.

Tests: 'OAuth-token alias = SATISFIED, no false mismatch/copy' + the unrelated-key
guard retained. RED-proven by restoring flag-on-alias (1 red). Full suite 1345
passed (only pre-existing reconcile-streamed reds).

NOTE: the live trigger was ALSO a stale-token desync (tmpfs/.env held a dead
OAuth token while ~/.claude/.credentials.json was fresh) — fixed operationally by
mirroring the fresh token into the vault+tmpfs; this commit fixes the bogus banner
that the desync surfaced.
…ock Send for vault OAuth users (AIR-022)

mumbo's live error after the token-desync fix: 'Missing ANTHROPIC_API_KEY —
required by the active provider.' This is the FOURTH credential-name gate with the
OAuth-alias gap (install gate, config-health banner, config-health mismatch were
the first three). validateChatReadiness (the pre-send gate that enables/disables
Send) checked only the canonical ANTHROPIC_API_KEY and an auth.json OAuth path —
it did NOT recognize CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_TOKEN in .env, so a vault
user whose Anthropic credential is the OAuth token got a false MISSING_API_KEY
block and a disabled Send button even though the gateway authenticates fine via
the Bearer path.

Fix: before returning MISSING_API_KEY, check KEY_ALIASES (kept in lock-step with
config-health.ts and Setup.tsx) for a populated accepted alias; if present, fail
open (return OK). Mirrors the install gate and the other three surfaces.

Tests: OAuth-token alias allows Send, ANTHROPIC_TOKEN alias allows Send, and an
UNRELATED token (MATRIX_ACCESS_TOKEN) still BLOCKS (no credential-bleed false-pass).
RED-proven by removing the alias loop (2 red). Full suite 1348 passed (only
pre-existing reconcile-streamed reds).

This was the last gate — all four credential-name surfaces now accept the OAuth
token alias.
… provider to the gateway (AIR-023)

Audit of provider <-> gateway <-> security-provider compatibility (mumbo's
request, generalized for the community client): the desktop's KNOWN_API_KEYS
list — the credential env-var names forwarded from the secrets provider into the
agent env on the CLI/non-gateway fallback path — had DRIFTED out of sync with the
gateway provider plugins' env_vars. It was missing every OAuth/Bearer-token name
and per-vendor alias that isn't the canonical <VENDOR>_API_KEY:

  ANTHROPIC_TOKEN, CLAUDE_CODE_OAUTH_TOKEN (anthropic OAuth/Bearer),
  GOOGLE_API_KEY/GEMINI_API_KEY, ZAI_API_KEY/Z_AI_API_KEY,
  COPILOT_GITHUB_TOKEN/GH_TOKEN/GITHUB_TOKEN, KIMI_CODING_API_KEY/KIMI_CN_API_KEY,
  DASHSCOPE_API_KEY/ALIBABA_CODING_PLAN_API_KEY, XAI_API_KEY, NVIDIA/NOVITA/
  STEPFUN/GMI/ARCEEAI/KILOCODE/OPENCODE_ZEN/OPENCODE_GO/QWEN/NOUS/AZURE_FOUNDRY.

So a vault-only user whose provider credential is stored under any of these names
got NO key forwarded on that path — a whole class of community users, not just the
Anthropic-OAuth case that surfaced it. (The primary buildGatewayEnv path already
overlays the full providerListSafe() set unfiltered and was complete; this brings
the CLI-fallback path to parity.)

Fix: extend KNOWN_API_KEYS to mirror the plugins' env_vars, extract it to an
exported module-level const, and add a drift-guard test (knownApiKeys.test.ts)
that asserts the set CONTAINS the OAuth/Bearer names + per-vendor aliases — a
behavior contract, not a snapshot, so it survives reordering but reds if a
credential name is dropped. RED-proven by removing CLAUDE_CODE_OAUTH_TOKEN
(2 tests red). Full suite 1354 passed (only pre-existing reconcile-streamed reds).

Compatibility audit summary (all GREEN after this):
  - model provider (anthropic) <-> secrets provider (command/vault): credential
    resolved, no OAuth-in-api-key-slot duplication.
  - gateway-spawn env: buildGatewayEnv forwards full provider set; CLI fallback
    now at parity.
  - 5 credential-name gates (install, config-health warn, config-health mismatch,
    Setup detect, chat-readiness) + this env-forward layer all accept the alias.
…reptile P1)

The credential NAME-ALIAS map (ANTHROPIC_API_KEY → ANTHROPIC_TOKEN,
CLAUDE_CODE_OAUTH_TOKEN) was defined THREE times independently — config-health.ts,
validation.ts, and Setup.tsx (as MODEL_KEY_ALIASES) — kept in sync only by
comments. Adding an alias to one but not the others would silently split the five
security gates (Greptile P1 on PR fathah#673).

Centralize it in src/shared/url-key-map.ts (already the single source of truth for
credential-key names, already imported by both main and renderer) as KEY_ALIASES +
aliasesForEnvKey(). All three sites now import it; no local copies remain.

Test: src/shared/url-key-map.test.ts guards the shared table. RED-proven — removing
CLAUDE_CODE_OAUTH_TOKEN from the ONE shared map now reds 5 tests across the shared
guard + validation (AIR-022) + Setup (AIR-018/018b), confirming every gate consumes
the single source. Full suite 1549 passed.
…ary probes (Greptile P1+P2)

Two vaultBootstrap fixes from the PR fathah#673 review:

P1 — detectExistingVault hardcoded 'keepassxc-cli' in its suggestedCommand for an
on-disk vault, while createVault (same module) correctly uses the snap-aware
resolveKeepassxcCli() name. On a snap-only system the binary is 'keepassxc.cli', so
a user who already has a vault was shown a read command that fails immediately.
Now resolves the CLI name (falling back to the apt name for display when none is
installed yet), matching createVault.

P2/security — hasBinary() and keepassxcIsSnap() interpolated their name/cli
argument into a /bin/sh -c string without quoting. All current callers pass
hardcoded literals so there's no live exploit, but the unquoted surface is a trap
for any future dynamic caller. Both now shellQuote the interpolated value (reusing
the module's existing shellQuote helper).

vaultBootstrap tests 18/18; full suite 1549 passed.
…check TS1117)

The upstream merge (5185d0d) brought together two identical invalidateSecretsCache
property definitions in the contextBridge object literal in preload/index.ts — one
from upstream, one from secrets/04. tsc -p tsconfig.node.json (the CI typecheck
config) rejects this with TS1117 'An object literal cannot have multiple properties
with the same name'; the duplicate was the CI 'check' job failure on PR fathah#673.

(Local bare 'npx tsc --noEmit' uses the default config and did not flag it — must
run 'npm run typecheck' to match CI's per-project tsconfig.node/web invocation.)

Removed the duplicate (kept upstream's, identical body). npm run typecheck and
npm test both pass.
… vi.hoisted harness

The merge resolution (5185d0d) took secrets/04's test harness for
config-health.test.ts, which mocked profilePaths on ./config — but the source
imports it from ./utils. That worked locally (~/.hermes/config.yaml exists) but
failed in CI's clean checkout where real paths resolve to non-existent files
→ configExists=false → EMPTY_API_SERVER_KEY doesn't fire → 5 test failures.

Root cause: mock on wrong module (./config instead of ./utils). Upstream had
already fixed this with a vi.hoisted + vi.mock("./utils") + lazy await-import
harness designed for CI determinism.

Fix: take upstream's robust harness as the base, port secrets/04's additional
tests (alias-awareness + connection-mode-audit) onto it. Added getConnectionConfig
to the mocks object, mock factory, and alias section. All 19 tests pass with
profilePaths properly mocked on ./utils. Full suite 1549/0, npm typecheck clean.
## Summary

Adds a config-gated opt-out for the auto-updater. Auto-update remains ENABLED
BY DEFAULT — only an explicit `desktop.auto_update: false` (or `0`) in
config.yaml disables it, so behavior is unchanged for everyone who never sets
the key. The opt-out exists for users who run a locally-built or patched
`/opt` artifact: electron-updater's `autoDownload` + `autoInstallOnAppQuit`
will otherwise re-download the public release and silently overwrite their
build on quit.

- `isAutoUpdateDisabled()` — pure, exported decision in config.ts (mirrors the
  existing `decideCanWrite` extraction pattern) so the gate is unit-testable
  without the Electron/IPC coupling in `setupUpdater()`.
- `setupUpdater()` short-circuits via the same no-op-IPC path already used for
  dev/portable builds when the opt-out is set — no autoDownload wiring runs.
- Settings UI: an "Automatic updates" toggle (config-backed, optimistic with
  rollback on failure, shows a restart notice since the gate is read once at
  launch). English i18n strings added to the existing settings namespace
  (other locales fall back to English).
- Docs: a "Disabling auto-update" section + troubleshooting row in the
  KeePassXC vault guide (where a patched-build user is most likely to look).

## Testing

- `npm run typecheck` — clean.
- `npx vitest run src/main/isAutoUpdateDisabled.test.ts` — 5/5 pass: default
  ON for null/unset/empty/whitespace, disabled only for explicit false/0,
  case- and whitespace-insensitive, any unrecognized value fails safe to ON.
- Full main-process suite (secrets + config-health + this) — 192/192 pass.
- prettier/eslint: no new warnings on changed lines.
Follow-up hardening on f5b4aba (desktop.auto_update opt-out). The opt-out
decision was duplicated: config.ts had its own isAutoUpdateDisabled() and the
renderer's Settings toggle inlined the same `=== "false" || === "0"` check. Two
copies of one security-relevant gate is a sibling-asymmetry drift risk — if one
side's accepted values changed, the UI and the updater would silently disagree
about whether auto-update is on.

Changes
- New src/shared/auto-update-gate.ts: the single source of truth. Takes
  `unknown` and coerces via String() so the renderer can pass getConfig()'s raw
  return straight in. ENABLED BY DEFAULT; only explicit "false"/"0" disables;
  null/unset/empty/whitespace/garbage all fail SAFE to upstream-ON.
- src/main/config.ts now RE-EXPORTS isAutoUpdateDisabled from the shared helper
  (main gate in setupUpdater() unchanged at the call site).
- src/renderer Settings.tsx calls the shared helper instead of its inline copy.
- src/main/autoUpdateGateParity.test.ts: drift guard — asserts the main
  re-export IS the shared helper (same reference) and agrees across the full
  input matrix. Reds if anyone reintroduces a divergent copy.
- src/shared/auto-update-gate.test.ts: 7 contract/adversarial tests (default-ON,
  empty/whitespace, explicit disable, case/whitespace-insensitive, fail-safe on
  garbage, non-string coercion never throws, renderer write-vocabulary
  round-trip). Supersedes the removed src/main/isAutoUpdateDisabled.test.ts.
- docs/diagrams/auto-update-gate-diagrams.md: logical flow + SECRET/overwrite-
  gate workflow (Mermaid, both validated to parse).

SDLC
- Step-0: security-relevant (controls the build-overwrite behavior + config
  parse + renderer surface).
- AppSec two-person rule: independent delegated audit returned a HIGH (the
  refactor was un-applied on config.ts after a RED-proof `git checkout`
  reverted it) — fixed and re-verified by reading real bytes + live test run.
  Cataloged as AIR-024 (verification-integrity class). Final verdict: SHIP.
- typecheck (node+web) clean; gitleaks clean; zero deps introduced; semgrep
  on-diff = 4 benign i18next-key-format style FPs.
- Full main-process + shared suite: 235/235 pass (incl. live vault/TPM probes).

Fail-direction: the gate fails CLOSED to the upstream default (ENABLED). A
config typo can never silently disable security updates.

Rollback: revert this commit — f5b4aba's gate keeps working (it just regains a
local copy of the decision); no migration, no data touched.
…decision)

The auto-update opt-out had a tested DECISION (isAutoUpdateDisabled) but the
WIRING gate it feeds — the early return in setupUpdater() that must fire BEFORE
`autoUpdater.autoDownload = true` / `autoInstallOnAppQuit = true` — was an
untested inline boolean (`!app.isPackaged || isPortableBuild || autoUpdateDisabled`).
That early return IS the protection the opt-out exists for (it's what stops the
updater from overwriting a patched /opt build on quit), so it deserves its own
regression test.

- Extract the inline condition to a pure `shouldSkipUpdaterWiring({isPackaged,
  isPortableBuild, autoUpdateDisabled})` in config.ts (mirrors the decideCanWrite
  / isAutoUpdateDisabled extraction pattern) so the gate is unit-testable without
  the Electron/ipcMain/require("electron-updater") coupling.
- setupUpdater() now calls the predicate; behavior is identical (the truth-table
  test pins equivalence to the old inline expression).
- New src/main/shouldSkipUpdaterWiring.test.ts: full 8-row truth table (only a
  packaged, non-portable, enabled build wires the updater; every other combo
  skips), the safety-critical packaged+opt-out=SKIP case stated explicitly, and
  an end-to-end compose-with-isAutoUpdateDisabled check (false/0 => skip; null/
  empty/garbage => wire, fail-safe to upstream-ON).

Testing
- typecheck (node+web) clean.
- Full main+shared suite 265/265 pass.
- RED-proven: dropping the autoUpdateDisabled skip condition reds 3/4 cases
  (the opt-out path stops skipping) — restored via reversible patch (no
  git checkout, per AIR-024).

No production behavior change — this makes an existing safety gate provable.
Rollback: revert this commit; the inline boolean returns, gate behavior unchanged.
… canaries)

Adds the Greptile family-6 (data-not-code) adversarial suite for the SSH remote
command builders. The existing ssh-remote.test.ts proves command STRUCTURE
(quoting shape) + NUL round-trip; this proves SAFETY — that a hostile arg
crossing into the `sh -c` string built by buildRemoteHermesCmd is treated as
inert data and never executes.

Method (the honest one): for each canary, build the real command, run it through
a real shell with a fake `hermes` shim at $HOME/.local/bin/hermes (the absolute
probe path, per the PR-6 CLI-resolution fix), then assert (1) the side-effect the
injection WOULD cause did NOT happen (a canary file is never created) and (2) the
hostile string arrived at the shim verbatim as a single argument.

Canaries: $(touch), backticks, `; cmd`, `&& cmd`, `| cmd`, newline-injected cmd,
redirect overwrite, single-quote breakout, ${IFS} expansion, subshell, plus
$PATH-no-expand and ../traversal-stays-one-arg. Also covers the extraShell
redirect path and the sshSetConfigValue YAML-scalar guard (", \, CR, LF rejected
before any write; a benign URL is NOT rejected).

Testing
- typecheck clean; 17/17 pass; prettier/eslint clean on the new file.
- RED-proven: weakening shellQuote to naive double-quotes reds 11/17 — the
  $()/backtick/$PATH/${IFS} canaries fire (canary file created / args mangled),
  exactly as a real injection would. Restored via reversible patch (no git
  checkout, per AIR-024). Note: a RED-proof of an injection test BY DESIGN
  triggers the injection — run such proofs from a temp CWD to avoid littering
  the repo root with redirect-target files.

No production code changed — this pins an existing security property.
…p-every-launch bug)

A vault-only user saw the first-run Setup screen on EVERY launch. Root cause:
checkInstallStatus() decides whether Setup shows (App.tsx: `!status.hasApiKey ->
"setup"`), and its .env check, envHasUsableValue(), did an EXACT
`key === expectedKey` match. A vault user stores the Anthropic credential under
an ALIAS — CLAUDE_CODE_OAUTH_TOKEN (or ANTHROPIC_TOKEN) — not the canonical
ANTHROPIC_API_KEY. So the gate returned hasApiKey=false and forced Setup every
launch even though a usable credential was present.

This is the install gate joining the credential-name-alias family the other
gates already handle (config-health, validation, chat-readiness). Per AIR-018:
fix the CLASS across ALL gates — the install gate was the missed one.

- envHasUsableValue() is now alias-aware: it accepts the canonical key OR any of
  its aliases from the shared KEY_ALIASES map (../shared/url-key-map —
  ANTHROPIC_API_KEY -> [ANTHROPIC_TOKEN, CLAUDE_CODE_OAUTH_TOKEN]), the same
  single source of truth config-health.ts and validation.ts use. The match stays
  an allowlist scoped to the provider's own key names — an unrelated token
  (TELEGRAM_BOT_TOKEN / MATRIX_ACCESS_TOKEN) does NOT satisfy the gate (no
  credential bleed).
- Robustness (pre-existing bug surfaced while here): re-trim the value AFTER
  quote-stripping so a quoted-blank `KEY="  "` no longer falsely satisfies the
  gate. This made the exact-match path too, not just aliases.
- envHasUsableValue() is now exported for unit testing.

Testing
- typecheck clean; full suite 1417 pass / 3 skip.
- New tests/install-gate-credential-alias.test.ts (8 cases): canonical accepted;
  ANTHROPIC_TOKEN + CLAUDE_CODE_OAUTH_TOKEN aliases accepted (incl. surrounded by
  unrelated tokens + comments); credential-bleed guard (unrelated token NOT
  accepted); empty / quoted-blank rejected; null-expectedKey path unchanged.
- RED-proven: removing alias acceptance reds 3/8 (the alias cases); the
  credential-bleed guard stays green. Restored via reversible patch (no git
  checkout, per AIR-024).
- Verified LIVE against the real vault .env: BEFORE (exact-match) hasApiKey=false
  (Setup shown), AFTER (alias-aware) hasApiKey=true (Setup skipped).
- Independent appsec audit: SHIP (allowlist correctly scoped; fail-direction
  STRICTER not looser; one shared KEY_ALIASES source of truth; no exploitable
  findings).

Rollback: revert this commit; the exact-match gate returns (and the
Setup-every-launch bug with it).
…AIR-016)

Greptile follow-up on the security-providers PR: commandWriteSecret /
commandDeleteSecret used execFileSync, blocking the Electron main thread up
to 5s on the vault-write IPC path — while createVault / sealKeyFileToTpm in
the same PR were correctly made async. Closes that consistency gap.

Implementation: rewritten on child_process.spawn (NOT promisify(execFile) —
async execFile does NOT honor the `input`/stdin option, which would silently
break value delivery; verified against a real /bin/sh). A shared runHelper()
spawns `/bin/sh -c <command>` and writes the value to child.stdin explicitly.

All security invariants preserved (appsec-reviewed, 8/8 PASS, SAFE TO MERGE):
- value on stdin ONLY — never argv, shell string, or env
- key NAME via HERMES_SECRET_KEY env only; validated /^[A-Za-z_]\w*$/ pre-spawn
- hard timeout SIGKILLs a hung helper; output capped at 1 MiB
- stderr piped + discarded (can echo the value) — never inherited
- coarse, secret-free errors (timeout/exit-N/helper-not-found/bad-key)
- single Promise resolution via idempotent settled/finish guard
- EPIPE on early helper exit handled (stdin 'error' listener before write)

Callers: the two async ipcMain.handle handlers now `await` the result and
add a defensive `.catch` returning a coarse error (belt-and-suspenders per
the review's LOW finding).

Tests: mock rewritten for spawn (EventEmitter child); +2 branch tests
(timeout, helper-not-found). 162/162 secrets-suite tests; typecheck + eslint
clean. Real-/bin/sh runtime check confirms value reaches stdin and the
delete path closes stdin without hanging (the bug the execFile attempt hid).
…IR-026 / Greptile P1)

The install gate fell through to a broad /(_API_KEY|_TOKEN)$/ scan when the
catalogued provider's expected key was not resolved directly — so a vault holding
only an unrelated token (GITHUB_TOKEN, SLACK_BOT_TOKEN) and no LLM key falsely
cleared the gate and showed chat instead of routing the user back through Setup.

Fix: extract vaultResolvedHasKey(resolved, expectedKey). When expectedKey is known
(catalogued provider), accept ONLY that key or one of its accepted aliases via the
shared aliasesForEnvKey() / KEY_ALIASES single source of truth. The broad fallback
now fires only when expectedKey is null (uncatalogued provider — no canonical name
to match). Mirrors config-health.ts resolvedHasKey(), closing a sibling-asymmetry.

Fail-closed: a resolver error leaves hasApiKey false (routes to Setup). Member of
the credential-name-alias-across-gates class (AIR-026), same class as de949a7.

Tests: tests/installer-vault-gate.test.ts (8) — bug-repro reds without the fix
(pre-fix broad scan returned true for {GITHUB_TOKEN}); covers exact key, real
aliases from the live map, blank/whitespace/non-string values, and the
expectedKey-null boundary. Typecheck clean (node+web); semgrep TS clean on
installer.ts. AppSec verdict: SHIP. Diagrams: docs/diagrams/install-gate-vault-alias-diagrams.md.
- Replace stale FAKE_VAULT bare variable with mocks.fakeVault
- Fix ReturnType<typeof getConnectionConfig> → typeof mockedGetConnectionConfig
  (getConnectionConfig not in scope as a named import in the vi.hoisted harness)
@0xr00tf3rr3t

Copy link
Copy Markdown
Contributor Author

Rebased onto current main (31 commits behind → 0). All conflicts resolved:

Conflicts resolved:

  • src/preload/index.d.ts — duplicate invalidateSecretsCache entry (upstream added it; our commits added it too — deduplicated)
  • src/renderer/src/screens/Gateway/Gateway.test.tsx — inline mock object replaced with createHermesAPIMock() helper (our version taken)
  • src/renderer/src/screens/Settings/Settings.tsx — 10s polling replaces onConnectionConfigChanged subscription (our version); Refresh-from-vault button added (our version)
  • src/main/index.ts — additive imports merged (both type ConnectionConfig and secretsProviderStatus)
  • src/main/config-health.test.tsvi.hoisted harness conflict resolved; rebase artefacts (FAKE_VAULT, stale ReturnType reference) cleaned up

Verification: npm run typecheck:node clean (0 errors). npm test — 1602 passed, 3 skipped (pre-existing), 0 failures.

…tainer

tests/ssh-remote.test.ts wrote its fake `hermes` shim into $HOME/bin and made
it reachable only by prepending that dir to PATH. The command under test
(buildRemoteHermesCmd) runs under `bash -lc` — a LOGIN shell that re-sources
/etc/profile and resets PATH — and probes a list of ABSOLUTE venv paths before
falling back to `command -v hermes`. In a clean CI container (node:22-bookworm)
there is no real `hermes` anywhere and the prepended PATH entry does not survive
the login shell, so `command -v hermes` finds nothing and the command exits 1
with "hermes CLI not found" — 4 failing cases. On a dev box the same test passed
for the WRONG reason: `command -v hermes` resolved a real host hermes.

Fix is test-only: install the shim at $HOME/.local/bin/hermes — a path
buildRemoteHermesCmd probes BY ABSOLUTE PATH ([ -x $HOME/.local/bin/hermes ])
before the PATH-dependent fallback. The shim is now hit deterministically,
independent of login-shell PATH behavior and of whether a real hermes exists on
the host. PATH is still prepended as belt-and-suspenders. No production code
changes.

Verified GREEN in the real CI image via `forgejo-runner exec` on
node:22-bookworm: tests/ssh-remote.test.ts 17/17 pass (was 4 failing). Typecheck
clean on the tracked surface.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant