security: 5 polarity fixes — fail-closed in prod, FSM default-on#9
security: 5 polarity fixes — fail-closed in prod, FSM default-on#9bGOATnote wants to merge 3 commits into
Conversation
Per the 2026-04-27 Glasswing-grade sweep + user review. Unifying thread:
critical controls were fail-OPEN / opt-IN. Flipped to fail-CLOSED /
default-ON. No live-demo behavior change — the B300 pod already sets
PRISM42_ENABLE_FSM=1 and prod has the worker/HMAC secrets configured.
The polarity flip protects deployments where an env var is missing.
H1 — Predictable session IDs (CRITICAL chain root)
mvp/911-console-live/lib/session-store.ts
Math.random() (V8 XorShift128+, predictable from a few outputs)
→ crypto.randomUUID() (CSPRNG). Session IDs gate auth on
/api/session/:id/{stream,turn,end}; the comment "not used for any
cryptographic purpose" misjudged the threat model.
H2 — /api/session/:id/turn fail-open auth
mvp/911-console-live/app/prism42/api/session/[id]/turn/route.ts
When VERCEL_ENV === "production" and PRISM42_WORKER_KEY is unset,
return 503 instead of treating missing-env as "open". Preview/dev
unchanged.
H3 — /api/chat/completions HMAC fail-open
mvp/911-console-live/app/prism42/api/chat/completions/route.ts
Same polarity flip on both escape hatches: missing
ELEVENLABS_SIGNING_SECRET in production = 503; the
PRISM42_SKIP_HMAC_PREVIEW flag is hard-blocked when
VERCEL_ENV === "production" regardless of NEXT_PUBLIC_VERCEL_ENV.
H4 — LiveKit signaling port (7880/tcp) exposed publicly
infra/b300/setup.sh
Drop `ufw allow 7880/tcp` (and 7881/tcp). Caddy terminates TLS in
front of 127.0.0.1:7880; opening the port let attackers bypass TLS
and hit the signaling WebSocket directly. 7882/udp media path is
the supported public surface.
H5 — FSM safety latch was opt-IN (safety-integrity bug, not just sec)
agents/livekit/dispatcher_fsm.py
Default flipped from PRISM42_ENABLE_FSM=0 → ON. Opt-out via
PRISM42_DISABLE_FSM=1 or PRISM42_ENABLE_FSM=0. Without the FSM the
worker is just an LLM voice demo — the deterministic-dispatcher
claim collapses, CPR latch / pronoun discipline / anti-repetition
go away. Live pod's systemd unit already sets
PRISM42_ENABLE_FSM=1, so prod behavior is unchanged.
M3 — CI swallowed pytest failures
.github/workflows/verify.yml
Drop `|| true` from the test step. Tests that need
corpus/kernel_bugs.yaml already auto-skip via conftest.py, so the
fallback was masking real regressions and undermining the
`make verify-all` discipline (CLAUDE.md §4).
Merge gate: hold until Wed 2026-04-29 13:00 PT (post-judges).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
prism42.thegoatnote.com (B300 self-hosted LiveKit signaling) is unreachable on tcp/443 as of ~2026-04-27 02:18 PT — frontend WSS upgrade fails with "WebSocket closed before connection established." Pod tcp/22 still responds (SSH key auth rejected separately), so the machine is alive but the LiveKit/Caddy stack on 443 is down. Without brev CLI auth or pod SSH there is no way to restart the stack from this session. Surgical fallback: bare-host and /prism42 redirects on prism42-app.thegoatnote.com flip from /prism42/livekit (broken) to /prism42-v3 (cloud backup, ElevenLabs ConvAI path, independently deployed on Vercel — verified HTTP 200). 307 temporary redirects so this reverts cleanly when the pod is back. The /prism42/livekit page itself is untouched — judges who type that path explicitly still get there (and see the WSS error). Only the canonical-host default routing changes. Team A's perceptual-SOTA work on agents/livekit/* is unaffected. Revert: change destination back to /prism42/livekit in both redirects once the pod is verified healthy via: curl -sI https://prism42.thegoatnote.com/ # expect 200, not timeout Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The # FSM-routing-bug fix (2026-04-27, see
# findings/research/2026-04-27-future-stack/fsm-routing-bug-diagnosis.md §7):
# if reassurance_done is already latched, do NOT re-enter the
# reassurance path. Defer to the after-reassurance helper, which
# handles direct-question routing + advancement to KEY_QUESTIONS
# without re-emitting any DELIVER_REASSURANCE_* intent.
if self.reassurance_done:
self.state = State.REASSURANCE_DELIVERED
return self._intent_in_after_reassurance(f, t0)…and reverts the prompt at line 1218 from Per-section read of the PR:
Recommended rebase path:
After rebase, H1-H4 + a clean H5 are mergeable. (Comment from left/H100 session, see |
|
Auto-merge aborted — 2 blockers as of 2026-04-29 13:00 PT 1. Merge conflicts (
|
Fixes the failing Tests (pytest) check: main's verify.yml now installs structlog + pytest-asyncio, so the tests/voice collection errors clear. Preserves this PR's hardening (removal of the '|| true' pytest guard). Resolves the sole conflict (mvp/911-console-live/vercel.json) by taking main's catch-all redirect, which supersedes the stale two-redirect block.
⛔ Hold until Wed 2026-04-29 13:00 PT (post-judges)
This PR is draft on purpose so it cannot auto-merge before the demo
window. Do not mark ready-for-review until after the Wednesday 1pm
PT cutoff. A scheduled agent will flip the draft state and merge then,
assuming CI is green.
Summary
The 2026-04-27 Glasswing-grade sweep flagged a unifying pattern across
five HIGH and one MEDIUM finding: too many critical controls were
fail-OPEN or opt-IN. This PR flips them to fail-CLOSED /
default-ON.
Live demo behavior is unchanged. The B300 pod's systemd unit
already sets
PRISM42_ENABLE_FSM=1, and production already hasPRISM42_WORKER_KEY+ELEVENLABS_SIGNING_SECRETconfigured. Thepolarity flip only changes behavior when an env var is missing — the
exact case the sweep flagged as catastrophic.
Fixes
mvp/911-console-live/lib/session-store.tsMath.random()→crypto.randomUUID()for session IDs (chain root for predictable-token attacks)mvp/911-console-live/app/prism42/api/session/[id]/turn/route.tsPRISM42_WORKER_KEYunsetmvp/911-console-live/app/prism42/api/chat/completions/route.tsELEVENLABS_SIGNING_SECRETunset;PRISM42_SKIP_HMAC_PREVIEWhard-blocked in prodinfra/b300/setup.shufw allow 7880/tcp(also 7881/tcp) — Caddy terminates TLS in front of 127.0.0.1:7880; public port let attackers bypass TLSagents/livekit/dispatcher_fsm.pyPRISM42_DISABLE_FSM=1orPRISM42_ENABLE_FSM=0. Without FSM, the deterministic-dispatcher claim collapses.github/workflows/verify.ymlTest plan
VERCEL_ENV !== "production", all escape hatches still open)|| truefallbackgit diff origin/main..HEAD -- mvp/911-console-live/reviewed by a human before Wed cutoffsystemctl show prism42-worker -p Environment | grep PRISM42_ENABLE_FSMconfirms=1already set, so flipping the default is a no-op for live behaviorsudo ufw statusOut of scope (deferred to follow-up PRs)
/api/session/start)livekit-servercompose pin from v1.9.0 → match deployed v1.11.0)_jwtfrom tracked artifacts; extend pre-commit secret regex)Why drafted (merge gate)
User directive (2026-04-27): hold for the judge demo window today;
auto-merge after Wed 2026-04-29 13:00 PT. A separate scheduled agent
handles the flip-and-merge.
🤖 Generated with Claude Code