Skip to content

security: 5 polarity fixes — fail-closed in prod, FSM default-on#9

Draft
bGOATnote wants to merge 3 commits into
mainfrom
security/polarity-fixes-2026-04-27
Draft

security: 5 polarity fixes — fail-closed in prod, FSM default-on#9
bGOATnote wants to merge 3 commits into
mainfrom
security/polarity-fixes-2026-04-27

Conversation

@bGOATnote

Copy link
Copy Markdown
Contributor

⛔ Hold until Wed 2026-04-29 13:00 PT (post-judges)

This PR is draft on purpose so it cannot auto-merge before the demo
window. Do not mark ready-for-review until after the Wednesday 1pm
PT cutoff. A scheduled agent will flip the draft state and merge then,
assuming CI is green.

Summary

The 2026-04-27 Glasswing-grade sweep flagged a unifying pattern across
five HIGH and one MEDIUM finding: too many critical controls were
fail-OPEN or opt-IN. This PR flips them to fail-CLOSED /
default-ON.

Live demo behavior is unchanged. The B300 pod's systemd unit
already sets PRISM42_ENABLE_FSM=1, and production already has
PRISM42_WORKER_KEY + ELEVENLABS_SIGNING_SECRET configured. The
polarity flip only changes behavior when an env var is missing — the
exact case the sweep flagged as catastrophic.

Fixes

ID File Fix Severity
H1 mvp/911-console-live/lib/session-store.ts Math.random()crypto.randomUUID() for session IDs (chain root for predictable-token attacks) HIGH
H2 mvp/911-console-live/app/prism42/api/session/[id]/turn/route.ts 503 in production when PRISM42_WORKER_KEY unset HIGH
H3 mvp/911-console-live/app/prism42/api/chat/completions/route.ts 503 in production when ELEVENLABS_SIGNING_SECRET unset; PRISM42_SKIP_HMAC_PREVIEW hard-blocked in prod HIGH
H4 infra/b300/setup.sh Drop ufw allow 7880/tcp (also 7881/tcp) — Caddy terminates TLS in front of 127.0.0.1:7880; public port let attackers bypass TLS HIGH
H5 agents/livekit/dispatcher_fsm.py FSM default flipped ON. Opt-out via PRISM42_DISABLE_FSM=1 or PRISM42_ENABLE_FSM=0. Without FSM, the deterministic-dispatcher claim collapses HIGH (safety-integrity, not just security)
M3 .github/workflows/verify.yml Drop `

Test plan

  • Vercel preview deployment succeeds and behaves identically (preview = VERCEL_ENV !== "production", all escape hatches still open)
  • CI verify.yml passes without the || true fallback
  • git diff origin/main..HEAD -- mvp/911-console-live/ reviewed by a human before Wed cutoff
  • B300 pod posture check: systemctl show prism42-worker -p Environment | grep PRISM42_ENABLE_FSM confirms =1 already set, so flipping the default is a no-op for live behavior
  • Post-merge: confirm 7880/tcp is closed on the pod via sudo ufw status

Out of scope (deferred to follow-up PRs)

  • M2 (rate-limit + drop-GET on /api/session/start)
  • M4 (bump livekit-server compose pin from v1.9.0 → match deployed v1.11.0)
  • M5 + L1 (strip Runway _jwt from tracked artifacts; extend pre-commit secret regex)

Why drafted (merge gate)

User directive (2026-04-27): hold for the judge demo window today;
auto-merge after Wed 2026-04-29 13:00 PT. A separate scheduled agent
handles the flip-and-merge.

🤖 Generated with Claude Code

Per the 2026-04-27 Glasswing-grade sweep + user review. Unifying thread:
critical controls were fail-OPEN / opt-IN. Flipped to fail-CLOSED /
default-ON. No live-demo behavior change — the B300 pod already sets
PRISM42_ENABLE_FSM=1 and prod has the worker/HMAC secrets configured.
The polarity flip protects deployments where an env var is missing.

H1 — Predictable session IDs (CRITICAL chain root)
  mvp/911-console-live/lib/session-store.ts
  Math.random() (V8 XorShift128+, predictable from a few outputs)
  → crypto.randomUUID() (CSPRNG). Session IDs gate auth on
  /api/session/:id/{stream,turn,end}; the comment "not used for any
  cryptographic purpose" misjudged the threat model.

H2 — /api/session/:id/turn fail-open auth
  mvp/911-console-live/app/prism42/api/session/[id]/turn/route.ts
  When VERCEL_ENV === "production" and PRISM42_WORKER_KEY is unset,
  return 503 instead of treating missing-env as "open". Preview/dev
  unchanged.

H3 — /api/chat/completions HMAC fail-open
  mvp/911-console-live/app/prism42/api/chat/completions/route.ts
  Same polarity flip on both escape hatches: missing
  ELEVENLABS_SIGNING_SECRET in production = 503; the
  PRISM42_SKIP_HMAC_PREVIEW flag is hard-blocked when
  VERCEL_ENV === "production" regardless of NEXT_PUBLIC_VERCEL_ENV.

H4 — LiveKit signaling port (7880/tcp) exposed publicly
  infra/b300/setup.sh
  Drop `ufw allow 7880/tcp` (and 7881/tcp). Caddy terminates TLS in
  front of 127.0.0.1:7880; opening the port let attackers bypass TLS
  and hit the signaling WebSocket directly. 7882/udp media path is
  the supported public surface.

H5 — FSM safety latch was opt-IN (safety-integrity bug, not just sec)
  agents/livekit/dispatcher_fsm.py
  Default flipped from PRISM42_ENABLE_FSM=0 → ON. Opt-out via
  PRISM42_DISABLE_FSM=1 or PRISM42_ENABLE_FSM=0. Without the FSM the
  worker is just an LLM voice demo — the deterministic-dispatcher
  claim collapses, CPR latch / pronoun discipline / anti-repetition
  go away. Live pod's systemd unit already sets
  PRISM42_ENABLE_FSM=1, so prod behavior is unchanged.

M3 — CI swallowed pytest failures
  .github/workflows/verify.yml
  Drop `|| true` from the test step. Tests that need
  corpus/kernel_bugs.yaml already auto-skip via conftest.py, so the
  fallback was masking real regressions and undermining the
  `make verify-all` discipline (CLAUDE.md §4).

Merge gate: hold until Wed 2026-04-29 13:00 PT (post-judges).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel

vercel Bot commented Apr 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
prism42 Ready Ready Preview, Comment Jun 2, 2026 12:09am
prism42-console Ready Ready Preview, Comment Jun 2, 2026 12:09am

prism42.thegoatnote.com (B300 self-hosted LiveKit signaling) is
unreachable on tcp/443 as of ~2026-04-27 02:18 PT — frontend WSS
upgrade fails with "WebSocket closed before connection established."
Pod tcp/22 still responds (SSH key auth rejected separately), so the
machine is alive but the LiveKit/Caddy stack on 443 is down. Without
brev CLI auth or pod SSH there is no way to restart the stack from
this session.

Surgical fallback: bare-host and /prism42 redirects on
prism42-app.thegoatnote.com flip from /prism42/livekit (broken) to
/prism42-v3 (cloud backup, ElevenLabs ConvAI path, independently
deployed on Vercel — verified HTTP 200). 307 temporary redirects so
this reverts cleanly when the pod is back.

The /prism42/livekit page itself is untouched — judges who type that
path explicitly still get there (and see the WSS error). Only the
canonical-host default routing changes. Team A's perceptual-SOTA work
on agents/livekit/* is unaffected.

Revert: change destination back to /prism42/livekit in both redirects
once the pod is verified healthy via:
  curl -sI https://prism42.thegoatnote.com/  # expect 200, not timeout

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bGOATnote

Copy link
Copy Markdown
Contributor Author

⚠️ Before merge: this PR silently regresses the FSM reassurance-latch fix from aa23de6.

The dispatcher_fsm.py diff in this PR deletes lines 843-851 of the current file:

# FSM-routing-bug fix (2026-04-27, see
# findings/research/2026-04-27-future-stack/fsm-routing-bug-diagnosis.md §7):
# if reassurance_done is already latched, do NOT re-enter the
# reassurance path. Defer to the after-reassurance helper, which
# handles direct-question routing + advancement to KEY_QUESTIONS
# without re-emitting any DELIVER_REASSURANCE_* intent.
if self.reassurance_done:
    self.state = State.REASSURANCE_DELIVERED
    return self._intent_in_after_reassurance(f, t0)

…and reverts the prompt at line 1218 from "Do NOT add reassurance phrasing — no 'stay with me'…" back to "Reassure the caller…". Net effect: merging as-written reintroduces the exact "I'm with you / stay with me / help is on the way" template-loop bug that findings/research/2026-04-27-future-stack/fsm-routing-bug-diagnosis.md §7 named, and which aa23de6 shipped a fix + 8 regression tests for (tests/voice/test_fsm_reassurance_latch.py, all passing on main).

Per-section read of the PR:

  • H1-H4 (session-id crypto.randomUUID, PRISM42_WORKER_KEY fail-closed in prod, ELEVENLABS_SIGNING_SECRET fail-closed, PRISM42_SKIP_HMAC_PREVIEW blocked in prod, drop UFW 7880 exposure) — additive defense, real wins, keep these unchanged.
  • H5 (FSM default-on) — the goal is correct, but the implementation deletes a separate bug fix. Also, on the live B300 pod the FSM is already default-on via the 120-cycle2Q-fsm.conf systemd drop-in (PRISM42_ENABLE_FSM=1), so the env-var polarity flip is redundant for prod. It IS additive for environments without the drop-in (preview, fresh pods).
  • CI (5/7 red) — the pytest failures are because the PR drops || true from verify.yml:121, unmasking real failures. That's intentional, but the conftest auto-skip for missing private corpus is incomplete. Fix conftest before unmasking.

Recommended rebase path:

  1. git fetch origin && git rebase origin/main (35 commits behind).
  2. Reconcile dispatcher_fsm.py — preserve the if self.reassurance_done short-circuit at lines 843-851 and the prompt at 1218 ("Do NOT add reassurance phrasing"). The H5 env-var polarity flip can co-exist with the safety fix without touching either.
  3. Verify tests/voice/test_fsm_reassurance_latch.py still passes (8 tests).
  4. Tighten conftest skip logic so pytest passes when the private corpus is unavailable (or land that as a separate prerequisite PR).

After rebase, H1-H4 + a clean H5 are mergeable.

(Comment from left/H100 session, see findings/ops/parallel-session-coord.md for the parallel-Claude coordination contract this is operating under.)

Copy link
Copy Markdown
Contributor Author

Auto-merge aborted — 2 blockers as of 2026-04-29 13:00 PT

1. Merge conflicts (mergeable_state: dirty)

The branch has conflicts with main that require human resolution. Rebase or merge main into the branch, resolve conflicts, and push.

2. CI failure: Tests (pytest)

No merge attempted. Human action required on both blockers before re-scheduling.


Generated by Claude Code

Fixes the failing Tests (pytest) check: main's verify.yml now installs
structlog + pytest-asyncio, so the tests/voice collection errors clear.
Preserves this PR's hardening (removal of the '|| true' pytest guard).
Resolves the sole conflict (mvp/911-console-live/vercel.json) by taking
main's catch-all redirect, which supersedes the stale two-redirect block.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant