Skip to content

iOS: dock redesign + voice mode (M1 + M2)#61

Open
lakunle wants to merge 201 commits into
raroque:mainfrom
lakunle:feat/ios-channel
Open

iOS: dock redesign + voice mode (M1 + M2)#61
lakunle wants to merge 201 commits into
raroque:mainfrom
lakunle:feat/ios-channel

Conversation

@lakunle
Copy link
Copy Markdown

@lakunle lakunle commented May 21, 2026

Summary

Combined PR covering two iOS milestones on feat/ios-channel:

M1 — Dock redesign + chat UX polish

Composer pill + welded sliding active tab, bare inactive thread icons, attach picker UI with chip staging, reduce-motion handling, dock hosted via safeAreaInset. 15 commits ending at 1c74f55.

M2 — Voice mode (new)

Dedicated full-screen voice mode sheet: continuous-loop hands-free conversation with on-device SFSpeechRecognizer + VAD, server-streamed TTS via ElevenLabs Flash v2.5, AVSpeechSynthesizer fallback when ELEVENLABS_API_KEY is unset, Lottie-driven orb (animations downloaded manually post-merge), permissions card, mute/exit/keyboard controls, Reduce Motion fallback. 31 commits from 0571ee2 to acf43bb.

Spec: docs/superpowers/specs/2026-05-23-ios-voice-m2-design.html
Plan: docs/superpowers/plans/2026-05-23-ios-voice-m2-plan.html

What changed (M2)

  • Server: new server/voice/ (sentence-buffer, markdown-strip, ElevenLabs WS adapter, TtsSidecar orchestrator). Extended ParsedInbound with source: "voice" + voiceTurnId. System-prompt addendum for voice turns. New SSE events: tts_chunk / tts_done / tts_error / tts_use_local. New env vars in .env.example.
  • iOS: new Voice/ (VoiceSession actor, VoiceVAD, AudioQueue, VoicePermissions) + Views/Voice/ (VoiceModeSheet, VoicePermissionsCard). Lottie SPM dep + LottieOrbView. Dock mic button now opens the sheet. SSE client extended for the four new events.
  • Tests: tests/voice/ — 27 unit tests passing (sentence-buffer, markdown-strip, tts-elevenlabs, tts-sidecar, interaction-prompt, parsed-inbound). 2 harness-gated tests skip cleanly. iOS BoopTests adds VAD unit tests. iOS BoopUITests adds an open/dismiss UI smoke test.
  • Docs: ios/README.md voice mode setup + smoke checklist. Boop IOS Design.md M2 status block.

Test plan

  • M1 regression: typing in dock still works; attach picker, send button, chip row, scroll behavior unchanged.
  • M2 first-tap: dock mic shows permissions card.
  • M2 happy path (with ELEVENLABS_API_KEY set): tap mic → speak "hello" → orb transitions listening → thinking → speaking → listening. First audio ≤ 1.5s after VAD commit.
  • M2 fallback (key unset): tap mic → speak → assistant text read aloud via AVSpeechSynthesizer.
  • M2 continuous loop: 10 turns without exiting.
  • M2 controls: mute toggles correctly; tap orb during Speaking skips audio; ⌨ keyboard exit lands transcript in dock composer.
  • M2 lifecycle: AirPods pull mid-Speaking fades audio + paused; phone call mid-session auto-pause/resume; background → resume.
  • iMessage path unchanged — Sendblue send still works.
  • Lottie animations: drop listening.json / thinking.json / speaking.json into ios/Boop/Resources/lottie/ from the candidates linked in the spec. Without them, the static SF Symbol orb still renders correctly.
  • TestFlight build 8 archived + uploaded.

Known follow-ups (deferred, not blocking)

  • toggleMute() flips state but doesn't actually stop the input tap — needs a real VoiceSession.pauseListening().
  • No in-UI tap-to-retry from the error state (copy now says "exit and try again").
  • Pre-existing tests/ios-thread-routes.test.ts + tests/threads.test.ts fail in CI without CONVEX_URL — needs env-aware skip gating (separate task).
  • Real-device audio playback QA — AudioQueue not unit-testable without a physical device.

🤖 Generated with Claude Code

lakunle and others added 30 commits April 30, 2026 09:47
Spec for giving Boop the ability to design and generate beautifully-designed
PDFs (six doc types — brief, invoice, itinerary, resume, newsletter,
reference), with boop-design as the universal design quality gate, plus
the bite-sized implementation plan derived from it.

Also gitignore .superpowers/ (brainstorm session content).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dispatcher was routing document-generation requests to Canva (when
connected) instead of using the new pdf-* skills. The new prompt section
explicitly tells the dispatcher about Boop's built-in PDF skills, when to
spawn with empty integrations vs data-source integrations, and steers it
away from Canva for documents. Discovered during E2E verification.
- server/pdf-tools.ts: gate getBrowser() restart with a `restarting` flag.
  Two concurrent renders at exactly the 100-render boundary could each
  launch a fresh Chromium and overwrite each other's reference, leaking
  the orphaned process.

- convex/pdfArtifacts.ts: throw if storage.getUrl returns null after store.
  Silently caching `""` would have produced text-only iMessage replies with
  no warning when storage broke.

- .claude/skills/pdf-{invoice,resume}/SKILL.md: replace "Studio Hera" /
  "Hera Studio" placeholder names with "Meridian Studio". This is a public
  repo and the original placeholder maps to the maintainer's actual brand
  (hera.ng); replacing with a fully fictional name keeps templates generic.
Tightened the description: frontmatter on each Convex skill so the SDK's
trigger-rich routing fits more skills in the available context budget.
skills-lock.json picks up the corresponding skillPath field and refreshed
computedHash values from npx convex ai-files install.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Claude Code harness writes .claude/scheduled_tasks.lock with the
current sessionId/pid. Scoped to top-level *.lock so tracked
.claude/skills/ files keep working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Append --dns-result-order=ipv4first to NODE_OPTIONS on every spawn so
undici/fetch inside children prefers IPv4. Networks with broken IPv6 to
Cloudflare-fronted endpoints (Convex deployments, npm registry) hang
intermittently under happy-eyeballs dual-stack — surfacing as tick/sweep
timeouts. Local-network workaround; harmless on healthy IPv6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Composio doesn't host a managed OAuth app for Confluence, so it requires
a manually-registered auth config in the Composio Dashboard. authMode
"byo" routes the connect flow through the existing BYO auth-config
lookup path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swap the boilerplate example data in pdf-resume and its companion plan
doc to a clearly fictional persona so the templates are unambiguously
generic. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec covers:
- Channel abstraction (server/channels/) with sendblue + telegram impls
- Active-channel setting (user controls where unsolicited messages go)
- Unioned recent-history across channels
- Telegram inbound voice via OpenAI gpt-4o-mini-transcribe
- Hybrid env+Convex allowlist with interactive approval CLI

Plan decomposes into 34 tasks across 6 phases. Each task ends in a
runnable system; no flag-day refactors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for the channel-abstraction layer. Adds:
- types.ts: Channel interface, ChannelId, ConversationId template type, SendOpts, ParsedInbound
- text.ts: stripMarkdown, chunk, formatDuration (pure utilities, shared by all channels)

No runtime wiring yet — these are scaffolding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thin wrapper around the existing server/sendblue.ts exports. No behavior
change — the adapter just rehomes the existing functions behind the
Channel interface so the registry can treat all channels uniformly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lakunle and others added 29 commits May 23, 2026 02:45
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt/undefined' case

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reads `source` and `voiceTurnId` from the POST /channels/ios/inbound
body with strict validation (source must be the literal "voice";
voiceTurnId must be a non-empty string) and passes them into runTurn so
the interaction agent can enter voice-prompt mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a `disposed` flag so that push() and flush() become silent no-ops
after dispose() is called, preventing onSentence from firing through a
closed WebSocket (T7 TtsSidecar race condition). dispose() is also now
idempotent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use word-boundary lookarounds (?<!\w) and (?!\w) on the underscore-italic
pattern so snake_case identifiers are not corrupted by the AVSpeech fallback.
Two new tests cover the fix and the original italic-stripping behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- send() now checks `ended` before readyState so text frames cannot
  slip in after the EOS frame has been sent.
- end() clears the 2-second fallback setTimeout on the happy path,
  preventing the event loop from being held open unnecessarily.
- Adds test: "send() after end() is silently ignored".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…isposed guards + test glob

- Fix 1 (critical): stale-closure bug — sentencesInFlight[] tracks every
  sentence sent to ElevenLabs; on onError all of them are flushed into
  fallbackBuffer, not just the first closure-captured sentence.
- Fix 2 (important): midTurnFailed sticky flag prevents re-opening an
  ElevenLabs stream after an error; future sentences route directly to
  fallbackBuffer instead of retrying.
- Fix 3 (important): every emit helper (emitChunk, emitDone, emitLocalFallback,
  emitError) now guards with `if (disposed) return` to prevent ghost events
  after dispose() — in particular the elevenlabs.end().then(emitDone)
  continuation can no longer fire tts_done post-disposal.
- Fix 4 (minor): package.json test glob updated from tests/*.test.ts to
  'tests/**/*.test.ts' so tests/voice/ suites are picked up by npm test.
- Test: adds dispose-race test confirming tts_done is suppressed after dispose.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…* events

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ctor store

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds VoiceVAD (pure logic with noise-floor calibration, silence/min/max
utterance guards), rmsDB() PCM helper, TranscriptBox, and startListening()
extension on VoiceSession. Also adds BoopTests unit-test target with 2/2
VoiceVADTests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VoiceModeStore: full implementation with VoiceSession + AudioQueue +
  SSE consumer loop; commitTurn posts voice turns with source/voiceTurnId;
  ttsChunk/ttsDone/ttsError/ttsUseLocal events dispatched correctly
- BoopClient: extend sendInbound with optional source/voiceTurnId params
  (existing call sites unchanged); add streamSSE(threadId:) helper
- VoiceModeSheet: state-driven orb using LottieOrbView per state;
  enter() called on permissions grant; exit() wrapped in Task
- Dock: lazy-init VoiceModeStore in sheet closure with BoopClient +
  conversationId + threadId; add AppSettings environment dependency
- ChatStore: expose currentBearer getter (needed by Dock to build
  BoopClient without duplicating settings reference)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rdown guards

- Dock: hoist VoiceModeStore to @State var voiceStore (created lazily via
  .onChange when showVoiceMode flips true, released on dismiss) so the store
  is never reconstructed on upstream re-renders, preventing audio/sseTask leaks
- VoiceModeStore.enter(): guard state == .permissionPending for idempotency
- VoiceModeStore.enter() catch: call await session.deactivate() to release
  AVAudioSession on partial-init failure
- AudioQueue.enqueue(): guard isStarted to drop late tts_chunk arrivals
  during the teardown window

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VoiceModeSheet: swap Lottie orb for SF Symbol when Reduce Motion is on
- Fix mute button icon (mic.fill/mic.slash.fill — was speaker.slash.fill)
- Add accessibilityLabel to mute, exit, and keyboard buttons
- Ensure store.exit() fires before dismiss() in exit + keyboard actions
- Add BoopUITests target + VoiceModeUITests smoke test (skips without pairing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lakunle lakunle changed the title iOS: dock redesign + chat UX polish iOS: dock redesign + voice mode (M1 + M2) May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant