iOS: dock redesign + voice mode (M1 + M2)#61
Open
lakunle wants to merge 201 commits into
Open
Conversation
Spec for giving Boop the ability to design and generate beautifully-designed PDFs (six doc types — brief, invoice, itinerary, resume, newsletter, reference), with boop-design as the universal design quality gate, plus the bite-sized implementation plan derived from it. Also gitignore .superpowers/ (brainstorm session content).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dispatcher was routing document-generation requests to Canva (when connected) instead of using the new pdf-* skills. The new prompt section explicitly tells the dispatcher about Boop's built-in PDF skills, when to spawn with empty integrations vs data-source integrations, and steers it away from Canva for documents. Discovered during E2E verification.
- server/pdf-tools.ts: gate getBrowser() restart with a `restarting` flag.
Two concurrent renders at exactly the 100-render boundary could each
launch a fresh Chromium and overwrite each other's reference, leaking
the orphaned process.
- convex/pdfArtifacts.ts: throw if storage.getUrl returns null after store.
Silently caching `""` would have produced text-only iMessage replies with
no warning when storage broke.
- .claude/skills/pdf-{invoice,resume}/SKILL.md: replace "Studio Hera" /
"Hera Studio" placeholder names with "Meridian Studio". This is a public
repo and the original placeholder maps to the maintainer's actual brand
(hera.ng); replacing with a fully fictional name keeps templates generic.
Tightened the description: frontmatter on each Convex skill so the SDK's trigger-rich routing fits more skills in the available context budget. skills-lock.json picks up the corresponding skillPath field and refreshed computedHash values from npx convex ai-files install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Claude Code harness writes .claude/scheduled_tasks.lock with the current sessionId/pid. Scoped to top-level *.lock so tracked .claude/skills/ files keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Append --dns-result-order=ipv4first to NODE_OPTIONS on every spawn so undici/fetch inside children prefers IPv4. Networks with broken IPv6 to Cloudflare-fronted endpoints (Convex deployments, npm registry) hang intermittently under happy-eyeballs dual-stack — surfacing as tick/sweep timeouts. Local-network workaround; harmless on healthy IPv6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Composio doesn't host a managed OAuth app for Confluence, so it requires a manually-registered auth config in the Composio Dashboard. authMode "byo" routes the connect flow through the existing BYO auth-config lookup path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swap the boilerplate example data in pdf-resume and its companion plan doc to a clearly fictional persona so the templates are unambiguously generic. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec covers: - Channel abstraction (server/channels/) with sendblue + telegram impls - Active-channel setting (user controls where unsolicited messages go) - Unioned recent-history across channels - Telegram inbound voice via OpenAI gpt-4o-mini-transcribe - Hybrid env+Convex allowlist with interactive approval CLI Plan decomposes into 34 tasks across 6 phases. Each task ends in a runnable system; no flag-day refactors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for the channel-abstraction layer. Adds: - types.ts: Channel interface, ChannelId, ConversationId template type, SendOpts, ParsedInbound - text.ts: stripMarkdown, chunk, formatDuration (pure utilities, shared by all channels) No runtime wiring yet — these are scaffolding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thin wrapper around the existing server/sendblue.ts exports. No behavior change — the adapter just rehomes the existing functions behind the Channel interface so the registry can treat all channels uniformly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt/undefined' case Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reads `source` and `voiceTurnId` from the POST /channels/ios/inbound body with strict validation (source must be the literal "voice"; voiceTurnId must be a non-empty string) and passes them into runTurn so the interaction agent can enter voice-prompt mode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a `disposed` flag so that push() and flush() become silent no-ops after dispose() is called, preventing onSentence from firing through a closed WebSocket (T7 TtsSidecar race condition). dispose() is also now idempotent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use word-boundary lookarounds (?<!\w) and (?!\w) on the underscore-italic pattern so snake_case identifiers are not corrupted by the AVSpeech fallback. Two new tests cover the fix and the original italic-stripping behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- send() now checks `ended` before readyState so text frames cannot slip in after the EOS frame has been sent. - end() clears the 2-second fallback setTimeout on the happy path, preventing the event loop from being held open unnecessarily. - Adds test: "send() after end() is silently ignored". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…isposed guards + test glob - Fix 1 (critical): stale-closure bug — sentencesInFlight[] tracks every sentence sent to ElevenLabs; on onError all of them are flushed into fallbackBuffer, not just the first closure-captured sentence. - Fix 2 (important): midTurnFailed sticky flag prevents re-opening an ElevenLabs stream after an error; future sentences route directly to fallbackBuffer instead of retrying. - Fix 3 (important): every emit helper (emitChunk, emitDone, emitLocalFallback, emitError) now guards with `if (disposed) return` to prevent ghost events after dispose() — in particular the elevenlabs.end().then(emitDone) continuation can no longer fire tts_done post-disposal. - Fix 4 (minor): package.json test glob updated from tests/*.test.ts to 'tests/**/*.test.ts' so tests/voice/ suites are picked up by npm test. - Test: adds dispose-race test confirming tts_done is suppressed after dispose. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…* events Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ctor store Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds VoiceVAD (pure logic with noise-floor calibration, silence/min/max utterance guards), rmsDB() PCM helper, TranscriptBox, and startListening() extension on VoiceSession. Also adds BoopTests unit-test target with 2/2 VoiceVADTests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VoiceModeStore: full implementation with VoiceSession + AudioQueue + SSE consumer loop; commitTurn posts voice turns with source/voiceTurnId; ttsChunk/ttsDone/ttsError/ttsUseLocal events dispatched correctly - BoopClient: extend sendInbound with optional source/voiceTurnId params (existing call sites unchanged); add streamSSE(threadId:) helper - VoiceModeSheet: state-driven orb using LottieOrbView per state; enter() called on permissions grant; exit() wrapped in Task - Dock: lazy-init VoiceModeStore in sheet closure with BoopClient + conversationId + threadId; add AppSettings environment dependency - ChatStore: expose currentBearer getter (needed by Dock to build BoopClient without duplicating settings reference) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rdown guards - Dock: hoist VoiceModeStore to @State var voiceStore (created lazily via .onChange when showVoiceMode flips true, released on dismiss) so the store is never reconstructed on upstream re-renders, preventing audio/sseTask leaks - VoiceModeStore.enter(): guard state == .permissionPending for idempotency - VoiceModeStore.enter() catch: call await session.deactivate() to release AVAudioSession on partial-init failure - AudioQueue.enqueue(): guard isStarted to drop late tts_chunk arrivals during the teardown window Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VoiceModeSheet: swap Lottie orb for SF Symbol when Reduce Motion is on - Fix mute button icon (mic.fill/mic.slash.fill — was speaker.slash.fill) - Add accessibilityLabel to mute, exit, and keyboard buttons - Ensure store.exit() fires before dismiss() in exit + keyboard actions - Add BoopUITests target + VoiceModeUITests smoke test (skips without pairing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Combined PR covering two iOS milestones on
feat/ios-channel:M1 — Dock redesign + chat UX polish
Composer pill + welded sliding active tab, bare inactive thread icons, attach picker UI with chip staging, reduce-motion handling, dock hosted via
safeAreaInset. 15 commits ending at1c74f55.M2 — Voice mode (new)
Dedicated full-screen voice mode sheet: continuous-loop hands-free conversation with on-device SFSpeechRecognizer + VAD, server-streamed TTS via ElevenLabs Flash v2.5, AVSpeechSynthesizer fallback when
ELEVENLABS_API_KEYis unset, Lottie-driven orb (animations downloaded manually post-merge), permissions card, mute/exit/keyboard controls, Reduce Motion fallback. 31 commits from0571ee2toacf43bb.Spec:
docs/superpowers/specs/2026-05-23-ios-voice-m2-design.htmlPlan:
docs/superpowers/plans/2026-05-23-ios-voice-m2-plan.htmlWhat changed (M2)
server/voice/(sentence-buffer, markdown-strip, ElevenLabs WS adapter, TtsSidecar orchestrator). ExtendedParsedInboundwithsource: "voice"+voiceTurnId. System-prompt addendum for voice turns. New SSE events:tts_chunk/tts_done/tts_error/tts_use_local. New env vars in.env.example.Voice/(VoiceSession actor, VoiceVAD, AudioQueue, VoicePermissions) +Views/Voice/(VoiceModeSheet, VoicePermissionsCard). Lottie SPM dep +LottieOrbView. Dock mic button now opens the sheet. SSE client extended for the four new events.tests/voice/— 27 unit tests passing (sentence-buffer, markdown-strip, tts-elevenlabs, tts-sidecar, interaction-prompt, parsed-inbound). 2 harness-gated tests skip cleanly. iOSBoopTestsadds VAD unit tests. iOSBoopUITestsadds an open/dismiss UI smoke test.ios/README.mdvoice mode setup + smoke checklist.Boop IOS Design.mdM2 status block.Test plan
ELEVENLABS_API_KEYset): tap mic → speak "hello" → orb transitions listening → thinking → speaking → listening. First audio ≤ 1.5s after VAD commit.AVSpeechSynthesizer.listening.json/thinking.json/speaking.jsonintoios/Boop/Resources/lottie/from the candidates linked in the spec. Without them, the static SF Symbol orb still renders correctly.Known follow-ups (deferred, not blocking)
toggleMute()flips state but doesn't actually stop the input tap — needs a realVoiceSession.pauseListening().tests/ios-thread-routes.test.ts+tests/threads.test.tsfail in CI withoutCONVEX_URL— needs env-aware skip gating (separate task).AudioQueuenot unit-testable without a physical device.🤖 Generated with Claude Code