feat(web): add push-to-talk, VAD continuous listening, and voice settings by P2Chill · Pull Request #2 · P2Chill/moltis

P2Chill · 2026-03-02T18:12:30Z

Summary

Adds two new voice input modes alongside the existing toggle:

Push-to-Talk (PTT): Configurable hotkey (default F13), hold to record, release to send. BroadcastChannel tab coordination prevents dual-tab recording.
Voice Activity Detection (VAD): Energy-based continuous listening with conversation mode button. Auto-sends after silence, auto-resumes after TTS playback. Configurable sensitivity slider in settings.
Voice Settings UI: PTT key picker and VAD sensitivity slider in the Voice settings tab.

Changed files

voice-input.js — Complete rewrite with PTT, VAD, tab coordination, health monitoring
page-chat.js — VAD waveform button next to mic button
page-settings.js — PTT key picker + VAD sensitivity slider
components.css — VAD button CSS states (listening glow, speech pulse)
input.css — Waveform icon SVG
locales/en,fr,zh/chat.js — i18n keys

Test plan

Toggle mode still works (click mic, click again to send)
PTT: hold configured key, speak, release - transcribes and sends
PTT key rebind in Settings > Voice works
VAD: click waveform button - listening state - speak - silence - auto-sends
VAD sensitivity slider adjusts detection threshold
VAD mutes during TTS playback, resumes after
VAD survives 10+ back-and-forth turns without degradation
No duplicate recorder starts in console logs
Tab coordination: only one tab records at a time

… and snapcraft style.css was gitignored in 25f327b but the release workflow, Dockerfile, and snapcraft.yaml were not updated to generate it before cargo build. This caused all three CI gate jobs (clippy, test, e2e) and all downstream build jobs to fail with: error: couldn't read `crates/web/src/assets/style.css` Add "Build Tailwind CSS" step to every job in release.yml that compiles Rust, using platform-appropriate standalone binaries (linux-x64/arm64, macos-arm64/x64, windows-x64). Also add the step to the Dockerfile builder stage and snapcraft override-build.

…t-abort The "aborted broadcast cleans up UI state" test was flaky because it injected a fake #thinkingIndicator into #messages before the sessions.switch RPC response arrived. When renderHistory() then cleared chatMsgBox.textContent, the injected element disappeared. Wait for state.sessionSwitchInProgress and state.chatBatchLoading to be false in beforeEach, ensuring history rendering is complete before any test injects DOM elements.

…tHub API Replace direct GitHub releases API polling with a website-hosted releases.json manifest. This decouples update announcements from GitHub releases, so broken or draft releases are never surfaced. The manifest supports stable and unstable channels — pre-release builds check unstable, stable builds check stable. All fetch errors (404, parse failure, network) are silently ignored. Config field renamed: server.update_repository_url → server.update_releases_url Default URL: https://www.moltis.org/releases.json

…tis-org#299) * fix(config): accept provider url alias for base_url * fix(web): make build-css install local tailwind deps

…oltis-org#301) * fix(providers): use Ollama capabilities field for tool support detection The hardcoded OLLAMA_NATIVE_TOOL_FAMILIES whitelist caused ~50% tool call failures for models not in the list (e.g. MiniMax M2.5) by forcing them into text mode, even though Ollama's /api/show reports native tool support via the capabilities field. - Read capabilities from OllamaShowResponse, fall back to family whitelist only for pre-0.5.x Ollama without capabilities - Sanitize tool names (trim whitespace, strip quotes) before registry lookup to handle models that wrap names in quotes - Add <invoke> XML format parser for models that emit tool calls as <invoke name="tool"><arg name="key">value</arg></invoke> - Add "invoke" and "tool_calls" to response sanitizer INTERNAL_TAGS Closes moltis-org#281 * fix(providers): collapse nested if per clippy collapsible_if lint * test(agents,providers): add regression and edge-case tests for tool calling Backward compatibility: - Verify fenced, XML function, and bare JSON formats still parse identically after adding the invoke parser - Verify sanitize_tool_name is a no-op on all real production tool names - Verify OllamaShowResponse deserializes from both old (no capabilities) and new (with capabilities) Ollama JSON Edge cases: - invoke: no args, unclosed tag, missing name, empty name, JSON arg values, multiline values, mixed with fenced blocks, multiple blocks - sanitize_tool_name: empty, only-quotes, internal quotes, single quotes - response sanitizer: invoke/tool_calls tags stripped, prose "invoke" preserved, tool_call recovery unaffected by new INTERNAL_TAGS - resolve_ollama_tool_mode: no probe result, explicit overrides capabilities, single "tools" capability, deserialization roundtrip * fix(web): install node_modules before resolving tailwindcss binary The npm install was only running in the fallback npx branch, so when a global tailwindcss CLI was found the local tailwindcss package was never installed. The CLI needs the local package to resolve CSS imports like `@import "tailwindcss"`, causing failures in fresh worktrees/clones. Move the node_modules check before binary resolution so it runs unconditionally.

…ings Add two new voice input modes alongside the existing toggle: Push-to-Talk (PTT): - Configurable hotkey (default F13, stored in localStorage) - Hold to record, release to send - Function keys work even when focused in text inputs - BroadcastChannel tab coordination prevents dual-tab recording Voice Activity Detection (VAD): - Energy-based continuous listening with conversation mode button - Exponential sensitivity curve (0-100%) configurable in settings - Auto-sends after 2.5s silence, 30s max recording safety valve - Mutes during TTS playback, auto-resumes after with echo settle delay - AudioContext health monitoring with auto-resume on browser suspension - MediaStream track health check with automatic reacquisition - Race condition guards (vadTranscribing flag) prevent recorder restart storms during async transcription fetches - EBML header validation catches corrupt WebM blobs before API submission - 15s fetch timeout prevents stuck transcription state Voice Settings UI: - PTT key picker (click to listen, press any key to rebind) - VAD sensitivity slider with real-time threshold preview - Waveform icon button with CSS states (listening glow, speech pulse) Also adds i18n keys for en/fr/zh locales.

Sets commit statuses (local/lint, local/test, etc.) that the upstream local-validation jobs poll for. Required because upstream ci.yml skips actual checks on pull_request events from forks.

penso and others added 16 commits March 2, 2026 08:57

chore(release): prepare v0.10.8

44992cb

fix(ci): harden tailwindcss cli downloads

8308a3f

chore(release): 0.10.9

dc1ba00

fix(swift-bridge): stabilize gateway migration and stream tests

d57b73d

chore(release): 0.10.10

e3739f9

chore(release): 0.10.11

edbb9f1

chore: update deploy templates to 0.10.10

89082a2

fix(config): support provider url alias for remote Ollama config (mol…

a03c79b

…tis-org#299) * fix(config): accept provider url alias for base_url * fix(web): make build-css install local tailwind deps

feat(ci): add release dry-run mode

9d60d70

fix(ci): make release dry-run job conditions valid

edae809

chore(ci): add release workflow dispatch recipes

89a72b2

chore: update deploy templates to 0.10.11

31c42fb

P2Chill force-pushed the feat/voice-modes branch 2 times, most recently from ac30fc0 to 24bb027 Compare March 3, 2026 00:23

P2Chill added 2 commits March 3, 2026 01:42

ci: add local-ci workflow for fork PR validation

f4e14b0

Sets commit statuses (local/lint, local/test, etc.) that the upstream local-validation jobs poll for. Required because upstream ci.yml skips actual checks on pull_request events from forks.

P2Chill force-pushed the feat/voice-modes branch from 57399d8 to f4e14b0 Compare March 3, 2026 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(web): add push-to-talk, VAD continuous listening, and voice settings#2

feat(web): add push-to-talk, VAD continuous listening, and voice settings#2
P2Chill wants to merge 18 commits intomainfrom
feat/voice-modes

P2Chill commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

P2Chill commented Mar 2, 2026

Summary

Changed files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants