Skip to content

feat(web): add push-to-talk, VAD continuous listening, and voice settings#2

Open
P2Chill wants to merge 18 commits intomainfrom
feat/voice-modes
Open

feat(web): add push-to-talk, VAD continuous listening, and voice settings#2
P2Chill wants to merge 18 commits intomainfrom
feat/voice-modes

Conversation

@P2Chill
Copy link
Copy Markdown
Owner

@P2Chill P2Chill commented Mar 2, 2026

Summary

Adds two new voice input modes alongside the existing toggle:

  • Push-to-Talk (PTT): Configurable hotkey (default F13), hold to record, release to send. BroadcastChannel tab coordination prevents dual-tab recording.
  • Voice Activity Detection (VAD): Energy-based continuous listening with conversation mode button. Auto-sends after silence, auto-resumes after TTS playback. Configurable sensitivity slider in settings.
  • Voice Settings UI: PTT key picker and VAD sensitivity slider in the Voice settings tab.

Changed files

  • voice-input.js — Complete rewrite with PTT, VAD, tab coordination, health monitoring
  • page-chat.js — VAD waveform button next to mic button
  • page-settings.js — PTT key picker + VAD sensitivity slider
  • components.css — VAD button CSS states (listening glow, speech pulse)
  • input.css — Waveform icon SVG
  • locales/en,fr,zh/chat.js — i18n keys

Test plan

  • Toggle mode still works (click mic, click again to send)
  • PTT: hold configured key, speak, release - transcribes and sends
  • PTT key rebind in Settings > Voice works
  • VAD: click waveform button - listening state - speak - silence - auto-sends
  • VAD sensitivity slider adjusts detection threshold
  • VAD mutes during TTS playback, resumes after
  • VAD survives 10+ back-and-forth turns without degradation
  • No duplicate recorder starts in console logs
  • Tab coordination: only one tab records at a time

penso and others added 16 commits March 2, 2026 08:57
… and snapcraft

style.css was gitignored in 25f327b but the release workflow,
Dockerfile, and snapcraft.yaml were not updated to generate it
before cargo build. This caused all three CI gate jobs (clippy,
test, e2e) and all downstream build jobs to fail with:
  error: couldn't read `crates/web/src/assets/style.css`

Add "Build Tailwind CSS" step to every job in release.yml that
compiles Rust, using platform-appropriate standalone binaries
(linux-x64/arm64, macos-arm64/x64, windows-x64). Also add the
step to the Dockerfile builder stage and snapcraft override-build.
…t-abort

The "aborted broadcast cleans up UI state" test was flaky because it
injected a fake #thinkingIndicator into #messages before the
sessions.switch RPC response arrived. When renderHistory() then
cleared chatMsgBox.textContent, the injected element disappeared.

Wait for state.sessionSwitchInProgress and state.chatBatchLoading
to be false in beforeEach, ensuring history rendering is complete
before any test injects DOM elements.
…tHub API

Replace direct GitHub releases API polling with a website-hosted
releases.json manifest. This decouples update announcements from
GitHub releases, so broken or draft releases are never surfaced.

The manifest supports stable and unstable channels — pre-release
builds check unstable, stable builds check stable. All fetch errors
(404, parse failure, network) are silently ignored.

Config field renamed: server.update_repository_url → server.update_releases_url
Default URL: https://www.moltis.org/releases.json
…tis-org#299)

* fix(config): accept provider url alias for base_url

* fix(web): make build-css install local tailwind deps
…oltis-org#301)

* fix(providers): use Ollama capabilities field for tool support detection

The hardcoded OLLAMA_NATIVE_TOOL_FAMILIES whitelist caused ~50% tool
call failures for models not in the list (e.g. MiniMax M2.5) by forcing
them into text mode, even though Ollama's /api/show reports native tool
support via the capabilities field.

- Read capabilities from OllamaShowResponse, fall back to family
  whitelist only for pre-0.5.x Ollama without capabilities
- Sanitize tool names (trim whitespace, strip quotes) before registry
  lookup to handle models that wrap names in quotes
- Add <invoke> XML format parser for models that emit tool calls as
  <invoke name="tool"><arg name="key">value</arg></invoke>
- Add "invoke" and "tool_calls" to response sanitizer INTERNAL_TAGS

Closes moltis-org#281

* fix(providers): collapse nested if per clippy collapsible_if lint

* test(agents,providers): add regression and edge-case tests for tool calling

Backward compatibility:
- Verify fenced, XML function, and bare JSON formats still parse
  identically after adding the invoke parser
- Verify sanitize_tool_name is a no-op on all real production tool names
- Verify OllamaShowResponse deserializes from both old (no capabilities)
  and new (with capabilities) Ollama JSON

Edge cases:
- invoke: no args, unclosed tag, missing name, empty name, JSON arg
  values, multiline values, mixed with fenced blocks, multiple blocks
- sanitize_tool_name: empty, only-quotes, internal quotes, single quotes
- response sanitizer: invoke/tool_calls tags stripped, prose "invoke"
  preserved, tool_call recovery unaffected by new INTERNAL_TAGS
- resolve_ollama_tool_mode: no probe result, explicit overrides
  capabilities, single "tools" capability, deserialization roundtrip

* fix(web): install node_modules before resolving tailwindcss binary

The npm install was only running in the fallback npx branch, so when a
global tailwindcss CLI was found the local tailwindcss package was never
installed. The CLI needs the local package to resolve CSS imports like
`@import "tailwindcss"`, causing failures in fresh worktrees/clones.

Move the node_modules check before binary resolution so it runs
unconditionally.
@P2Chill P2Chill force-pushed the feat/voice-modes branch 2 times, most recently from ac30fc0 to 24bb027 Compare March 3, 2026 00:23
P2Chill added 2 commits March 3, 2026 01:42
…ings

Add two new voice input modes alongside the existing toggle:

Push-to-Talk (PTT):
- Configurable hotkey (default F13, stored in localStorage)
- Hold to record, release to send
- Function keys work even when focused in text inputs
- BroadcastChannel tab coordination prevents dual-tab recording

Voice Activity Detection (VAD):
- Energy-based continuous listening with conversation mode button
- Exponential sensitivity curve (0-100%) configurable in settings
- Auto-sends after 2.5s silence, 30s max recording safety valve
- Mutes during TTS playback, auto-resumes after with echo settle delay
- AudioContext health monitoring with auto-resume on browser suspension
- MediaStream track health check with automatic reacquisition
- Race condition guards (vadTranscribing flag) prevent recorder restart
  storms during async transcription fetches
- EBML header validation catches corrupt WebM blobs before API submission
- 15s fetch timeout prevents stuck transcription state

Voice Settings UI:
- PTT key picker (click to listen, press any key to rebind)
- VAD sensitivity slider with real-time threshold preview
- Waveform icon button with CSS states (listening glow, speech pulse)

Also adds i18n keys for en/fr/zh locales.
Sets commit statuses (local/lint, local/test, etc.) that the upstream
local-validation jobs poll for. Required because upstream ci.yml skips
actual checks on pull_request events from forks.
@P2Chill P2Chill force-pushed the feat/voice-modes branch from 57399d8 to f4e14b0 Compare March 3, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants