feat(web): add push-to-talk, VAD continuous listening, and voice settings#2
Open
feat(web): add push-to-talk, VAD continuous listening, and voice settings#2
Conversation
… and snapcraft style.css was gitignored in 25f327b but the release workflow, Dockerfile, and snapcraft.yaml were not updated to generate it before cargo build. This caused all three CI gate jobs (clippy, test, e2e) and all downstream build jobs to fail with: error: couldn't read `crates/web/src/assets/style.css` Add "Build Tailwind CSS" step to every job in release.yml that compiles Rust, using platform-appropriate standalone binaries (linux-x64/arm64, macos-arm64/x64, windows-x64). Also add the step to the Dockerfile builder stage and snapcraft override-build.
…t-abort The "aborted broadcast cleans up UI state" test was flaky because it injected a fake #thinkingIndicator into #messages before the sessions.switch RPC response arrived. When renderHistory() then cleared chatMsgBox.textContent, the injected element disappeared. Wait for state.sessionSwitchInProgress and state.chatBatchLoading to be false in beforeEach, ensuring history rendering is complete before any test injects DOM elements.
…tHub API Replace direct GitHub releases API polling with a website-hosted releases.json manifest. This decouples update announcements from GitHub releases, so broken or draft releases are never surfaced. The manifest supports stable and unstable channels — pre-release builds check unstable, stable builds check stable. All fetch errors (404, parse failure, network) are silently ignored. Config field renamed: server.update_repository_url → server.update_releases_url Default URL: https://www.moltis.org/releases.json
…tis-org#299) * fix(config): accept provider url alias for base_url * fix(web): make build-css install local tailwind deps
…oltis-org#301) * fix(providers): use Ollama capabilities field for tool support detection The hardcoded OLLAMA_NATIVE_TOOL_FAMILIES whitelist caused ~50% tool call failures for models not in the list (e.g. MiniMax M2.5) by forcing them into text mode, even though Ollama's /api/show reports native tool support via the capabilities field. - Read capabilities from OllamaShowResponse, fall back to family whitelist only for pre-0.5.x Ollama without capabilities - Sanitize tool names (trim whitespace, strip quotes) before registry lookup to handle models that wrap names in quotes - Add <invoke> XML format parser for models that emit tool calls as <invoke name="tool"><arg name="key">value</arg></invoke> - Add "invoke" and "tool_calls" to response sanitizer INTERNAL_TAGS Closes moltis-org#281 * fix(providers): collapse nested if per clippy collapsible_if lint * test(agents,providers): add regression and edge-case tests for tool calling Backward compatibility: - Verify fenced, XML function, and bare JSON formats still parse identically after adding the invoke parser - Verify sanitize_tool_name is a no-op on all real production tool names - Verify OllamaShowResponse deserializes from both old (no capabilities) and new (with capabilities) Ollama JSON Edge cases: - invoke: no args, unclosed tag, missing name, empty name, JSON arg values, multiline values, mixed with fenced blocks, multiple blocks - sanitize_tool_name: empty, only-quotes, internal quotes, single quotes - response sanitizer: invoke/tool_calls tags stripped, prose "invoke" preserved, tool_call recovery unaffected by new INTERNAL_TAGS - resolve_ollama_tool_mode: no probe result, explicit overrides capabilities, single "tools" capability, deserialization roundtrip * fix(web): install node_modules before resolving tailwindcss binary The npm install was only running in the fallback npx branch, so when a global tailwindcss CLI was found the local tailwindcss package was never installed. The CLI needs the local package to resolve CSS imports like `@import "tailwindcss"`, causing failures in fresh worktrees/clones. Move the node_modules check before binary resolution so it runs unconditionally.
ac30fc0 to
24bb027
Compare
…ings Add two new voice input modes alongside the existing toggle: Push-to-Talk (PTT): - Configurable hotkey (default F13, stored in localStorage) - Hold to record, release to send - Function keys work even when focused in text inputs - BroadcastChannel tab coordination prevents dual-tab recording Voice Activity Detection (VAD): - Energy-based continuous listening with conversation mode button - Exponential sensitivity curve (0-100%) configurable in settings - Auto-sends after 2.5s silence, 30s max recording safety valve - Mutes during TTS playback, auto-resumes after with echo settle delay - AudioContext health monitoring with auto-resume on browser suspension - MediaStream track health check with automatic reacquisition - Race condition guards (vadTranscribing flag) prevent recorder restart storms during async transcription fetches - EBML header validation catches corrupt WebM blobs before API submission - 15s fetch timeout prevents stuck transcription state Voice Settings UI: - PTT key picker (click to listen, press any key to rebind) - VAD sensitivity slider with real-time threshold preview - Waveform icon button with CSS states (listening glow, speech pulse) Also adds i18n keys for en/fr/zh locales.
Sets commit statuses (local/lint, local/test, etc.) that the upstream local-validation jobs poll for. Required because upstream ci.yml skips actual checks on pull_request events from forks.
57399d8 to
f4e14b0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two new voice input modes alongside the existing toggle:
Changed files
voice-input.js— Complete rewrite with PTT, VAD, tab coordination, health monitoringpage-chat.js— VAD waveform button next to mic buttonpage-settings.js— PTT key picker + VAD sensitivity slidercomponents.css— VAD button CSS states (listening glow, speech pulse)input.css— Waveform icon SVGlocales/en,fr,zh/chat.js— i18n keysTest plan