Skip to content

[Feature] Rewrite Speech to Tauri 2 + Rust with multi-model support (v3.0.0) #3

@NOGIT007

Description

@NOGIT007

Goal

Rewrite Speech from Swift/SwiftUI to Rust + Tauri 2 (Svelte 5 frontend) while keeping all existing features and adding multi-model support (Whisper, Parakeet, Moonshine, SenseVoice) from Handy's transcribe-rs crate. Version 3.0.0.

Tasks

# Task Status
1 Initialize Tauri 2 + Svelte project scaffold
2 Implement system tray + menu bar panel (no dock icon)
3 Implement audio recording manager (cpal, 16kHz, WAV)
4 Implement transcription manager + Whisper via transcribe-rs
5 Implement global hotkey (hold-to-record + escape-cancel)
6 Implement paste manager (clipboard + CGEvent Cmd+V)
7 Wire up core recording loop (state machine)
8 Implement recording overlay with waveform visualization
9 Implement settings window - General tab + hotkey recorder
10 Implement settings - Model tab with download UI
11 Implement settings - Permissions tab
12 Implement menu bar panel UI (status, history, actions)
13 Implement model profiles + switch overlay
14 Add all Handy models + auto-update + release prep v3.0.0

Architecture

Frontend: Svelte 5 + Tailwind CSS
Backend:  Rust + Tauri 2
IPC:      Tauri commands + events

Managers: AudioMgr (cpal) | ModelMgr (transcribe-rs) | TranscriptionMgr
          HotkeyMgr (plugin) | PasteMgr (CGEvent) | SettingsMgr (store)
          OverlayMgr | UpdateMgr (updater) | PermissionsMgr

Menu bar: TrayIconBuilder + set_activation_policy(Accessory) (= LSUIElement)

Task Details

Task 1: Initialize Tauri 2 + Svelte project scaffold

Files:

File Lines Change
src-tauri/Cargo.toml new All Rust dependencies (tauri 2, transcribe-rs, cpal, cocoa, core-graphics, etc.)
src-tauri/tauri.conf.json new Windows config (main, recording-overlay, switch-overlay), tray, bundle
package.json new Svelte 5 + Tailwind + Tauri CLI

Verification: bun run tauri dev launches without errors

Task 2: Implement system tray + menu bar panel

Files:

File Lines Change
src-tauri/src/tray.rs new TrayIconBuilder, click handler, dynamic icon swap
src-tauri/src/lib.rs new set_activation_policy(Accessory), setup_tray()
src/components/MenuBarPanel.svelte new Basic panel with Quit button

Verification: Icon in menu bar, no dock icon, panel toggles on click

Task 3: Implement audio recording manager

Files:

File Lines Change
src-tauri/src/managers/audio.rs new cpal stream, 16kHz resampling, hound WAV writer, RMS level calc
src-tauri/src/commands/audio.rs new start_recording, stop_recording Tauri commands

Porting from: Sources/Recording/AudioRecorder.swift:1-150

Verification: Record 3s audio, WAV is valid 16kHz mono 16-bit PCM

Task 4: Implement transcription manager + Whisper via transcribe-rs

Files:

File Lines Change
src-tauri/src/managers/model.rs new Model registry (6 Whisper sizes), download with progress, storage
src-tauri/src/managers/transcription.rs new transcribe-rs WhisperEngine wrapper
src-tauri/src/text_cleaner.rs new Filler word/phrase removal, stutter collapse (regex)

Porting from: Sources/Transcription/WhisperService.swift:1-107, Sources/Transcription/TextCleaner.swift:1-49

Verification: Download whisper-small, transcribe test audio, verify filler removal

Task 5: Implement global hotkey (hold-to-record + escape-cancel)

Files:

File Lines Change
src-tauri/src/managers/hotkey.rs new tauri-plugin-global-shortcut, Pressed/Released, escape monitor

Porting from: Sources/Hotkeys/HotkeyManager.swift:1-153

Verification: Hold Alt+Space emits start, release emits stop, Escape cancels

Task 6: Implement paste manager (clipboard + CGEvent Cmd+V)

Files:

File Lines Change
src-tauri/src/managers/paste.rs new Save focused app, clipboard save/restore, CGEvent Cmd+V simulation, modifier wait

Porting from: Sources/Injection/TextInjector.swift:1-138

Verification: Text auto-pasted into TextEdit, original clipboard restored

Task 7: Wire up core recording loop (state machine)

Files:

File Lines Change
src-tauri/src/state.rs new AppState, TranscriptionCoordinator (Idle/Recording/Processing)
src-tauri/src/lib.rs mod Initialize managers, register commands, wire hotkey events

Porting from: Sources/AppState.swift:109-215 (recording flow), Sources/AppDelegate.swift:8-58 (lifecycle)

Verification: End-to-end: hold hotkey, speak, release -> text pasted into active app

Task 8: Implement recording overlay with waveform visualization

Files:

File Lines Change
src-tauri/src/overlay.rs new Show/hide/position overlay windows, multi-monitor
src/overlay/RecordingOverlay.svelte new 3 modes: recording/processing/ready
src/overlay/AudioWaveform.svelte new 5-bar animated waveform (blue-cyan gradient)

Porting from: Sources/UI/RecordingOverlay.swift:1-185

Verification: Overlay appears with waveform during recording, transitions through all 3 modes

Task 9: Implement settings window - General tab + hotkey recorder

Files:

File Lines Change
src-tauri/src/managers/settings.rs new tauri-plugin-store JSON persistence
src/components/SettingsWindow.svelte new Tabbed layout (450x560)
src/components/GeneralTab.svelte new Toggles + hotkey recorder

Porting from: Sources/UI/SettingsView.swift:1-99

Verification: Settings open, toggles persist across restart, hotkey recorder works

Task 10: Implement settings - Model tab with download UI

Files:

File Lines Change
src/components/ModelTab.svelte new Model list, download progress, status display
src-tauri/src/commands/model.rs new list_models, download_model, set_active, delete_model

Porting from: Sources/UI/SettingsView.swift:184-265

Verification: Browse models, download with progress bar, status shows Ready

Task 11: Implement settings - Permissions tab

Files:

File Lines Change
src/components/PermissionsTab.svelte new 3 permission rows with status + Grant buttons
src-tauri/src/managers/permissions.rs new macOS permission checks via cocoa/objc FFI
src-tauri/src/commands/permissions.rs new check_permissions, open_settings, reset

Porting from: Sources/AppDelegate.swift:62-115, Sources/UI/SettingsView.swift:267-386

Verification: Correct permission status, Grant opens System Settings

Task 12: Implement menu bar panel UI (status, history, actions)

Files:

File Lines Change
src/components/MenuBarPanel.svelte mod Full panel: status header, history, error, version, actions
src/components/TranscriptionRow.svelte new History item with copy/delete/hover
src/lib/api.ts new Typed Tauri invoke wrappers

Porting from: Sources/UI/MenuBarView.swift:1-314

Verification: Panel shows status, history populates, copy/delete work

Task 13: Implement model profiles + switch overlay

Files:

File Lines Change
src/components/ProfileCard.svelte new Profile editor with model/language pickers
src-tauri/src/commands/profiles.rs new CRUD, switch_to_next, auto-migrate
src/overlay/SwitchOverlay.svelte new 260x80 notification, 1.5s auto-hide

Porting from: Sources/AppState.swift:413-477, Sources/UI/SwitchOverlayController.swift:1-79

Verification: Create 2 profiles, switch via hotkey, overlay shows

Task 14: Add all Handy models + auto-update + release prep v3.0.0

Files:

File Lines Change
src-tauri/src/managers/model.rs mod Add Parakeet V2/V3, Moonshine (4 sizes), SenseVoice to registry
src-tauri/src/managers/update.rs new tauri-plugin-updater integration
CLAUDE.md mod Version 3.0.0, updated architecture + commands

Verification: All model types downloadable, auto-update works, bun run tauri build produces .app

Model Registry (v3.0.0)

Engine Model Size Languages
Whisper tiny/base/small/medium/large-v3/large-v3-turbo 75MB-1.6GB Multilingual
Parakeet V2 ~473MB English
Parakeet V3 ~478MB 25 European languages
Moonshine tiny/base/small/medium 31MB-192MB English
SenseVoice sensevoice ~160MB Chinese, English, Japanese, Korean, Cantonese

Research Summary

Files Analyzed

File Lines Purpose
Sources/AppState.swift 524 Central state machine, models, profiles, languages, history
Sources/SpeechApp.swift 19 App entry with MenuBarExtra
Sources/AppDelegate.swift 116 Lifecycle, permissions, hotkey/whisper setup
Sources/Recording/AudioRecorder.swift 150 AVAudioEngine 16kHz recording with RMS levels
Sources/Transcription/WhisperService.swift 107 WhisperKit model download + transcription
Sources/Transcription/TextCleaner.swift 49 Filler word removal via regex
Sources/Injection/TextInjector.swift 138 Clipboard + CGEvent Cmd+V auto-paste
Sources/Hotkeys/HotkeyManager.swift 153 Global hotkey with hold-to-record pattern
Sources/UI/RecordingOverlay.swift 185 3-mode overlay with 5-bar waveform
Sources/UI/MenuBarView.swift 314 Menu bar dropdown panel
Sources/UI/SettingsView.swift 536 Settings with General/Model/Permissions tabs
Sources/UpdateManager.swift 166 GitHub release checking + auto-download
Package.swift 34 Swift dependencies (HotKey, WhisperKit)
build_app.sh 88 Release build + bundle creation

Key Code References

  • Recording flow: AppState.swift:109-215 — startRecording → stopAndTranscribe → injectText
  • CGEvent paste: TextInjector.swift:98-113 — keycode 0x09, maskCommand, cghidEventTap
  • Audio format: AudioRecorder.swift:38-63 — 16kHz mono 16-bit PCM, 4096 buffer
  • Hotkey pattern: HotkeyManager.swift:80-93 — keyDown/keyUp + flagsChanged monitor
  • Waveform: RecordingOverlay.swift:106-147 — 5 bars, center-weighted, 8-80px height
  • Model variants: AppState.swift:266-275 — WhisperKit variant strings
  • Text cleaner: TextCleaner.swift:4-48 — filler phrases/words, stutter collapse regex

Risks & Edge Cases

  • Risk: transcribe-rs v0.2.5 maturity — Mitigation: pin version, keep whisper-rs as fallback
  • Risk: Overlay focus stealing — Mitigation: set_ignore_cursor_events(true) + alwaysOnTop, escalate to tauri-nspanel
  • Risk: Hold-to-record reliability — Mitigation: add rdev fallback for modifier/key release
  • Risk: CGEvent in Rust — Mitigation: proven approach (Handy uses enigo), only ~30 LOC

Branch: v3
Tasks: View with ctrl+t

Created with /plan-issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions