Skip to content

[Feature] UX improvements inspired by Handy #2

@NOGIT007

Description

@NOGIT007

Goal

Improve Speech UX based on patterns from cjpais/Handy. Only changes that solve real user frustrations — no changes for the sake of changes.

Tasks

# Task Status
1 Add clipboard save/restore to TextInjector
2 Add Escape key to cancel recording
3 Drive overlay waveform from real mic levels
4 Add filler word removal post-processing
5 Fix overlay positioning for multi-monitor setups

Task Details

Task 1: Add clipboard save/restore to TextInjector

Every transcription destroys the user's clipboard. Save before, restore after paste.

Files:

File Lines Change
Sources/Injection/TextInjector.swift 19-62 Save pasteboard before copyToClipboard(), restore after simulatePaste() succeeds. Only restore when auto-paste is on.

Verification: Copy text, dictate, verify original clipboard is restored after auto-paste.

Task 2: Add Escape key to cancel recording

No way to abort an accidental recording. Add Escape key as cancel.

Files:

File Lines Change
Sources/Hotkeys/HotkeyManager.swift 89-93 Add Escape key check in existing global event monitor
Sources/AppState.swift 107-125 Add cancelRecording() — stop engine, hide overlay, discard audio
Sources/UI/RecordingOverlay.swift 62-64 Update hint: "Release to transcribe · Esc to cancel"

Verification: Start recording, press Escape, verify overlay hides and no transcription happens.

Task 3: Drive overlay waveform from real mic levels

Waveform bars are purely cosmetic (random heights). Drive them from actual mic input.

Files:

File Lines Change
Sources/Recording/AudioRecorder.swift 73-98 Compute RMS level in tap callback, expose as observable value
Sources/AppState.swift 10 Add @Published var audioLevel: Float bridging from AudioRecorder
Sources/UI/RecordingOverlay.swift 105-158 Replace random bar heights with level-driven heights

Verification: Record while speaking vs silent — bars should visually respond to voice.

Task 4: Add filler word removal post-processing

Strip "um", "uh", "you know" and collapse stutters ("I I I think" → "I think").

Files:

File Lines Change
Sources/Transcription/TextCleaner.swift New Static clean() function with regex-based filler removal + stutter collapse
Sources/AppState.swift 153 Apply TextCleaner.clean() after transcription
Sources/UI/SettingsView.swift 39 Add "Remove filler words" toggle (default: on)

Verification: Say "um, so like, I I think this is good, you know" — verify clean output.

Task 5: Fix overlay positioning for multi-monitor setups

Overlay always appears on main screen, not where user is working.

Files:

File Lines Change
Sources/UI/RecordingOverlay.swift 32 Replace NSScreen.main with cursor-position-based screen detection
Sources/UI/SwitchOverlayController.swift 39 Same replacement

Verification: Move cursor to secondary display, record, verify overlay appears there.

Research Summary

Files Analyzed

File Lines Purpose
Sources/Injection/TextInjector.swift 1-122 Text injection via clipboard + Cmd+V simulation
Sources/Hotkeys/HotkeyManager.swift 1-145 Global hotkey registration and event monitoring
Sources/AppState.swift 1-483 Central state, recording flow orchestration
Sources/Recording/AudioRecorder.swift 1-128 AVAudioEngine-based recording with 16kHz conversion
Sources/UI/RecordingOverlay.swift 1-195 Floating overlay with waveform animation
Sources/UI/SwitchOverlayController.swift 1-78 Profile switch overlay
Sources/UI/SettingsView.swift 1-533 Settings UI with General/Model/Permissions tabs

Key Code References

  • injectText() at TextInjector.swift:19 — clipboard write + paste simulation flow
  • startRecording() at AppState.swift:107 — recording lifecycle entry
  • installTap() at AudioRecorder.swift:73 — audio buffer callback (where RMS can be computed)
  • WaveformBar.startAnimation() at RecordingOverlay.swift:149 — current random height at line 155
  • NSScreen.main at RecordingOverlay.swift:32 and SwitchOverlayController.swift:39

Explicitly Not Doing (and why)

Feature Reason
VAD (Voice Activity Detection) Requires ONNX runtime. WhisperKit handles silence. Big binary size increase.
LLM post-processing Changes app identity. Adds API key management + network dependency.
Custom word correction Niche. Levenshtein + Soundex is complex for marginal benefit.
Audio feedback sounds Overlay provides visual feedback. Asset management overhead.
Multiple transcription engines WhisperKit is excellent on Apple Silicon.

Risks & Edge Cases

  • Risk: Clipboard restore may interfere with apps that monitor pasteboard changes — Mitigation: restore after short delay, only when auto-paste is on
  • Risk: Escape key monitor could conflict with other global shortcuts — Mitigation: only active during recording state
  • Risk: AudioRecorder is an actor, so exposing level to @MainActor UI needs careful bridging — Mitigation: use nonisolated property with atomic access or async polling

Branch: feature/handy-improvements
Tasks: View with ctrl+t

Created with /plan-issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions