Goal
Improve Speech UX based on patterns from cjpais/Handy. Only changes that solve real user frustrations — no changes for the sake of changes.
Tasks
| # |
Task |
Status |
| 1 |
Add clipboard save/restore to TextInjector |
✅ |
| 2 |
Add Escape key to cancel recording |
✅ |
| 3 |
Drive overlay waveform from real mic levels |
✅ |
| 4 |
Add filler word removal post-processing |
✅ |
| 5 |
Fix overlay positioning for multi-monitor setups |
✅ |
Task Details
Task 1: Add clipboard save/restore to TextInjector
Every transcription destroys the user's clipboard. Save before, restore after paste.
Files:
| File |
Lines |
Change |
Sources/Injection/TextInjector.swift |
19-62 |
Save pasteboard before copyToClipboard(), restore after simulatePaste() succeeds. Only restore when auto-paste is on. |
Verification: Copy text, dictate, verify original clipboard is restored after auto-paste.
Task 2: Add Escape key to cancel recording
No way to abort an accidental recording. Add Escape key as cancel.
Files:
| File |
Lines |
Change |
Sources/Hotkeys/HotkeyManager.swift |
89-93 |
Add Escape key check in existing global event monitor |
Sources/AppState.swift |
107-125 |
Add cancelRecording() — stop engine, hide overlay, discard audio |
Sources/UI/RecordingOverlay.swift |
62-64 |
Update hint: "Release to transcribe · Esc to cancel" |
Verification: Start recording, press Escape, verify overlay hides and no transcription happens.
Task 3: Drive overlay waveform from real mic levels
Waveform bars are purely cosmetic (random heights). Drive them from actual mic input.
Files:
| File |
Lines |
Change |
Sources/Recording/AudioRecorder.swift |
73-98 |
Compute RMS level in tap callback, expose as observable value |
Sources/AppState.swift |
10 |
Add @Published var audioLevel: Float bridging from AudioRecorder |
Sources/UI/RecordingOverlay.swift |
105-158 |
Replace random bar heights with level-driven heights |
Verification: Record while speaking vs silent — bars should visually respond to voice.
Task 4: Add filler word removal post-processing
Strip "um", "uh", "you know" and collapse stutters ("I I I think" → "I think").
Files:
| File |
Lines |
Change |
Sources/Transcription/TextCleaner.swift |
New |
Static clean() function with regex-based filler removal + stutter collapse |
Sources/AppState.swift |
153 |
Apply TextCleaner.clean() after transcription |
Sources/UI/SettingsView.swift |
39 |
Add "Remove filler words" toggle (default: on) |
Verification: Say "um, so like, I I think this is good, you know" — verify clean output.
Task 5: Fix overlay positioning for multi-monitor setups
Overlay always appears on main screen, not where user is working.
Files:
| File |
Lines |
Change |
Sources/UI/RecordingOverlay.swift |
32 |
Replace NSScreen.main with cursor-position-based screen detection |
Sources/UI/SwitchOverlayController.swift |
39 |
Same replacement |
Verification: Move cursor to secondary display, record, verify overlay appears there.
Research Summary
Files Analyzed
| File |
Lines |
Purpose |
Sources/Injection/TextInjector.swift |
1-122 |
Text injection via clipboard + Cmd+V simulation |
Sources/Hotkeys/HotkeyManager.swift |
1-145 |
Global hotkey registration and event monitoring |
Sources/AppState.swift |
1-483 |
Central state, recording flow orchestration |
Sources/Recording/AudioRecorder.swift |
1-128 |
AVAudioEngine-based recording with 16kHz conversion |
Sources/UI/RecordingOverlay.swift |
1-195 |
Floating overlay with waveform animation |
Sources/UI/SwitchOverlayController.swift |
1-78 |
Profile switch overlay |
Sources/UI/SettingsView.swift |
1-533 |
Settings UI with General/Model/Permissions tabs |
Key Code References
injectText() at TextInjector.swift:19 — clipboard write + paste simulation flow
startRecording() at AppState.swift:107 — recording lifecycle entry
installTap() at AudioRecorder.swift:73 — audio buffer callback (where RMS can be computed)
WaveformBar.startAnimation() at RecordingOverlay.swift:149 — current random height at line 155
NSScreen.main at RecordingOverlay.swift:32 and SwitchOverlayController.swift:39
Explicitly Not Doing (and why)
| Feature |
Reason |
| VAD (Voice Activity Detection) |
Requires ONNX runtime. WhisperKit handles silence. Big binary size increase. |
| LLM post-processing |
Changes app identity. Adds API key management + network dependency. |
| Custom word correction |
Niche. Levenshtein + Soundex is complex for marginal benefit. |
| Audio feedback sounds |
Overlay provides visual feedback. Asset management overhead. |
| Multiple transcription engines |
WhisperKit is excellent on Apple Silicon. |
Risks & Edge Cases
Branch: feature/handy-improvements
Tasks: View with ctrl+t
Created with /plan-issue
Goal
Improve Speech UX based on patterns from cjpais/Handy. Only changes that solve real user frustrations — no changes for the sake of changes.
Tasks
Task Details
Task 1: Add clipboard save/restore to TextInjector
Every transcription destroys the user's clipboard. Save before, restore after paste.
Files:
Sources/Injection/TextInjector.swiftcopyToClipboard(), restore aftersimulatePaste()succeeds. Only restore when auto-paste is on.Verification: Copy text, dictate, verify original clipboard is restored after auto-paste.
Task 2: Add Escape key to cancel recording
No way to abort an accidental recording. Add Escape key as cancel.
Files:
Sources/Hotkeys/HotkeyManager.swiftSources/AppState.swiftcancelRecording()— stop engine, hide overlay, discard audioSources/UI/RecordingOverlay.swiftVerification: Start recording, press Escape, verify overlay hides and no transcription happens.
Task 3: Drive overlay waveform from real mic levels
Waveform bars are purely cosmetic (random heights). Drive them from actual mic input.
Files:
Sources/Recording/AudioRecorder.swiftSources/AppState.swift@Published var audioLevel: Floatbridging from AudioRecorderSources/UI/RecordingOverlay.swiftVerification: Record while speaking vs silent — bars should visually respond to voice.
Task 4: Add filler word removal post-processing
Strip "um", "uh", "you know" and collapse stutters ("I I I think" → "I think").
Files:
Sources/Transcription/TextCleaner.swiftclean()function with regex-based filler removal + stutter collapseSources/AppState.swiftTextCleaner.clean()after transcriptionSources/UI/SettingsView.swiftVerification: Say "um, so like, I I think this is good, you know" — verify clean output.
Task 5: Fix overlay positioning for multi-monitor setups
Overlay always appears on main screen, not where user is working.
Files:
Sources/UI/RecordingOverlay.swiftNSScreen.mainwith cursor-position-based screen detectionSources/UI/SwitchOverlayController.swiftVerification: Move cursor to secondary display, record, verify overlay appears there.
Research Summary
Files Analyzed
Sources/Injection/TextInjector.swiftSources/Hotkeys/HotkeyManager.swiftSources/AppState.swiftSources/Recording/AudioRecorder.swiftSources/UI/RecordingOverlay.swiftSources/UI/SwitchOverlayController.swiftSources/UI/SettingsView.swiftKey Code References
injectText()atTextInjector.swift:19— clipboard write + paste simulation flowstartRecording()atAppState.swift:107— recording lifecycle entryinstallTap()atAudioRecorder.swift:73— audio buffer callback (where RMS can be computed)WaveformBar.startAnimation()atRecordingOverlay.swift:149— current random height at line 155NSScreen.mainatRecordingOverlay.swift:32andSwitchOverlayController.swift:39Explicitly Not Doing (and why)
Risks & Edge Cases
actor, so exposing level to@MainActorUI needs careful bridging — Mitigation: usenonisolatedproperty with atomic access or async pollingBranch:
feature/handy-improvementsTasks: View with
ctrl+tCreated with
/plan-issue