Skip to content

Replace Piper/macOS TTS with Pocket TTS#646

Merged
dharmab merged 23 commits intov2-devfrom
pocket-tts
Mar 20, 2026
Merged

Replace Piper/macOS TTS with Pocket TTS#646
dharmab merged 23 commits intov2-devfrom
pocket-tts

Conversation

@dharmab
Copy link
Owner

@dharmab dharmab commented Mar 18, 2026

Replace Piper TTS (Windows/Linux) and macOS Speech Synthesis with Pocket TTS via sherpa-onnx.

Closes #635

🤖 Generated with Claude Code

dharmab and others added 2 commits March 18, 2026 00:43
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Pocket TTS speaker using sherpa-onnx voice cloning
- Remove Piper and macOS Speech Synthesis backends
- Verify archive hashes before extracting model downloads
- Add WAV decoder bounds checking for malformed files
- Add Application.Close() to release TTS C resources on shutdown
- Deduplicate model setup logic in CLI entrypoint
- Use named constants for model filenames instead of positional indices
- Handle unexpected model verification errors (e.g. permission denied)
- Use "Magic" callsign in integration tests for better TTS recognition

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dharmab dharmab changed the title Replace Piper/macOS TTS with Pocket TTS via sherpa-onnx Replace Piper/macOS TTS with Pocket TTS Mar 18, 2026
dharmab and others added 21 commits March 18, 2026 02:13
Add a new integration-test job that downloads both Parakeet and Pocket
TTS models and runs the TTS→STT round-trip tests. Gate release and
push-images jobs on integration tests passing.

Fix model download paths in all build jobs from models/parakeet to
models so download-models correctly creates both parakeet/ and pocket/
subdirectories, including Pocket TTS models in release archives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Suppress gosec false positives:
- G101 on model file SHA256 hashes (not credentials)
- G115 on uint16→int16 PCM sample reinterpretation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tests advisory

Add digit homophone substitutions (won→1, to/too→2, free/tree→3, for/fore→4,
ate→8, niner→9, tutu→22) and ordinal suffix stripping (5th→5) to
ParsePilotCallsign. Deduplicate consecutive repeated words to handle STT
stutter (e.g. "eagle eagle 2 7" → "eagle 2 7"). Truncate callsign text at
"request" instead of just removing the word.

Add comprehensive unit tests for homophones, ordinals, and stutter
deduplication. Add integration round trip test covering all 81 two-digit
callsign combinations (1-1 through 9-9).

Make CI integration tests advisory (continue-on-error) and run them only
after lint and unit tests pass, since TTS→STT round trips are inherently
nondeterministic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split model version metadata (URLs, hashes, filenames) into dedicated
version.go files so the CI cache key only changes when the actual model
version changes, not when download logic is modified.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n integration tests

Add new bogey dope misrecognitions (bodidoda, bougie, vogie, wajidoke)
to the replacements LUT. Extract callsign similarity threshold to an
exported constant and update integration tests to snap parsed callsigns
to the closest candidate using edit distance, mirroring the real
application's radar database behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add "dody dot" and "ody do" to the bogey dope replacements LUT.
Rewrite TestRoundTripCallsignNumbers to use a probabilistic approach
with multiple callsign words (Eagle, Mobius, Wardog), request phrasings,
and a multi-flight candidate list, requiring >99% success rate instead
of 100% to account for inherent TTS→STT lossyness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set GitHub Actions job timeout to 60 minutes and Go test timeout to
45 minutes to allow the probabilistic callsign round-trip tests to
complete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use a worker pool (NumCPU/2 goroutines) with independent TTS/STT
pipelines to run integration test permutations concurrently, reducing
wall-clock time from ~10m to ~2m.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add replacements for bodhi, bojy, boy do, boyido, budgie, moji,
ogie, og da, vaughi, vogee, voji with parser unit tests.

Rework integration test to sample 40 random callsigns (20 common,
20 random) and repeat to 500+ inputs for statistical sensitivity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmarks showed 2 threads gives 15-65% speedup over single-threaded
inference. Adds --voice-multithreading flag (default 2) and WithThreads
option. Includes a skipped benchmark for comparing thread counts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register the actions/cache step early (right after checkout) so the
post-job save hook is active even if build steps fail. Move model
downloads before build/test steps so they can benefit from cache
hits sooner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace standard log with zerolog and flag with cobra in download-models
- Extract shared model error types and setup logic into pkg/models
- Deduplicate model setup between cmd/skyeye and cmd/download-models
- Use "failed to" prefix consistently in error messages for end-user clarity
- Fix .gitignore to only ignore root-level models/ directory
- Check MarkFlagFilename/MarkFlagDirname return values (errcheck)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract shared model download/verify/extract logic from parakeet and
pocket model packages into pkg/models/archive.go. Move DownsampleF32
into the pocket package as unexported downsample since it has no other
callers. Add "Ford 1 1" callsign test cases across all parser commands.
Fix misplaced godoc comment on isDigitLike/deduplicateConsecutiveWords.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dharmab
Copy link
Owner Author

dharmab commented Mar 20, 2026

Merging so I can iterate on quality issues - code is fine, models are not

@dharmab dharmab marked this pull request as ready for review March 20, 2026 05:16
@dharmab dharmab merged commit 5e87fed into v2-dev Mar 20, 2026
8 of 9 checks passed
@dharmab dharmab deleted the pocket-tts branch March 20, 2026 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant