💬 Talk to Me

A macOS menu bar utility for local speech-to-text, powered by Whisper and NVIDIA Parakeet. No cloud, no API key — everything runs on your machine.

Built with Tauri v2 (Rust + vanilla HTML/CSS/JS).

Features

🎙 Dictation anywhere — Press Alt+Space to record, press again to transcribe and inject text into any app
🤖 Multiple engines — Whisper (via whisper.cpp) and NVIDIA Parakeet (via ONNX Runtime), including multilingual TDT models
📦 Model management — Browse, download, and switch models from HuggingFace directly in the app
🔒 100% local — No data leaves your machine, no account required
⚡ Apple Silicon optimized — CoreML/Metal acceleration for fast inference

Quick start

# Prerequisites: Rust toolchain, Node.js
cargo tauri dev

On first launch, open the settings window to download a model. Recommended starting points:

Model	Engine	Size	Languages
Whisper Small	whisper.cpp	~244 MB	Multilingual
Whisper Large v3 Turbo	whisper.cpp	~1.5 GB	Multilingual
Parakeet CTC 0.6B	ONNX	~700 MB	English
Parakeet TDT 0.6B v3	ONNX	~2.5 GB	25 languages (EN, FR, DE, ES…)

Models are stored in ~/Library/Application Support/TalkToMe/models/.

How it works

Alt+Space → start recording (mic capture via cpal)
Alt+Space → stop recording
   → resample to 16kHz
   → compute mel spectrogram
   → run inference (Whisper or Parakeet)
   → inject text into active app (CGEvent or clipboard)

The overlay window shows recording state and transcription progress.

Architecture

src-tauri/src/          Rust backend
├── engine/             SttEngine trait → whisper_stt.rs, onnx_stt.rs
├── audio/              Mic capture, resampling, mel spectrogram (pure Rust)
├── commands/           Tauri IPC: STT, models, settings
├── hub/                HuggingFace API, downloads, model registry
├── hotkey/             Global shortcut dispatch
└── platform/           OS abstraction (TextInjector, TextSelector traits)

src/                    Vanilla JS frontend
├── index.html          Settings window (model management, preferences)
└── overlay.html        Floating recording/transcription overlay

Designed for future TTS support (Phase 6) and cross-platform portability (Windows/Linux via platform/ trait abstraction).

Requirements

macOS 13+ (Ventura)
Rust toolchain
Node.js
Microphone access permission
Accessibility permission (for keystroke injection, optional — falls back to clipboard)

Build

cargo tauri build       # Production .dmg

⚠️ Without an Apple Developer certificate, users will need to right-click → Open on first launch.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
docs/plans		docs/plans
src-tauri		src-tauri
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💬 Talk to Me

Features

Quick start

How it works

Architecture

Requirements

Build

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

💬 Talk to Me

Features

Quick start

How it works

Architecture

Requirements

Build

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages