A professional, high-performance voice dictation application for Windows that converts speech to text in real-time using Google's Gemini AI API. Built with Electron, React, and TypeScript.
Voice to Prompt is a desktop application designed for seamless speech-to-text transcription. It runs as a system tray application with a floating overlay window, allowing users to dictate text directly into any application. The application features ultra-low latency audio processing, multi-provider API support (Google Gemini, Antigravity, and custom endpoints), and automatic text injection into the active application.
- Real-time Voice Recording - Capture audio from microphone with optimal settings for speech recognition (16kHz, mono, Opus encoding)
- Automatic Transcription - Send audio to AI API and receive transcribed text
- Multiple API Providers - Support for Google Gemini, Antigravity, and custom API endpoints
- Language Support - Configurable language setting for transcription accuracy
- Floating Overlay Window - Compact, always-on-top recording indicator that stays out of your way
- System Tray Integration - Runs silently in the system tray with context menu controls
- Settings Panel - Comprehensive configuration for API keys, language, custom prompts, and endpoint settings
- Recording Indicator - Visual feedback showing recording status and duration
- Global Shortcut - Press
Super+Alt+Hto toggle recording from anywhere in the system - Enter Key Stop - Press Enter to immediately stop recording and process audio
- Automatic Text Injection - Automatically pastes transcribed text into the active application using clipboard simulation
- Multi-language Support - Optimized for Vietnamese with support for other languages
- Low-Latency Audio Pipeline - Target latency under 50ms from audio capture to API submission
- Connection Pooling - Pre-warmed WebSocket connections for instant API communication
- Audio Compression - Opus encoding at 24kbps for efficient bandwidth usage
- Process Priority Elevation - High-priority audio processing for consistent performance
- React 18 - UI framework for component-based interface
- TypeScript - Type-safe development
- Vite - Fast build tool and dev server
- Electron 33 - Cross-platform desktop application framework
- Electron Builder - Application packaging and distribution
- Web Audio API - Real-time audio capture and processing
- MediaRecorder API - Audio chunking and encoding
- Opus Codec - High-efficiency audio compression
- Google Gemini API - Primary speech-to-text provider
- Antigravity API - Alternative provider option
- Custom Endpoint - Support for self-hosted or third-party APIs
- @nut-tree-fork/nut-js - Keyboard automation for text injection
Before building and running the application, ensure you have the following installed:
- Node.js (v18 or higher)
- npm (v9 or higher)
- Windows 10/11 (64-bit)
- Microphone - Required for voice input
- API Key - Google Gemini API key (or compatible provider)
git clone <repository-url>
cd voice-to-textnpm installCreate a .env file in the project root directory:
VITE_GEMINI_API_KEY=your_api_key_hereAlternatively, you can enter your API key through the application settings panel after running the app.
npm run devTo create a Windows executable:
npm run buildThe built executable will be located in the release directory.
| Setting | Description | Default |
|---|---|---|
| API Type | Provider selection (Google/Antigravity/Custom) | |
| API Key | Authentication key for the selected provider | - |
| Custom Endpoint | URL for custom API endpoint | - |
| Language | Transcription language code | Vietnamese (vi) |
| Parameter | Value | Description |
|---|---|---|
| Sample Rate | 16000 Hz | Optimized for speech recognition |
| Channels | 1 (Mono) | Voice-optimized |
| Codec | Opus | Low-latency compression |
| Bitrate | 24 kbps | Voice-optimized quality |
| Shortcut | Action |
|---|---|
Super + Alt + H |
Toggle recording (global) |
Enter |
Stop recording and transcribe |
- Launch the application - it will minimize to the system tray
- Click the tray icon to open the settings panel
- Configure your API key and preferences
- Close settings - the app continues running in the background
- Press
Super + Alt + Hor click the tray icon and select "Bắt đầu ghi âm" - A floating overlay window appears indicating recording is active
- Speak clearly into your microphone
- Press
Enterto stop recording and process the audio - The transcribed text is automatically injected into your active application
Access the settings panel by:
- Right-clicking the system tray icon and selecting "Cài đặt"
- Clicking the system tray icon (single click)
Available settings:
- API Key Configuration - Enter and validate your API key
- Language Selection - Choose the transcription language
- Custom Prompt - Add context hints for better transcription
- API Type - Select between Google Gemini, Antigravity, or custom endpoints
┌─────────────────────────────────────────────────────────────────────────────┐
│ LOW-LATENCY AUDIO PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ AudioCapture │───▶│ AudioEncoder │───▶│ StreamBuffer │───▶│ WebSocket │ │
│ │ (Device) │ │ (Opus) │ │ (Priority) │ │ (Pooled) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
voice-to-text/
├── electron/ # Electron main process
│ ├── main.ts # Main entry point, window management
│ └── preload.ts # Preload script for IPC
├── src/
│ ├── components/ # React UI components
│ │ ├── OverlayView.tsx # Recording overlay window
│ │ └── SettingsView.tsx # Settings panel
│ ├── hooks/ # Custom React hooks
│ │ └── useAudioRecorder.ts # Audio recording logic
│ ├── lib/ # Core libraries
│ │ ├── voice-stream-engine.ts # Audio processing engine
│ │ ├── http2-stream.ts # HTTP/2 streaming
│ │ ├── performance-monitor.ts # Performance metrics
│ │ └── types.ts # TypeScript definitions
│ ├── App.tsx # Main application component
│ └── main.tsx # React entry point
├── public/ # Static assets
└── release/ # Built executables
npm run devThis starts the Vite development server with Electron.
npm run buildThis compiles TypeScript, builds the React frontend, and packages the Electron application into an executable.
electron/- Electron main process codesrc/- React frontend source codepublic/- Static assets (icons, images)dist/- Compiled frontend outputdist-electron/- Compiled Electron outputrelease/- Final executable packages
Default configuration uses Google's Gemini Flash model for transcription:
- Endpoint:
https://generativelanguage.googleapis.com/v1beta - Model:
gemini-3-flash-preview - Authentication:
X-goog-api-keyheader
Alternative provider with Bearer token authentication:
- Endpoint: Configurable (default:
https://api.antigravity.app) - Authentication:
Authorization: Bearer <api_key>
For self-hosted or third-party APIs:
- Endpoint: User-configured URL
- Authentication: Bearer token (same format as Antigravity)
No API Key
- Solution: Enter your API key in the settings panel or create a
.envfile
Microphone Not Detected
- Solution: Ensure microphone permissions are granted in Windows Settings
Text Not Being Injected
- Solution: Check that no other application is blocking clipboard access
Recording Not Starting
- Solution: Verify global shortcut permissions in Windows
Application logs are available in the developer console when running in development mode. Check for:
[VoiceStreamEngine]- Audio processing logs[ConnectionPool]- API connection logs[PerformanceMonitor]- Performance metrics
This project is provided as-is for personal and commercial use. See the LICENSE file for details.
Contributions are welcome. Please ensure:
- TypeScript strict mode compliance
- React best practices
- Proper error handling
- Performance-conscious code
- Google Gemini API for speech-to-text capabilities
- Electron and React communities
- Opus codec developers
The Windows installer can be code-signed by setting the following environment variables before npm run build:
WIN_CSC_LINK— Path to a.pfxcertificate file (or base64-encoded PFX content)WIN_CSC_KEY_PASSWORD— Password for the certificate
Without these set, builds proceed unsigned. Unsigned installers will trigger SmartScreen warnings on user systems and should be treated as dev-only artifacts.
The signing config lives under win.signtoolOptions in electron-builder.yml and uses DigiCert's RFC3161 timestamp server so signatures stay valid after the certificate expires.
Auto-update uses GitHub Releases via electron-updater. The publish channel is configured in electron-builder.yml:
publish:
provider: github
owner: quangtruong2003
repo: VoiceToPrompt
releaseType: releaseReleases must be published with electron-builder --publish always for the updater to find them. The release artifact must include latest.yml alongside the installer for electron-updater to detect the new version.
The app currently exposes two update paths in the main process. Both are intentionally preserved while the v2 path is validated:
- Legacy custom flow (
check-for-update) — fetches releases directly fromapi.github.comand shows a custom progress dialog. Default user-facing path; runs on startup whenautoUpdate !== false. - electron-updater (v2) flow (
check-for-update-v2,download-update,install-update) — uses theelectron-updaterlibrary against the configured GitHub Releases publish channel. Opt-in; no startup auto-trigger yet.
Migration of the renderer to the v2 path is a follow-up task once the publish channel is verified end-to-end (signed installer + latest.yml + updater detection).
electron-builder reads GH_TOKEN from the environment when publishing. Use a token with public_repo permission (or repo for private releases).
$env:WIN_CSC_LINK = "C:\path\to\cert.pfx"
$env:WIN_CSC_KEY_PASSWORD = "..."
$env:GH_TOKEN = "ghp_..."
npm run build