Voice to Prompt

A professional, high-performance voice dictation application for Windows that converts speech to text in real-time using Google's Gemini AI API. Built with Electron, React, and TypeScript.

Overview

Voice to Prompt is a desktop application designed for seamless speech-to-text transcription. It runs as a system tray application with a floating overlay window, allowing users to dictate text directly into any application. The application features ultra-low latency audio processing, multi-provider API support (Google Gemini, Antigravity, and custom endpoints), and automatic text injection into the active application.

Core Features

Recording & Transcription

Real-time Voice Recording - Capture audio from microphone with optimal settings for speech recognition (16kHz, mono, Opus encoding)
Automatic Transcription - Send audio to AI API and receive transcribed text
Multiple API Providers - Support for Google Gemini, Antigravity, and custom API endpoints
Language Support - Configurable language setting for transcription accuracy

User Interface

Floating Overlay Window - Compact, always-on-top recording indicator that stays out of your way
System Tray Integration - Runs silently in the system tray with context menu controls
Settings Panel - Comprehensive configuration for API keys, language, custom prompts, and endpoint settings
Recording Indicator - Visual feedback showing recording status and duration

Automation

Global Shortcut - Press Super+Alt+H to toggle recording from anywhere in the system
Enter Key Stop - Press Enter to immediately stop recording and process audio
Automatic Text Injection - Automatically pastes transcribed text into the active application using clipboard simulation
Multi-language Support - Optimized for Vietnamese with support for other languages

Performance

Low-Latency Audio Pipeline - Target latency under 50ms from audio capture to API submission
Connection Pooling - Pre-warmed WebSocket connections for instant API communication
Audio Compression - Opus encoding at 24kbps for efficient bandwidth usage
Process Priority Elevation - High-priority audio processing for consistent performance

Technology Stack

Frontend

React 18 - UI framework for component-based interface
TypeScript - Type-safe development
Vite - Fast build tool and dev server

Desktop

Electron 33 - Cross-platform desktop application framework
Electron Builder - Application packaging and distribution

Audio Processing

Web Audio API - Real-time audio capture and processing
MediaRecorder API - Audio chunking and encoding
Opus Codec - High-efficiency audio compression

AI Integration

Google Gemini API - Primary speech-to-text provider
Antigravity API - Alternative provider option
Custom Endpoint - Support for self-hosted or third-party APIs

Utilities

@nut-tree-fork/nut-js - Keyboard automation for text injection

Prerequisites

Before building and running the application, ensure you have the following installed:

Node.js (v18 or higher)
npm (v9 or higher)
Windows 10/11 (64-bit)
Microphone - Required for voice input
API Key - Google Gemini API key (or compatible provider)

Installation

1. Clone the Repository

git clone <repository-url>
cd voice-to-text

2. Install Dependencies

npm install

3. Configure API Key

Create a .env file in the project root directory:

VITE_GEMINI_API_KEY=your_api_key_here

Alternatively, you can enter your API key through the application settings panel after running the app.

4. Run in Development Mode

npm run dev

5. Build for Production

To create a Windows executable:

npm run build

The built executable will be located in the release directory.

Configuration

API Settings

Setting	Description	Default
API Type	Provider selection (Google/Antigravity/Custom)	Google
API Key	Authentication key for the selected provider	-
Custom Endpoint	URL for custom API endpoint	-
Language	Transcription language code	Vietnamese (vi)

Audio Settings

Parameter	Value	Description
Sample Rate	16000 Hz	Optimized for speech recognition
Channels	1 (Mono)	Voice-optimized
Codec	Opus	Low-latency compression
Bitrate	24 kbps	Voice-optimized quality

Keyboard Shortcuts

Shortcut	Action
`Super + Alt + H`	Toggle recording (global)
`Enter`	Stop recording and transcribe

Usage

Starting the Application

Launch the application - it will minimize to the system tray
Click the tray icon to open the settings panel
Configure your API key and preferences
Close settings - the app continues running in the background

Recording Voice

Press Super + Alt + H or click the tray icon and select "Bắt đầu ghi âm"
A floating overlay window appears indicating recording is active
Speak clearly into your microphone
Press Enter to stop recording and process the audio
The transcribed text is automatically injected into your active application

Settings Panel

Access the settings panel by:

Right-clicking the system tray icon and selecting "Cài đặt"
Clicking the system tray icon (single click)

Available settings:

API Key Configuration - Enter and validate your API key
Language Selection - Choose the transcription language
Custom Prompt - Add context hints for better transcription
API Type - Select between Google Gemini, Antigravity, or custom endpoints

Architecture

Audio Pipeline

┌─────────────────────────────────────────────────────────────────────────────┐
│                        LOW-LATENCY AUDIO PIPELINE                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌────────────┐ │
│  │ AudioCapture │───▶│ AudioEncoder │───▶│ StreamBuffer │───▶│  WebSocket │ │
│  │   (Device)   │    │    (Opus)    │    │  (Priority)  │    │  (Pooled)  │ │
│  └──────────────┘    └──────────────┘    └──────────────┘    └────────────┘ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Component Structure

voice-to-text/
├── electron/                 # Electron main process
│   ├── main.ts              # Main entry point, window management
│   └── preload.ts           # Preload script for IPC
├── src/
│   ├── components/          # React UI components
│   │   ├── OverlayView.tsx  # Recording overlay window
│   │   └── SettingsView.tsx # Settings panel
│   ├── hooks/              # Custom React hooks
│   │   └── useAudioRecorder.ts # Audio recording logic
│   ├── lib/                # Core libraries
│   │   ├── voice-stream-engine.ts # Audio processing engine
│   │   ├── http2-stream.ts       # HTTP/2 streaming
│   │   ├── performance-monitor.ts # Performance metrics
│   │   └── types.ts              # TypeScript definitions
│   ├── App.tsx             # Main application component
│   └── main.tsx           # React entry point
├── public/                 # Static assets
└── release/               # Built executables

Development

Running in Development Mode

npm run dev

This starts the Vite development server with Electron.

Building for Production

npm run build

This compiles TypeScript, builds the React frontend, and packages the Electron application into an executable.

Project Structure

electron/ - Electron main process code
src/ - React frontend source code
public/ - Static assets (icons, images)
dist/ - Compiled frontend output
dist-electron/ - Compiled Electron output
release/ - Final executable packages

API Integration

Google Gemini API

Default configuration uses Google's Gemini Flash model for transcription:

Endpoint: https://generativelanguage.googleapis.com/v1beta
Model: gemini-3-flash-preview
Authentication: X-goog-api-key header

Antigravity API

Alternative provider with Bearer token authentication:

Endpoint: Configurable (default: https://api.antigravity.app)
Authentication: Authorization: Bearer <api_key>

Custom Endpoint

For self-hosted or third-party APIs:

Endpoint: User-configured URL
Authentication: Bearer token (same format as Antigravity)

Troubleshooting

Common Issues

No API Key

Solution: Enter your API key in the settings panel or create a .env file

Microphone Not Detected

Solution: Ensure microphone permissions are granted in Windows Settings

Text Not Being Injected

Solution: Check that no other application is blocking clipboard access

Recording Not Starting

Solution: Verify global shortcut permissions in Windows

Logs

Application logs are available in the developer console when running in development mode. Check for:

[VoiceStreamEngine] - Audio processing logs
[ConnectionPool] - API connection logs
[PerformanceMonitor] - Performance metrics

License

This project is provided as-is for personal and commercial use. See the LICENSE file for details.

Contributing

Contributions are welcome. Please ensure:

TypeScript strict mode compliance
React best practices
Proper error handling
Performance-conscious code

Acknowledgments

Google Gemini API for speech-to-text capabilities
Electron and React communities
Opus codec developers

Release & Auto-Update

Code Signing

The Windows installer can be code-signed by setting the following environment variables before npm run build:

WIN_CSC_LINK — Path to a .pfx certificate file (or base64-encoded PFX content)
WIN_CSC_KEY_PASSWORD — Password for the certificate

Without these set, builds proceed unsigned. Unsigned installers will trigger SmartScreen warnings on user systems and should be treated as dev-only artifacts.

The signing config lives under win.signtoolOptions in electron-builder.yml and uses DigiCert's RFC3161 timestamp server so signatures stay valid after the certificate expires.

Auto-Update

Auto-update uses GitHub Releases via electron-updater. The publish channel is configured in electron-builder.yml:

publish:
  provider: github
  owner: quangtruong2003
  repo: VoiceToPrompt
  releaseType: release

Releases must be published with electron-builder --publish always for the updater to find them. The release artifact must include latest.yml alongside the installer for electron-updater to detect the new version.

The app currently exposes two update paths in the main process. Both are intentionally preserved while the v2 path is validated:

Legacy custom flow (check-for-update) — fetches releases directly from api.github.com and shows a custom progress dialog. Default user-facing path; runs on startup when autoUpdate !== false.
electron-updater (v2) flow (check-for-update-v2, download-update, install-update) — uses the electron-updater library against the configured GitHub Releases publish channel. Opt-in; no startup auto-trigger yet.

Migration of the renderer to the v2 path is a follow-up task once the publish channel is verified end-to-end (signed installer + latest.yml + updater detection).

electron-builder reads GH_TOKEN from the environment when publishing. Use a token with public_repo permission (or repo for private releases).

$env:WIN_CSC_LINK = "C:\path\to\cert.pfx"
$env:WIN_CSC_KEY_PASSWORD = "..."
$env:GH_TOKEN = "ghp_..."
npm run build

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github/workflows		.github/workflows
docs/superpowers		docs/superpowers
electron		electron
public		public
resources/whisper		resources/whisper
scripts		scripts
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE_GUIDE.md		LICENSE_GUIDE.md
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY_AUDIT.md		SECURITY_AUDIT.md
electron-builder.yml		electron-builder.yml
generate-key.js		generate-key.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Voice to Prompt

Overview

Core Features

Recording & Transcription

User Interface

Automation

Performance

Technology Stack

Frontend

Desktop

Audio Processing

AI Integration

Utilities

Prerequisites

Installation

1. Clone the Repository

2. Install Dependencies

3. Configure API Key

4. Run in Development Mode

5. Build for Production

Configuration

API Settings

Audio Settings

Keyboard Shortcuts

Usage

Starting the Application

Recording Voice

Settings Panel

Architecture

Audio Pipeline

Component Structure

Development

Running in Development Mode

Building for Production

Project Structure

API Integration

Google Gemini API

Antigravity API

Custom Endpoint

Troubleshooting

Common Issues

Logs

License

Contributing

Acknowledgments

Release & Auto-Update

Code Signing

Auto-Update

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 28

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages