Native Speech Generation for NVDA

Author: Muhammad Gagah muha.aku@gmail.com

Native Speech Generation is an NVDA add-on that integrates Google Gemini AI to generate high-quality, natural-sounding speech directly within NVDA. It provides a clean, fully accessible interface for converting text into audio, supporting both single-speaker narration and dynamic multi-speaker dialogues.

This add-on is designed for smooth workflows, accessibility-first interaction, and flexible voice control suitable for narration, dialogue, and audio content production.

Features

High-Quality Speech Generation

Choose between:
- Gemini Flash 3.1 Preview Powerful, low-latency speech generation, very good for short audio.
- Gemini Flash 2.5 Standard quality, fast generation, low latency.
- Gemini Pro 2.5 Premium, more realistic voices (paid model).

Single & Multi-Speaker Modes

Single-speaker narration for standard text-to-speech.
Multi-speaker (2 speakers) mode for dialogues with distinct voices.

Advanced Voice Control

Speaker Naming Assign custom names (e.g., John, Mary) in multi-speaker mode. The AI automatically maps voices based on speaker names in the script.
Style Instructions Provide prompts such as “Speak in a cheerful tone” or “Narrate calmly” to guide delivery.
Temperature Control Adjust output variation and creativity:
- Lower values → more stable and predictable speech.
- Higher values → more expressive and varied speech.

Accessible & Clean Interface

Fully accessible with screen readers.
Advanced options are placed in a collapsible panel to keep the main dialog simple and focused.

Seamless Workflow

Audio plays automatically after generation.
Generated audio can be replayed or saved as a high-quality .wav file.
Designed for minimal friction during repeated generation and playback.

Smart Voice Loading & Caching

Available voices are fetched dynamically from the Gemini API.
Voice data is cached for 24 hours to reduce API calls and speed up startup.

Talk With AI (Live Conversation)

Real-time Voice Chat: Have a natural, low-latency spoken conversation with Gemini.
Grounding with Google Search: Enable the AI to access real-time information from the web during your chat.
Interruptible: You can interrupt the AI at any time by speaking or pressing "Stop Conversation".
Customizable: Uses your selected voice and style instructions.
Thinking Level Control: Choose No Thinking, Low, Medium, or High depending on the reasoning depth you want.
Reconnect Continuity: Recent conversation context is restored automatically after a reconnect, without a separate memory toggle.
More Stable Streaming: Improved reconnection behavior (backoff + retry) and adaptive audio buffering for better resilience on unstable networks.

Requirements

NVDA 2024.1 or newer, tested through NVDA 2026.1.
Active internet connection.
A valid Google Gemini API Key.

Installation

Download the latest add-on package from the Releases page: https://github.com/MuhammadGagah/native-speech-generation/releases
Install it like any standard NVDA add-on.
Restart NVDA when prompted.

API Key Setup (Required)

Create an API key from Google AI Studio: https://aistudio.google.com/apikey
Open NVDA and go to: NVDA Menu → Tools → Native Speech Generation
Click “API Key Settings”.
This opens NVDA Settings directly in the Native Speech Generation category.
Paste your Gemini API Key into the GEMINI API Key field.
Click OK to save.

Saved keys are stored securely using Windows DPAPI, so the encrypted value cannot be decrypted on a different Windows machine or user account.

For advanced deployments, you can also provide the key through the GEMINI_API_KEY environment variable. The add-on will use it automatically when no stored key is available.

How to Use

Open the dialog using:

NVDA+Control+Shift+G, or
NVDA Menu → Tools → Native Speech Generation

Main Interface Elements

Text to convert Enter or paste the text you want to convert to speech.
Style instructions (optional) Provide guidance for tone, emotion, or delivery.
Select Model
- Flash 3.1 Preview
- Flash 2.5
- Pro 2.5 (High Quality)
Speaker Mode
- Single-speaker
- Multi-speaker (2)

Generating Speech

Single-Speaker Mode

Select Single-speaker.
Choose a voice from the Select Voice dropdown.
Enter your text.
Optionally add style instructions.
Click Generate Speech.
The audio will play automatically after generation.

Multi-Speaker Mode

Select Multi-speaker (2).
For each speaker:
- Enter a unique Speaker Name.
- Select a distinct Voice.
Format the text so each line starts with the speaker name followed by a colon.

Example:

Alice: Hi Bob, how are you today?
Bob: I'm doing great, Alice! The weather is fantastic.

Click Generate Speech. Voices will be assigned automatically based on the speaker names.

Talk With AI (Live Mode)

Experience a natural, two-way voice conversation with Gemini.

Configure your desired Voice and Style Instructions in the main dialog. (Note: Talk With AI currently supports Single-speaker mode only)
Click Talk With AI.
In the new window:
- Start Conversation: Begins the session. Speak into your microphone.
- Stop Conversation: Ends the session.
- Grounding with Google Search: Check this box to allow Gemini to search the web for answers (e.g., current news, weather).
  - Note: This checkbox is hidden while a conversation is active. Stop the conversation to change it.
- Thinking level: Choose No Thinking, Low, Medium, or High.
- Microphone Toggle: Mute/Unmute your microphone.
- Volume: Adjust the AI's playback volume.

Advanced Settings

Enable Advanced Settings (Temperature) to show the slider.
Temperature Range:
- 0.0 → Most deterministic and stable.
- 1.0 → Default balance.
- 2.0 → Most creative and varied.

Buttons Overview

Generate Speech - Start speech generation.
Play - Replay the last generated audio.
Talk With AI - Open the real-time voice conversation interface.
Save Audio - Save the last audio as a .wav file.
API Key Settings - Open the add-on configuration in NVDA Settings.
View voices in AI Studio - Opens Google AI Studio in a browser.
Close - Close the dialog (or press Escape).

Input Gestures

Customizable via: NVDA Menu → Preferences → Input Gestures → Native Speech Generation

Default gesture:

NVDA+Control+Shift+G – Open Native Speech Generation dialog.

Development & Contribution Guide

If you want to develop or modify this add-on, follow the steps below.

Environment Setup

Python matching your target NVDA runtime
- Use Python 3.13 64-bit when testing or packaging dependencies for NVDA 2026.1 and newer.
- Use Python 3.11 32-bit only when packaging dependencies for older supported NVDA builds.
uv for the pinned build and lint toolchain.
```
uv sync
uv run pre-commit run --all-files
uv run scons
uv run scons pot
```
SCons 4.10.1, Markdown 3.10, Ruff 0.14.10, Pyright 1.1.407, and the other build tools are installed from uv.lock.
GNU Gettext Tools (optional, recommended for localization)
- Usually preinstalled on Linux/Cygwin.
- Windows: https://gnuwin32.sourceforge.net/downlinks/gettext.php

Additional Dependencies

For local development only, install the audio-only Talk With AI dependencies directly into the add-on library path using the Python version and architecture that match the NVDA runtime you are testing:

python.exe -m pip install google-genai pyaudio --target "D:/myAdd-on/Native-Speech-Generation/addon/globalPlugins/NativeSpeechGeneration/lib"

Adjust the path according to your local add-on source directory.

For the current audio-only Talk With AI implementation, you do not need opencv-python, pillow, or mss.

For release packages, the add-on downloads the latest verified dependency archive based on the running NVDA version:

lib.zip for NVDA 2025.3.3 and older supported builds.
lib64.zip for NVDA 2026.1 and newer.

The add-on reads SHA-256 data from the latest GitHub dependency release, using the release asset digest or checksum files. Bundled approved checksums are kept only as a fallback for first-time installs when the latest-release lookup fails. Manual library reinstalls require the latest verified release. The extracted folder is always installed as addon/globalPlugins/NativeSpeechGeneration/lib.

Then copy the following from your Python installation into:

addon/globalPlugins/NativeSpeechGeneration/lib

zoneinfo folder
secrets.py file

Contributing

Contributions, suggestions, and bug reports are very welcome.

Open an Issue for bugs or feature requests.
Submit a Pull Request for code contributions.

Contact

Email: muha.aku@gmail.com
GitHub: https://github.com/MuhammadGagah

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github		.github
.vscode		.vscode
addon		addon
site_scons/site_tools		site_scons/site_tools
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
COPYING.txt		COPYING.txt
buildVars.py		buildVars.py
changelog.md		changelog.md
manifest-translated.ini.tpl		manifest-translated.ini.tpl
manifest.ini.tpl		manifest.ini.tpl
pyproject.toml		pyproject.toml
readme.md		readme.md
sconstruct		sconstruct
style.css		style.css
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Native Speech Generation for NVDA

Features

High-Quality Speech Generation

Single & Multi-Speaker Modes

Advanced Voice Control

Accessible & Clean Interface

Seamless Workflow

Smart Voice Loading & Caching

Talk With AI (Live Conversation)

Requirements

Installation

API Key Setup (Required)

How to Use

Main Interface Elements

Generating Speech

Single-Speaker Mode

Multi-Speaker Mode

Talk With AI (Live Mode)

Advanced Settings

Buttons Overview

Input Gestures

Development & Contribution Guide

Environment Setup

Additional Dependencies

Contributing

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Native Speech Generation for NVDA

Features

High-Quality Speech Generation

Single & Multi-Speaker Modes

Advanced Voice Control

Accessible & Clean Interface

Seamless Workflow

Smart Voice Loading & Caching

Talk With AI (Live Conversation)

Requirements

Installation

API Key Setup (Required)

How to Use

Main Interface Elements

Generating Speech

Single-Speaker Mode

Multi-Speaker Mode

Talk With AI (Live Mode)

Advanced Settings

Buttons Overview

Input Gestures

Development & Contribution Guide

Environment Setup

Additional Dependencies

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages