Skip to content

ityeti/herald

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Herald - Text-to-Speech

License: MIT Platform

A text-to-speech utility for Windows that reads clipboard text aloud using high-quality neural voices. The inverse of Whisper Voice-to-Text.

v0.3.0+ is a C# (.NET 8) rewrite with local neural TTS via Kokoro. The Python version (v0.2.1) remains available.

Why Herald?

Accessibility - Have articles, emails, and documents read aloud while you multitask or rest your eyes.

Productivity - Listen to written content while driving, exercising, or doing other tasks.

Learning - Improve comprehension by engaging both visual and auditory learning styles.

Hands-Free - Works globally across any Windows application with a simple hotkey.

vs. Windows Built-in TTS

Feature Herald Windows TTS
Voice Quality Kokoro neural (local) + Edge neural (online) Basic SAPI voices
Speed Range 100-600+ wpm Limited range
Pause/Resume ✅ Yes ❌ No
Hotkey Control ✅ Global hotkeys ❌ Manual trigger
System Tray ✅ Quick access ❌ No controls
Offline Support ✅ Kokoro (default) + SAPI fallback N/A

Features

  • Kokoro Neural TTS (default): Studio-quality local voices — 27 voices, runs offline, Apache 2.0 licensed
  • Edge Neural Voices: Microsoft Edge neural voices (Aria, Jenny, Guy, Christopher) — requires internet
  • SAPI Fallback: Windows SAPI voices (Zira, David) — always available offline
  • OCR Support: Read text from screenshots and images (Win+Shift+S → Ctrl+Shift+S)
  • Region Capture: Draw a box on screen to OCR and read (Ctrl+Shift+O) - great for PDFs
  • Auto-Copy: Just select text and press Ctrl+Shift+S - no need to Ctrl+C first
  • Global Hotkeys: Works in any application
  • System Tray: Unobtrusive tray icon with menu controls
  • Pause/Resume: Pause mid-speech and resume later
  • Adjustable Speed: 100-600+ wpm depending on engine
  • Settings Persistence: Remembers your voice and speed preferences
  • Verbal Error Alerts: Speaks errors via SAPI fallback so you're never left with silence

Requirements

  • OS: Windows 10/11
  • Internet: Not required (Kokoro runs locally). Edge TTS voices need internet.

Installation

Option 1: Standalone Executable (Easiest)

Download the pre-built executable - no Python or .NET installation required:

  1. Go to Releases
  2. Download Herald.zip from the latest release
  3. Extract and run Herald.exe as Administrator

The executable is portable and self-contained. On first use, Kokoro will download its ONNX model (~320MB).

Option 2: Build from Source (C# — v0.3.0+)

Requires .NET 8 SDK.

# Build
dotnet build cs/Herald.sln

# Run
dotnet run --project cs/Herald

# Publish self-contained executable
dotnet publish cs/Herald -c Release -r win-x64 --self-contained

The published exe is at cs/Herald/bin/Release/net8.0-windows/win-x64/publish/Herald.exe.

Option 3: Run from Source (Python — v0.2.1)

Requires Python 3.10, 3.11, or 3.12.

1. Install Python

Download and install Python from python.org

Make sure to check "Add Python to PATH" during installation.

2. Run the Application

Double-click Launch_Herald.bat - it will:

  • Request administrator privileges (required for global hotkeys)
  • Create a virtual environment (first run only)
  • Install dependencies (first run only)
  • Launch the application

That's it! On subsequent runs, it starts immediately.

3. Usage

Read selected text:

  1. Select any text in any application
  2. Press Ctrl+Shift+S to hear it read aloud (auto-copies selection)

Read from screenshot:

  1. Take a screenshot with Win+Shift+S and select a region
  2. Press Ctrl+Shift+S to OCR and read the text

Read from screen region (great for PDFs):

  1. Press Ctrl+Shift+O to enter region capture mode
  2. Draw a box around the text you want to read
  3. Text is OCR'd and read aloud automatically

Persistent region for PDFs/videos:

  1. Press Ctrl+Shift+M to define a screen region (green border appears)
  2. Press Ctrl+Shift+S anytime to read from that region
  3. Enable "Auto-Read Monitor Region" in tray menu to read automatically when text changes
  4. Press Ctrl+Shift+M again to clear the region

Controls:

  • Press Alt+P to pause/resume
  • Press Escape to stop
  • Right-click the tray icon to change voice, speed, or hotkeys

4. Auto-Start with Windows (Optional)

To launch automatically when you log in:

Double-click Autostart_Enable.bat (will prompt for admin rights).

To remove auto-start:

Double-click Autostart_Disable.bat (will prompt for admin rights).

Hotkeys

All hotkeys are configurable via the system tray menu and config/settings.json.

Hotkey Action
Ctrl+Shift+S Speak selection/clipboard (auto-copies, supports OCR for images)
Ctrl+Shift+O OCR region capture (draw box on screen)
Ctrl+Shift+M Toggle persistent OCR region (for PDFs/videos)
Ctrl+Shift+P Pause/resume
Ctrl+Shift+N Skip to next line
Ctrl+Shift+B Go back to previous line
Ctrl+Shift+] Speed up
Ctrl+Shift+[ Slow down
Escape Stop speaking
Ctrl+Shift+Q Quit application

Line Navigation

When you press Ctrl+Shift+S, text is split by newlines and read one line at a time. Use Ctrl+Shift+N to skip ahead or Ctrl+Shift+B to go back. This is useful for:

  • Bouncing through code blocks
  • Skipping sections you've already heard
  • Replaying a line you missed

System Tray Menu

Right-click the tray icon to access:

  • Voice (Online): Aria, Jenny, Guy, Christopher (neural voices, requires internet)
  • Voice (Offline): Zira, David (Windows SAPI voices, no internet needed)
  • Speed: Preset speeds (150-900 wpm online, up to 1500 wpm offline)
  • Read Mode: Line by Line (default) or Continuous (reads all text as one block)
  • Line Delay: Add a pause between lines (0-2000ms, only applies in Line by Line mode)
  • Pause/Resume: Toggle when speaking
  • Grab & Speak Selection: Auto-copy and speak when you select text
  • Copy OCR to Clipboard: Save OCR'd text for pasting
  • Auto-Read Monitor Region: Continuously read when text changes in persistent region
  • Hotkeys: Configure all hotkeys (organized by Reading, Navigation, OCR, App categories)
  • Console: Show or hide the console window
  • Show Text Preview: Toggle whether text content appears in console/logs (privacy option)
  • Quit: Exit the application

Configuration

Settings are saved to config/settings.json:

{
  "engine": "edge",
  "voice": "aria",
  "rate": 900,
  "hotkey_speak": "ctrl+shift+s",
  "hotkey_pause": "ctrl+shift+p",
  "hotkey_stop": "escape",
  "hotkey_speed_up": "ctrl+shift+]",
  "hotkey_speed_down": "ctrl+shift+[",
  "hotkey_next": "ctrl+shift+n",
  "hotkey_prev": "ctrl+shift+b",
  "hotkey_ocr": "ctrl+shift+o",
  "hotkey_monitor": "ctrl+shift+m",
  "hotkey_quit": "ctrl+shift+q",
  "line_delay": 0,
  "read_mode": "lines",
  "log_preview": true
}
Setting Options Description
engine kokoro, edge, pyttsx3 TTS engine (kokoro is default in v0.3.0+)
voice heart, bella, michael, emma, aria, jenny, guy, christopher, zira, david Voice name (varies by engine)
rate 100-1500 Words per minute (varies by engine)
hotkey_speak ctrl+shift+s, alt+s, f9, alt+r Speak hotkey
hotkey_pause ctrl+shift+p, alt+p, f10 Pause hotkey
hotkey_stop escape, f12 Stop hotkey
hotkey_speed_up ctrl+shift+], alt+] Speed up hotkey
hotkey_speed_down ctrl+shift+[, alt+[ Slow down hotkey
hotkey_next ctrl+shift+n, alt+n, f7 Next line hotkey
hotkey_prev ctrl+shift+b, alt+b, f6 Previous line hotkey
hotkey_ocr ctrl+shift+o, ctrl+alt+shift+o, alt+o, f8 OCR capture hotkey
hotkey_monitor ctrl+shift+m, ctrl+alt+shift+m, alt+m, f11 Monitor region hotkey
hotkey_quit ctrl+shift+q, alt+q Quit hotkey
line_delay 0-2000 Milliseconds to pause between lines
read_mode lines, continuous Line by line or read all text at once
log_preview true, false Show text content in console/logs

TTS Engines

Engine Type Internet Voices Speed Range Notes
Kokoro (default) Local neural No 27 100-600 wpm Best quality up to ~260 wpm (1.3x). Quality degrades above that. All values above 600 wpm produce identical output (capped at 3.0x).
Edge TTS Cloud neural Yes 4 150-900 wpm Microsoft Edge voices. Not for commercial use.
SAPI Windows built-in No 2 150-1500 wpm Basic offline fallback.

Note: Edge TTS uses Microsoft Edge's neural voices for personal and educational use only. For commercial use, switch to Kokoro (default) or SAPI.

Available Voices

Kokoro (local neural — default)

Voice Gender Accent Grade
heart (default) Female American A
bella Female American A-
nicole Female American B-
sarah Female American C+
nova, sky, alloy, jessica, kore, aoede, river Female American
michael Male American C+
fenrir, puck Male American C+
adam, echo, eric, liam, onyx Male American
emma Female British B-
alice, isabella, lily Female British
daniel, george, lewis, fable Male British

Edge TTS (online)

Voice Description
aria Female, conversational
jenny Female, news anchor
guy Male, friendly
christopher Male, professional

SAPI (offline)

Voice Description
zira Female, Windows default
david Male, Windows default

Troubleshooting

"No text to speak"

  • Make sure you've copied text to the clipboard (Ctrl+C)
  • Some applications use special clipboard formats; try copying from Notepad

Neural voices not working

  • Check your internet connection
  • The app will fall back to offline voices if edge-tts fails

Hotkeys not working

  • Ensure the application is running as administrator
  • Check for conflicting hotkeys in other applications

Pause doesn't work

  • Pause only works with neural voices (edge-tts)
  • Offline voices (pyttsx3) don't support true pause

Speed limits

  • Kokoro: Best quality up to ~260 wpm (1.3x). Usable up to 600 wpm (3.0x). Values above 600 wpm are capped.
  • Edge TTS: Effective range 150-900 wpm
  • SAPI: Full 150-1500 wpm range

Project Structure

herald/
├── cs/                        # C# rewrite (v0.3.0+)
│   ├── Herald/                # WinForms app (hotkeys, OCR, tray, UI)
│   ├── Herald.Tts/            # Shared TTS library (Kokoro, Edge, SAPI)
│   ├── Herald.Tests/          # xUnit tests
│   └── Herald.sln             # Solution file
├── src/                       # Python version (v0.2.1)
│   ├── main.py                # Application entry point
│   ├── tts_engine.py          # TTS engine abstraction
│   ├── tray_app.py            # System tray icon
│   ├── text_grab.py           # Clipboard handling + OCR
│   ├── region_capture.py      # Screen region selection
│   └── config.py              # Settings management
├── config/
│   └── settings.json          # Your settings (shared format, auto-created)
├── Launch_Herald.bat          # Python launcher
└── requirements.txt           # Python dependencies

Related Projects

Herald Pro

Get Herald Pro on the Microsoft Store -- Free, 41 AI neural voices in 6 languages, fully offline. Includes batch audio export, sentence highlighting, voice preview, and EPUB/PDF/DOCX import.

Microsoft Store

License

MIT License - feel free to use and modify.

About

Text-to-speech utility for Windows - hear selected text read aloud and higher quality Microsoft Edge models more realistic voice options

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors