Skip to content

petamorikei/voicepeak-cli

Repository files navigation

voicepeak-cli

日本語 | English

A command-line interface wrapper for VOICEPEAK text-to-speech software with preset management and automatic audio playback.

What's Different from the Original VOICEPEAK Command?

This wrapper enhances the original VOICEPEAK CLI with several powerful features:

  • 🎵 Auto-play with mpv - Automatically plays generated audio when no output file is specified
  • 📝 Voice presets - Save and reuse combinations of narrator, emotions, and pitch settings
  • 📜 Long text support - Automatically splits texts longer than 140 characters and merges audio chunks
  • 🔧 Advanced playback modes - Choose between batch (generate all → merge → play) or sequential (generate → play one by one)
  • 🔄 Pipe input support - Accept text from stdin: echo "text" | vp
  • 🔇 Clean output - Suppresses technical output by default (use --verbose to see debug info)
  • ⚙️ Configuration file - Store your preferred settings in ~/.config/vp/config.toml

Key Benefits

  • Enhanced Workflow: No need to manually save and play audio files - just run and listen
  • Batch Processing: Handle long documents without worrying about character limits
  • Flexible Input: Works with direct text, files, or piped input from other commands
  • Personalization: Save your favorite voice configurations for consistent results
  • Professional Output: Clean interface with optional verbose mode for debugging

Requirements

  • macOS
  • VOICEPEAK installed at /Applications/voicepeak.app/
  • mpv for audio playback (install via Homebrew: brew install mpv)
  • ffmpeg for batch mode and multi-chunk file output (install via Homebrew: brew install ffmpeg)

Installation

From crates.io (Recommended)

cargo install voicepeak-cli

From source

  1. Clone this repository
  2. Build and install:
    cargo install --path .

Usage

Basic Usage

# Simple text-to-speech (requires preset or --narrator)
vp "こんにちは、世界!"

# With explicit narrator
vp "こんにちは、世界!" --narrator "夏色花梨"

# Save to file instead of auto-play
vp "こんにちは、世界!" --narrator "夏色花梨" -o output.wav

# Read from file
vp -t input.txt --narrator "夏色花梨"

# Pipe input
echo "こんにちは、世界!" | vp --narrator "夏色花梨"
cat document.txt | vp -p karin-happy

Using Presets

# List available presets
vp --list-presets

# Use a preset
vp "こんにちは、世界!" -p karin-happy

# Override preset settings
vp "こんにちは、世界!" -p karin-normal --emotion "happy=50"

Voice Controls

# Control speech parameters
vp "こんにちは、世界!" --narrator "夏色花梨" --speed 120 --pitch 50

# List available narrators
vp --list-narrator

# List emotions for a specific narrator
vp --list-emotion "夏色花梨"

Text Length Handling

# Allow automatic text splitting (default)
vp "very long text..."

# Strict mode: reject texts longer than 140 characters
vp "text" --strict-length

Playback Modes

# Batch mode: generate all chunks first, merge, then play (default)
vp "long text" --playback-mode batch

# Sequential mode: generate and play chunks one by one
vp "long text" --playback-mode sequential

# Long text file output (uses ffmpeg to merge chunks)
vp "very long text" -o output.wav

# For sequential playback without ffmpeg
vp "long text" --playback-mode sequential

Configuration

Configuration is stored in ~/.config/vp/config.toml. The file is automatically created on first run.

Example Configuration

default_preset = "karin-custom"

[[presets]]
name = "karin-custom"
narrator = "夏色花梨"
emotions = [
    { name = "hightension", value = 10 },
    { name = "sasayaki", value = 20 },
]
pitch = 30
speed = 120

[[presets]]
name = "karin-normal"
narrator = "夏色花梨"
emotions = []

[[presets]]
name = "karin-happy"
narrator = "夏色花梨"
emotions = [{ name = "hightension", value = 50 }]

Configuration Fields

  • default_preset: Optional. Preset to use when no -p option is specified
  • presets: Array of voice presets

Preset Fields

  • name: Unique preset identifier
  • narrator: Voice narrator name
  • emotions: Array of emotion parameters with name and value
  • pitch: Optional pitch adjustment (-300 to 300)
  • speed: Optional speed adjustment (50 to 200)

Command-Line Options

Usage: vp [OPTIONS] [TEXT]

Arguments:
  [TEXT]  Text to say (or pipe from stdin)

Options:
  -t, --text <FILE>              Text file to say
  -o, --out <FILE>               Path of output file (optional - will play with mpv if not specified)
  -n, --narrator <NAME>          Name of voice
  -e, --emotion <EXPR>           Emotion expression (e.g., happy=50,sad=50)
  -p, --preset <NAME>            Use voice preset
      --list-narrator            Print voice list
      --list-emotion <NARRATOR>  Print emotion list for given voice
      --list-presets             Print available presets
      --speed <VALUE>            Speed (50 - 200)
      --pitch <VALUE>            Pitch (-300 - 300)
      --strict-length            Reject input longer than 140 characters (default: false, allows splitting)
      --playback-mode <MODE>     Playback mode: sequential or batch (default: batch)
  -v, --verbose                  Enable verbose output (show VOICEPEAK debug messages)
  -h, --help                     Print help
  -V, --version                  Print version

Parameter Priority

When multiple sources specify the same parameter, the priority order is:

  1. Command-line options (highest priority)
  2. Preset values
  3. Default values / none (lowest priority)

For example:

  • vp "text" -p my-preset --pitch 100 uses pitch=100 (CLI override)
  • vp "text" -p my-preset uses preset's pitch value
  • vp "text" --narrator "voice" uses no pitch adjustment

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for detailed guidelines on how to contribute to this project.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages