Skip to content

No base/default voice — impossible to validate reference WAV quality #16

@DragonbornElric

Description

@DragonbornElric

Environment: VoXtream v0.2.0, herimor/voxtream2 model, RTX 3060 12GB, CUDA 12.8, Ubuntu Linux, Python 3.12

Description:
VoXtream requires a reference WAV for voice cloning and has no built-in base/default voice. If no reference WAV is provided, synthesis silently fails with "No reference WAV available" and produces zero audio output.

This makes it difficult to diagnose whether audio quality issues stem from the model itself or from a poor reference WAV match. Without a base voice to compare against, users cannot determine if their reference WAV has problems (e.g., gender mismatch with training data, incompatible recording quality, wrong duration).

Feature request:
Provide a built-in default/base voice so users can:

  1. Verify the model works correctly before introducing voice cloning
  2. Compare cloned output against baseline to isolate reference WAV issues
  3. Use the model without voice cloning for applications that don't require it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions