Environment: VoXtream v0.2.0, herimor/voxtream2 model, RTX 3060 12GB, CUDA 12.8, Ubuntu Linux, Python 3.12
Description:
VoXtream requires a reference WAV for voice cloning and has no built-in base/default voice. If no reference WAV is provided, synthesis silently fails with "No reference WAV available" and produces zero audio output.
This makes it difficult to diagnose whether audio quality issues stem from the model itself or from a poor reference WAV match. Without a base voice to compare against, users cannot determine if their reference WAV has problems (e.g., gender mismatch with training data, incompatible recording quality, wrong duration).
Feature request:
Provide a built-in default/base voice so users can:
- Verify the model works correctly before introducing voice cloning
- Compare cloned output against baseline to isolate reference WAV issues
- Use the model without voice cloning for applications that don't require it
Environment: VoXtream v0.2.0, herimor/voxtream2 model, RTX 3060 12GB, CUDA 12.8, Ubuntu Linux, Python 3.12
Description:
VoXtream requires a reference WAV for voice cloning and has no built-in base/default voice. If no reference WAV is provided, synthesis silently fails with "No reference WAV available" and produces zero audio output.
This makes it difficult to diagnose whether audio quality issues stem from the model itself or from a poor reference WAV match. Without a base voice to compare against, users cannot determine if their reference WAV has problems (e.g., gender mismatch with training data, incompatible recording quality, wrong duration).
Feature request:
Provide a built-in default/base voice so users can: