Skip to content

Word substitution hallucinations during synthesis #14

@DragonbornElric

Description

@DragonbornElric

Environment: VoXtream v0.2.0, herimor/voxtream2 model, RTX 3060 12GB, CUDA 12.8, Ubuntu Linux, Python 3.12

Description:
The model occasionally substitutes entirely unrelated words in the output. For example, the word "small" in the input text was spoken as "fish" in the audio output. These are not mispronunciations — they are completely different words with no phonetic similarity.

Steps to reproduce:
Intermittent — no reliable reproduction steps identified yet. Occurs during normal conversational synthesis with a voice-cloned reference WAV.

Expected behavior:
The spoken output should match the input text.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions