Long text (>500 chars) causes repetition and audio degradation

**Environment:** VoXtream v0.2.0, herimor/voxtream2 model, RTX 3060 12GB, CUDA 12.8, Ubuntu Linux, Python 3.12

**Description:**
When synthesis input exceeds approximately 500 characters, the model begins repeating the last sentence of the output multiple times and audio quality degrades progressively. The documentation suggests a 1000-character limit, but in practice the model becomes unreliable well before that.

**Steps to reproduce:**

1. Send a synthesis request with 500-800 characters of text
2. Observe: the model repeats the final sentence multiple times, it doesn't happen 100% of the time, but is inconsistent on longer synthesis.
3. Audio quality degrades toward the end of the output (pacing issues, garbled speech)

**Expected behavior:**
Model should synthesize the full text without looping or repeating sentences.

**Workaround:**
Split input text into chunks of ~250 characters before sending to the model. At 250 characters we saw improvement in the performance, but there are still pacing issues for shorter character lengths.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long text (>500 chars) causes repetition and audio degradation #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Long text (>500 chars) causes repetition and audio degradation #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions