Inconsistent pacing and emphasis during synthesis

**Environment:** VoXtream v0.2.0, herimor/voxtream2 model, RTX 3060 12GB, CUDA 12.8, Ubuntu Linux, Python 3.12

**Description:**
The model applies inconsistent pacing and emphasis during synthesis. Some words receive dramatic emphasis that doesn't match the sentence context, and the speaking rate varies unpredictably — sometimes speeding up mid-sentence, sometimes slowing down. This may be related to the base model's training data influencing the output cadence.

**Steps to reproduce:**
Send several synthesis requests with conversational text. Observe that emphasis placement and speaking speed vary unpredictably between and within utterances.

**Expected behavior:**
Pacing and emphasis should be relatively consistent and contextually appropriate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent pacing and emphasis during synthesis #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inconsistent pacing and emphasis during synthesis #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions