Generate completely noise

Cloned the [Step-Audio](https://github.com/stepfun-ai/Step-Audio) "64cf0a6", and cloned the [Step-Audio-Tokenizer](https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer/tree/main) "af7e5a3" and the [Step-Audio-TTS-3B](https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B/tree/main) "9ddb7cb". No code was changed. Ran the `tts_inference.py` with parameter "synthesis-type" being "tts" or "clone". Generated completely noise. What's wrong?

[output_tts.wav](https://github.com/user-attachments/files/22288198/output_tts.wav)
[output_clone.wav](https://github.com/user-attachments/files/22288197/output_clone.wav)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generate completely noise #168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generate completely noise #168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions