Skip to content

"last one means y is already too long, shouldn't happen, but put it here" #175

@royrs

Description

@royrs

When working with the model I encountered some situations where the desired text was removed, but nothing was generated instead of it.
After some investigation, I found the following line in your code:

): # last one means y is already too long, shouldn't happen, but put it here

which checks if y_input > 10 * x_lens and if so, it doesn't generate anything.

Why do we need this check?
I'm not sure why the target transcript length and the input size should limit our generation.
In the code you wrote it should happen, but it might happen if the audio doesn't include a lot of words, but it is longer because of silences in it.

All audios I tested are 4~5 seconds as you suggest it works best for.

I tried removing this check and for the few examples I tried it gave good results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions