Fail to generate video

The audio is generated normally, but i fail to generate video, do you know what may be the problem? Thanks for your help. 
The log is as follows:
```
CUDA_VISIBLE_DEVICES=0 python3 inference.py --config-file ovi/configs/inference/inference_fusion.yaml
[2026-02-03 11:16:52,976] INFO: Using SP: False, SP_SIZE: 1
[2026-02-03 11:16:52,979] INFO: Loading OVI Fusion Engine...
Score model (Fusion) all parameters:11660753108
[2026-02-03 11:16:53,616] INFO: loading ./ckpts/Wan2.2-TI2V-5B/Wan2.2_VAE.pth
Removing weight norm...
[2026-02-03 11:16:59,390] INFO: loading ./ckpts/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth
Successfully loaded fusion checkpoint from ./ckpts/Ovi/model_960x960_10s.safetensors
[2026-02-03 11:17:12,437] INFO: OVI Fusion Engine initialized, cpu_offload=False. GPU VRAM allocated: 36.67 GB, reserved: 37.49 GB
[2026-02-03 11:17:12,438] INFO: OVI Fusion Engine loaded!
0it [00:00, ?it/s][2026-02-03 11:17:12,439] INFO: 
========== Generation Parameters ==========
             Text Prompt: A young man wearing a light blue hoodie, dark pants, and a white baseball cap is performing a dynamic street dance on a stone terrace. In the background, there's a stunning panoramic view of a city sprawling along a large body of water, with a long bridge and a distant statue visible under a bright, sunny sky. He begins on one knee, leaning back with one hand raised, then fluidly rises, bringing his hands to his head before dropping them down. He executes a quick succession of intricate footwork, shifting his weight rapidly and performing small hops and shuffles. His movements are sharp and precise, with a strong rhythmic quality. He bends his knees and extends his arms, then continues with more fast-paced footwork, incorporating body isolations and flowing arm movements. The dance is energetic and expressive, with the dancer's shadow stretching out behind him on the sunlit pavement. <S>Ovi diez segundos, mira cómo baila este algoritmo.<E> Audio: Upbeat, electronic dance music with a strong beat and synthesised elements, a processed, slightly robotic-sounding male voice speaking a Spanish phrase.
              Image Path: example_prompts/pngs_10s/19.png
      Frame Height Width: [704, 1280]
                    Seed: 103
                  Solver: unipc
            Sample Steps: 50
                   Shift: 5.0
    Video Guidance Scale: 4.0
    Audio Guidance Scale: 3.0
               SLG Layer: 11
   Video Negative Prompt: jitter, bad hands, blur, distortion
   Audio Negative Prompt: robotic, muffled, echo, distorted
==========================================
50it [15:51, 19.04s/it]
1it [16:52, 1012.35s/it][2026-02-03 11:34:04,789] INFO: 
```
https://github.com/user-attachments/assets/2dec62ce-d421-43b7-99a6-a61ffd4030ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to generate video #91

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fail to generate video #91

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions