The audio is generated normally, but i fail to generate video, do you know what may be the problem? Thanks for your help.
The log is as follows:
CUDA_VISIBLE_DEVICES=0 python3 inference.py --config-file ovi/configs/inference/inference_fusion.yaml
[2026-02-03 11:16:52,976] INFO: Using SP: False, SP_SIZE: 1
[2026-02-03 11:16:52,979] INFO: Loading OVI Fusion Engine...
Score model (Fusion) all parameters:11660753108
[2026-02-03 11:16:53,616] INFO: loading ./ckpts/Wan2.2-TI2V-5B/Wan2.2_VAE.pth
Removing weight norm...
[2026-02-03 11:16:59,390] INFO: loading ./ckpts/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth
Successfully loaded fusion checkpoint from ./ckpts/Ovi/model_960x960_10s.safetensors
[2026-02-03 11:17:12,437] INFO: OVI Fusion Engine initialized, cpu_offload=False. GPU VRAM allocated: 36.67 GB, reserved: 37.49 GB
[2026-02-03 11:17:12,438] INFO: OVI Fusion Engine loaded!
0it [00:00, ?it/s][2026-02-03 11:17:12,439] INFO:
========== Generation Parameters ==========
Text Prompt: A young man wearing a light blue hoodie, dark pants, and a white baseball cap is performing a dynamic street dance on a stone terrace. In the background, there's a stunning panoramic view of a city sprawling along a large body of water, with a long bridge and a distant statue visible under a bright, sunny sky. He begins on one knee, leaning back with one hand raised, then fluidly rises, bringing his hands to his head before dropping them down. He executes a quick succession of intricate footwork, shifting his weight rapidly and performing small hops and shuffles. His movements are sharp and precise, with a strong rhythmic quality. He bends his knees and extends his arms, then continues with more fast-paced footwork, incorporating body isolations and flowing arm movements. The dance is energetic and expressive, with the dancer's shadow stretching out behind him on the sunlit pavement. <S>Ovi diez segundos, mira cómo baila este algoritmo.<E> Audio: Upbeat, electronic dance music with a strong beat and synthesised elements, a processed, slightly robotic-sounding male voice speaking a Spanish phrase.
Image Path: example_prompts/pngs_10s/19.png
Frame Height Width: [704, 1280]
Seed: 103
Solver: unipc
Sample Steps: 50
Shift: 5.0
Video Guidance Scale: 4.0
Audio Guidance Scale: 3.0
SLG Layer: 11
Video Negative Prompt: jitter, bad hands, blur, distortion
Audio Negative Prompt: robotic, muffled, echo, distorted
==========================================
50it [15:51, 19.04s/it]
1it [16:52, 1012.35s/it][2026-02-03 11:34:04,789] INFO:
The audio is generated normally, but i fail to generate video, do you know what may be the problem? Thanks for your help.
The log is as follows:
A_young_man_wearing_a_light_blue_hoodie._dark_pant_704x1280_103_0.mp4