Skip to content

Fail to generate video #91

@Hao-tianWang

Description

@Hao-tianWang

The audio is generated normally, but i fail to generate video, do you know what may be the problem? Thanks for your help.
The log is as follows:

CUDA_VISIBLE_DEVICES=0 python3 inference.py --config-file ovi/configs/inference/inference_fusion.yaml
[2026-02-03 11:16:52,976] INFO: Using SP: False, SP_SIZE: 1
[2026-02-03 11:16:52,979] INFO: Loading OVI Fusion Engine...
Score model (Fusion) all parameters:11660753108
[2026-02-03 11:16:53,616] INFO: loading ./ckpts/Wan2.2-TI2V-5B/Wan2.2_VAE.pth
Removing weight norm...
[2026-02-03 11:16:59,390] INFO: loading ./ckpts/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth
Successfully loaded fusion checkpoint from ./ckpts/Ovi/model_960x960_10s.safetensors
[2026-02-03 11:17:12,437] INFO: OVI Fusion Engine initialized, cpu_offload=False. GPU VRAM allocated: 36.67 GB, reserved: 37.49 GB
[2026-02-03 11:17:12,438] INFO: OVI Fusion Engine loaded!
0it [00:00, ?it/s][2026-02-03 11:17:12,439] INFO: 
========== Generation Parameters ==========
             Text Prompt: A young man wearing a light blue hoodie, dark pants, and a white baseball cap is performing a dynamic street dance on a stone terrace. In the background, there's a stunning panoramic view of a city sprawling along a large body of water, with a long bridge and a distant statue visible under a bright, sunny sky. He begins on one knee, leaning back with one hand raised, then fluidly rises, bringing his hands to his head before dropping them down. He executes a quick succession of intricate footwork, shifting his weight rapidly and performing small hops and shuffles. His movements are sharp and precise, with a strong rhythmic quality. He bends his knees and extends his arms, then continues with more fast-paced footwork, incorporating body isolations and flowing arm movements. The dance is energetic and expressive, with the dancer's shadow stretching out behind him on the sunlit pavement. <S>Ovi diez segundos, mira cómo baila este algoritmo.<E> Audio: Upbeat, electronic dance music with a strong beat and synthesised elements, a processed, slightly robotic-sounding male voice speaking a Spanish phrase.
              Image Path: example_prompts/pngs_10s/19.png
      Frame Height Width: [704, 1280]
                    Seed: 103
                  Solver: unipc
            Sample Steps: 50
                   Shift: 5.0
    Video Guidance Scale: 4.0
    Audio Guidance Scale: 3.0
               SLG Layer: 11
   Video Negative Prompt: jitter, bad hands, blur, distortion
   Audio Negative Prompt: robotic, muffled, echo, distorted
==========================================
50it [15:51, 19.04s/it]
1it [16:52, 1012.35s/it][2026-02-03 11:34:04,789] INFO: 
A_young_man_wearing_a_light_blue_hoodie._dark_pant_704x1280_103_0.mp4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions