SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

This repository is for environment setup and inference of the paper "SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion."

result

result_sv3d.mp4

result2_sv3d.mp4

setup

Our default, provided install method is based on Conda package and environment management:

conda create -n sv3d python=3.10
conda activate sv3d
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # check CUDA version
pip3 install -r requirements/pt2.txt
pip3 install .
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

To run SV3D_u on a single image

1. download the model

pip install huggingface_hub
huggingface-cli login
huggingface-cli download stabilityai/sv3d sv3d_u.safetensors --local-dir checkpoints

2. Run the code

scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_u

To run SV3D_p on a single image:

1. download the model

pip install huggingface_hub
huggingface-cli login
huggingface-cli download stabilityai/sv3d sv3d_p.safetensors --local-dir checkpoints

2. Run the code

Generate static orbit at a specified elevation eg. 10.0

python scripts/sampling/simple_video_sample.py \
  --input_path <path/to/image.png> \
  --version sv3d_p \
  --elevations_deg 10.0

Generate dynamic orbit at a specified elevations and azimuths: specify sequences of 21 elevations (in degrees) to elevations_deg ([-90, 90]), and 21 azimuths (in degrees) to azimuths_deg [0, 360] in sorted order from 0 to 360.

python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg [<list of 21 elevations in degrees>] --azimuths_deg [<list of 21 azimuths in degrees>]

BibTeX

@misc{voleti2024sv3dnovelmultiviewsynthesis,
  title        = {SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion},
  author       = {Vikram Voleti and Chun-Han Yao and Mark Boss and Adam Letts and David Pankratz and Dmitry Tochilkin and Christian Laforte and Robin Rombach and Varun Jampani},
  year         = {2024},
  eprint       = {2403.12008},
  archivePrefix= {arXiv},
  primaryClass = {cs.CV},
  url          = {https://arxiv.org/abs/2403.12008}
}

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
data		data
input_images		input_images
model_licenses		model_licenses
outputs/simple_video_sample/sv3d_p		outputs/simple_video_sample/sv3d_p
requirements		requirements
scripts		scripts
sgm		sgm
tests/inference		tests/inference
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE-CODE		LICENSE-CODE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

result

setup

To run SV3D_u on a single image

1. download the model

2. Run the code

To run SV3D_p on a single image:

1. download the model

2. Run the code

BibTeX

About

Uh oh!

Releases

Packages

Contributors 21

Uh oh!

Languages

License

MVDGS/sv3d

Folders and files

Latest commit

History

Repository files navigation

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

result

setup

To run SV3D_u on a single image

1. download the model

2. Run the code

To run SV3D_p on a single image:

1. download the model

2. Run the code

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 21

Uh oh!

Languages

Packages