Skip to content

autonomousvision/gta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geometric Transform Attention

Takeru Miyato · Bernhard Jaeger · Max Welling · Andreas Geiger

ICLR2024

gta_mech

Official reproducing code of "GTA: A Geometry-Aware Attention Mechanism for Multi-view Transformers".

⭐ (12/27/2025) ⭐ Recently, GTA-style camera encoding has been increasingly adopted across a variety of works, particularly for improved camera control:

  • PRoPE: which extends GTA by incorporating intrinsic-aware camera encoding.
  • UCPE: further extends GTA and PRoPE to support non-pinhole camera models.
  • Kaleido: a large generative model for scene-level neural rendering
  • WorldPlay: a generative world model with real-time interaction.
  • ReDirector: GTA-like camera encoding for video diffusion models.

Contents

This repository contains the following different codebases, each of which can be accessed by switching to the corresponding branch:

  • NVS experiments on CLEVR-TR and MSN-Hard (this branch)
  • NVS experiments on ACID and RealEstate (link)
  • ImageNet generation with Diffusion transformers (DiT) (link)

You can find the core implementation of GTA for multi-view ViTs here and for image ViTs here.

Please feel free to reach out to us if you have any questions!

Setup

1. Create env and install python libraries

conda create -n gta python=3.9
conda activate gta
pip3 install -r requirements.txt

2. Download dataset

export DATADIR=<path_to_datadir>
mkdir -p $DATADIR

CLEVR-TR

Download the dataset from this link and place it under $DATADIR

clevr1 clevr2

MultiShapeNet Hard (MSN-Hard)

gsutil -m cp -r gs://kubric-public/tfds/kubric_frames/multi_shapenet_conditional/2.8.0/ ${DATADIR}/multi_shapenet_frames/

gta_mech

*Pretrained models (MSN-Hard pre-trained models will be uploaded soon)

Training

CLEVR-TR

torchrun --standalone --nnodes 1 --nproc_per_node 4 train.py runs/clevrtr/GTA/gta/config.yaml  ${DATADIR}/clevrtr --seed=0 

MSN-Hard

torchrun --standalone --nnodes 1 --nproc_per_node 4 train.py runs/msn/GTA/gta_so3/config.yaml  ${DATADIR} --seed=0 

Evaluation of PSNR, SSIM and LPIPS

python evaluate.py runs/clevrtr/GTA/gta/config.yaml ${DATADIR}/clevrtr $PATH_TO_CHECKPOINT # CLEVR-TR
python evaluate.py runs/msn/GTA/gta_so3/config.yaml ${DATADIR} $PATH_TO_CHECKPOINT # MSN-Hard

Acknowledgements

This repository is built on top of SRT and OSRT created by @stelzner. We would like to thank him for his open-source contribution of the SRT models. We also thank @lucidrains for providing the values of J matrices, which are needed to compute the irreps of SO(3) efficiently.

Citation

@inproceedings{Miyato2024GTA,
    title={GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers},
    author={Miyato,Takeru and Jaeger, Bernhard and Welling, Max and Geiger, Andreas},
    booktitle={International Conference on Learning Representations (ICLR)},
    year={2024}
}

About

[ICLR'24] GTA: A Geometry-Aware Attention Mechanism for Multi-view Transformers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages