Geometric Transform Attention

Takeru Miyato · Bernhard Jaeger · Max Welling · Andreas Geiger

OpenReview | arXiv | Project Page

ICLR2024

Official reproducing code of "GTA: A Geometry-Aware Attention Mechanism for Multi-view Transformers".

⭐ (12/27/2025) ⭐ Recently, GTA-style camera encoding has been increasingly adopted across a variety of works, particularly for improved camera control:

PRoPE: which extends GTA by incorporating intrinsic-aware camera encoding.
UCPE: further extends GTA and PRoPE to support non-pinhole camera models.
Kaleido: a large generative model for scene-level neural rendering
WorldPlay: a generative world model with real-time interaction.
ReDirector: GTA-like camera encoding for video diffusion models.

Setup

1. Create env and install python libraries

conda create -n gta python=3.9
conda activate gta
pip3 install -r requirements.txt

2. Download dataset

export DATADIR=<path_to_datadir>
mkdir -p $DATADIR

CLEVR-TR

Download the dataset from this link and place it under $DATADIR

MultiShapeNet Hard (MSN-Hard)

gsutil -m cp -r gs://kubric-public/tfds/kubric_frames/multi_shapenet_conditional/2.8.0/ ${DATADIR}/multi_shapenet_frames/

*Pretrained models (MSN-Hard pre-trained models will be uploaded soon)

CLEVR-TR: link
MSN-Hard: link

Training

CLEVR-TR

torchrun --standalone --nnodes 1 --nproc_per_node 4 train.py runs/clevrtr/GTA/gta/config.yaml  ${DATADIR}/clevrtr --seed=0

MSN-Hard

torchrun --standalone --nnodes 1 --nproc_per_node 4 train.py runs/msn/GTA/gta_so3/config.yaml  ${DATADIR} --seed=0

Evaluation of PSNR, SSIM and LPIPS

python evaluate.py runs/clevrtr/GTA/gta/config.yaml ${DATADIR}/clevrtr $PATH_TO_CHECKPOINT # CLEVR-TR
python evaluate.py runs/msn/GTA/gta_so3/config.yaml ${DATADIR} $PATH_TO_CHECKPOINT # MSN-Hard

Acknowledgements

This repository is built on top of SRT and OSRT created by @stelzner. We would like to thank him for his open-source contribution of the SRT models. We also thank @lucidrains for providing the values of J matrices, which are needed to compute the irreps of SO(3) efficiently.

Citation

@inproceedings{Miyato2024GTA,
    title={GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers},
    author={Miyato,Takeru and Jaeger, Bernhard and Welling, Max and Geiger, Andreas},
    booktitle={International Conference on Learning Representations (ICLR)},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
runs		runs
source		source
.gitignore		.gitignore
J_dense.pt		J_dense.pt
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Geometric Transform Attention

OpenReview | arXiv | Project Page

ICLR2024

Contents

Setup

1. Create env and install python libraries

2. Download dataset

CLEVR-TR

MultiShapeNet Hard (MSN-Hard)

*Pretrained models (MSN-Hard pre-trained models will be uploaded soon)

Training

CLEVR-TR

MSN-Hard

Evaluation of PSNR, SSIM and LPIPS

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

autonomousvision/gta

Folders and files

Latest commit

History

Repository files navigation

Geometric Transform Attention

OpenReview | arXiv | Project Page

ICLR2024

Contents

Setup

1. Create env and install python libraries

2. Download dataset

CLEVR-TR

MultiShapeNet Hard (MSN-Hard)

*Pretrained models (MSN-Hard pre-trained models will be uploaded soon)

Training

CLEVR-TR

MSN-Hard

Evaluation of PSNR, SSIM and LPIPS

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages