Official PyTorch implementation of "Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction" (SIGGRAPH'26)

Hao Phung Hadar Averbuch-Elor

Cornell University

[Page] [Paper] [HuggingFace

]

TLDR: We introduce Raster2Seq, an approach that transforms rasterized floorplan images to vectorized format using a labeled polygon sequence representation.

Details of the model architecture and experimental results can be found in our following paper:

@inproceedings{phung2026raster2seq,
   title={Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction},
   author={Phung, Hao and Averbuch-Elor, Hadar},
   booktitle={Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
   year= {2026},
}

Please CITE our paper and give us a ⭐ whenever this repository is used to help produce published results or incorporated into other software.

Table of Contents

Abstract
Method
Installation
Data
Inference
Evaluation
Training
Acknowledgment

Abstract

Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task, where each room is represented as a polygon sequence---labeled with the room's semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D and CubiCasa5K, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.

Method

Given a rasterized floorplan image (left), our approach converts it into vectorized format, represented as a labeled polygon sequence, separated using special tokens. The main architectural component of our framework is an anchor-based autoregressive decoder, which predicts the next token given image features ($f_{img}$), learnable anchors ($v_{anc}$) and the previously generated tokens. Above, we visualize the first two labeled polygons predicted (colored in orange and pink, respectively).

Installation

The code has been tested on Linux with python 3.10.13, pytorch 2.3.1 and cuda 11.8

Create an environment:

conda create -n raster2seq python=3.10
conda activate raster2seq

Install pytorch and required libraries:

# adjust the cuda version accordingly
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Compile the deformable-attention modules (from deformable-DETR) and the differentiable rasterization module (from BoundaryFormer):

cd models/ops
sh make.sh

# unit test for deformable-attention modules (should see all checking is True)
# python test.py

cd ../../diff_ras
python setup.py build develop

Data

We use the COCO-style format for all experiments. Data preprocessing are detailed in data_preprocess. Simply put, input data is RGB images and output is the 2D coordinate vectors of room regions which are represented as close-loop segmentation.

The data tree structure of Structured3D, for instance, is as follows:

code_root/
└── data/
    └── stru3d/
        ├── train/
        ├── val/
        ├── test/
        └── annotations/
            ├── train.json
            ├── val.json
            └── test.json

In this code, we experiments with 3 datasets: Structured3D, CubiCasa5K, and Raster2Graph. We also conduct zero-shot evaluation on a WAFFLE subset of 100 samples with provided segmentation annotations.

Checkpoints

Our model checkpoints are hosted at haopt/Raster2Seq, with one subfolder per trained model. You can download all checkpoints at once or only the checkpoint subfolder you need.

Dataset	RoomF1	Hugging Face key
Structured3D	99.6	`s3d-bw`
CubiCasa5K	88.7	`cubicasa5k`
Raster2Graph	97.0	`raster2graph`
Structured3D-DensityMap	99.1	`s3d-density`

The provided eval and inference bash scripts in following sections use Hugging Face checkpoint aliases by default (e.g. hf:cubicasa5k) and will automatically download the corresponding checkpoint on first use. If you prefer to download checkpoints manually, please use tools/download_checkpoints.sh.

High-res models (512x512)

Dataset	RoomF1	Hugging Face key
Raster2Graph	98.1	`raster2graph-512`

Inference

To run inference, we have provided these following bash scripts:

Dataset	Bash Script
Structured3D	`tools/predict_s3d.sh`
CubiCasa5K	`tools/predict_cc5k.sh`
Raster2Graph	`tools/predict_r2g.sh`
WAFFLE	`tools/predict_waffle.sh`
Structured3D-DensityMap	`tools/predict_s3d_density.sh`

For WAFFLE, we use CubiCasa5K pretrained checkpoints for the inference.

VLM-based floorplan refinement

For detailed instructions, see the VLM refinement's README.

Evaluation

Dataset	Bash Script
Structured3D	`tools/eval_s3d.sh`
CubiCasa5K	`tools/eval_cc5k.sh`
Raster2Graph	`tools/eval_r2g.sh`
Structured3D-DensityMap	`tools/eval_s3d_density.sh`

High-res models (512x512)

Dataset	Bash Script
Raster2Graph	`tools/eval_r2g_res512.sh`

Cross-evaluation: We perform cross-evaluation on three datasets, CubiCasa5K, Raster2Graph, and WAFFLE. For CubiCasa5K & Raster2Graph, we use the geometric evaluation on Room, Corner, Angle while for WAFFLE, we report IoU segmentation results.

Dataset	Bash Script
CubiCasa5K	`tools/cross_eval_cc5k.sh`
Raster2Graph	`tools/cross_eval_r2g.sh`
WAFFLE	`tools/cross_eval_waffle.sh`

Training

Stage 1: Training with only structural room predictions

Dataset	Bash Script
Structured3D	`tools/pretrain_s3d.sh`
CubiCasa5K	`tools/pretrain_cc5k.sh`
Raster2Graph	`tools/pretrain_r2g.sh`
Structured3D-DensityMap	`tools/pretrain_s3d_density.sh`

Stage 2: Finetuning with semantic room predictions

Dataset	Bash Script
Structured3D	`tools/finetune_s3d.sh`
CubiCasa5K	`tools/finetune_cc5k.sh`
Raster2Graph	`tools/finetune_r2g.sh`
Structured3D-DensityMap	`tools/finetune_s3d_density.sh`

High-res models (512x512)

Dataset	Bash Script
Raster2Graph	`tools/finetune_r2g_res512.sh`

Acknowledgment

We gratefully acknowledge the authors of RoomFormer, HEAT, Raster2Graph and MonteFloor for releasing their code and datasets. Our approach builds upon Deformable-DETR for the architecture design and draws inspiration from PolyFormer for the seq2seq framework.

Contacts

If you have any problems, please open an issue in this repository or send an email to htp26@cornell.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data_preprocess		data_preprocess
datasets		datasets
detectron2		detectron2
diff_ras		diff_ras
evaluations		evaluations
html_generator		html_generator
models		models
tools		tools
util		util
vlm_refinement		vlm_refinement
.geminiignore		.geminiignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
eval.py		eval.py
eval_seg.py		eval_seg.py
main_ddp.py		main_ddp.py
plot_floor.py		plot_floor.py
predict.py		predict.py
pyproject.toml		pyproject.toml
raster2seq_hub.py		raster2seq_hub.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official PyTorch implementation of "Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction" (SIGGRAPH'26)

Abstract

Method

Installation

Data

Checkpoints

Inference

VLM-based floorplan refinement

Evaluation

Training

Stage 1: Training with only structural room predictions

Stage 2: Finetuning with semantic room predictions

Acknowledgment

Contacts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Official PyTorch implementation of "Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction" (SIGGRAPH'26)

Abstract

Method

Installation

Data

Checkpoints

Inference

VLM-based floorplan refinement

Evaluation

Training

Stage 1: Training with only structural room predictions

Stage 2: Finetuning with semantic room predictions

Acknowledgment

Contacts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages