Official PyTorch implementation of "Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction" (SIGGRAPH'26)
TLDR: We introduce Raster2Seq, an approach that transforms rasterized floorplan images to vectorized format using a labeled polygon sequence representation.
Details of the model architecture and experimental results can be found in our following paper:
@inproceedings{phung2026raster2seq,
title={Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction},
author={Phung, Hao and Averbuch-Elor, Hadar},
booktitle={Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
year= {2026},
}Please CITE our paper and give us a ⭐ whenever this repository is used to help produce published results or incorporated into other software.
Table of Contents
Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task, where each room is represented as a polygon sequence---labeled with the room's semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D and CubiCasa5K, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.
Given a rasterized floorplan image (left), our approach converts it into vectorized format, represented as a labeled polygon sequence, separated using special tokens. The main architectural component of our framework is an anchor-based autoregressive decoder, which predicts the next token given image features (
-
The code has been tested on Linux with python 3.10.13, pytorch 2.3.1 and cuda 11.8
-
Create an environment:
conda create -n raster2seq python=3.10 conda activate raster2seq
-
Install pytorch and required libraries:
# adjust the cuda version accordingly pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt -
Compile the deformable-attention modules (from deformable-DETR) and the differentiable rasterization module (from BoundaryFormer):
cd models/ops sh make.sh # unit test for deformable-attention modules (should see all checking is True) # python test.py cd ../../diff_ras python setup.py build develop
We use the COCO-style format for all experiments. Data preprocessing are detailed in data_preprocess. Simply put, input data is RGB images and output is the 2D coordinate vectors of room regions which are represented as close-loop segmentation.
The data tree structure of Structured3D, for instance, is as follows:
code_root/
└── data/
└── stru3d/
├── train/
├── val/
├── test/
└── annotations/
├── train.json
├── val.json
└── test.json
In this code, we experiments with 3 datasets: Structured3D, CubiCasa5K, and Raster2Graph. We also conduct zero-shot evaluation on a WAFFLE subset of 100 samples with provided segmentation annotations.
Our model checkpoints are hosted at haopt/Raster2Seq, with one subfolder per trained model. You can download all checkpoints at once or only the checkpoint subfolder you need.
| Dataset | RoomF1 | Hugging Face key |
|---|---|---|
| Structured3D | 99.6 | s3d-bw |
| CubiCasa5K | 88.7 | cubicasa5k |
| Raster2Graph | 97.0 | raster2graph |
| Structured3D-DensityMap | 99.1 | s3d-density |
The provided eval and inference bash scripts in following sections use Hugging Face checkpoint aliases by default (e.g. hf:cubicasa5k) and will automatically download the corresponding checkpoint on first use. If you prefer to download checkpoints manually, please use tools/download_checkpoints.sh.
High-res models (512x512)
| Dataset | RoomF1 | Hugging Face key |
|---|---|---|
| Raster2Graph | 98.1 |
raster2graph-512
|
To run inference, we have provided these following bash scripts:
| Dataset | Bash Script |
|---|---|
| Structured3D | tools/predict_s3d.sh |
| CubiCasa5K | tools/predict_cc5k.sh |
| Raster2Graph | tools/predict_r2g.sh |
| WAFFLE | tools/predict_waffle.sh |
| Structured3D-DensityMap | tools/predict_s3d_density.sh |
For WAFFLE, we use CubiCasa5K pretrained checkpoints for the inference.
For detailed instructions, see the VLM refinement's README.
| Dataset | Bash Script |
|---|---|
| Structured3D | tools/eval_s3d.sh |
| CubiCasa5K | tools/eval_cc5k.sh |
| Raster2Graph | tools/eval_r2g.sh |
| Structured3D-DensityMap | tools/eval_s3d_density.sh |
High-res models (512x512)
| Dataset | Bash Script |
|---|---|
| Raster2Graph | tools/eval_r2g_res512.sh |
Cross-evaluation: We perform cross-evaluation on three datasets, CubiCasa5K, Raster2Graph, and WAFFLE. For CubiCasa5K & Raster2Graph, we use the geometric evaluation on Room, Corner, Angle while for WAFFLE, we report IoU segmentation results.
| Dataset | Bash Script |
|---|---|
| CubiCasa5K | tools/cross_eval_cc5k.sh |
| Raster2Graph | tools/cross_eval_r2g.sh |
| WAFFLE | tools/cross_eval_waffle.sh |
| Dataset | Bash Script |
|---|---|
| Structured3D | tools/pretrain_s3d.sh |
| CubiCasa5K | tools/pretrain_cc5k.sh |
| Raster2Graph | tools/pretrain_r2g.sh |
| Structured3D-DensityMap | tools/pretrain_s3d_density.sh |
| Dataset | Bash Script |
|---|---|
| Structured3D | tools/finetune_s3d.sh |
| CubiCasa5K | tools/finetune_cc5k.sh |
| Raster2Graph | tools/finetune_r2g.sh |
| Structured3D-DensityMap | tools/finetune_s3d_density.sh |
High-res models (512x512)
| Dataset | Bash Script |
|---|---|
| Raster2Graph | tools/finetune_r2g_res512.sh |
We gratefully acknowledge the authors of RoomFormer, HEAT, Raster2Graph and MonteFloor for releasing their code and datasets. Our approach builds upon Deformable-DETR for the architecture design and draws inspiration from PolyFormer for the seq2seq framework.
If you have any problems, please open an issue in this repository or send an email to htp26@cornell.edu.

