Skip to content

Cornell-VAILab/Raster2Seq

Repository files navigation

Official PyTorch implementation of "Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction" (SIGGRAPH'26)

Hao Phung     Hadar Averbuch-Elor

Cornell University  

[Page]    [Paper]    [HuggingFace ]   

TLDR: We introduce Raster2Seq, an approach that transforms rasterized floorplan images to vectorized format using a labeled polygon sequence representation.

Details of the model architecture and experimental results can be found in our following paper:

@inproceedings{phung2026raster2seq,
   title={Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction},
   author={Phung, Hao and Averbuch-Elor, Hadar},
   booktitle={Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
   year= {2026},
}

Please CITE our paper and give us a ⭐ whenever this repository is used to help produce published results or incorporated into other software.

Table of Contents
  1. Abstract
  2. Method
  3. Installation
  4. Data
  5. Inference
  6. Evaluation
  7. Training
  8. Acknowledgment

Abstract

Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task, where each room is represented as a polygon sequence---labeled with the room's semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D and CubiCasa5K, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.

Method

space-1.jpg

Given a rasterized floorplan image (left), our approach converts it into vectorized format, represented as a labeled polygon sequence, separated using special tokens. The main architectural component of our framework is an anchor-based autoregressive decoder, which predicts the next token given image features ($f_{img}$), learnable anchors ($v_{anc}$) and the previously generated tokens. Above, we visualize the first two labeled polygons predicted (colored in orange and pink, respectively).

Installation

  • The code has been tested on Linux with python 3.10.13, pytorch 2.3.1 and cuda 11.8

  • Create an environment:

    conda create -n raster2seq python=3.10
    conda activate raster2seq
  • Install pytorch and required libraries:

    # adjust the cuda version accordingly
    pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
    pip install -r requirements.txt
  • Compile the deformable-attention modules (from deformable-DETR) and the differentiable rasterization module (from BoundaryFormer):

    cd models/ops
    sh make.sh
    
    # unit test for deformable-attention modules (should see all checking is True)
    # python test.py
    
    cd ../../diff_ras
    python setup.py build develop

Data

We use the COCO-style format for all experiments. Data preprocessing are detailed in data_preprocess. Simply put, input data is RGB images and output is the 2D coordinate vectors of room regions which are represented as close-loop segmentation.

The data tree structure of Structured3D, for instance, is as follows:

code_root/
└── data/
    └── stru3d/
        ├── train/
        ├── val/
        ├── test/
        └── annotations/
            ├── train.json
            ├── val.json
            └── test.json

In this code, we experiments with 3 datasets: Structured3D, CubiCasa5K, and Raster2Graph. We also conduct zero-shot evaluation on a WAFFLE subset of 100 samples with provided segmentation annotations.

Checkpoints

Our model checkpoints are hosted at haopt/Raster2Seq, with one subfolder per trained model. You can download all checkpoints at once or only the checkpoint subfolder you need.

Dataset RoomF1 Hugging Face key
Structured3D 99.6 s3d-bw
CubiCasa5K 88.7 cubicasa5k
Raster2Graph 97.0 raster2graph
Structured3D-DensityMap 99.1 s3d-density

The provided eval and inference bash scripts in following sections use Hugging Face checkpoint aliases by default (e.g. hf:cubicasa5k) and will automatically download the corresponding checkpoint on first use. If you prefer to download checkpoints manually, please use tools/download_checkpoints.sh.

High-res models (512x512)
Dataset RoomF1 Hugging Face key
Raster2Graph 98.1 raster2graph-512

Inference

To run inference, we have provided these following bash scripts:

Dataset Bash Script
Structured3D tools/predict_s3d.sh
CubiCasa5K tools/predict_cc5k.sh
Raster2Graph tools/predict_r2g.sh
WAFFLE tools/predict_waffle.sh
Structured3D-DensityMap tools/predict_s3d_density.sh

For WAFFLE, we use CubiCasa5K pretrained checkpoints for the inference.

VLM-based floorplan refinement

For detailed instructions, see the VLM refinement's README.

Evaluation

Dataset Bash Script
Structured3D tools/eval_s3d.sh
CubiCasa5K tools/eval_cc5k.sh
Raster2Graph tools/eval_r2g.sh
Structured3D-DensityMap tools/eval_s3d_density.sh
High-res models (512x512)
Dataset Bash Script
Raster2Graph tools/eval_r2g_res512.sh

Cross-evaluation: We perform cross-evaluation on three datasets, CubiCasa5K, Raster2Graph, and WAFFLE. For CubiCasa5K & Raster2Graph, we use the geometric evaluation on Room, Corner, Angle while for WAFFLE, we report IoU segmentation results.

Dataset Bash Script
CubiCasa5K tools/cross_eval_cc5k.sh
Raster2Graph tools/cross_eval_r2g.sh
WAFFLE tools/cross_eval_waffle.sh

Training

Stage 1: Training with only structural room predictions

Dataset Bash Script
Structured3D tools/pretrain_s3d.sh
CubiCasa5K tools/pretrain_cc5k.sh
Raster2Graph tools/pretrain_r2g.sh
Structured3D-DensityMap tools/pretrain_s3d_density.sh

Stage 2: Finetuning with semantic room predictions

Dataset Bash Script
Structured3D tools/finetune_s3d.sh
CubiCasa5K tools/finetune_cc5k.sh
Raster2Graph tools/finetune_r2g.sh
Structured3D-DensityMap tools/finetune_s3d_density.sh
High-res models (512x512)
Dataset Bash Script
Raster2Graph tools/finetune_r2g_res512.sh

Acknowledgment

We gratefully acknowledge the authors of RoomFormer, HEAT, Raster2Graph and MonteFloor for releasing their code and datasets. Our approach builds upon Deformable-DETR for the architecture design and draws inspiration from PolyFormer for the seq2seq framework.

Contacts

If you have any problems, please open an issue in this repository or send an email to htp26@cornell.edu.

About

Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction (SIGGRAPH 2026)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors