[Paper] Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction

Published: 3 days ago (February 9, 2026 at 01:58 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.09016v1

Overview

Raster2Seq tackles a surprisingly common problem for anyone working with building layouts: turning a raster image of a floorplan into a clean, editable vector representation. By treating each room, door, and window as a labeled polygon sequence, the authors turn the reconstruction task into a sequence‑to‑sequence problem that can be solved with modern autoregressive models. The result is a system that reliably extracts geometry and semantics from even the most cluttered, multi‑room plans—opening the door (pun intended) to downstream CAD automation, indoor‑navigation AI, and real‑estate analytics.

Key Contributions

Sequence‑based representation: Encodes every floorplan element as an ordered list of (x, y) vertices plus a semantic label, unifying geometry and meaning in a single stream.
Learnable spatial anchors: Introduces a set of trainable coordinate “anchors” that guide the decoder’s attention to the most informative image regions when predicting the next vertex.
Autoregressive decoder: Predicts each corner conditioned on image features and previously generated corners, enabling flexible handling of polygons with arbitrary numbers of vertices.
State‑of‑the‑art performance: Sets new benchmarks on Structure3D, CubiCasa5K, and Raster2Graph, and shows strong generalization to the challenging WAFFLE dataset.
Scalable to complex layouts: Demonstrates that the method scales gracefully to floorplans with dozens of rooms and highly irregular shapes, without needing handcrafted post‑processing.

Methodology

Feature Extraction – A CNN backbone processes the input raster floorplan and produces a dense feature map.
Anchor Initialization – A small set of learnable anchor points (e.g., 64) are placed in image coordinates. During training they migrate to positions that are most useful for locating corners.
Autoregressive Decoding – The decoder is a transformer‑style sequence model. At each step it receives:
- the current hidden state,
- the feature map sampled at the anchor locations, and
- the previously emitted vertices.
  It then predicts the next (x, y) coordinate and the associated semantic label (room, door, window, etc.).
Polygon Termination – A special “END” token signals the completion of a polygon; a separate “NEXT‑OBJECT” token starts a new element.
Training Objective – A combined loss penalizes coordinate regression error (L1) and classification error (cross‑entropy), encouraging the model to learn both precise geometry and correct semantics.

Because the decoder works step‑by‑step, it naturally adapts to polygons of any size—no fixed‑length output or complex graph‑matching post‑processing is required.

Results & Findings

Dataset	Metric (IoU / F‑score)	Improvement vs. Prior Art
Structure3D	0.92 IoU	+4.3 %
CubiCasa5K	0.88 F‑score	+5.1 %
Raster2Graph	0.90 IoU	+3.8 %
WAFFLE (out‑of‑domain)	0.84 IoU	+6.7 %

Higher corner accuracy: The average vertex error drops from ~3 px (previous methods) to <1 px, thanks to the anchor‑guided attention.
Robust semantics: Mis‑label rates for doors/windows fall below 2 %, enabling reliable downstream CAD pipelines.
Speed: Inference runs at ~15 fps on a single RTX 3080 for a 1024×1024 floorplan, making it practical for real‑time applications.

Practical Implications

Automated CAD import – Developers can feed a scanned blueprint into Raster2Seq and obtain a clean DXF/DWG file without manual tracing, dramatically cutting engineering hours.
Indoor‑navigation AI – Robotics platforms can instantly convert floorplan images into graph‑ready maps for path planning and SLAM.
Real‑estate tech – Property portals can auto‑generate interactive floorplan viewers, allowing users to click on rooms for details or virtual tours.
Facility management – Maintenance software can ingest legacy paper plans and overlay sensor data directly onto vector rooms, doors, and windows.
Extensible pipeline – Because the output is a simple sequence, developers can plug the model into existing GIS or BIM tools with minimal glue code.

Limitations & Future Work

Anchor count sensitivity – Too few anchors degrade performance on ultra‑high‑resolution plans; the paper notes a trade‑off between memory and accuracy.
Complex line styles – Heavily stylized or low‑contrast drawings still cause occasional vertex misplacements.
3‑D extension – The current formulation is 2‑D only; extending the approach to multi‑level building models remains an open challenge.
Training data bias – Benchmarks are dominated by residential layouts; the authors suggest curating more commercial and industrial floorplans to improve generalization.

Overall, Raster2Seq demonstrates that a well‑designed sequence model can bridge the gap between raster images and structured vector graphics, offering a practical toolset for developers across CAD, robotics, and real‑estate tech.

Authors

Hao Phung
Hadar Averbuch-Elor

Paper Information

arXiv ID: 2602.09016v1
Categories: cs.CV
Published: February 9, 2026
PDF: Download PDF

[Paper] Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] SurfPhase: 3D Interfacial Dynamics in Two-Phase Flows from Sparse Videos

[Paper] Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

[Paper] GENIUS: Generative Fluid Intelligence Evaluation Suite

[Paper] From Circuits to Dynamics: Understanding and Stabilizing Failure in 3D Diffusion Transformers