[Paper] Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction

Published: (February 9, 2026 at 01:58 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.09016v1

Overview

Raster2Seq tackles a surprisingly common problem for anyone working with building layouts: turning a raster image of a floorplan into a clean, editable vector representation. By treating each room, door, and window as a labeled polygon sequence, the authors turn the reconstruction task into a sequence‑to‑sequence problem that can be solved with modern autoregressive models. The result is a system that reliably extracts geometry and semantics from even the most cluttered, multi‑room plans—opening the door (pun intended) to downstream CAD automation, indoor‑navigation AI, and real‑estate analytics.

Key Contributions

  • Sequence‑based representation: Encodes every floorplan element as an ordered list of (x, y) vertices plus a semantic label, unifying geometry and meaning in a single stream.
  • Learnable spatial anchors: Introduces a set of trainable coordinate “anchors” that guide the decoder’s attention to the most informative image regions when predicting the next vertex.
  • Autoregressive decoder: Predicts each corner conditioned on image features and previously generated corners, enabling flexible handling of polygons with arbitrary numbers of vertices.
  • State‑of‑the‑art performance: Sets new benchmarks on Structure3D, CubiCasa5K, and Raster2Graph, and shows strong generalization to the challenging WAFFLE dataset.
  • Scalable to complex layouts: Demonstrates that the method scales gracefully to floorplans with dozens of rooms and highly irregular shapes, without needing handcrafted post‑processing.

Methodology

  1. Feature Extraction – A CNN backbone processes the input raster floorplan and produces a dense feature map.
  2. Anchor Initialization – A small set of learnable anchor points (e.g., 64) are placed in image coordinates. During training they migrate to positions that are most useful for locating corners.
  3. Autoregressive Decoding – The decoder is a transformer‑style sequence model. At each step it receives:
    • the current hidden state,
    • the feature map sampled at the anchor locations, and
    • the previously emitted vertices.
      It then predicts the next (x, y) coordinate and the associated semantic label (room, door, window, etc.).
  4. Polygon Termination – A special “END” token signals the completion of a polygon; a separate “NEXT‑OBJECT” token starts a new element.
  5. Training Objective – A combined loss penalizes coordinate regression error (L1) and classification error (cross‑entropy), encouraging the model to learn both precise geometry and correct semantics.

Because the decoder works step‑by‑step, it naturally adapts to polygons of any size—no fixed‑length output or complex graph‑matching post‑processing is required.

Results & Findings

DatasetMetric (IoU / F‑score)Improvement vs. Prior Art
Structure3D0.92 IoU+4.3 %
CubiCasa5K0.88 F‑score+5.1 %
Raster2Graph0.90 IoU+3.8 %
WAFFLE (out‑of‑domain)0.84 IoU+6.7 %
  • Higher corner accuracy: The average vertex error drops from ~3 px (previous methods) to <1 px, thanks to the anchor‑guided attention.
  • Robust semantics: Mis‑label rates for doors/windows fall below 2 %, enabling reliable downstream CAD pipelines.
  • Speed: Inference runs at ~15 fps on a single RTX 3080 for a 1024×1024 floorplan, making it practical for real‑time applications.

Practical Implications

  • Automated CAD import – Developers can feed a scanned blueprint into Raster2Seq and obtain a clean DXF/DWG file without manual tracing, dramatically cutting engineering hours.
  • Indoor‑navigation AI – Robotics platforms can instantly convert floorplan images into graph‑ready maps for path planning and SLAM.
  • Real‑estate tech – Property portals can auto‑generate interactive floorplan viewers, allowing users to click on rooms for details or virtual tours.
  • Facility management – Maintenance software can ingest legacy paper plans and overlay sensor data directly onto vector rooms, doors, and windows.
  • Extensible pipeline – Because the output is a simple sequence, developers can plug the model into existing GIS or BIM tools with minimal glue code.

Limitations & Future Work

  • Anchor count sensitivity – Too few anchors degrade performance on ultra‑high‑resolution plans; the paper notes a trade‑off between memory and accuracy.
  • Complex line styles – Heavily stylized or low‑contrast drawings still cause occasional vertex misplacements.
  • 3‑D extension – The current formulation is 2‑D only; extending the approach to multi‑level building models remains an open challenge.
  • Training data bias – Benchmarks are dominated by residential layouts; the authors suggest curating more commercial and industrial floorplans to improve generalization.

Overall, Raster2Seq demonstrates that a well‑designed sequence model can bridge the gap between raster images and structured vector graphics, offering a practical toolset for developers across CAD, robotics, and real‑estate tech.

Authors

  • Hao Phung
  • Hadar Averbuch-Elor

Paper Information

  • arXiv ID: 2602.09016v1
  • Categories: cs.CV
  • Published: February 9, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »