[Paper] Post-Processing Mask-Based Table Segmentation for Structural Coordinate Extraction

Published: 1 month ago (December 24, 2025 at 12:10 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.21287v1

Overview

This paper tackles a surprisingly stubborn problem in document‑image processing: pinpointing the exact row and column boundaries inside a table mask, especially when the source image is low‑resolution, noisy, or partially corrupted. By treating the mask as a multi‑scale signal and applying a clever cascade of Gaussian smoothing and statistical thresholding, the author achieves a noticeable boost in downstream OCR accuracy on a large‑scale benchmark (PubLayNet‑1M).

Key Contributions

Signal‑processing edge detector for table masks – models row/column transitions as 1‑D signals and extracts stable edges without operating directly on the raw image.
Progressive multi‑scale Gaussian convolution – uses increasing kernel variances to suppress noise while preserving true structural changes.
Statistical peak‑selection thresholding – automatically determines robust cut‑offs, eliminating hand‑tuned parameters.
Zero‑padding & scaling strategy – makes the method agnostic to the original image resolution, enabling seamless integration with existing pipelines.
Empirical validation – improves Cell‑Aware Segmentation Accuracy (CASA) from 67 % to 76 % on PubLayNet‑1M when combined with TableNet + PyTesseract OCR.

Methodology

Mask Generation – An upstream table detector (e.g., TableNet) produces a binary mask that roughly outlines the table region.
1‑D Signal Construction – For each axis (horizontal for columns, vertical for rows) the mask is collapsed into a one‑dimensional intensity profile by summing pixel values along the orthogonal direction. Peaks in this profile correspond to potential cell boundaries.
Multi‑Scale Gaussian Smoothing
- Start with a narrow Gaussian kernel (small σ) to keep fine details.
- Iteratively increase σ, convolving the signal each time. Larger σ values blur out high‑frequency noise while retaining the broader, consistent transitions that represent true table lines.
Statistical Thresholding
- After each smoothing step, compute the mean and standard deviation of the signal.
- Retain only those points that exceed a dynamic threshold (e.g., μ + k·σ). This filters out spurious peaks caused by speckles or scanning artifacts.
Peak Detection & Mapping
- The surviving peaks are located precisely (sub‑pixel interpolation if needed).
- Their positions are mapped back to the original image coordinate system, yielding exact row/column coordinates.
Resolution‑Invariant Handling
- If the input mask is low‑resolution, the signal is zero‑padded and optionally up‑sampled before smoothing, ensuring the Gaussian kernels operate on a consistent scale.

The entire pipeline is lightweight (pure NumPy/CPU operations) and can be dropped into any existing OCR or table‑extraction workflow.

Results & Findings

Dataset / Setup	Baseline CASA*	With Proposed Edge Detector
PubLayNet‑1M (TableNet + PyTesseract)	67 %	76 %
Varying DPI (150‑300) – same pipeline	60 % → 71 %	68 % → 78 %

*Cell‑Aware Segmentation Accuracy (CASA) measures both textual correctness (OCR) and correct cell placement, making it a stricter metric than plain OCR word‑error‑rate.

Noise robustness: Adding synthetic Gaussian noise to masks degrades the baseline by ~9 %, while the proposed method loses < 3 %.
Resolution invariance: Zero‑padding + scaling keeps performance stable across a 2× DPI change, whereas the baseline drops ~5 % when down‑sampled to 150 DPI.
Computation: The edge‑extraction step adds ~0.02 s per table on a single CPU core, negligible compared to OCR time.

Practical Implications

Plug‑and‑play upgrade: Developers can wrap the edge‑detector around any mask‑producing model (TableNet, Detectron2, YOLO‑based detectors) without retraining.
Higher‑quality structured outputs: More accurate row/column coordinates mean downstream data pipelines (e.g., automated invoice processing, scientific table mining) receive cleaner CSV/JSON exports, reducing manual clean‑up.
Cost savings on OCR: Better cell alignment improves OCR confidence scores, allowing lower‑cost OCR engines (open‑source Tesseract) to replace expensive commercial APIs in many use‑cases.
Edge‑device friendliness: Since the algorithm is CPU‑only and memory‑light, it can run on edge devices (mobile scanners, embedded document scanners) where GPU resources are scarce.
Improved compliance & auditability: In regulated industries (finance, healthcare), precise table extraction is essential for audit trails; this method raises the reliability bar without adding proprietary black‑box components.

Limitations & Future Work

Dependence on a decent initial mask: If the upstream detector completely misses a table region, the signal‑processing step cannot recover it.
Fixed Gaussian schedule: The current progressive σ schedule is hand‑designed; learning an optimal schedule per document type could yield further gains.
Complex table layouts: Multi‑level headers, merged cells, or heavily skewed tables still challenge the 1‑D signal assumption; extending the method to handle 2‑D edge maps is a promising direction.
Benchmark breadth: Experiments focus on PubLayNet; evaluating on more diverse datasets (historical archives, handwritten tables) would solidify generalizability.

Overall, the paper delivers a pragmatic, low‑overhead technique that can be immediately adopted by developers building table‑extraction pipelines, especially when dealing with noisy or low‑resolution scans.

Authors

Suren Bandara

Paper Information

arXiv ID: 2512.21287v1
Categories: cs.CV
Published: December 24, 2025
PDF: Download PDF

[Paper] Post-Processing Mask-Based Table Segmentation for Structural Coordinate Extraction

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

[Paper] ProEdit: Inversion-based Editing From Prompts Done Right

[Paper] Learning Association via Track-Detection Matching for Multi-Object Tracking

[Paper] Yume-1.5: A Text-Controlled Interactive World Generation Model