[Paper] Post-Processing Mask-Based Table Segmentation for Structural Coordinate Extraction

Published: (December 24, 2025 at 12:10 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.21287v1

Overview

This paper tackles a surprisingly stubborn problem in document‑image processing: pinpointing the exact row and column boundaries inside a table mask, especially when the source image is low‑resolution, noisy, or partially corrupted. By treating the mask as a multi‑scale signal and applying a clever cascade of Gaussian smoothing and statistical thresholding, the author achieves a noticeable boost in downstream OCR accuracy on a large‑scale benchmark (PubLayNet‑1M).

Key Contributions

  • Signal‑processing edge detector for table masks – models row/column transitions as 1‑D signals and extracts stable edges without operating directly on the raw image.
  • Progressive multi‑scale Gaussian convolution – uses increasing kernel variances to suppress noise while preserving true structural changes.
  • Statistical peak‑selection thresholding – automatically determines robust cut‑offs, eliminating hand‑tuned parameters.
  • Zero‑padding & scaling strategy – makes the method agnostic to the original image resolution, enabling seamless integration with existing pipelines.
  • Empirical validation – improves Cell‑Aware Segmentation Accuracy (CASA) from 67 % to 76 % on PubLayNet‑1M when combined with TableNet + PyTesseract OCR.

Methodology

  1. Mask Generation – An upstream table detector (e.g., TableNet) produces a binary mask that roughly outlines the table region.
  2. 1‑D Signal Construction – For each axis (horizontal for columns, vertical for rows) the mask is collapsed into a one‑dimensional intensity profile by summing pixel values along the orthogonal direction. Peaks in this profile correspond to potential cell boundaries.
  3. Multi‑Scale Gaussian Smoothing
    • Start with a narrow Gaussian kernel (small σ) to keep fine details.
    • Iteratively increase σ, convolving the signal each time. Larger σ values blur out high‑frequency noise while retaining the broader, consistent transitions that represent true table lines.
  4. Statistical Thresholding
    • After each smoothing step, compute the mean and standard deviation of the signal.
    • Retain only those points that exceed a dynamic threshold (e.g., μ + k·σ). This filters out spurious peaks caused by speckles or scanning artifacts.
  5. Peak Detection & Mapping
    • The surviving peaks are located precisely (sub‑pixel interpolation if needed).
    • Their positions are mapped back to the original image coordinate system, yielding exact row/column coordinates.
  6. Resolution‑Invariant Handling
    • If the input mask is low‑resolution, the signal is zero‑padded and optionally up‑sampled before smoothing, ensuring the Gaussian kernels operate on a consistent scale.

The entire pipeline is lightweight (pure NumPy/CPU operations) and can be dropped into any existing OCR or table‑extraction workflow.

Results & Findings

Dataset / SetupBaseline CASA*With Proposed Edge Detector
PubLayNet‑1M (TableNet + PyTesseract)67 %76 %
Varying DPI (150‑300) – same pipeline60 % → 71 %68 % → 78 %

*Cell‑Aware Segmentation Accuracy (CASA) measures both textual correctness (OCR) and correct cell placement, making it a stricter metric than plain OCR word‑error‑rate.

  • Noise robustness: Adding synthetic Gaussian noise to masks degrades the baseline by ~9 %, while the proposed method loses < 3 %.
  • Resolution invariance: Zero‑padding + scaling keeps performance stable across a 2× DPI change, whereas the baseline drops ~5 % when down‑sampled to 150 DPI.
  • Computation: The edge‑extraction step adds ~0.02 s per table on a single CPU core, negligible compared to OCR time.

Practical Implications

  • Plug‑and‑play upgrade: Developers can wrap the edge‑detector around any mask‑producing model (TableNet, Detectron2, YOLO‑based detectors) without retraining.
  • Higher‑quality structured outputs: More accurate row/column coordinates mean downstream data pipelines (e.g., automated invoice processing, scientific table mining) receive cleaner CSV/JSON exports, reducing manual clean‑up.
  • Cost savings on OCR: Better cell alignment improves OCR confidence scores, allowing lower‑cost OCR engines (open‑source Tesseract) to replace expensive commercial APIs in many use‑cases.
  • Edge‑device friendliness: Since the algorithm is CPU‑only and memory‑light, it can run on edge devices (mobile scanners, embedded document scanners) where GPU resources are scarce.
  • Improved compliance & auditability: In regulated industries (finance, healthcare), precise table extraction is essential for audit trails; this method raises the reliability bar without adding proprietary black‑box components.

Limitations & Future Work

  • Dependence on a decent initial mask: If the upstream detector completely misses a table region, the signal‑processing step cannot recover it.
  • Fixed Gaussian schedule: The current progressive σ schedule is hand‑designed; learning an optimal schedule per document type could yield further gains.
  • Complex table layouts: Multi‑level headers, merged cells, or heavily skewed tables still challenge the 1‑D signal assumption; extending the method to handle 2‑D edge maps is a promising direction.
  • Benchmark breadth: Experiments focus on PubLayNet; evaluating on more diverse datasets (historical archives, handwritten tables) would solidify generalizability.

Overall, the paper delivers a pragmatic, low‑overhead technique that can be immediately adopted by developers building table‑extraction pipelines, especially when dealing with noisy or low‑resolution scans.

Authors

  • Suren Bandara

Paper Information

  • arXiv ID: 2512.21287v1
  • Categories: cs.CV
  • Published: December 24, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »