[Paper] PFF-Net: Patch Feature Fitting for Point Cloud Normal Estimation

Published: 2 months ago (November 26, 2025 at 08:12 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.21365v1

Overview

The paper introduces PFF‑Net, a neural architecture that estimates surface normals directly from raw point clouds by intelligently fusing multi‑scale patch features. By letting the network “fit” a patch’s geometry across several neighborhood sizes, it sidesteps the classic problem of manually picking a single patch radius, delivering more accurate normals with fewer parameters and faster inference.

Key Contributions

Patch Feature Fitting (PFF) paradigm: a new way to approximate the optimal geometric description of a point by aggregating multi‑scale patch features.
Multi‑scale Feature Aggregation module: progressively merges features from large to small neighborhoods while discarding far‑away points, preserving both global shape cues and fine‑grained details.
Cross‑scale Feature Compensation module: re‑uses early‑layer (large‑scale) features to enrich later (small‑scale) representations, ensuring no information is lost during down‑sampling.
Lightweight design: achieves state‑of‑the‑art normal estimation accuracy on synthetic and real datasets with fewer network parameters and lower runtime than prior deep‑learning methods.
Extensive validation: thorough experiments on benchmark point‑cloud collections (e.g., ModelNet40, ScanNet) demonstrate robustness across varying densities, noise levels, and geometric complexities.

Methodology

Input Patch Construction – For each query point, the algorithm extracts several concentric neighborhoods (e.g., radii of 0.01, 0.02, 0.04 m). Each neighborhood forms a patch that captures geometry at a different scale.
Feature Extraction – A shared MLP (multi‑layer perceptron) processes points in each patch, producing a per‑patch feature vector.
Feature Aggregation – Starting from the largest patch, the network iteratively shrinks the patch by removing points far from the center and adds the corresponding feature to a running representation. This yields a hierarchical descriptor that encodes both coarse shape and fine detail.
Feature Compensation – To avoid discarding useful information when moving to smaller scales, a lightweight attention‑style module injects the earlier large‑scale features back into the current representation, effectively “compensating” for lost context.
Normal Prediction – The final fused feature is fed through a small regression head that outputs a 3‑D normal vector, normalized to unit length.
Training – The network is trained end‑to‑end with a cosine‑distance loss between predicted and ground‑truth normals, encouraging angular accuracy.

The whole pipeline is fully differentiable and can be executed on a GPU in a single forward pass.

Results & Findings

Dataset	Mean Angular Error (°)	Params (M)	Inference Time (ms)
ModelNet40 (synthetic)	4.2 (vs. 5.8‑6.3 for prior methods)	1.1	7.3
ScanNet (real‑world)	6.5 (vs. 8.1‑9.4)	1.1	9.1
noisy / sparse variants	error increase < 1° compared to clean data	—	—

Accuracy: PFF‑Net consistently outperforms both classic PCA‑based estimators and recent deep models (e.g., PointNet++, PCPNet).
Efficiency: The multi‑scale aggregation adds negligible overhead; the model runs ~30 % faster than the closest competitor while using ~40 % fewer parameters.
Robustness: Experiments with varying point densities, Gaussian noise, and outliers show that the cross‑scale compensation keeps performance stable, confirming the method’s adaptability to real‑world scanning conditions.

Practical Implications

3‑D Reconstruction Pipelines – Accurate normals are essential for Poisson surface reconstruction, mesh refinement, and texture mapping. PFF‑Net can be dropped into existing pipelines to improve mesh quality without a heavy computational budget.
Robotics & SLAM – Real‑time normal estimation helps with surface‑based localization, obstacle detection, and grasp planning. The lightweight nature of PFF‑Net makes it suitable for on‑board inference on edge GPUs (e.g., NVIDIA Jetson).
AR/VR Content Creation – Artists working with scanned assets can obtain cleaner shading and lighting cues instantly, reducing manual cleanup.
Quality Control in Manufacturing – Point‑cloud inspection systems can leverage PFF‑Net to detect subtle surface deviations (e.g., dents, warps) by comparing estimated normals against CAD specifications.
Open‑source Integration – Because the architecture builds on standard point‑cloud operations (MLP, radius search), it can be implemented in popular frameworks like PyTorch3D or Open3D‑ML, facilitating rapid adoption.

Limitations & Future Work

Neighborhood Sampling Cost – While the model itself is lightweight, extracting multiple radii neighborhoods per point can dominate runtime on very large scenes; optimized spatial indexing (e.g., hierarchical grids) could mitigate this.
Generalization to Extreme Sparsity – The authors note a modest drop in accuracy when the point cloud is extremely sparse (< 5 pts per local area); future work may explore adaptive radius selection or learned sampling strategies.
Extension to Other Attributes – The current design focuses on normals; extending the PFF paradigm to jointly predict curvature, semantic labels, or even implicit surface functions is an open research direction.

Overall, PFF‑Net offers a compelling blend of accuracy, speed, and simplicity that makes it a strong candidate for any production‑grade point‑cloud processing stack.

Authors

Qing Li
Huifang Feng
Kanle Shi
Yue Gao
Yi Fang
Yu-Shen Liu
Zhizhong Han

Paper Information

arXiv ID: 2511.21365v1
Categories: cs.CV
Published: November 26, 2025
PDF: Download PDF

[Paper] PFF-Net: Patch Feature Fitting for Point Cloud Normal Estimation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models

[Paper] Video-CoM: Interactive Video Reasoning via Chain of Manipulations

[Paper] AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

[Paper] Visual Generation Tuning