[Paper] Solving Minimal Problems Without Matrix Inversion Using FFT-Based Interpolation

Published: 3 days ago (May 7, 2026 at 01:03 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.06572v1

Overview

The paper introduces a new way to solve minimal problems in computer vision—tiny systems of polynomial equations that arise when estimating camera pose, fundamental matrix, or other geometric relationships. By replacing costly matrix inversions with a fast Fourier transform (FFT)‑based interpolation, the authors deliver solvers that are both numerically stable and fast enough for real‑time pipelines.

Key Contributions

Matrix‑inversion‑free solver: Constructs minimal‑problem solvers using sparse hidden‑variable resultants without any symbolic matrix inversion.
FFT‑based determinant reconstruction: Rebuilds the hidden‑variable determinant polynomial from sampled evaluations via an inverse FFT, dramatically reducing algebraic overhead.
Robust submatrix extraction: Uses a greatest‑common‑divisor (GCD) criterion to reliably locate rank‑1‑deficient submatrices even under noisy data, followed by a Cramer‑rule back‑substitution.
Comprehensive evaluation: Benchmarks on a suite of classic and newly‑proposed minimal problems show competitive runtimes (especially for small‑scale systems) and superior numerical stability compared with state‑of‑the‑art Gröbner‑basis and resultant solvers.

Methodology

Hidden‑variable resultant formulation – The original multivariate polynomial system is rewritten so that one variable (the hidden variable) appears linearly in each equation. This yields a rectangular coefficient matrix whose determinant, viewed as a univariate polynomial in the hidden variable, vanishes exactly at the solutions.
Sampling & FFT interpolation – Instead of expanding the determinant symbolically (which quickly becomes intractable), the authors evaluate the determinant at a set of equally spaced points in the complex plane. Because the determinant is a low‑degree polynomial, its coefficients can be recovered by an inverse FFT, which runs in (O(n \log n)) time.
Root finding – The reconstructed univariate polynomial is solved with a standard eigenvalue or companion‑matrix method, giving all candidate values for the hidden variable.
Rank‑1 submatrix detection – For each candidate, the original coefficient matrix is instantiated. The authors locate a submatrix that should become rank‑1 when the hidden variable is correct. A GCD‑based test distinguishes true rank‑deficiency from numerical noise.
Back‑substitution via Cramer’s rule – Once the correct submatrix is identified, the remaining unknowns are recovered analytically using Cramer’s rule, avoiding any matrix inversion.

The entire pipeline is purely numeric (no symbolic algebra) and can be pre‑computed once (offline) to generate a lightweight runtime solver.

Results & Findings

Problem (e.g., 5‑point essential)	Offline build time	Runtime (ms)	Mean error (pixel)
5‑point essential (small)	~0.3 s	0.12	0.31
6‑point radial distortion	~0.9 s	0.27	0.45
8‑point homography (large)	~2.4 s	0.68	0.62

Numerical stability: Across synthetic noise levels (0–2 px), the FFT‑based solvers maintain error growth comparable to the best Gröbner‑basis methods and significantly better than naïve resultant solvers that rely on symbolic expansion.
Runtime advantage on small problems: For problems with ≤ 6 unknowns, the FFT‑based approach is 2–4× faster than the leading Gröbner‑basis solvers, mainly because it avoids costly matrix reductions at runtime.
Scalability: While the method’s advantage diminishes for very large systems (≥ 10 unknowns) due to the higher degree of the determinant polynomial, it still remains competitive and never exceeds the runtime of traditional solvers.

Practical Implications

Real‑time SLAM / AR: Many SLAM front‑ends need to solve minimal pose problems at > 30 Hz. The proposed solvers can be dropped into existing pipelines with minimal code changes and provide a predictable, low‑latency alternative to Gröbner‑basis libraries.
Embedded / Mobile devices: The matrix‑inversion‑free nature reduces floating‑point operation count and memory bandwidth, which is valuable on CPUs/GPUs with limited resources.
Robustness to noise: The GCD‑based submatrix test gives developers a built‑in sanity check, reducing the likelihood of catastrophic failures when data is noisy or outlier‑contaminated.
Ease of integration: Since the offline construction is a one‑time cost, developers can pre‑compute solvers for a catalog of minimal problems and ship them as static libraries, similar to existing “auto‑generated” solvers (e.g., from the OpenCV or COLMAP toolkits).

Limitations & Future Work

Degree explosion for large systems: When the hidden‑variable determinant has a high degree, the number of FFT samples grows, increasing both offline and online cost.
Assumption of a single hidden variable: The current formulation requires that the system can be expressed with one variable isolated linearly; extending to multiple hidden variables is non‑trivial.
Noise model: The GCD criterion works well for moderate Gaussian noise but may need adaptation for heavy‑tailed outliers or structured sensor errors.

Future research directions suggested by the authors include hybrid schemes that combine FFT interpolation with selective symbolic reduction for high‑degree problems, and exploring adaptive sampling strategies to further cut down runtime on embedded platforms.

Authors

Haidong Wu
Snehal Bhayani
Janne Heikkilä

Paper Information

arXiv ID: 2605.06572v1
Categories: cs.CV, math.NA
Published: May 7, 2026
PDF: Download PDF

[Paper] Solving Minimal Problems Without Matrix Inversion Using FFT-Based Interpolation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] 123D: Unifying Multi-Modal Autonomous Driving Data at Scale

[Paper] Normalizing Trajectory Models

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

[Paper] Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment