[Paper] High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations

Published: 1 month ago (December 17, 2025 at 01:38 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.15684v1

Overview

Partial Least Squares (PLS) is a work‑horse for linking two high‑dimensional data sets—think genomics and imaging, or user behavior and product attributes. Léger and Chatelain deliver the first rigorous, high‑dimensional theory for the most common PLS implementation that relies on a singular‑value‑decomposition (PLS‑SVD). Their analysis explains when the method reliably uncovers the shared latent structure and why it sometimes fails, giving practitioners a solid footing for using PLS in modern, “big‑data” pipelines.

Key Contributions

Random‑matrix‑based spectral analysis of the cross‑covariance matrix that underlies PLS‑SVD, yielding closed‑form asymptotic formulas for the alignment between estimated and true latent directions.
Quantitative phase diagram that delineates regimes of successful recovery, partial recovery, and complete failure as a function of signal strength, dimensionality ratios, and noise levels.
Proof of asymptotic superiority of PLS‑SVD over applying PCA separately to each data set for detecting the common low‑rank subspace.
Identification of counter‑intuitive phenomena, such as “signal swamping” where adding more samples can degrade the estimated components in certain noise configurations.
Clear practical guidelines (e.g., required signal‑to‑noise ratios, optimal scaling of regularization) derived from the theoretical limits.

Methodology

Model setup – Two data matrices (X \in \mathbb{R}^{n \times p}) and (Y \in \mathbb{R}^{n \times q}) are generated as
[ X = L,U^\top + E_X,\qquad Y = L,V^\top + E_Y, ]
where (L) is an (n \times r) low‑rank latent factor matrix shared by both views, (U) and (V) contain the true loading vectors, and (E_X, E_Y) are independent Gaussian noise matrices.
PLS‑SVD estimator – The algorithm forms the empirical cross‑covariance (\hat{C}=X^\top Y) and extracts its top singular vectors ((\hat{u},\hat{v})) as estimates of ((U,V)).
Random matrix tools – By letting (n,p,q\to\infty) with fixed ratios (p/n) and (q/n), the authors invoke the Marchenko–Pastur law and recent “spiked‑model” results to track how the singular values and vectors of (\hat{C}) behave.
Alignment metrics – The cosine similarity between (\hat{u}) and the true (u) (and similarly for (v)) is expressed in terms of deterministic functions of the signal strengths (the singular values of the true low‑rank part) and the aspect ratios.
Comparison with PCA – A parallel analysis is carried out for the top eigenvectors of (X^\top X) and (Y^\top Y) separately, allowing a clean asymptotic comparison.

Results & Findings

Regime	Condition (simplified)	What Happens to PLS‑SVD
Strong signal	Signal eigenvalue > critical threshold (\sqrt{c_x c_y}) (where (c_x=p/n, c_y=q/n))	Top singular vectors align strongly with true loadings (cosine → 1).
Weak signal	Signal eigenvalue below threshold	Estimated vectors become essentially random (alignment → 0).
Intermediate	Near‑threshold	Partial alignment; the exact formula predicts the cosine as a smooth function of signal strength.
Noise‑dominated	Very high noise variance relative to signal	Counter‑intuitive “swamping”: adding more samples can reduce alignment because the noise inflates the bulk spectrum.

Superiority over separate PCA: Even when each view alone cannot recover its own latent subspace (because the signal is below the PCA threshold), the joint PLS‑SVD can succeed as long as the product of the two signal strengths exceeds the joint threshold.
Phase transition: The analysis uncovers a sharp transition akin to the BBP (Baik–Ben Arous–Péché) phase transition, but now in the cross‑covariance domain.

Practical Implications

Guideline for data collection – Before committing to PLS, compute the empirical aspect ratios (p/n) and (q/n) and estimate signal‑to‑noise ratios. If the product of the two estimated signal strengths falls below the derived threshold, expect poor latent‑component recovery.
Model selection – The asymptotic formulas can be turned into a quick diagnostic tool (e.g., a “PLS feasibility plot”) that tells you how many components are statistically identifiable.
Algorithmic choices – In regimes where PLS‑SVD is near the threshold, adding a modest amount of regularization (ridge‑type shrinkage on (X) and (Y)) can push the effective signal above the critical value.
Benchmarking – When comparing PLS‑SVD to deep‑learning based multimodal embeddings, the theory provides a baseline: any method that cannot beat the PLS‑SVD asymptotic limit in the high‑dimensional regime is unlikely to add value.
Interpretability – Because the alignment metrics are explicit, developers can report confidence scores for each extracted component, improving transparency in downstream applications (e.g., biomarker discovery, recommendation systems).

Limitations & Future Work

Gaussian noise assumption – The proofs rely on i.i.d. Gaussian noise; heavy‑tailed or structured noise may shift the thresholds.
Exact low‑rank model – Real‑world data often contain more complex, possibly hierarchical latent structures that are not captured by a single shared low‑rank factor.
Finite‑sample corrections – The asymptotic results may be optimistic for modest sample sizes; deriving non‑asymptotic error bounds is an open challenge.
Extension to regularized PLS – While the paper hints at ridge‑type modifications, a full spectral analysis of regularized PLS‑SVD (including sparsity constraints) remains to be done.

Overall, Léger and Chatelain’s work equips developers with a solid theoretical compass for navigating high‑dimensional PLS, clarifying both its power and its boundaries.

Authors

Victor Léger
Florent Chatelain

Paper Information

arXiv ID: 2512.15684v1
Categories: stat.ML, cs.LG
Published: December 17, 2025
PDF: Download PDF

[Paper] High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

[Paper] Adversarial Robustness of Vision in Open Foundation Models

[Paper] When Reasoning Meets Its Laws

[Paper] Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy