[Paper] High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations

Published: (December 17, 2025 at 01:38 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.15684v1

Overview

Partial Least Squares (PLS) is a work‑horse for linking two high‑dimensional data sets—think genomics and imaging, or user behavior and product attributes. Léger and Chatelain deliver the first rigorous, high‑dimensional theory for the most common PLS implementation that relies on a singular‑value‑decomposition (PLS‑SVD). Their analysis explains when the method reliably uncovers the shared latent structure and why it sometimes fails, giving practitioners a solid footing for using PLS in modern, “big‑data” pipelines.

Key Contributions

  • Random‑matrix‑based spectral analysis of the cross‑covariance matrix that underlies PLS‑SVD, yielding closed‑form asymptotic formulas for the alignment between estimated and true latent directions.
  • Quantitative phase diagram that delineates regimes of successful recovery, partial recovery, and complete failure as a function of signal strength, dimensionality ratios, and noise levels.
  • Proof of asymptotic superiority of PLS‑SVD over applying PCA separately to each data set for detecting the common low‑rank subspace.
  • Identification of counter‑intuitive phenomena, such as “signal swamping” where adding more samples can degrade the estimated components in certain noise configurations.
  • Clear practical guidelines (e.g., required signal‑to‑noise ratios, optimal scaling of regularization) derived from the theoretical limits.

Methodology

  1. Model setup – Two data matrices (X \in \mathbb{R}^{n \times p}) and (Y \in \mathbb{R}^{n \times q}) are generated as
    [ X = L,U^\top + E_X,\qquad Y = L,V^\top + E_Y, ]
    where (L) is an (n \times r) low‑rank latent factor matrix shared by both views, (U) and (V) contain the true loading vectors, and (E_X, E_Y) are independent Gaussian noise matrices.

  2. PLS‑SVD estimator – The algorithm forms the empirical cross‑covariance (\hat{C}=X^\top Y) and extracts its top singular vectors ((\hat{u},\hat{v})) as estimates of ((U,V)).

  3. Random matrix tools – By letting (n,p,q\to\infty) with fixed ratios (p/n) and (q/n), the authors invoke the Marchenko–Pastur law and recent “spiked‑model” results to track how the singular values and vectors of (\hat{C}) behave.

  4. Alignment metrics – The cosine similarity between (\hat{u}) and the true (u) (and similarly for (v)) is expressed in terms of deterministic functions of the signal strengths (the singular values of the true low‑rank part) and the aspect ratios.

  5. Comparison with PCA – A parallel analysis is carried out for the top eigenvectors of (X^\top X) and (Y^\top Y) separately, allowing a clean asymptotic comparison.

Results & Findings

RegimeCondition (simplified)What Happens to PLS‑SVD
Strong signalSignal eigenvalue > critical threshold (\sqrt{c_x c_y}) (where (c_x=p/n, c_y=q/n))Top singular vectors align strongly with true loadings (cosine → 1).
Weak signalSignal eigenvalue below thresholdEstimated vectors become essentially random (alignment → 0).
IntermediateNear‑thresholdPartial alignment; the exact formula predicts the cosine as a smooth function of signal strength.
Noise‑dominatedVery high noise variance relative to signalCounter‑intuitive “swamping”: adding more samples can reduce alignment because the noise inflates the bulk spectrum.
  • Superiority over separate PCA: Even when each view alone cannot recover its own latent subspace (because the signal is below the PCA threshold), the joint PLS‑SVD can succeed as long as the product of the two signal strengths exceeds the joint threshold.
  • Phase transition: The analysis uncovers a sharp transition akin to the BBP (Baik–Ben Arous–Péché) phase transition, but now in the cross‑covariance domain.

Practical Implications

  • Guideline for data collection – Before committing to PLS, compute the empirical aspect ratios (p/n) and (q/n) and estimate signal‑to‑noise ratios. If the product of the two estimated signal strengths falls below the derived threshold, expect poor latent‑component recovery.
  • Model selection – The asymptotic formulas can be turned into a quick diagnostic tool (e.g., a “PLS feasibility plot”) that tells you how many components are statistically identifiable.
  • Algorithmic choices – In regimes where PLS‑SVD is near the threshold, adding a modest amount of regularization (ridge‑type shrinkage on (X) and (Y)) can push the effective signal above the critical value.
  • Benchmarking – When comparing PLS‑SVD to deep‑learning based multimodal embeddings, the theory provides a baseline: any method that cannot beat the PLS‑SVD asymptotic limit in the high‑dimensional regime is unlikely to add value.
  • Interpretability – Because the alignment metrics are explicit, developers can report confidence scores for each extracted component, improving transparency in downstream applications (e.g., biomarker discovery, recommendation systems).

Limitations & Future Work

  • Gaussian noise assumption – The proofs rely on i.i.d. Gaussian noise; heavy‑tailed or structured noise may shift the thresholds.
  • Exact low‑rank model – Real‑world data often contain more complex, possibly hierarchical latent structures that are not captured by a single shared low‑rank factor.
  • Finite‑sample corrections – The asymptotic results may be optimistic for modest sample sizes; deriving non‑asymptotic error bounds is an open challenge.
  • Extension to regularized PLS – While the paper hints at ridge‑type modifications, a full spectral analysis of regularized PLS‑SVD (including sparsity constraints) remains to be done.

Overall, Léger and Chatelain’s work equips developers with a solid theoretical compass for navigating high‑dimensional PLS, clarifying both its power and its boundaries.

Authors

  • Victor Léger
  • Florent Chatelain

Paper Information

  • arXiv ID: 2512.15684v1
  • Categories: stat.ML, cs.LG
  • Published: December 17, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »

[Paper] When Reasoning Meets Its Laws

Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabil...