[Paper] Diffusion-Guided Feature Selection via Nishimori Temperature: Noise-Based Spectral Embedding

Published: 1 day ago (April 27, 2026 at 12:53 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2604.24692v1

Overview

The authors introduce Noise‑Based Spectral Embedding (NBSE), a physics‑inspired technique for picking the most informative features from very high‑dimensional data. By leveraging concepts from statistical physics (the Nishimori temperature and the Bethe Hessian), NBSE builds a diffusion‑based similarity graph that automatically groups redundant dimensions and selects a compact, high‑quality subset—without the need for costly greedy searches.

Key Contributions

A novel spectral embedding pipeline that operates directly on the feature space via a sparse similarity graph and the Bethe Hessian’s singularity at the Nishimori temperature.
Theoretical robustness guarantee: coloured Gaussian noise perturbs the critical temperature by at most (O(\bar\sigma^{2})), ensuring stable feature selection under realistic measurement noise.
One‑dimensional embedding for feature grouping: the smallest eigenvector of the Bethe Hessian yields a scalar score per feature, exposing clusters of semantically related dimensions.
Practical compression results on large‑scale image embeddings (MobileNetV2, EfficientNet‑B4) showing < 1 % accuracy loss while retaining only 30 % of the original features, beating classic ANOVA‑F and random baselines by up to 6.8 %.
No greedy search or hyper‑parameter sweep: the method is essentially parameter‑free apart from the graph sparsity level, making it easy to plug into existing pipelines.

Methodology

Construct a sparse similarity graph on the data samples (or on transposed data for feature‑wise analysis). Edges connect nearest‑neighbors based on cosine similarity, yielding a lightweight adjacency matrix.
Compute the Bethe Hessian (H(\beta) = (β^{2} - 1)I - βA + D), where (A) is the adjacency matrix and (D) the degree matrix.
Find the Nishimori temperature (β_{N}) – the smallest inverse temperature where the smallest eigenvalue of (H(\beta)) hits zero (i.e., the matrix becomes singular). This can be located efficiently with a simple bisection search.
Extract the dominant eigenvector of (H(β_{N})). In the sample‑space case it gives a diffusion‑aware weighting of points; in the feature‑space case it provides a scalar “importance score” for each dimension.
Group and bin features: sort features by their scores, apply a balanced binning (e.g., k‑means on the 1‑D scores) to form clusters of redundant dimensions, then pick a single representative from each cluster.
Optional post‑processing: the selected features can be fed to downstream models unchanged, or used to fine‑tune a smaller network.

The whole pipeline runs in near‑linear time with respect to the number of samples/features because the graph is sparse and only a few eigenvectors are needed.

Results & Findings

Dataset / Backbone	% Features Retained	Top‑1 Accuracy Δ	NBSE vs. ANOVA‑F	NBSE vs. Random
ImageNet (MobileNetV2)	30 %	–0.8 %	+4.2 %	+5.6 %
ImageNet (EfficientNet‑B4)	30 %	–0.9 %	+5.1 %	+6.8 %
CIFAR‑10 (ResNet‑18)	25 %	–0.5 %	+3.7 %	+4.9 %

Accuracy preservation: Even with aggressive compression (30 % of original dimensions) the drop in classification performance stays below 1 %.
Robustness to noise: Synthetic coloured Gaussian perturbations added to the embeddings shift the critical temperature by less than (10^{-3}) for typical noise levels, confirming the theoretical bound.
Speed: Feature selection on a 1 M‑sample, 2048‑dimensional embedding completes in under 2 minutes on a single GPU, far faster than iterative wrapper methods.

Practical Implications

Model deployment on edge devices – NBSE can shrink the size of pre‑computed embeddings (e.g., from a backbone CNN) without retraining, reducing memory bandwidth and storage costs.
Feature‑budgeted APIs – Services that expose embeddings (search, recommendation) can enforce strict bandwidth limits while keeping query quality high.
Data‑centric AI pipelines – By automatically exposing groups of redundant dimensions, NBSE helps data engineers understand feature redundancy, guide dimensionality‑reduction decisions, and debug noisy sensor streams.
Plug‑and‑play compatibility – Since NBSE works on any numeric matrix, it can be applied to text embeddings (BERT, CLIP), graph node features, or even tabular sensor data with minimal code changes.
Reduced training time for downstream models – Smaller feature sets lead to faster training and inference for linear classifiers, gradient‑boosted trees, or fine‑tuned transformers built on top of the embeddings.

Limitations & Future Work

Graph construction hyper‑parameter (number of nearest neighbors) still requires a modest amount of tuning; overly sparse graphs may miss subtle similarity patterns.
Scalability to billions of points: while the sparse eigen‑solver scales well, constructing the nearest‑neighbor graph at that scale may need approximate methods (e.g., FAISS).
Extension beyond Euclidean/cosine similarity: the current formulation assumes a symmetric similarity measure; adapting NBSE to asymmetric or learned similarity kernels is an open direction.
Theoretical analysis for non‑Gaussian noise: the robustness proof covers coloured Gaussian perturbations; exploring heavy‑tailed or adversarial noise would broaden applicability.

The authors suggest exploring adaptive graph sparsity, integrating NBSE with end‑to‑end differentiable pipelines, and applying the framework to multimodal embeddings as promising next steps.

Authors

Vasiliy S. Usatyuk
Denis A. Sapozhnikov
Sergey I. Egorov

Paper Information

arXiv ID: 2604.24692v1
Categories: cs.LG
Published: April 27, 2026
PDF: Download PDF

[Paper] Diffusion-Guided Feature Selection via Nishimori Temperature: Noise-Based Spectral Embedding

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recursive Multi-Agent Systems

[Paper] How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

[Paper] Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

[Paper] Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models