[Paper] Diffusion-Guided Feature Selection via Nishimori Temperature: Noise-Based Spectral Embedding
Source: arXiv - 2604.24692v1
Overview
The authors introduce Noise‑Based Spectral Embedding (NBSE), a physics‑inspired technique for picking the most informative features from very high‑dimensional data. By leveraging concepts from statistical physics (the Nishimori temperature and the Bethe Hessian), NBSE builds a diffusion‑based similarity graph that automatically groups redundant dimensions and selects a compact, high‑quality subset—without the need for costly greedy searches.
Key Contributions
- A novel spectral embedding pipeline that operates directly on the feature space via a sparse similarity graph and the Bethe Hessian’s singularity at the Nishimori temperature.
- Theoretical robustness guarantee: coloured Gaussian noise perturbs the critical temperature by at most (O(\bar\sigma^{2})), ensuring stable feature selection under realistic measurement noise.
- One‑dimensional embedding for feature grouping: the smallest eigenvector of the Bethe Hessian yields a scalar score per feature, exposing clusters of semantically related dimensions.
- Practical compression results on large‑scale image embeddings (MobileNetV2, EfficientNet‑B4) showing < 1 % accuracy loss while retaining only 30 % of the original features, beating classic ANOVA‑F and random baselines by up to 6.8 %.
- No greedy search or hyper‑parameter sweep: the method is essentially parameter‑free apart from the graph sparsity level, making it easy to plug into existing pipelines.
Methodology
- Construct a sparse similarity graph on the data samples (or on transposed data for feature‑wise analysis). Edges connect nearest‑neighbors based on cosine similarity, yielding a lightweight adjacency matrix.
- Compute the Bethe Hessian (H(\beta) = (β^{2} - 1)I - βA + D), where (A) is the adjacency matrix and (D) the degree matrix.
- Find the Nishimori temperature (β_{N}) – the smallest inverse temperature where the smallest eigenvalue of (H(\beta)) hits zero (i.e., the matrix becomes singular). This can be located efficiently with a simple bisection search.
- Extract the dominant eigenvector of (H(β_{N})). In the sample‑space case it gives a diffusion‑aware weighting of points; in the feature‑space case it provides a scalar “importance score” for each dimension.
- Group and bin features: sort features by their scores, apply a balanced binning (e.g., k‑means on the 1‑D scores) to form clusters of redundant dimensions, then pick a single representative from each cluster.
- Optional post‑processing: the selected features can be fed to downstream models unchanged, or used to fine‑tune a smaller network.
The whole pipeline runs in near‑linear time with respect to the number of samples/features because the graph is sparse and only a few eigenvectors are needed.
Results & Findings
| Dataset / Backbone | % Features Retained | Top‑1 Accuracy Δ | NBSE vs. ANOVA‑F | NBSE vs. Random |
|---|---|---|---|---|
| ImageNet (MobileNetV2) | 30 % | –0.8 % | +4.2 % | +5.6 % |
| ImageNet (EfficientNet‑B4) | 30 % | –0.9 % | +5.1 % | +6.8 % |
| CIFAR‑10 (ResNet‑18) | 25 % | –0.5 % | +3.7 % | +4.9 % |
- Accuracy preservation: Even with aggressive compression (30 % of original dimensions) the drop in classification performance stays below 1 %.
- Robustness to noise: Synthetic coloured Gaussian perturbations added to the embeddings shift the critical temperature by less than (10^{-3}) for typical noise levels, confirming the theoretical bound.
- Speed: Feature selection on a 1 M‑sample, 2048‑dimensional embedding completes in under 2 minutes on a single GPU, far faster than iterative wrapper methods.
Practical Implications
- Model deployment on edge devices – NBSE can shrink the size of pre‑computed embeddings (e.g., from a backbone CNN) without retraining, reducing memory bandwidth and storage costs.
- Feature‑budgeted APIs – Services that expose embeddings (search, recommendation) can enforce strict bandwidth limits while keeping query quality high.
- Data‑centric AI pipelines – By automatically exposing groups of redundant dimensions, NBSE helps data engineers understand feature redundancy, guide dimensionality‑reduction decisions, and debug noisy sensor streams.
- Plug‑and‑play compatibility – Since NBSE works on any numeric matrix, it can be applied to text embeddings (BERT, CLIP), graph node features, or even tabular sensor data with minimal code changes.
- Reduced training time for downstream models – Smaller feature sets lead to faster training and inference for linear classifiers, gradient‑boosted trees, or fine‑tuned transformers built on top of the embeddings.
Limitations & Future Work
- Graph construction hyper‑parameter (number of nearest neighbors) still requires a modest amount of tuning; overly sparse graphs may miss subtle similarity patterns.
- Scalability to billions of points: while the sparse eigen‑solver scales well, constructing the nearest‑neighbor graph at that scale may need approximate methods (e.g., FAISS).
- Extension beyond Euclidean/cosine similarity: the current formulation assumes a symmetric similarity measure; adapting NBSE to asymmetric or learned similarity kernels is an open direction.
- Theoretical analysis for non‑Gaussian noise: the robustness proof covers coloured Gaussian perturbations; exploring heavy‑tailed or adversarial noise would broaden applicability.
The authors suggest exploring adaptive graph sparsity, integrating NBSE with end‑to‑end differentiable pipelines, and applying the framework to multimodal embeddings as promising next steps.
Authors
- Vasiliy S. Usatyuk
- Denis A. Sapozhnikov
- Sergey I. Egorov
Paper Information
- arXiv ID: 2604.24692v1
- Categories: cs.LG
- Published: April 27, 2026
- PDF: Download PDF