[Paper] Evolutionary Neural Architecture Search with Dual Contrastive Learning

Published: (December 23, 2025 at 02:15 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.20112v1

Overview

The paper proposes DCL‑ENAS, a new way to speed up Evolutionary Neural Architecture Search (ENAS) by teaching the search algorithm to rank candidate models instead of fully training each one. By leveraging two stages of contrastive learning—first to extract useful “shape” information from raw architectures, then to fine‑tune a predictor that only needs to know which architecture is better—the authors dramatically cut the amount of GPU time required while still beating state‑of‑the‑art NAS methods on benchmark suites and a real‑world ECG classification task.

Key Contributions

  • Dual‑contrastive learning pipeline:
    1. Self‑supervised contrastive stage learns architecture embeddings without any performance labels.
    2. Relative‑performance contrastive fine‑tuning trains a lightweight predictor to rank architectures rather than predict exact accuracies.
  • Label‑efficient predictor: Achieves high‑quality ranking with far fewer fully‑trained architecture‑label pairs, addressing the biggest bottleneck in ENAS.
  • Empirical superiority: Sets new best‑in‑class validation accuracy on NASBench‑101, NASBench‑201, and ImageNet‑16‑120, improving by 0.05 %–0.39 % over the strongest baselines.
  • Real‑world validation: On an ECG arrhythmia detection dataset, DCL‑ENAS outperforms a manually designed model (found via random search) by ~2.5 % absolute accuracy while consuming only 7.7 GPU‑days.
  • Generalizable framework: The contrastive learning stages are architecture‑agnostic and can be plugged into any evolutionary NAS loop.

Methodology

  1. Architecture encoding – Each candidate network is represented as a graph (nodes = operations, edges = connections). A graph neural network (GNN) converts this graph into a fixed‑size embedding.
  2. Stage 1: Contrastive Self‑Supervision
    • Randomly augment the graph (e.g., drop edges, permute node order).
    • Use a contrastive loss (InfoNCE) to push embeddings of augmented views of the same architecture together while pushing different architectures apart.
    • No performance labels are needed; the model learns a “semantic” space where similar structures cluster.
  3. Stage 2: Relative‑Performance Contrastive Fine‑Tuning
    • Collect a small budget of fully trained architectures (e.g., 200–500).
    • For each pair (A, B) compute which one performed better on the validation set.
    • Apply a contrastive loss that encourages the embedding of the better architecture to have a higher “score” than the worse one. This turns the predictor into a ranking model rather than a regression model.
  4. Evolutionary Search Loop
    • Initialize a population of random architectures.
    • At each generation, use the trained predictor to rank offspring and keep the top‑k for the next round.
    • Only a handful of individuals are fully trained to refresh the predictor, keeping the overall compute budget low.

Results & Findings

BenchmarkGPU‑days (budget)Best validation accuracy (↑)Improvement vs. prior SOTA
NASBench‑101~894.12 %+0.39 %
NASBench‑201 (CIFAR‑10)~693.71 %+0.22 %
ImageNet‑16‑120~1058.73 %+0.05 %
ECG Arrhythmia (real‑world)7.787.4 %+2.5 % over manual baseline
  • The predictor’s ranking quality (Kendall’s τ) reaches >0.85 after just a few hundred labeled samples.
  • Ablation studies show that removing either contrastive stage drops performance by 0.2 %–0.4 % and increases required GPU days by ~30 %.
  • The method is robust to different GNN encoders and evolutionary operators (mutation/crossover).

Practical Implications

  • Faster NAS pipelines: Teams can now run ENAS cycles on a single workstation (or modest cloud budget) instead of needing large GPU clusters.
  • Better use of limited data: Since only relative performance matters, the predictor can be trained on a few fully‑trained models and still guide the search effectively.
  • Plug‑and‑play: The dual‑contrastive learning module can be dropped into existing evolutionary NAS frameworks (e.g., DEvol, Regularized Evolution) with minimal code changes.
  • Domain‑specific NAS: The ECG experiment demonstrates that DCL‑ENAS works beyond image benchmarks, making it attractive for healthcare, IoT, and other verticals where compute budgets are tight.
  • Reduced carbon footprint: Cutting the search cost from dozens of GPU‑days to under ten translates into measurable energy savings for organizations with sustainability goals.

Limitations & Future Work

  • Scalability to very large search spaces (e.g., full‑scale ImageNet models) is not yet proven; the current experiments stay within the NASBench‑style micro‑search spaces.
  • The approach still requires a small but non‑trivial set of fully trained architectures; in domains where even a single full training run is prohibitively expensive, further label‑free techniques may be needed.
  • The contrastive augmentations are handcrafted for graph‑structured architectures; learning augmentation policies automatically could improve robustness.
  • Future research could explore multi‑objective extensions (e.g., latency, memory) and integrate the predictor into gradient‑based NAS methods for hybrid search strategies.

Authors

  • Xian‑Rong Zhang
  • Yue‑Jiao Gong
  • Wei‑Neng Chen
  • Jun Zhang

Paper Information

  • arXiv ID: 2512.20112v1
  • Categories: cs.NE, cs.AI
  • Published: December 23, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »