[Paper] Neural-Symbolic Integration with Evolvable Policies

Published: (January 8, 2026 at 05:29 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2601.04799v1

Overview

The paper introduces a new Neural‑Symbolic (NeSy) framework that lets both a neural network and a symbolic policy evolve together—without needing any pre‑written rules or differentiable policies. By treating each NeSy system as an “organism” that mutates and competes for fitness, the authors show how to discover interpretable, non‑differentiable policies from scratch, opening the door to AI solutions in domains where expert knowledge is scarce.

Key Contributions

  • Evolvable NeSy architecture: Extends the NEUROLOG system so that symbolic policies become mutable, evolvable entities.
  • Evolutionary learning loop: Applies Valiant’s evolvability theory to jointly evolve symbolic rule sets and neural‑network weights.
  • Differentiability‑free training: Uses abductive reasoning from the symbolic component to train the neural part, removing the need for gradient‑based updates.
  • Machine‑Coaching semantics: Introduces a lightweight, mutable representation for symbolic rules that can be incrementally refined during evolution.
  • Empirical validation: Demonstrates that populations initialized with empty policies and random weights converge to hidden, non‑differentiable target policies with median accuracies near 100 %.

Methodology

  1. Population encoding – Each individual in the evolutionary population consists of:
    • A symbolic policy (a set of logical rules) that can be empty or grow over time.
    • A neural network whose weights are free parameters.
  2. Mutation operators – Two kinds of mutations are applied:
    • Symbolic mutation: randomly add, delete, or modify a rule.
    • Neural mutation: perturb network weights (e.g., Gaussian noise).
  3. Fitness evaluation – For a given task, the system receives inputs, the symbolic part proposes a decision, and the neural part supplies perceptual features. The combined output is compared against a hidden target policy; the match score becomes the fitness.
  4. Selection & reproduction – Standard evolutionary strategies (e.g., tournament selection) pick higher‑fitness individuals to spawn the next generation.
  5. Training the neural component – Instead of back‑propagation, the network is trained via abductive reasoning: the symbolic layer explains observed outcomes, and the network adjusts to better support those explanations. This sidesteps any requirement that the policy be differentiable.

Results & Findings

  • Convergence speed: Across multiple benchmark tasks, populations typically reached > 90 % correct performance within 200–500 generations.
  • Policy complexity: Starting from an empty rule set, the evolved policies grew to a modest size (average 5–12 rules) yet captured the full behavior of the hidden target.
  • Robustness: The approach handled non‑differentiable target policies (e.g., discrete decision trees) that traditional gradient‑based NeSy methods cannot learn.
  • Ablation: Removing either symbolic or neural mutation dramatically reduced performance, confirming that co‑evolution is essential.

Practical Implications

  • Rapid prototyping in low‑knowledge domains – Developers can deploy NeSy agents without hand‑crafting rule bases, letting the system discover interpretable policies automatically.
  • Explainable AI for safety‑critical systems – Since the final policy is symbolic, engineers can audit and modify it post‑hoc, satisfying regulatory or compliance requirements.
  • Edge‑friendly inference – The symbolic component can be compiled into lightweight rule engines, while the neural part can be quantized, enabling hybrid models on constrained devices.
  • Integration with existing pipelines – The evolutionary loop can be wrapped around any off‑the‑shelf neural architecture (CNNs, Transformers) and any logical language (Prolog‑style Horn clauses, Datalog), making adoption straightforward for ML engineers.

Limitations & Future Work

  • Scalability – Evolutionary search can become costly for very high‑dimensional neural nets or large rule vocabularies; the paper notes the need for smarter mutation heuristics or hybrid gradient‑evolution strategies.
  • Fitness design – The current fitness function assumes access to a hidden target policy; real‑world scenarios may require surrogate objectives (e.g., reward signals) that are noisier.
  • Rule expressiveness – Experiments used relatively simple propositional rules; extending to richer first‑order logic or temporal reasoning remains an open challenge.
  • Benchmark breadth – Future work should test the framework on larger, industry‑scale datasets (e.g., autonomous driving perception‑decision loops) to validate practical viability.

Authors

  • Marios Thoma
  • Vassilis Vassiliades
  • Loizos Michael

Paper Information

  • arXiv ID: 2601.04799v1
  • Categories: cs.LG, cs.NE
  • Published: January 8, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »