[Paper] Neural-Symbolic Integration with Evolvable Policies

Published: 1 month ago (January 8, 2026 at 05:29 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2601.04799v1

Overview

The paper introduces a new Neural‑Symbolic (NeSy) framework that lets both a neural network and a symbolic policy evolve together—without needing any pre‑written rules or differentiable policies. By treating each NeSy system as an “organism” that mutates and competes for fitness, the authors show how to discover interpretable, non‑differentiable policies from scratch, opening the door to AI solutions in domains where expert knowledge is scarce.

Key Contributions

Evolvable NeSy architecture: Extends the NEUROLOG system so that symbolic policies become mutable, evolvable entities.
Evolutionary learning loop: Applies Valiant’s evolvability theory to jointly evolve symbolic rule sets and neural‑network weights.
Differentiability‑free training: Uses abductive reasoning from the symbolic component to train the neural part, removing the need for gradient‑based updates.
Machine‑Coaching semantics: Introduces a lightweight, mutable representation for symbolic rules that can be incrementally refined during evolution.
Empirical validation: Demonstrates that populations initialized with empty policies and random weights converge to hidden, non‑differentiable target policies with median accuracies near 100 %.

Methodology

Population encoding – Each individual in the evolutionary population consists of:
- A symbolic policy (a set of logical rules) that can be empty or grow over time.
- A neural network whose weights are free parameters.
Mutation operators – Two kinds of mutations are applied:
- Symbolic mutation: randomly add, delete, or modify a rule.
- Neural mutation: perturb network weights (e.g., Gaussian noise).
Fitness evaluation – For a given task, the system receives inputs, the symbolic part proposes a decision, and the neural part supplies perceptual features. The combined output is compared against a hidden target policy; the match score becomes the fitness.
Selection & reproduction – Standard evolutionary strategies (e.g., tournament selection) pick higher‑fitness individuals to spawn the next generation.
Training the neural component – Instead of back‑propagation, the network is trained via abductive reasoning: the symbolic layer explains observed outcomes, and the network adjusts to better support those explanations. This sidesteps any requirement that the policy be differentiable.

Results & Findings

Convergence speed: Across multiple benchmark tasks, populations typically reached > 90 % correct performance within 200–500 generations.
Policy complexity: Starting from an empty rule set, the evolved policies grew to a modest size (average 5–12 rules) yet captured the full behavior of the hidden target.
Robustness: The approach handled non‑differentiable target policies (e.g., discrete decision trees) that traditional gradient‑based NeSy methods cannot learn.
Ablation: Removing either symbolic or neural mutation dramatically reduced performance, confirming that co‑evolution is essential.

Practical Implications

Rapid prototyping in low‑knowledge domains – Developers can deploy NeSy agents without hand‑crafting rule bases, letting the system discover interpretable policies automatically.
Explainable AI for safety‑critical systems – Since the final policy is symbolic, engineers can audit and modify it post‑hoc, satisfying regulatory or compliance requirements.
Edge‑friendly inference – The symbolic component can be compiled into lightweight rule engines, while the neural part can be quantized, enabling hybrid models on constrained devices.
Integration with existing pipelines – The evolutionary loop can be wrapped around any off‑the‑shelf neural architecture (CNNs, Transformers) and any logical language (Prolog‑style Horn clauses, Datalog), making adoption straightforward for ML engineers.

Limitations & Future Work

Scalability – Evolutionary search can become costly for very high‑dimensional neural nets or large rule vocabularies; the paper notes the need for smarter mutation heuristics or hybrid gradient‑evolution strategies.
Fitness design – The current fitness function assumes access to a hidden target policy; real‑world scenarios may require surrogate objectives (e.g., reward signals) that are noisier.
Rule expressiveness – Experiments used relatively simple propositional rules; extending to richer first‑order logic or temporal reasoning remains an open challenge.
Benchmark breadth – Future work should test the framework on larger, industry‑scale datasets (e.g., autonomous driving perception‑decision loops) to validate practical viability.

Authors

Marios Thoma
Vassilis Vassiliades
Loizos Michael

Paper Information

arXiv ID: 2601.04799v1
Categories: cs.LG, cs.NE
Published: January 8, 2026
PDF: Download PDF

[Paper] Neural-Symbolic Integration with Evolvable Policies

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Manifold limit for the training of shallow graph convolutional neural networks

[Paper] AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

[Paper] LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection

[Paper] Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem