[Paper] SymPlex: A Structure-Aware Transformer for Symbolic PDE Solving

Published: (February 3, 2026 at 01:18 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.03816v1

Overview

The paper introduces SymPlex, a reinforcement‑learning system that can automatically discover exact symbolic formulas for solutions of partial differential equations (PDEs). By treating the search for a formula as a tree‑structured decision problem and using a novel structure‑aware Transformer (SymFormer), the method works directly in the space of mathematical expressions—producing human‑readable, interpretable solutions without ever seeing a ground‑truth answer.

Key Contributions

  • SymPlex framework: Casts symbolic PDE solving as a reinforcement‑learning problem that optimizes candidate expressions using only the PDE and its boundary conditions.
  • SymFormer architecture: A Transformer variant that respects the hierarchical tree structure of mathematical expressions via tree‑relative self‑attention and guarantees syntactically valid outputs through grammar‑constrained autoregressive decoding.
  • Structure‑aware generation: Moves beyond linear token sequences (e.g., standard language models) to directly model the nested, tree‑like nature of symbolic math, improving expressivity and correctness.
  • Exact recovery of non‑smooth and parametric solutions: Demonstrates that the system can discover closed‑form solutions that include piecewise definitions, absolute values, and explicit parameter dependencies—cases where numerical or implicit neural solvers struggle.
  • Empirical validation: Shows that SymPlex matches or exceeds prior symbolic regression baselines on a suite of benchmark PDEs, achieving 100 % exact recovery on several challenging examples.

Methodology

  1. Problem formulation – The goal is to find a symbolic expression (u(x)) that satisfies a given PDE (\mathcal{L}[u]=0) together with boundary/initial conditions. No training data (i.e., known solutions) are provided.
  2. Tree‑structured action space – Each candidate solution is represented as a syntax tree (operators as internal nodes, variables/constants as leaves). The RL agent incrementally builds this tree, choosing the next node based on the current partial structure.
  3. SymFormer encoder‑decoder
    • Encoder processes the PDE description (operators, variables, boundary terms) using a standard Transformer.
    • Decoder generates the solution tree. It uses tree‑relative self‑attention: attention scores are computed relative to the parent, sibling, and ancestor positions, preserving the hierarchical dependencies of math expressions.
    • Grammar constraints enforce that only syntactically valid tokens can be selected at each step (e.g., a binary operator must be followed by two sub‑expressions).
  4. Reward signal – After a full expression is generated, the system evaluates it on a set of collocation points sampled from the domain. The reward combines:
    • PDE residual loss (how well the expression satisfies the differential equation),
    • Boundary loss (how well it meets the prescribed conditions), and
    • Complexity penalty (favoring simpler formulas).
  5. Training loop – Policy gradient (REINFORCE) updates the decoder parameters to maximize expected reward, while the encoder is jointly fine‑tuned to better condition the decoder on the PDE context.

Results & Findings

Benchmark PDEExact symbolic recovery?Notable features recovered
1‑D Burgers (viscous)Piecewise linear shock, explicit viscosity parameter
2‑D Laplace with Dirichlet BCHarmonic polynomial with parametric coefficients
Heat equation with time‑dependent BCSeries solution with explicit time factor
Non‑smooth Poisson (xterm)
  • Zero‑error solutions: For all tested equations, SymPlex produced expressions that evaluate to machine‑precision zero residual on unseen points.
  • Interpretability: The recovered formulas are concise and directly usable in downstream analytical work (e.g., stability analysis).
  • Comparison to baselines: Traditional symbolic regression (e.g., Eureqa, Deep Symbolic Regression) failed on non‑smooth or parametric cases, while SymPlex succeeded consistently.
  • Ablation: Removing tree‑relative attention or grammar constraints caused a >30 % drop in exact recovery rates, confirming their importance.

Practical Implications

  • Rapid prototyping of analytical models: Engineers can feed a PDE description and let SymPlex suggest closed‑form solutions, accelerating the design of control laws, material models, or fluid dynamics approximations.
  • Explainable AI for scientific computing: Unlike black‑box neural PDE solvers that output discretized fields, SymPlex yields formulas that can be inspected, differentiated, and embedded into larger symbolic pipelines (e.g., symbolic optimization, theorem proving).
  • Parameter‑sensitive design: Because the output retains explicit dependence on physical parameters, developers can perform sensitivity analysis or embed the solution directly into simulation code without re‑training.
  • Integration with existing toolchains: The generated expressions are standard mathematical syntax, making them compatible with CAS (Mathematica, SymPy) and automatic code generators for high‑performance computing.
  • Educational use: Students and researchers can use SymPlex as a “symbolic assistant” to verify hand‑derived solutions or explore alternative forms.

Limitations & Future Work

  • Scalability to high‑dimensional PDEs: Current experiments are limited to 1‑D/2‑D problems; extending to 3‑D or systems with many coupled fields will require more efficient tree search and possibly hierarchical decomposition.
  • Reward evaluation cost: Computing PDE residuals for complex expressions can be expensive; smarter surrogate rewards or adaptive sampling could reduce overhead.
  • Grammar expressiveness: The predefined grammar restricts the operator set (e.g., no special functions like Bessel or hypergeometric). Future work could learn or expand grammars dynamically.
  • Generalization across PDE families: While the encoder learns to condition on a single PDE, transferring knowledge to entirely new equation families remains an open challenge.
  • Robustness to noisy or approximate boundary data: Real‑world scenarios often involve measurement noise; incorporating uncertainty handling is a promising direction.

Authors

  • Yesom Park
  • Annie C. Lu
  • Shao‑Ching Huang
  • Qiyang Hu
  • Y. Sungtaek Ju
  • Stanley Osher

Paper Information

  • arXiv ID: 2602.03816v1
  • Categories: cs.LG
  • Published: February 3, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »