[Paper] SymPlex: A Structure-Aware Transformer for Symbolic PDE Solving
Source: arXiv - 2602.03816v1
Overview
The paper introduces SymPlex, a reinforcement‑learning system that can automatically discover exact symbolic formulas for solutions of partial differential equations (PDEs). By treating the search for a formula as a tree‑structured decision problem and using a novel structure‑aware Transformer (SymFormer), the method works directly in the space of mathematical expressions—producing human‑readable, interpretable solutions without ever seeing a ground‑truth answer.
Key Contributions
- SymPlex framework: Casts symbolic PDE solving as a reinforcement‑learning problem that optimizes candidate expressions using only the PDE and its boundary conditions.
- SymFormer architecture: A Transformer variant that respects the hierarchical tree structure of mathematical expressions via tree‑relative self‑attention and guarantees syntactically valid outputs through grammar‑constrained autoregressive decoding.
- Structure‑aware generation: Moves beyond linear token sequences (e.g., standard language models) to directly model the nested, tree‑like nature of symbolic math, improving expressivity and correctness.
- Exact recovery of non‑smooth and parametric solutions: Demonstrates that the system can discover closed‑form solutions that include piecewise definitions, absolute values, and explicit parameter dependencies—cases where numerical or implicit neural solvers struggle.
- Empirical validation: Shows that SymPlex matches or exceeds prior symbolic regression baselines on a suite of benchmark PDEs, achieving 100 % exact recovery on several challenging examples.
Methodology
- Problem formulation – The goal is to find a symbolic expression (u(x)) that satisfies a given PDE (\mathcal{L}[u]=0) together with boundary/initial conditions. No training data (i.e., known solutions) are provided.
- Tree‑structured action space – Each candidate solution is represented as a syntax tree (operators as internal nodes, variables/constants as leaves). The RL agent incrementally builds this tree, choosing the next node based on the current partial structure.
- SymFormer encoder‑decoder
- Encoder processes the PDE description (operators, variables, boundary terms) using a standard Transformer.
- Decoder generates the solution tree. It uses tree‑relative self‑attention: attention scores are computed relative to the parent, sibling, and ancestor positions, preserving the hierarchical dependencies of math expressions.
- Grammar constraints enforce that only syntactically valid tokens can be selected at each step (e.g., a binary operator must be followed by two sub‑expressions).
- Reward signal – After a full expression is generated, the system evaluates it on a set of collocation points sampled from the domain. The reward combines:
- PDE residual loss (how well the expression satisfies the differential equation),
- Boundary loss (how well it meets the prescribed conditions), and
- Complexity penalty (favoring simpler formulas).
- Training loop – Policy gradient (REINFORCE) updates the decoder parameters to maximize expected reward, while the encoder is jointly fine‑tuned to better condition the decoder on the PDE context.
Results & Findings
| Benchmark PDE | Exact symbolic recovery? | Notable features recovered |
|---|---|---|
| 1‑D Burgers (viscous) | ✅ | Piecewise linear shock, explicit viscosity parameter |
| 2‑D Laplace with Dirichlet BC | ✅ | Harmonic polynomial with parametric coefficients |
| Heat equation with time‑dependent BC | ✅ | Series solution with explicit time factor |
| Non‑smooth Poisson ( | x | term) |
- Zero‑error solutions: For all tested equations, SymPlex produced expressions that evaluate to machine‑precision zero residual on unseen points.
- Interpretability: The recovered formulas are concise and directly usable in downstream analytical work (e.g., stability analysis).
- Comparison to baselines: Traditional symbolic regression (e.g., Eureqa, Deep Symbolic Regression) failed on non‑smooth or parametric cases, while SymPlex succeeded consistently.
- Ablation: Removing tree‑relative attention or grammar constraints caused a >30 % drop in exact recovery rates, confirming their importance.
Practical Implications
- Rapid prototyping of analytical models: Engineers can feed a PDE description and let SymPlex suggest closed‑form solutions, accelerating the design of control laws, material models, or fluid dynamics approximations.
- Explainable AI for scientific computing: Unlike black‑box neural PDE solvers that output discretized fields, SymPlex yields formulas that can be inspected, differentiated, and embedded into larger symbolic pipelines (e.g., symbolic optimization, theorem proving).
- Parameter‑sensitive design: Because the output retains explicit dependence on physical parameters, developers can perform sensitivity analysis or embed the solution directly into simulation code without re‑training.
- Integration with existing toolchains: The generated expressions are standard mathematical syntax, making them compatible with CAS (Mathematica, SymPy) and automatic code generators for high‑performance computing.
- Educational use: Students and researchers can use SymPlex as a “symbolic assistant” to verify hand‑derived solutions or explore alternative forms.
Limitations & Future Work
- Scalability to high‑dimensional PDEs: Current experiments are limited to 1‑D/2‑D problems; extending to 3‑D or systems with many coupled fields will require more efficient tree search and possibly hierarchical decomposition.
- Reward evaluation cost: Computing PDE residuals for complex expressions can be expensive; smarter surrogate rewards or adaptive sampling could reduce overhead.
- Grammar expressiveness: The predefined grammar restricts the operator set (e.g., no special functions like Bessel or hypergeometric). Future work could learn or expand grammars dynamically.
- Generalization across PDE families: While the encoder learns to condition on a single PDE, transferring knowledge to entirely new equation families remains an open challenge.
- Robustness to noisy or approximate boundary data: Real‑world scenarios often involve measurement noise; incorporating uncertainty handling is a promising direction.
Authors
- Yesom Park
- Annie C. Lu
- Shao‑Ching Huang
- Qiyang Hu
- Y. Sungtaek Ju
- Stanley Osher
Paper Information
- arXiv ID: 2602.03816v1
- Categories: cs.LG
- Published: February 3, 2026
- PDF: Download PDF