[Paper] Think like a Scientist: Physics-guided LLM Agent for Equation Discovery

Published: 3 days ago (February 12, 2026 at 01:49 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.12259v1

Overview

The paper presents KeplerAgent, a physics‑guided AI agent that mimics how scientists discover equations: first uncovering hidden physical properties (e.g., symmetries, conservation laws) and then using that insight to steer symbolic regression toward the correct formula. By coupling large language models (LLMs) with domain‑specific tools, the authors achieve markedly better equation‑discovery performance—especially when data are noisy—than pure‑LLM or classic regression approaches.

Key Contributions

Agentic reasoning pipeline that separates structure inference (symmetry, dimensional analysis, invariants) from symbolic regression.
Integration of LLMs with physics‑based toolkits (e.g., dimensional analysis libraries, invariance detectors) to generate priors for downstream regression engines.
Dynamic configuration of symbolic regression back‑ends (PySINDy, PySR) – the agent automatically selects function libraries and imposes structural constraints based on the inferred physics.
Comprehensive benchmark suite covering classic mechanics, thermodynamics, and electromagnetism problems, showing large gains in symbolic accuracy and noise robustness.
Open‑source implementation that demonstrates how to orchestrate LLM calls, tool execution, and regression in a reproducible workflow.

Methodology

Problem Setup – Given a dataset of input variables (X) and observed outputs (y), the goal is to recover an interpretable symbolic expression (f(X)).
Scientific Reasoning Loop
- LLM Prompting – The LLM is asked to hypothesize physical properties (e.g., “Is the system invariant under rotation?”).
- Physics‑Based Tools – Specialized modules (dimensional analysis, symmetry detectors, conserved‑quantity calculators) verify or refine these hypotheses, producing concrete constraints such as “the equation must be homogeneous of degree 2 in length”.
- Constraint Synthesis – The agent translates the constraints into a configuration for a symbolic regression engine: selecting candidate functions (e.g., sin, cos, polynomial terms) and adding algebraic restrictions (e.g., no odd powers).
Symbolic Regression – The configured engine (PySINDy or PySR) searches the constrained space for the best‑fitting symbolic model, using standard sparsity‑promoting or evolutionary strategies.
Iterative Refinement – If the resulting expression fails validation (e.g., violates a discovered invariant), the loop repeats, allowing the LLM to propose alternative hypotheses.

The entire pipeline is orchestrated by a lightweight “agent” that tracks state, decides when to call tools, and aggregates evidence before committing to a final equation.

Results & Findings

Benchmark	Baseline (LLM‑only)	Traditional SR (PySINDy)	KeplerAgent	Noise Level (σ)
Simple harmonic oscillator	62 % exact	78 % exact	94 % exact	0.01
Pendulum (large‑angle)	48 % exact	71 % exact	88 % exact	0.05
Heat diffusion	55 % exact	69 % exact	90 % exact	0.10
Maxwell‑type system	41 % exact	63 % exact	85 % exact	0.08

Symbolic accuracy (percentage of runs recovering the ground‑truth formula) improves by 15‑30 % over the strongest non‑agent baselines.
Noise robustness: performance degrades gracefully; KeplerAgent maintains >80 % accuracy even when Gaussian noise amplitude is doubled, whereas baselines drop below 50 %.
Search efficiency: By pruning the candidate space, the regression step converges 2‑3× faster, reducing compute time from several minutes to under a minute on a single CPU core.

Practical Implications

Accelerated scientific modeling – Engineers can feed experimental data into KeplerAgent to obtain first‑principles‑like models without hand‑crafting feature libraries.
Embedded diagnostics – In control systems (e.g., robotics, aerospace), the agent can continuously infer governing dynamics from sensor streams, enabling adaptive controllers that respect physical constraints.
Reduced data‑hunger – Because the agent leverages physics priors, it needs fewer samples to converge, which is valuable in domains where data collection is expensive (e.g., material testing, biomedical experiments).
Tool‑chain extensibility – The modular design lets teams plug in domain‑specific analyzers (e.g., thermodynamic potentials, quantum symmetries), making the approach reusable across disciplines.
Explainability for AI‑augmented products – The symbolic output is human‑readable, facilitating regulatory compliance and stakeholder trust in AI‑driven decision systems.

Limitations & Future Work

Dependence on LLM quality – The agent’s initial hypotheses are only as good as the underlying LLM; mis‑identified symmetries can misguide the regression step.
Scalability to high‑dimensional systems – Current experiments focus on ≤5 variables; extending to large‑scale PDE discovery will require more sophisticated constraint propagation.
Tool integration overhead – Adding new physics modules entails custom wrappers; a standardized API for “physics‑as‑a‑service” would streamline adoption.
Future directions include (1) training domain‑adapted LLMs that are explicitly aware of physical units, (2) incorporating Bayesian uncertainty quantification into the constraint loop, and (3) applying the framework to real‑world industrial datasets (e.g., fluid‑flow diagnostics, battery degradation modeling).

Authors

Jianke Yang
Ohm Venkatachalam
Mohammad Kianezhad
Sharvaree Vadgama
Rose Yu

Paper Information

arXiv ID: 2602.12259v1
Categories: cs.AI, cs.LG
Published: February 12, 2026
PDF: Download PDF

[Paper] Think like a Scientist: Physics-guided LLM Agent for Equation Discovery

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

[Paper] Agentic Test-Time Scaling for WebAgents