[Paper] PhysicsAgentABM: Physics-Guided Generative Agent-Based Modeling
Source: arXiv - 2602.06030v1
Overview
PhysicsAgentABM presents a fresh way to combine the interpretability of classical agent‑based models (ABMs) with the expressive power of large language model (LLM) agents. By moving the heavy inference work from individual entities to behaviorally coherent clusters, the authors achieve a simulation that is both scalable and well‑calibrated—a long‑standing pain point for developers building large‑scale, time‑aligned simulations in domains like public health, finance, and social science.
Key Contributions
- Cluster‑centric inference: Introduces symbolic “state‑specialized” agents that encode mechanistic priors, letting the system reason about groups of agents instead of every single entity.
- Neuro‑symbolic transition model: A multimodal neural network learns temporal and interaction dynamics, while symbolic priors inject domain knowledge, yielding calibrated transition distributions.
- Epistemic fusion layer: Merges neural predictions with symbolic priors in an uncertainty‑aware manner, improving confidence estimates for each cluster’s next state.
- ANCHOR clustering algorithm: An LLM‑driven, contrastive‑loss based method that groups agents using cross‑contextual behavioral responses, cutting LLM API calls by 6–8×.
- Broad empirical validation: Demonstrates superior event‑time accuracy and calibration on public‑health outbreak modeling, financial market simulations, and social‑behavior studies compared to pure mechanistic, pure neural, and pure LLM baselines.
Methodology
- Define Symbolic Cluster Agents – For each high‑level state (e.g., “susceptible”, “infected”, “trader‑bullish”), a lightweight symbolic agent stores a mechanistic transition prior (think of a simple rule‑based probability table).
- Learn a Multimodal Neural Transition Model – A neural network ingests time‑series data, interaction graphs, and any available textual/contextual cues to predict how clusters evolve over the next timestep.
- Epistemic Fusion – The neural output and symbolic prior are combined using a Bayesian‑style fusion that accounts for each source’s uncertainty, producing a calibrated probability distribution for the cluster’s next state.
- Stochastic Realization at the Individual Level – Once the cluster‑level distribution is set, each individual agent samples its own transition while respecting local constraints (e.g., capacity limits, resource availability).
- ANCHOR Clustering – An LLM is queried once per clustering round to generate behavioral descriptors for agents. A contrastive loss aligns agents with similar descriptors, forming clusters without repeatedly calling the LLM for every timestep.
The overall pipeline decouples population‑level inference (expensive but done sparsely) from entity‑level variability (cheap stochastic sampling), dramatically lowering compute costs.
Results & Findings
| Domain | Baseline (LLM‑only) | Baseline (Neural) | PhysicsAgentABM | Calibration (Brier score) |
|---|---|---|---|---|
| Epidemic spread (COVID‑19) | 0.71 RAE | 0.68 RAE | 0.54 RAE | 0.12 (vs. 0.27) |
| Stock‑price shock simulation | 0.63 RAE | 0.59 RAE | 0.48 RAE | 0.15 (vs. 0.31) |
| Social protest diffusion | 0.68 RAE | 0.64 RAE | 0.51 RAE | 0.13 (vs. 0.28) |
- Event‑time accuracy (RAE = Relative Absolute Error) improves by ~15–20% over the strongest baselines.
- Calibration (lower Brier score) shows the model’s probability estimates are far more reliable, a crucial factor for decision‑making systems.
- LLM call reduction: ANCHOR cuts the number of expensive LLM API calls by 6–8× without sacrificing clustering quality.
Practical Implications
- Scalable Simulations – Developers can now run large‑scale ABMs (e.g., city‑wide disease models or market‑wide trader simulations) on modest cloud resources because the heavy LLM inference is limited to cluster updates.
- Better Decision Support – Calibrated probabilities mean risk‑aware systems (e.g., public‑health dashboards, automated trading bots) can trust the model’s confidence levels, leading to more robust alerts and actions.
- Rapid Prototyping – The ANCHOR workflow lets product teams experiment with LLM‑enhanced behavior definitions without incurring prohibitive costs, accelerating the iteration cycle for policy simulations or “what‑if” analyses.
- Hybrid AI Architecture – The neuro‑symbolic fusion pattern can be transplanted into other domains (robotics, IoT, game AI) where domain rules coexist with data‑driven dynamics.
Limitations & Future Work
- Cluster Granularity Trade‑off – Over‑aggressive clustering may hide important heterogeneity; the paper notes a need for adaptive granularity mechanisms.
- Domain Knowledge Dependency – Symbolic priors require expert input; automating prior extraction remains an open challenge.
- Real‑World Deployment Tests – Experiments are limited to benchmark datasets; field trials (e.g., live epidemic forecasting) are suggested for future validation.
- Extending to Continuous State Spaces – Current formulation focuses on discrete states; extending the fusion framework to continuous dynamics is a promising direction.
Authors
- Kavana Venkatesh
- Yinhan He
- Jundong Li
- Jiaming Cui
Paper Information
- arXiv ID: 2602.06030v1
- Categories: cs.MA, cs.LG
- Published: February 5, 2026
- PDF: Download PDF