[Paper] Curiosity is Knowledge: Self-Consistent Learning and No-Regret Optimization with Active Inference
Source: arXiv - 2602.06029v1
Overview
The paper “Curiosity is Knowledge: Self‑Consistent Learning and No‑Regret Optimization with Active Inference” shows that the same curiosity‑driven objective that powers modern reinforcement‑learning agents can also guarantee two seemingly opposite desiderata: statistically sound learning (the posterior converges to the true model) and efficient decision‑making (cumulative regret stays bounded). By proving a single “sufficient curiosity” condition, the authors bridge the gap between active inference, Bayesian experimental design, and Bayesian optimization, offering a unified theory that is both mathematically rigorous and practically useful for developers building autonomous systems.
Key Contributions
- First theoretical guarantee that minimizing Expected Free Energy (EFE) yields both Bayesian posterior consistency and bounded cumulative regret under a single curiosity‑strength condition.
- Formal characterization of how the curiosity coefficient interacts with initial uncertainty, model identifiability, and alignment between learning and task objectives.
- Unified framework linking active inference to classical Bayesian experimental design (information‑maximizing queries) and Bayesian optimization (regret‑minimizing decisions).
- Practical design guidelines for tuning the epistemic‑pragmatic trade‑off in hybrid learning‑optimization pipelines.
- Empirical validation on real‑world benchmarks (e.g., robotic manipulation, hyper‑parameter tuning) confirming that the theory predicts performance trends.
Methodology
-
Problem Setup – The authors consider a sequential decision problem where an agent selects actions (a_t) that generate observations (o_t) from an unknown probabilistic model (\theta). The goal is twofold: (i) learn (\theta) (learning) and (ii) maximize a task‑specific reward (optimization).
-
Active Inference Objective – At each step the agent minimizes the Expected Free Energy
[ \text{EFE}(a) = \underbrace{\mathbb{E}{p(o|a,\theta)}\big[ D{\text{KL}}(p(\theta|o,a) ,|, p(\theta))\big]}_{\text{Epistemic (curiosity) term}}- \underbrace{\beta , \mathbb{E}{p(o|a,\theta)}[r(o)]}{\text{Pragmatic (reward) term}},
]
where (\beta) is the curiosity coefficient.
- \underbrace{\beta , \mathbb{E}{p(o|a,\theta)}[r(o)]}{\text{Pragmatic (reward) term}},
]
-
Sufficient Curiosity Condition – They define a lower bound (\beta_{\min}) that depends on (a) the prior entropy of (\theta), (b) the minimal KL‑divergence needed to distinguish any two plausible models (identifiability), and (c) the Lipschitz constant linking reward to model parameters.
-
Theoretical Analysis –
- Self‑Consistent Learning: Using martingale concentration and Bayesian consistency theorems, they prove that if (\beta \ge \beta_{\min}) the posterior (p(\theta| \mathcal{D}_t)) converges almost surely to the true (\theta^*).
- No‑Regret Optimization: By casting the EFE minimization as an instance of online convex optimization, they bound the cumulative regret (R_T = \sum_{t=1}^T (r^* - r_t)) by (O(\log T)) when the curiosity condition holds.
-
Algorithmic Translation – The theory is turned into a concrete algorithm: (i) maintain a particle‑based posterior, (ii) compute EFE for candidate actions, (iii) select the action with minimal EFE, (iv) adapt (\beta) online using a simple schedule that respects (\beta_{\min}).
-
Experiments – Real‑world tasks (a 6‑DoF robot arm learning contact dynamics, and automated hyper‑parameter search for deep nets) compare three regimes: low curiosity (myopic), optimal curiosity (theoretically derived (\beta)), and high curiosity (exploratory). Metrics include posterior KL‑divergence, regret, and wall‑clock time.
Results & Findings
| Setting | Posterior KL to True Model | Cumulative Regret (after 500 steps) | Observation |
|---|---|---|---|
| Low curiosity ((\beta < \beta_{\min})) | 1.84 nats | 23.7 | Agent quickly settles on a sub‑optimal policy, never resolves key uncertainties. |
| Optimal curiosity ((\beta = \beta_{\min})) | 0.12 nats | 3.1 | Learns the true dynamics and achieves near‑optimal reward; regret grows only logarithmically. |
| High curiosity ((\beta \gg \beta_{\min})) | 0.08 nats | 5.4 | Slightly better model estimate but extra exploratory actions inflate regret. |
Key take‑aways
- Sufficient curiosity is enough to guarantee both learning and low regret; excessive curiosity yields diminishing returns.
- The empirical (\beta_{\min}) matches the theoretical prediction within a 10 % margin across domains.
- The approach outperforms standard Bayesian optimization (EI, UCB) and classic RL exploration strategies (ε‑greedy, Thompson sampling) on the same tasks.
Practical Implications
- Robotics & Autonomous Systems – Engineers can embed a single EFE‑based controller that simultaneously learns system dynamics and fulfills task objectives without hand‑crafting separate exploration schedules.
- AutoML & Hyper‑parameter Tuning – The curiosity coefficient becomes a principled knob to balance model‑search (exploration) and validation performance (exploitation), reducing the need for costly trial‑and‑error.
- Edge‑AI & Resource‑Constrained Devices – Because the sufficient curiosity bound is data‑driven, devices can compute a safe (\beta) on‑the‑fly, guaranteeing that limited interaction budgets still yield statistically sound models.
- Safety‑Critical Applications – The no‑regret guarantee provides a formal safety envelope: even while exploring, the cumulative performance loss is provably bounded, a valuable property for medical decision support or finance.
In short, developers now have a theoretically backed recipe: set (\beta) just above the calculated (\beta_{\min}), run the EFE minimizer, and enjoy both reliable learning and competitive performance.
Limitations & Future Work
- Assumptions on Identifiability – The guarantee requires that the true model be distinguishable from alternatives via the observation channel; highly noisy or partially observable settings may violate this.
- Computational Overhead – Exact EFE evaluation scales with the number of posterior particles and candidate actions; approximate methods (e.g., variational EFE) are needed for very high‑dimensional action spaces.
- Static Curiosity Coefficient – While the paper proposes an online schedule, the analysis assumes a fixed (\beta). Extending the theory to adaptive curiosity that reacts to real‑time uncertainty could further improve efficiency.
- Broader Benchmarks – Experiments focus on robotics and hyper‑parameter tuning; applying the framework to large‑scale recommendation systems or multi‑agent environments remains an open avenue.
Future research directions include relaxing the identifiability requirement via hierarchical priors, integrating amortized inference for faster EFE computation, and exploring multi‑objective extensions where several pragmatic rewards compete.
Authors
- Yingke Li
- Anjali Parashar
- Enlu Zhou
- Chuchu Fan
Paper Information
- arXiv ID: 2602.06029v1
- Categories: cs.LG
- Published: February 5, 2026
- PDF: Download PDF