[Paper] Cold-Start Personalization via Training-Free Priors from Structured World Models
Source: arXiv - 2602.15012v1
Overview
Cold‑start personalization tackles the classic “no data” problem: how can a system quickly learn what a new user cares about when it has no prior interaction history? The paper “Cold‑Start Personalization via Training‑Free Priors from Structured World Models” introduces Pep (Preference Elicitation with Priors), a lightweight framework that first learns the hidden structure of preference data offline and then uses Bayesian inference at run‑time to ask the right questions and predict the full preference profile—without any costly reinforcement‑learning (RL) training.
Key Contributions
- Structured World Model: Learns a probabilistic graph of how different preference dimensions (e.g., medical symptoms, math problem‑solving styles) correlate from a dataset of complete user profiles.
- Training‑Free Online Inference: Uses the learned model as a prior and updates a simple belief state with each user answer, selecting the next most informative question on the fly.
- Modular Design: Pep works as a plug‑in front‑end for any downstream decision‑making system (e.g., recommendation engine, dialogue agent).
- Parameter Efficiency: Achieves comparable or better performance with ~10 K parameters versus billions required by RL‑based elicitation policies.
- Empirical Gains: Demonstrates 80.8 % alignment with user‑stated preferences (vs. 68.5 % for RL) while needing 3–5× fewer interaction turns across four diverse domains.
Methodology
-
Offline Phase – Learning the Prior
- Collect a corpus of complete preference profiles (e.g., a survey where users answer every possible question).
- Fit a factor graph / Bayesian network that captures conditional dependencies among preference dimensions (e.g., “interest in algebra” ↔ “preference for symbolic reasoning”).
- The resulting model provides a joint probability distribution (P(\mathbf{p})) over all preference variables (\mathbf{p}).
-
Online Phase – Bayesian Elicitation
- Initialize a belief (B_0 = P(\mathbf{p})).
- Select Question: For each candidate question (q), compute the expected information gain (or reduction in posterior entropy) given the current belief. Pick the highest‑gain question.
- Observe Answer: Update the belief via Bayes’ rule: (B_{t+1} \propto B_t \cdot P(\text{answer} \mid q, \mathbf{p})).
- Iterate until a budget of turns is exhausted or the belief confidence exceeds a threshold.
- Predict Full Profile: Use the final posterior to infer unasked dimensions, feeding them to any downstream solver (e.g., a medical diagnosis model).
The entire online loop is training‑free: it only requires evaluating simple probability updates, making it fast and interpretable.
Results & Findings
| Domain | Interaction Budget | Alignment (Pep) | Alignment (RL) | Turns Saved |
|---|---|---|---|---|
| Medical reasoning | 5 questions | 80.8 % | 68.5 % | 3–5× |
| Mathematical problem‑solving | 4 | 78.2 % | 65.1 % | 4× |
| Social preference surveys | 6 | 81.5 % | 70.3 % | 3× |
| Commonsense reasoning | 5 | 79.9 % | 66.7 % | 5× |
- Adaptive Follow‑ups: When two users answer the same question differently, Pep changes its next question 39–62 % of the time, compared to 0–28 % for RL policies that tend to follow a static script.
- Parameter Footprint: Pep’s belief model uses ~10 K parameters, while the RL baselines require up to 8 B parameters to achieve comparable performance.
- Robustness: Even with noisy or contradictory answers, the Bayesian update gracefully de‑weights unlikely hypotheses, preserving overall alignment.
Practical Implications
- Rapid Onboarding: SaaS platforms can personalize UI layouts, feature toggles, or content recommendations after only a handful of user prompts, reducing churn caused by generic experiences.
- Low‑Compute Edge Deployment: Because Pep’s online inference is lightweight, it can run on mobile devices or embedded systems where GPU‑heavy RL policies are infeasible.
- Explainable Personalization: The structured world model makes it easy to surface why a particular question was asked (high expected information gain), improving user trust.
- Domain‑Agnostic Plug‑In: Any system that already has a historical dataset of full profiles (e.g., e‑learning platforms, health apps) can train the prior once and reuse it across millions of new users.
- Cost Savings: Fewer interaction turns translate directly into reduced support tickets, lower data‑collection costs, and faster A/B testing cycles.
Limitations & Future Work
- Dependence on Complete Profiles: Pep requires an initial corpus of fully answered preference surveys; domains lacking such data may need synthetic generation or transfer learning.
- Static Prior Assumption: The offline world model is fixed during deployment; it may become stale as user preferences evolve or new dimensions emerge.
- Scalability of the Prior: While the online inference is cheap, learning a high‑dimensional factor graph can become computationally intensive for thousands of preference variables.
- Future Directions:
- Incremental updating of the world model with streaming user data.
- Hybrid approaches that combine Pep’s priors with lightweight reinforcement signals for truly novel preference dimensions.
- Exploration of richer belief representations (e.g., neural variational inference) to capture non‑linear dependencies without exploding parameter counts.
Authors
- Avinandan Bose
- Shuyue Stella Li
- Faeze Brahman
- Pang Wei Koh
- Simon Shaolei Du
- Yulia Tsvetkov
- Maryam Fazel
- Lin Xiao
- Asli Celikyilmaz
Paper Information
- arXiv ID: 2602.15012v1
- Categories: cs.CL, cs.AI, cs.LG
- Published: February 16, 2026
- PDF: Download PDF