[Paper] Physio-DPO: Aligning Large Language Models with the Protein Energy Landscape to Eliminate Structural Hallucinations
Source: arXiv - 2601.00647v1
Overview
The paper introduces Physio‑DPO, a new alignment technique that teaches large protein language models (PLMs) to respect the underlying physics of protein folding. By directly incorporating thermodynamic stability into the training objective, the method dramatically cuts down “structural hallucinations” – sequences that look plausible in language space but would collapse into unstable shapes in reality.
Key Contributions
- Physics‑aware alignment: Extends Direct Preference Optimization (DPO) with a magnitude‑aware loss that scales updates based on the energy gap between a native fold and a physics‑perturbed hard negative.
- Hard‑negative generation pipeline: Uses fast, differentiable energy calculators (e.g., Rosetta, AlphaFold‑lite) to synthesize destabilizing variants that serve as realistic counter‑examples during training.
- Empirical superiority: Beats strong baselines (SFT, PPO, vanilla DPO) on multiple metrics—self‑consistency RMSD drops to 1.28 Å, and the proportion of designs that fold correctly rises to 92.8 %.
- Interpretability gains: Qualitative analyses reveal restored biophysical patterns such as hydrophobic core packing and coherent hydrogen‑bond networks, directly linking model outputs to known protein chemistry.
Methodology
- Base PLM – Starts from a pretrained protein language model (e.g., ESM‑2) that already captures sequence‑level statistics.
- Energy‑based hard negatives – For each training sequence, the authors generate a perturbed version by applying small random mutations and then scoring both the original and perturbed sequences with a fast energy estimator. The perturbed sequence that yields the largest energy increase (i.e., most unstable) becomes the hard negative.
- Magnitude‑aware DPO loss – Traditional DPO treats preference as a binary label (preferred vs. not). Physio‑DPO augments this by weighting the loss with the energy gap ΔE = E(negative) – E(native). Larger gaps produce stronger gradient signals, encouraging the model to push unstable sequences farther away in probability space.
- Training loop – The model is fine‑tuned using standard Adam optimizers. The loss combines the usual cross‑entropy term (to retain language fluency) with the physics‑aware DPO term, ensuring a balance between linguistic plausibility and thermodynamic realism.
Results & Findings
| Metric | SFT | PPO | DPO (vanilla) | Physio‑DPO |
|---|---|---|---|---|
| Self‑consistency RMSD (Å) | 2.34 | 2.01 | 1.71 | 1.28 |
| Foldability (percent of designs with high pLDDT) | 78.4 % | 81.2 % | 86.5 % | 92.8 % |
| Average ΔE (kcal/mol) improvement | – | – | +1.9 | +3.6 |
- Reduced hallucinations: The RMSD drop indicates that generated sequences now adopt conformations much closer to the intended target structures.
- Higher stability: The increase in ΔE shows that the model learns to assign lower probabilities to energetically unfavorable variants.
- Biophysical fidelity: Visual inspection of top‑ranked designs reveals well‑packed hydrophobic cores and realistic hydrogen‑bond patterns that were missing in baseline outputs.
Practical Implications
- More reliable generative design pipelines – Engineers can feed Physio‑DPO‑tuned models directly into downstream structure prediction tools (AlphaFold, RoseTTAFold) with far fewer wasted candidates, cutting compute costs.
- Accelerated therapeutic protein engineering – By ensuring thermodynamic viability early, the method speeds up the iteration loop for antibody affinity maturation, enzyme redesign, and de‑novo scaffold creation.
- Integration with existing DevOps – The magnitude‑aware loss is a drop‑in replacement for standard DPO in libraries like 🤗 Transformers, meaning teams can adopt it without rewriting data pipelines.
- Safety & interpretability – Aligning model outputs with physical laws reduces the risk of generating toxic or aggregation‑prone sequences, a key concern for biotech deployments.
Limitations & Future Work
- Energy estimator fidelity – The current pipeline relies on fast approximations; higher‑accuracy physics engines could further improve alignment but at a higher computational cost.
- Scalability to ultra‑large PLMs – Experiments were performed on models up to ~1 B parameters; extending to >10 B‑parameter PLMs may require gradient‑checkpointing or distributed training tricks.
- Generalization to non‑globular proteins – The study focuses on soluble, globular folds. Membrane proteins, intrinsically disordered regions, and multi‑domain assemblies remain challenging.
- Future directions – The authors suggest coupling Physio‑DPO with reinforcement learning for multi‑objective optimization (e.g., activity + stability) and exploring curriculum learning where the hardness of negatives gradually increases.
Authors
- QiWei Meng
Paper Information
- arXiv ID: 2601.00647v1
- Categories: cs.CL, cs.CE, q-bio.QM
- Published: January 2, 2026
- PDF: Download PDF