[Paper] Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching
Source: arXiv - 2602.13136v1
Overview
A new paper tackles retrosynthesis—the problem of figuring out how to make a target molecule—from a fresh angle: the order in which atoms are presented to a neural network matters. By deliberately placing the atoms that form the reaction center at the front of the input sequence, the authors turn implicit chemical knowledge into a simple positional cue that modern graph transformers can exploit. The result is a template‑free system that reaches state‑of‑the‑art accuracy while needing far fewer inference steps than previous diffusion‑based models.
Key Contributions
- Positional inductive bias: Introduces a “reaction‑center‑first” atom ordering that makes the most chemically relevant substructure easy for the model to spot.
- RetroDiT backbone: A graph transformer equipped with rotary position embeddings that directly consumes the ordered atom sequence.
- Discrete flow matching: Decouples training from sampling, allowing the model to generate retrosynthetic routes in 20‑50 steps (vs. ~500 steps for earlier diffusion approaches).
- Strong empirical results: Sets new top‑1 accuracy records on USPTO‑50K (61.2 %) and USPTO‑Full (51.3 %) with predicted reaction centers; jumps to 71.1 % / 63.4 % when oracle centers are supplied.
- Efficiency over scale: Shows that a 280 K‑parameter model with the ordering trick matches the performance of a 65 M‑parameter model lacking it, highlighting the power of structural priors over brute‑force scaling.
Methodology
- Two‑stage view of a reaction – First, identify the reaction center (atoms whose bonds change); second, reconstruct the full precursor molecules.
- Atom ordering as a bias – The authors reorder the graph’s node list so that reaction‑center atoms appear at the beginning of the sequence fed to the transformer. This turns “where the chemistry happens” into a simple positional pattern.
- RetroDiT architecture – A graph transformer that processes the ordered node list, using rotary position embeddings to preserve relative order information without sacrificing permutation invariance of the rest of the graph.
- Discrete flow matching – Instead of learning a continuous diffusion process, the model learns a discrete transformation that directly maps a latent “noise” graph to a valid precursor graph. Training is done once; at inference time the model can step through a short, fixed number of discrete transitions (20‑50) to produce a candidate synthesis route.
- Reaction‑center prediction – A lightweight classifier predicts the reaction center from the target molecule; its output guides the ordering for the main generator.
Results & Findings
| Dataset | Setting | Top‑1 Accuracy |
|---|---|---|
| USPTO‑50K | Predicted centers | 61.2 % |
| USPTO‑Full | Predicted centers | 51.3 % |
| USPTO‑50K | Oracle (ground‑truth) centers | 71.1 % |
| USPTO‑Full | Oracle centers | 63.4 % |
- Speed: Generation requires only 20‑50 discrete flow steps, a 10×‑25× speed‑up over prior diffusion‑based retrosynthesis models that needed ~500 steps.
- Parameter efficiency: A 0.28 M‑parameter RetroDiT matches a 65 M‑parameter baseline that lacks the ordering bias, confirming that the structural prior is more valuable than sheer model size.
- Data efficiency: The approach outperforms large foundation models trained on 10 B reactions, despite using only the standard USPTO datasets (≈1 M reactions).
Practical Implications
- Faster AI‑assisted synthesis planning: Chemists can obtain candidate routes in seconds rather than minutes, enabling tighter integration into interactive design tools and automated lab workflows.
- Reduced compute costs: The discrete flow matching scheme and small model size lower GPU memory and inference time, making deployment feasible on on‑premise servers or even high‑end workstations.
- Better generalization with limited data: By encoding domain knowledge as a simple ordering, companies with proprietary reaction databases (often far smaller than public corpora) can train competitive retrosynthesis models without needing massive data collection.
- Plug‑and‑play reaction‑center predictor: The modular design lets developers swap in a more sophisticated center‑prediction model (e.g., a graph‑based classifier fine‑tuned on a specific chemistry domain) to further boost accuracy.
- Potential for downstream automation: The short, deterministic generation pipeline is well‑suited for coupling with robotic synthesis platforms that require rapid, reliable route suggestions.
Limitations & Future Work
- Reliance on accurate reaction‑center prediction: If the center classifier errs, the ordering cue can mislead the generator, degrading performance.
- Template‑free but still heuristic: While the model does not use explicit templates, the discrete flow steps are handcrafted; exploring fully learned flow dynamics could yield further gains.
- Scalability to exotic chemistries: The benchmarks focus on patent reactions; extending to organometallic or biocatalytic transformations may require additional domain‑specific priors.
- Integration with multi‑step planning: The paper evaluates single‑step retrosynthesis; future work could embed the method into a recursive planner that assembles multi‑step synthetic routes.
Authors
- Chenguang Wang
- Zihan Zhou
- Lei Bai
- Tianshu Yu
Paper Information
- arXiv ID: 2602.13136v1
- Categories: cs.LG
- Published: February 13, 2026
- PDF: Download PDF