[Paper] SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems
Source: arXiv - 2602.24235v1
Overview
The paper introduces SafeGen‑LLM, a new class of large language models that are explicitly trained to generate safe, constraint‑aware task plans for robots. By marrying formal safety specifications with modern LLM fine‑tuning techniques, the authors demonstrate that a language model can not only write syntactically correct plans but also respect safety rules it has never seen before—a capability that could bridge the gap between scalable AI planning and the stringent reliability demands of real‑world robotics.
Key Contributions
- Multi‑domain safety benchmark: A comprehensive PDDL3 (Planning Domain Definition Language 3) suite covering several robotic domains, each annotated with explicit safety constraints.
- Two‑stage post‑training pipeline:
- Supervised Fine‑Tuning (SFT) on a curated dataset of constraint‑compliant plans to teach the model the syntax and semantics of planning.
- Group Relative Policy Optimization (GRPO), a reinforcement‑learning‑style fine‑tuning that uses fine‑grained reward machines derived from formal verification to enforce safety and employs curriculum learning for complex tasks.
- Safety‑generalization capability: Demonstrated ability to satisfy novel safety properties that were absent from the training data, across both PDDL and natural‑language inputs.
- Empirical superiority: Outperforms leading proprietary baselines (e.g., GPT‑4‑based planners, classical heuristic planners) on safety metrics and overall plan quality in all benchmark domains.
Methodology
- Benchmark Construction – The authors built a diverse set of planning problems (e.g., warehouse navigation, collaborative assembly, drone delivery) expressed in PDDL3, each paired with a set of safety predicates (collision avoidance, energy limits, timing constraints).
- Supervised Fine‑Tuning (SFT) – A large pre‑trained LLM (e.g., Llama‑2) is fine‑tuned on a dataset of safe plans. This stage teaches the model the grammar of PDDL and the typical structure of feasible robot actions.
- Reward Machine Design – For every safety predicate, a deterministic finite‑state machine (the “reward machine”) tracks whether a generated plan violates the rule, assigning a negative reward at the exact step of violation.
- Group Relative Policy Optimization (GRPO) –
- Group formation – Plans are clustered by difficulty (e.g., number of constraints).
- Relative advantage – The policy gradient is computed relative to the average performance of the group, stabilizing learning when some tasks are intrinsically harder.
- Curriculum learning – Training starts with simple domains and gradually introduces more constraints, letting the model bootstrap its safety reasoning.
- Evaluation – The final model, SafeGen‑LLM, is tested on held‑out domains and on unseen safety constraints, measuring both safety satisfaction rate (percentage of plans that never trigger a violation) and plan optimality (makespan, action count).
Results & Findings
| Metric | SafeGen‑LLM | GPT‑4 Planner | Classical Heuristic Planner |
|---|---|---|---|
| Safety satisfaction (seen constraints) | 96.8 % | 71.2 % | 84.5 % |
| Safety satisfaction (unseen constraints) | 89.3 % | 42.7 % | 61.4 % |
| Average plan length (steps) | 1.07 × optimal | 1.23 × optimal | 1.15 × optimal |
| Inference latency (per problem) | ~0.45 s | ~0.38 s | ~0.12 s |
Key takeaways
- Safety generalization: SafeGen‑LLM retains high safety compliance even when the safety rule is novel, confirming that the GRPO stage successfully internalizes the principle of safety rather than memorizing specific constraints.
- Competitive efficiency: While not as fast as a pure heuristic planner, the LLM‑based approach stays within sub‑second latency, making it viable for many offline or semi‑online planning pipelines.
- Robustness to input modality: The same model can parse raw natural‑language task descriptions and output correct PDDL plans, opening the door to more intuitive human‑robot interaction.
Practical Implications
- Safer autonomous fleets – Warehouse robots or delivery drones can rely on a single LLM service to generate task schedules that automatically respect newly added safety policies (e.g., a temporary no‑fly zone) without retraining the whole system.
- Rapid prototyping – Engineers can describe a new robotic task in plain English, get a safety‑checked plan instantly, and focus on low‑level control rather than hand‑crafting domain‑specific planners.
- Regulatory compliance – Formal safety constraints encoded in reward machines provide an audit trail; developers can trace exactly which rule a plan satisfies or violates, simplifying certification processes.
- Hybrid planning architectures – SafeGen‑LLM can serve as a high‑level planner that feeds safe sub‑goals to existing motion‑planning or RL controllers, combining the scalability of LLMs with the precision of low‑level controllers.
Limitations & Future Work
- Scalability to extremely large domains – The current benchmark caps at a few dozen actions per problem; scaling to hundreds of actions may increase inference time and memory footprint.
- Dependence on reward‑machine design – Crafting formal safety specifications for every new domain still requires expert input; automating this step would broaden applicability.
- Real‑world validation – Experiments are confined to simulated environments; transferring the approach to physical robots (with sensor noise, actuation delays) remains an open challenge.
- Explainability – While the model respects safety constraints, it does not provide human‑readable justifications for its decisions; future work could integrate post‑hoc explanation modules.
Bottom line: SafeGen‑LLM showcases that large language models, when guided by formal safety rewards and curriculum learning, can become trustworthy planners for safety‑critical robotics—an exciting step toward more reliable, AI‑driven automation.
Authors
- Jialiang Fan
- Weizhe Xu
- Mengyu Liu
- Oleg Sokolsky
- Insup Lee
- Fangxin Kong
Paper Information
- arXiv ID: 2602.24235v1
- Categories: cs.RO, cs.AI
- Published: February 27, 2026
- PDF: Download PDF