[Paper] Interactive LLM-assisted Curriculum Learning for Multi-Task Evolutionary Policy Search

Published: 2 months ago (February 11, 2026 at 09:21 AM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.10891v1

Overview

The paper introduces a novel framework that lets large language models (LLMs) act as interactive curriculum designers for multi‑task evolutionary policy search. By feeding the LLM real‑time feedback from the optimizer, the system can dynamically generate training scenarios that steadily push a robot’s policy toward better generalisation—something that previously required hand‑crafted curricula or static, offline LLM suggestions.

Key Contributions

Interactive LLM‑assisted curriculum generation – a loop where the LLM receives live metrics, plots, and visualisations from the evolutionary algorithm and instantly proposes new training cases.
Feedback‑modality study – systematic comparison of numeric‑only feedback vs. multimodal feedback (numeric + progression plots + behavior visualisations) on the LLM’s ability to craft useful curricula.
Empirical validation on a 2‑D robot navigation task – using genetic programming as the policy optimiser, the authors benchmark interactive curricula against static LLM‑generated and human‑expert curricula.
Performance parity with expert‑designed curricula – multimodal interactive feedback yields results that match or exceed those of hand‑crafted curricula, demonstrating that LLMs can approximate domain expertise.
Open‑ended design recipe – the framework is agnostic to the underlying optimisation algorithm, suggesting easy adaptation to other embodied‑AI or evolutionary‑robotics problems.

Methodology

Problem setting – Multi‑task policy search via evolutionary algorithms (genetic programming) where each “task” is a navigation scenario in a 2‑D world.
Curriculum loop
- The evolutionary optimizer runs for a short interval and produces feedback (e.g., success rate, fitness curves, trajectory snapshots).
- This feedback is packaged and sent to a large language model (e.g., GPT‑4).
- The LLM, prompted with a description of the current policy performance and the desired learning goal, generates new training cases (obstacle layouts, start/goal positions, difficulty parameters).
- The new cases are fed back into the optimizer, and the cycle repeats.
Feedback modalities
- Numeric only: raw scores and scalar metrics.
- Numeric + plots: fitness curves, success‑rate over generations.
- Numeric + plots + visualisations: trajectory videos or rendered snapshots of robot behaviour.
Baselines
- Static LLM curriculum: a one‑shot LLM generation before optimisation starts.
- Expert curriculum: manually designed progression of tasks by a robotics researcher.
Evaluation metrics – final success rate across a held‑out test set, learning speed (generations to reach a threshold), and curriculum “smoothness” (how gradually difficulty increases).

Results & Findings

Curriculum type	Test‑set success ↑	Generations to 80 % success ↓	Qualitative notes
Expert‑designed	92 %	45	Smooth difficulty ramp, intuitive obstacles
Interactive (multimodal)	90 %	48	LLM quickly learns to increase obstacle density after seeing failure patterns
Interactive (numeric‑only)	78 %	62	Curriculum becomes erratic; LLM lacks visual context
Static LLM	71 %	70	No adaptation to optimizer’s actual struggles
No curriculum (random tasks)	55 %	120	Policy fails to generalise

Multimodal feedback (numbers + plots + visuals) gave the LLM enough context to propose curricula that are almost as effective as those crafted by human experts.
Numeric‑only feedback led to noisy curricula, confirming that visual cues are crucial for the LLM to understand the shape of the problem space.
The interactive loop consistently outperformed the static LLM baseline, highlighting the value of online adaptation.

Practical Implications

Rapid prototyping of training regimes – Developers can replace time‑consuming manual curriculum design with an LLM that tailors tasks on the fly, cutting down iteration cycles for embodied‑AI projects.
Scalable to diverse domains – Because the feedback is language‑agnostic, the same pattern can be applied to simulated drones, manipulators, or even non‑robotic optimisation problems (e.g., game‑level generation).
Lower barrier to entry – Small teams without deep domain expertise can achieve near‑expert performance by leveraging an LLM as a “curriculum consultant.”
Tooling opportunities – IDE‑style plugins could surface the LLM‑generated tasks directly in simulation environments (e.g., Unity, ROS Gazebo), allowing developers to inspect and approve curricula before deployment.
Cost‑effective training – By focusing the evolutionary search on progressively harder yet tractable tasks, compute budgets shrink, which is attractive for cloud‑based RL pipelines.

Limitations & Future Work

Domain specificity of prompts – The LLM still needs carefully crafted prompts and a well‑structured feedback format; a generic “plug‑and‑play” solution is not yet available.
Scalability to high‑dimensional tasks – The study used a simple 2‑D navigation benchmark; it remains unclear how the approach scales to 3‑D robotics or tasks with richer sensory inputs.
Reliance on visualisation quality – Poorly rendered trajectories can mislead the LLM; robust visual pipelines are required.
Potential for hallucination – The LLM may suggest impossible or unsafe scenarios; a verification layer is needed before feeding tasks to the optimizer.
Future directions suggested by the authors include: extending the framework to other evolutionary algorithms (CMA‑ES, NEAT), testing on real‑world robots, and exploring reinforcement‑learning‑style reward shaping as an additional feedback channel.

Authors

Berfin Sakallioglu
Giorgia Nadizar
Eric Medvet

Paper Information

arXiv ID: 2602.10891v1
Categories: cs.NE, cs.AI
Published: February 11, 2026
PDF: Download PDF

[Paper] Interactive LLM-assisted Curriculum Learning for Multi-Task Evolutionary Policy Search

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

[Paper] Semantic Chunking and the Entropy of Natural Language

[Paper] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

[Paper] Selection of CMIP6 Models for Regional Precipitation Projection and Climate Change Assessment in the Jhelum and Chenab River Basins