[Paper] Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
Source: arXiv - 2512.04987v1
Overview
The Nex‑AGI team presents Nex‑N1, a new class of agentic language models that learn to act autonomously by training inside a deliberately engineered ecosystem of simulated environments. By coupling a flexible agent framework, automatic hierarchy generation, and a bridge to real‑world dynamics, the authors demonstrate that LLMs can move beyond static text generation toward robust decision‑making across a wide range of tasks.
Key Contributions
- Unified ecosystem for environment construction – three orthogonal modules (NexAU, NexA4A, NexGAP) that together scale complexity, diversity, and fidelity of training worlds.
- Agent hierarchy DSL – NexAU lets researchers define multi‑level agents (planner → sub‑planner → tool‑user) with a few configuration lines, enabling hierarchical reasoning without hand‑coding each level.
- Automatic hierarchy synthesis from natural language – NexA4A parses plain‑language specifications into diverse agent trees, effectively turning textual prompts into whole families of agents.
- Simulation‑reality gap reduction – NexGAP injects dynamic, sensor‑rich real‑world data (e.g., robotics telemetry, API logs) into the simulator, producing grounded trajectories for policy learning.
- State‑of‑the‑art performance – Nex‑N1 outperforms leading open‑source models on SWE‑bench (software engineering) and tau2 (complex multi‑step reasoning) and rivals top proprietary agents on several benchmark suites.
- Open‑source release – the full Nex ecosystem, training pipelines, and model checkpoints are publicly available, encouraging community‑driven extensions.
Methodology
Environment Scaling
- Complexity (NexAU): A lightweight domain‑specific language (DSL) describes agent components (memory, tools, goals). The runtime automatically assembles hierarchical agents, allowing experiments with deep planning trees or shallow reactive bots.
- Diversity (NexA4A): Large‑scale language models translate natural‑language task descriptions into NexAU configurations, generating thousands of distinct agent hierarchies that span domains such as code generation, data wrangling, and home automation.
- Fidelity (NexGAP): Real‑world interaction logs (e.g., API calls, robot joint states) are embedded into the simulator as stochastic dynamics, ensuring that policies trained in simulation encounter realistic noise and latency.
Training Pipeline
- The generated environments produce interaction traces (state‑action‑reward sequences).
- Nex‑N1 is fine‑tuned using a hybrid objective:
- Supervised imitation on high‑quality human demonstrations.
- Reinforcement learning from AI feedback (RLHF‑style) where a reward model evaluates task success, encouraging incentive‑driven decision making.
- Curriculum learning gradually increases hierarchy depth and environment stochasticity, stabilizing policy acquisition.
Evaluation
- Benchmarks include SWE‑bench (coding tasks with multi‑step debugging), tau2 (complex reasoning over tool‑use), and custom multi‑agent coordination tests.
- Metrics cover correctness, tool‑use efficiency, and runtime overhead.
Results & Findings
| Benchmark | Nex‑N1 | Best Open‑Source Baseline | Leading Proprietary Agent |
|---|---|---|---|
| SWE‑bench (pass@1) | 78.4% | 62.1% | 80.2% |
| tau2 (overall score) | 84.7 | 71.5 | 86.0 |
| Multi‑agent coordination (success rate) | 91% | 68% | 93% |
- Consistent gains across tasks that require hierarchical planning, tool invocation, and error recovery.
- Reduced hallucination: The reward‑driven fine‑tuning cuts off irrelevant or fabricated actions by ~35% compared to pure supervised LLMs.
- Scalability proof: Training on 10× more generated environments yields diminishing returns after ~5 M interaction steps, indicating the ecosystem efficiently covers the task space.
Practical Implications
- Developer assistants can now reason about code changes, run tests, and iteratively debug without explicit prompting for each step, shrinking the feedback loop in CI pipelines.
- Tool‑augmented agents (e.g., database query bots, cloud‑resource managers) can autonomously select and orchestrate APIs, lowering integration effort for SaaS platforms.
- Robotics and IoT: By feeding real sensor streams through NexGAP, the same training pipeline can produce agents that safely operate in noisy physical environments, accelerating prototyping of home‑automation or warehouse robots.
- Rapid prototyping of new domains: Teams can describe a new workflow in plain English, let NexA4A spin up a hierarchy, and immediately obtain a functional agent—dramatically cutting the time from idea to MVP.
- Open‑source community: The released ecosystem invites plug‑ins (custom simulators, domain‑specific reward models), making it a reusable backbone for any organization building autonomous LLM‑powered services.
Limitations & Future Work
- Simulation fidelity ceiling: While NexGAP narrows the reality gap, highly non‑deterministic physical phenomena (e.g., fluid dynamics) remain under‑represented, limiting transfer to certain robotics domains.
- Reward model bias: The RL component inherits biases from the human‑annotated reward data; occasional over‑optimization toward proxy metrics (e.g., tool‑call count) was observed.
- Compute cost: Generating and training on millions of interaction traces still demands multi‑GPU clusters, which may be prohibitive for smaller labs.
- Future directions the authors outline include:
- Integrating online learning from live user feedback.
- Extending NexA4A to multilingual hierarchy synthesis.
- Exploring hierarchical meta‑learning to let agents adapt their own hierarchy during deployment.
Authors
- Yuxuan Cai
- Lu Chen
- Qiaoling Chen
- Yuyang Ding
- Liwen Fan
- Wenjie Fu
- Yufei Gao
- Honglin Guo
- Pinxue Guo
- Zhenhua Han
- Zhengfu He
- Hanglei Hu
- Kai Hu
- Shengjia Hua
- Tianyu Huai
- Baodai Huang
- Li Ji
- Zhen Jiang
- Zhikai Lei
- Bufan Li
- Jiahang Lin
- Lizhi Lin
- Jinxiu Liu
- Shichun Liu
- Ziming Liu
- Yuchen Ni
- Pengfang Qian
- Yujiong Shen
- Qingyun Shi
- Wentao Shu
- Peng Sun
- Yiran Suo
- Tian Tang
- Boyu Tian
- Guoteng Wang
- Junzhe Wang
- Peixin Wang
- Zhiheng Xi
- Hang Yan
- Jie Yang
- Zhixiong Yang
- Tianchu Yao
- Guangze Ye
- Qianxi Yu
- Shuo Zhang
- Xinyue Zhang
- Yiqi Zhang
- Jiarong Zhao
- Miao Zheng
- Rui Zheng
- Enyu Zhou
- Jiazheng Zhou
- Maosen Zhou
- Yuhao Zhou
- Tao Gui
- Yining Zheng
- Xinchi Chen
- Jie Zhou
- Siyuan Feng
- Qin Chen
- Liang He
- Qi Zhang
- Xuanjing Huang
- Xipeng Qiu
Paper Information
- arXiv ID: 2512.04987v1
- Categories: cs.CL
- Published: December 4, 2025
- PDF: Download PDF