[Paper] Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Published: 2 months ago (December 4, 2025 at 11:57 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.04987v1

Overview

The Nex‑AGI team presents Nex‑N1, a new class of agentic language models that learn to act autonomously by training inside a deliberately engineered ecosystem of simulated environments. By coupling a flexible agent framework, automatic hierarchy generation, and a bridge to real‑world dynamics, the authors demonstrate that LLMs can move beyond static text generation toward robust decision‑making across a wide range of tasks.

Key Contributions

Unified ecosystem for environment construction – three orthogonal modules (NexAU, NexA4A, NexGAP) that together scale complexity, diversity, and fidelity of training worlds.
Agent hierarchy DSL – NexAU lets researchers define multi‑level agents (planner → sub‑planner → tool‑user) with a few configuration lines, enabling hierarchical reasoning without hand‑coding each level.
Automatic hierarchy synthesis from natural language – NexA4A parses plain‑language specifications into diverse agent trees, effectively turning textual prompts into whole families of agents.
Simulation‑reality gap reduction – NexGAP injects dynamic, sensor‑rich real‑world data (e.g., robotics telemetry, API logs) into the simulator, producing grounded trajectories for policy learning.
State‑of‑the‑art performance – Nex‑N1 outperforms leading open‑source models on SWE‑bench (software engineering) and tau2 (complex multi‑step reasoning) and rivals top proprietary agents on several benchmark suites.
Open‑source release – the full Nex ecosystem, training pipelines, and model checkpoints are publicly available, encouraging community‑driven extensions.

Methodology

Environment Scaling

Complexity (NexAU): A lightweight domain‑specific language (DSL) describes agent components (memory, tools, goals). The runtime automatically assembles hierarchical agents, allowing experiments with deep planning trees or shallow reactive bots.
Diversity (NexA4A): Large‑scale language models translate natural‑language task descriptions into NexAU configurations, generating thousands of distinct agent hierarchies that span domains such as code generation, data wrangling, and home automation.
Fidelity (NexGAP): Real‑world interaction logs (e.g., API calls, robot joint states) are embedded into the simulator as stochastic dynamics, ensuring that policies trained in simulation encounter realistic noise and latency.

Training Pipeline

The generated environments produce interaction traces (state‑action‑reward sequences).
Nex‑N1 is fine‑tuned using a hybrid objective:
- Supervised imitation on high‑quality human demonstrations.
- Reinforcement learning from AI feedback (RLHF‑style) where a reward model evaluates task success, encouraging incentive‑driven decision making.
Curriculum learning gradually increases hierarchy depth and environment stochasticity, stabilizing policy acquisition.

Evaluation

Benchmarks include SWE‑bench (coding tasks with multi‑step debugging), tau2 (complex reasoning over tool‑use), and custom multi‑agent coordination tests.
Metrics cover correctness, tool‑use efficiency, and runtime overhead.

Results & Findings

Benchmark	Nex‑N1	Best Open‑Source Baseline	Leading Proprietary Agent
SWE‑bench (pass@1)	78.4%	62.1%	80.2%
tau2 (overall score)	84.7	71.5	86.0
Multi‑agent coordination (success rate)	91%	68%	93%

Consistent gains across tasks that require hierarchical planning, tool invocation, and error recovery.
Reduced hallucination: The reward‑driven fine‑tuning cuts off irrelevant or fabricated actions by ~35% compared to pure supervised LLMs.
Scalability proof: Training on 10× more generated environments yields diminishing returns after ~5 M interaction steps, indicating the ecosystem efficiently covers the task space.

Practical Implications

Developer assistants can now reason about code changes, run tests, and iteratively debug without explicit prompting for each step, shrinking the feedback loop in CI pipelines.
Tool‑augmented agents (e.g., database query bots, cloud‑resource managers) can autonomously select and orchestrate APIs, lowering integration effort for SaaS platforms.
Robotics and IoT: By feeding real sensor streams through NexGAP, the same training pipeline can produce agents that safely operate in noisy physical environments, accelerating prototyping of home‑automation or warehouse robots.
Rapid prototyping of new domains: Teams can describe a new workflow in plain English, let NexA4A spin up a hierarchy, and immediately obtain a functional agent—dramatically cutting the time from idea to MVP.
Open‑source community: The released ecosystem invites plug‑ins (custom simulators, domain‑specific reward models), making it a reusable backbone for any organization building autonomous LLM‑powered services.

Limitations & Future Work

Simulation fidelity ceiling: While NexGAP narrows the reality gap, highly non‑deterministic physical phenomena (e.g., fluid dynamics) remain under‑represented, limiting transfer to certain robotics domains.
Reward model bias: The RL component inherits biases from the human‑annotated reward data; occasional over‑optimization toward proxy metrics (e.g., tool‑call count) was observed.
Compute cost: Generating and training on millions of interaction traces still demands multi‑GPU clusters, which may be prohibitive for smaller labs.
Future directions the authors outline include:
1. Integrating online learning from live user feedback.
2. Extending NexA4A to multilingual hierarchy synthesis.
3. Exploring hierarchical meta‑learning to let agents adapt their own hierarchy during deployment.

Authors

Yuxuan Cai
Lu Chen
Qiaoling Chen
Yuyang Ding
Liwen Fan
Wenjie Fu
Yufei Gao
Honglin Guo
Pinxue Guo
Zhenhua Han
Zhengfu He
Hanglei Hu
Kai Hu
Shengjia Hua
Tianyu Huai
Baodai Huang
Li Ji
Zhen Jiang
Zhikai Lei
Bufan Li
Jiahang Lin
Lizhi Lin
Jinxiu Liu
Shichun Liu
Ziming Liu
Yuchen Ni
Pengfang Qian
Yujiong Shen
Qingyun Shi
Wentao Shu
Peng Sun
Yiran Suo
Tian Tang
Boyu Tian
Guoteng Wang
Junzhe Wang
Peixin Wang
Zhiheng Xi
Hang Yan
Jie Yang
Zhixiong Yang
Tianchu Yao
Guangze Ye
Qianxi Yu
Shuo Zhang
Xinyue Zhang
Yiqi Zhang
Jiarong Zhao
Miao Zheng
Rui Zheng
Enyu Zhou
Jiazheng Zhou
Maosen Zhou
Yuhao Zhou
Tao Gui
Yining Zheng
Xinchi Chen
Jie Zhou
Siyuan Feng
Qin Chen
Liang He
Qi Zhang
Xuanjing Huang
Xipeng Qiu

Paper Information

arXiv ID: 2512.04987v1
Categories: cs.CL
Published: December 4, 2025
PDF: Download PDF