[Paper] Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Published: (December 4, 2025 at 11:57 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.04987v1

Overview

The Nex‑AGI team presents Nex‑N1, a new class of agentic language models that learn to act autonomously by training inside a deliberately engineered ecosystem of simulated environments. By coupling a flexible agent framework, automatic hierarchy generation, and a bridge to real‑world dynamics, the authors demonstrate that LLMs can move beyond static text generation toward robust decision‑making across a wide range of tasks.

Key Contributions

  • Unified ecosystem for environment construction – three orthogonal modules (NexAU, NexA4A, NexGAP) that together scale complexity, diversity, and fidelity of training worlds.
  • Agent hierarchy DSL – NexAU lets researchers define multi‑level agents (planner → sub‑planner → tool‑user) with a few configuration lines, enabling hierarchical reasoning without hand‑coding each level.
  • Automatic hierarchy synthesis from natural language – NexA4A parses plain‑language specifications into diverse agent trees, effectively turning textual prompts into whole families of agents.
  • Simulation‑reality gap reduction – NexGAP injects dynamic, sensor‑rich real‑world data (e.g., robotics telemetry, API logs) into the simulator, producing grounded trajectories for policy learning.
  • State‑of‑the‑art performance – Nex‑N1 outperforms leading open‑source models on SWE‑bench (software engineering) and tau2 (complex multi‑step reasoning) and rivals top proprietary agents on several benchmark suites.
  • Open‑source release – the full Nex ecosystem, training pipelines, and model checkpoints are publicly available, encouraging community‑driven extensions.

Methodology

Environment Scaling

  • Complexity (NexAU): A lightweight domain‑specific language (DSL) describes agent components (memory, tools, goals). The runtime automatically assembles hierarchical agents, allowing experiments with deep planning trees or shallow reactive bots.
  • Diversity (NexA4A): Large‑scale language models translate natural‑language task descriptions into NexAU configurations, generating thousands of distinct agent hierarchies that span domains such as code generation, data wrangling, and home automation.
  • Fidelity (NexGAP): Real‑world interaction logs (e.g., API calls, robot joint states) are embedded into the simulator as stochastic dynamics, ensuring that policies trained in simulation encounter realistic noise and latency.

Training Pipeline

  1. The generated environments produce interaction traces (state‑action‑reward sequences).
  2. Nex‑N1 is fine‑tuned using a hybrid objective:
    • Supervised imitation on high‑quality human demonstrations.
    • Reinforcement learning from AI feedback (RLHF‑style) where a reward model evaluates task success, encouraging incentive‑driven decision making.
  3. Curriculum learning gradually increases hierarchy depth and environment stochasticity, stabilizing policy acquisition.

Evaluation

  • Benchmarks include SWE‑bench (coding tasks with multi‑step debugging), tau2 (complex reasoning over tool‑use), and custom multi‑agent coordination tests.
  • Metrics cover correctness, tool‑use efficiency, and runtime overhead.

Results & Findings

BenchmarkNex‑N1Best Open‑Source BaselineLeading Proprietary Agent
SWE‑bench (pass@1)78.4%62.1%80.2%
tau2 (overall score)84.771.586.0
Multi‑agent coordination (success rate)91%68%93%
  • Consistent gains across tasks that require hierarchical planning, tool invocation, and error recovery.
  • Reduced hallucination: The reward‑driven fine‑tuning cuts off irrelevant or fabricated actions by ~35% compared to pure supervised LLMs.
  • Scalability proof: Training on 10× more generated environments yields diminishing returns after ~5 M interaction steps, indicating the ecosystem efficiently covers the task space.

Practical Implications

  • Developer assistants can now reason about code changes, run tests, and iteratively debug without explicit prompting for each step, shrinking the feedback loop in CI pipelines.
  • Tool‑augmented agents (e.g., database query bots, cloud‑resource managers) can autonomously select and orchestrate APIs, lowering integration effort for SaaS platforms.
  • Robotics and IoT: By feeding real sensor streams through NexGAP, the same training pipeline can produce agents that safely operate in noisy physical environments, accelerating prototyping of home‑automation or warehouse robots.
  • Rapid prototyping of new domains: Teams can describe a new workflow in plain English, let NexA4A spin up a hierarchy, and immediately obtain a functional agent—dramatically cutting the time from idea to MVP.
  • Open‑source community: The released ecosystem invites plug‑ins (custom simulators, domain‑specific reward models), making it a reusable backbone for any organization building autonomous LLM‑powered services.

Limitations & Future Work

  • Simulation fidelity ceiling: While NexGAP narrows the reality gap, highly non‑deterministic physical phenomena (e.g., fluid dynamics) remain under‑represented, limiting transfer to certain robotics domains.
  • Reward model bias: The RL component inherits biases from the human‑annotated reward data; occasional over‑optimization toward proxy metrics (e.g., tool‑call count) was observed.
  • Compute cost: Generating and training on millions of interaction traces still demands multi‑GPU clusters, which may be prohibitive for smaller labs.
  • Future directions the authors outline include:
    1. Integrating online learning from live user feedback.
    2. Extending NexA4A to multilingual hierarchy synthesis.
    3. Exploring hierarchical meta‑learning to let agents adapt their own hierarchy during deployment.

Authors

  • Yuxuan Cai
  • Lu Chen
  • Qiaoling Chen
  • Yuyang Ding
  • Liwen Fan
  • Wenjie Fu
  • Yufei Gao
  • Honglin Guo
  • Pinxue Guo
  • Zhenhua Han
  • Zhengfu He
  • Hanglei Hu
  • Kai Hu
  • Shengjia Hua
  • Tianyu Huai
  • Baodai Huang
  • Li Ji
  • Zhen Jiang
  • Zhikai Lei
  • Bufan Li
  • Jiahang Lin
  • Lizhi Lin
  • Jinxiu Liu
  • Shichun Liu
  • Ziming Liu
  • Yuchen Ni
  • Pengfang Qian
  • Yujiong Shen
  • Qingyun Shi
  • Wentao Shu
  • Peng Sun
  • Yiran Suo
  • Tian Tang
  • Boyu Tian
  • Guoteng Wang
  • Junzhe Wang
  • Peixin Wang
  • Zhiheng Xi
  • Hang Yan
  • Jie Yang
  • Zhixiong Yang
  • Tianchu Yao
  • Guangze Ye
  • Qianxi Yu
  • Shuo Zhang
  • Xinyue Zhang
  • Yiqi Zhang
  • Jiarong Zhao
  • Miao Zheng
  • Rui Zheng
  • Enyu Zhou
  • Jiazheng Zhou
  • Maosen Zhou
  • Yuhao Zhou
  • Tao Gui
  • Yining Zheng
  • Xinchi Chen
  • Jie Zhou
  • Siyuan Feng
  • Qin Chen
  • Liang He
  • Qi Zhang
  • Xuanjing Huang
  • Xipeng Qiu

Paper Information

  • arXiv ID: 2512.04987v1
  • Categories: cs.CL
  • Published: December 4, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »