[Paper] CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

Published: 3 days ago (February 27, 2026 at 11:19 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.24142v1

Overview

The paper introduces CoME (Channel‑of‑Mobile‑Experts), a new architecture for mobile AI agents that can follow user instructions by chaining together several reasoning capabilities—summarizing the screen, planning subtasks, deciding actions, and executing functions. By structuring these capabilities as separate “experts” that are activated only when needed, CoME achieves both modular improvement and tight integration, addressing a long‑standing bottleneck in current mobile agents.

Key Contributions

Channel‑of‑Mobile‑Experts (CoME) architecture: four dedicated experts (Screen‑Summarizer, Planner, Action‑Decider, Function‑Executor) that are invoked via an output‑oriented activation mechanism.
Progressive training pipeline:
- Expert‑FT – fine‑tunes each expert independently, allowing targeted capability upgrades.
- Router‑FT – learns to route the conversation to the right expert at each reasoning stage.
- CoT‑FT – fine‑tunes the whole chain as a “chain‑of‑thought”, encouraging smooth hand‑offs between experts.
InfoGain‑Driven DPO (Info‑DPO): a reinforcement‑style fine‑tuning step that scores intermediate steps by their information gain, reducing error propagation and nudging the agent toward more informative reasoning traces.
Empirical validation: CoME beats dense (single‑model) mobile agents and existing mixture‑of‑experts (MoE) baselines on two benchmark suites—AITZ (AI‑driven task‑completion) and AMEX (mobile‑app execution).

Methodology

Modular Expert Design – The agent is split into four lightweight language models, each specialized for a distinct sub‑task of the overall instruction.
Output‑Oriented Activation – Instead of a static pipeline, CoME watches the token stream; when a token signals the end of a stage (e.g., a summary is complete), the router switches to the next expert.
Progressive Fine‑Tuning:
- Expert‑FT: Each expert is fine‑tuned on a curated dataset that isolates its function (e.g., screen‑summaries from UI screenshots).
- Router‑FT: A small classifier learns to predict the correct expert given the current dialogue context.
- CoT‑FT: The whole system is then fine‑tuned end‑to‑end using chain‑of‑thought prompts, encouraging the experts to produce compatible intermediate outputs.
Info‑DPO: During reinforcement‑style fine‑tuning, the system computes the information gain of each intermediate step (how much it reduces uncertainty about the final answer). Steps with higher gain receive larger rewards, steering the model away from redundant or misleading reasoning.

Results & Findings

Metric	Dense Mobile Agent	MoE Baseline	CoME (full)
Success Rate (AITZ)	68.2 %	71.5 %	78.9 %
Task Completion (AMEX)	62.4 %	66.1 %	74.3 %
Avg. Reasoning Steps	12.4	11.8	9.6
Info‑Gain Score (higher is better)	0.41	0.45	0.58

Higher success rates: CoME solves more tasks across both benchmarks, especially in complex multi‑step scenarios.
Fewer steps: The router’s stage‑aware activation cuts down unnecessary back‑and‑forth, making the reasoning trace shorter and more interpretable.
Improved robustness: Info‑DPO reduces error cascades; when an early stage makes a mistake, later experts can still recover because the system penalizes low‑information steps.

Practical Implications

Developer‑friendly modular upgrades – Teams can improve a single capability (e.g., UI summarization) without retraining the whole agent, accelerating iteration cycles.
Lower compute on‑device – Because each expert is lightweight and only one runs at a time, the memory footprint is smaller than a monolithic large language model, making CoME suitable for smartphones, wearables, or edge devices.
Better debugging & observability – The explicit stage boundaries give developers clear logs (“summary generated”, “plan chosen”), simplifying troubleshooting of mobile assistants.
Potential for plug‑and‑play ecosystems – Third‑party developers could ship specialized experts (e.g., for a new app’s API) that the router can invoke, fostering an extensible marketplace of mobile‑agent capabilities.

Limitations & Future Work

Dataset bias – The training data for each expert comes from curated UI logs; performance may degrade on novel app designs or heavily customized interfaces.
Router mis‑routing – Although Router‑FT improves stage prediction, occasional mis‑classifications still force the wrong expert to act, leading to failure cascades.
Scalability of expert count – Adding more specialized experts (e.g., for voice input, AR overlays) could increase routing complexity; the paper leaves optimal scaling strategies for future research.
User‑privacy considerations – On‑device fine‑tuning with personal UI data is promising but requires robust privacy‑preserving mechanisms, which are not explored in depth.

Overall, CoME demonstrates that a thoughtfully modular, stage‑aware architecture can deliver more capable and efficient mobile agents, opening a path toward truly assistant‑level AI on everyday devices.

Authors

Yuxuan Liu
Weikai Xu
Kun Huang
Changyu Chen
Jiankun Zhao
Pengzhi Gao
Wei Liu
Jian Luan
Shuo Shang
Bo Du
Ji-Rong Wen
Rui Yan

Paper Information

arXiv ID: 2602.24142v1
Categories: cs.CL, cs.AI
Published: February 27, 2026
PDF: Download PDF

[Paper] CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

[Paper] Do LLMs Benefit From Their Own Words?

[Paper] Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

[Paper] Controllable Reasoning Models Are Private Thinkers