[Paper] From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

Published: 5 days ago (May 5, 2026 at 01:08 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.03986v1

Overview

The paper presents an end‑to‑end framework that automatically builds multi‑agent workflows from a high‑level user intent. By replacing the traditionally manual steps of planning, agent selection, and execution‑graph construction with a set of coordinated software modules, the authors demonstrate a more scalable way to spin up task‑specific AI applications.

Key Contributions

LLM‑driven planner that translates natural‑language intents into a structured sequence of tasks.
Two‑stage agent recommender (fast vector retriever + LLM re‑ranker) that selects the most suitable agents from local and global registries.
Dynamic call‑graph generator that assembles the selected agents into an executable workflow.
Critique agent that reviews the whole plan and can trigger revisions to improve recall and robustness.
Comprehensive empirical evaluation of embedder/re‑ranker choices, description enrichment, and the impact of the critique step, showing state‑of‑the‑art recall performance.

Methodology

Intent → Task Decomposition
- An LLM (e.g., GPT‑4) receives a user’s natural‑language goal and outputs an ordered list of atomic tasks.
Agent Retrieval
- Stage 1: A dense‑vector retriever (e.g., FAISS + sentence‑transformer embeddings) quickly pulls a shortlist of candidate agents whose metadata matches each task.
- Stage 2: A smaller LLM re‑ranks the shortlist using richer contextual cues (task description, agent capabilities, past performance).
Workflow Assembly
- The system builds a dynamic call graph that wires the chosen agents according to task dependencies, forming an executable DAG (directed acyclic graph).
Critique Loop
- A dedicated critique agent inspects the full plan + selected agents, checks for gaps or mismatches, and can request re‑planning or alternative agents.
Execution
- The orchestrator invokes each agent in topological order, passing intermediate results downstream until the overall intent is satisfied.

All components are modular, allowing developers to plug in their own LLMs, embedding models, or custom agents.

Results & Findings

Aspect	Metric	Outcome
Recall of correct agents	% of tasks matched with an appropriate agent	~15 % higher than prior baselines (e.g., single‑stage retrieval).
Scalability	Time to retrieve agents for 100‑task workflow	Linear growth; the fast retriever keeps latency low (< 200 ms per task).
Critique impact	Recall after critique vs. before	+4–6 % absolute gain, confirming the value of a holistic review step.
Robustness	Success rate under noisy intent phrasing	Maintained > 90 % task completion, whereas baselines dropped below 70 %.

The experiments also showed that enriching agent descriptions (adding example inputs/outputs) significantly improves the re‑ranker’s ability to pick the right tool.

Practical Implications

Rapid prototyping of AI‑powered services: Developers can describe a new workflow in plain English and obtain a ready‑to‑run multi‑agent pipeline without hand‑crafting glue code.
Marketplace integration: SaaS platforms that host a catalog of specialized agents (e.g., data cleaning, translation, code generation) can use the recommender to auto‑match client requests to the best‑fit services.
Enterprise automation: Business process automation teams can replace brittle RPA scripts with adaptive agent chains that self‑select the most capable tool for each step.
Extensibility: Because the framework is modular, teams can swap in domain‑specific LLMs or embedder models to tailor performance for niche verticals (finance, healthcare, etc.).

In short, the approach lowers the barrier to building sophisticated, composable AI systems, turning “intent → execution” into a repeatable engineering pattern.

Limitations & Future Work

Dependency on high‑quality agent metadata: The recommender’s success hinges on well‑structured, descriptive registries; sparse or noisy descriptions degrade performance.
LLM cost and latency: Using large LLMs for planning and re‑ranking can be expensive for very large workflows; future work could explore distilled models or caching strategies.
Evaluation scope: Benchmarks focus on recall and synthetic intents; real‑world deployments with complex error handling and security constraints remain to be tested.
Dynamic adaptation: The current system assumes a static agent pool; extending it to discover or train new agents on‑fly is an open research direction.

Overall, the paper lays a solid foundation for automated multi‑agent composition while highlighting practical challenges that the community can address next.

Authors

Kishan Athrey
Ramin Pishehvar
Brian Riordan
Mahesh Viswanathan

Paper Information

arXiv ID: 2605.03986v1
Categories: cs.AI
Published: May 5, 2026
PDF: Download PDF

[Paper] From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

[Paper] GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction