[Paper] Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks

Published: (February 26, 2026 at 01:37 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.23330v1

Overview

The paper introduces a multi‑agent trading framework powered by large language models (LLMs) that breaks down the investment workflow into a series of fine‑grained, well‑defined tasks instead of handing each agent a vague, high‑level instruction. Tested on Japanese equities with a rigorously controlled back‑test, the approach delivers markedly better risk‑adjusted returns and offers clearer insight into why the system makes its decisions.

Key Contributions

  • Task‑level decomposition: Formalizes investment analysis as a pipeline of specific subtasks (e.g., earnings‑statement parsing, news sentiment extraction, macro‑factor scoring) rather than a single “analyze‑and‑trade” prompt.
  • Multi‑agent LLM architecture: Assigns dedicated LLM agents to each subtask, enabling specialization and easier debugging.
  • Leakage‑controlled backtesting: Uses a realistic data‑slicing protocol that prevents future information from contaminating training or inference, ensuring results are comparable to real‑world deployment.
  • Empirical validation on Japanese markets: Demonstrates that fine‑grained agents outperform coarse‑grained baselines on a diverse dataset (price series, financial statements, news, macro indicators).
  • Portfolio‑level optimization: Shows that combining outputs from several fine‑grained systems—exploiting low correlation with the market index—further boosts performance.
  • Interpretability insight: Finds that alignment between intermediate analytical outputs and downstream decision preferences is a primary driver of success.

Methodology

  1. Data Ingestion – The system pulls four data streams for each stock:

    • (a) historical price/volume
    • (b) structured financial‑statement fields
    • (c) news headlines/articles
    • (d) macro‑economic indicators
  2. Task Decomposition – The overall investment decision is split into discrete prompts:

    • Fundamental Analyzer: extracts key ratios (e.g., ROE, debt‑to‑equity) from filings.
    • News Sentiment Agent: classifies recent headlines as positive/negative/neutral and quantifies sentiment strength.
    • Macro‑Factor Agent: scores the current macro environment (interest rates, GDP growth) for relevance to the sector.
    • Signal Synthesizer: combines the numeric outputs from the previous agents into a trading signal (long/short/hold) using a simple rule‑based or lightweight ML model.
  3. LLM Choice & Fine‑Tuning – Each agent uses a state‑of‑the‑art LLM (e.g., GPT‑4‑Turbo) with domain‑specific few‑shot examples to guide the expected output format. No full model fine‑tuning is required, keeping the pipeline lightweight.

  4. Backtesting Framework – A “leakage‑controlled” split ensures that any information used by an agent at time t is strictly limited to data available up to t‑1. The authors run a rolling‑window evaluation over several years of Japanese market data.

  5. Portfolio Construction – Signals from multiple independent fine‑grained systems are fed into a mean‑variance optimizer that accounts for each system’s predicted variance and its correlation with the benchmark index.

Results & Findings

MetricFine‑grained Multi‑AgentCoarse‑grained BaselineNaïve Buy‑Hold
Annualized Return12.4 %8.1 %6.3 %
Sharpe Ratio1.450.870.55
Max Drawdown‑9.2 %‑14.8 %‑18.5 %
Correlation with Index0.310.581.00
  • Fine‑grained decomposition yields a ~50 % improvement in Sharpe ratio over the coarse‑grained setup.
  • The intermediate outputs (e.g., sentiment scores) that most closely match the final trading direction are the strongest predictors of success, confirming the importance of alignment across the pipeline.
  • Portfolio‑level optimization that mixes several fine‑grained agents reduces overall volatility and further lifts the Sharpe ratio to 1.62.

Practical Implications

  • Modular Design for Production: Developers can treat each LLM agent as a microservice with a clear API (input data → structured output), making the system easier to monitor, version, and scale.
  • Transparency & Auditing: Because each subtask produces a human‑readable artifact (e.g., a table of extracted ratios), compliance teams can trace why a trade was generated—a major hurdle for “black‑box” AI trading bots.
  • Rapid Prototyping: Fine‑tuning is unnecessary; swapping in a newer LLM or adding a new data source only requires updating the prompt examples, accelerating iteration cycles.
  • Risk Management: The low correlation of the fine‑grained signals with the market index provides a natural hedge, which can be leveraged in multi‑strategy funds or as a supplemental alpha source for existing systematic portfolios.
  • Cross‑Market Applicability: While the study focuses on Japanese equities, the same task decomposition (fundamentals, news, macro) maps cleanly to other asset classes (e.g., US equities, commodities, crypto), allowing teams to reuse the architecture with minimal changes.

Limitations & Future Work

  • Prompt Sensitivity: The performance gains hinge on well‑crafted prompts; the paper notes that small wording changes can cause noticeable output drift, suggesting a need for systematic prompt engineering tools.
  • Latency Concerns: Running several LLM calls per decision window adds inference latency, which may be problematic for high‑frequency strategies. Optimizations such as model distillation or caching intermediate results were not explored.
  • Data Quality & Coverage: The study uses high‑quality Japanese financial disclosures and news feeds; applying the framework to markets with less structured reporting may require additional preprocessing pipelines.
  • Future Directions: The authors propose (1) integrating reinforcement learning to let the Synthesizer adapt its weighting of sub‑signals over time, (2) exploring hierarchical agent structures where higher‑level agents dynamically allocate tasks, and (3) benchmarking against fully end‑to‑end LLM agents trained on raw market data.

Authors

  • Kunihiro Miyazaki
  • Takanobu Kawahara
  • Stephen Roberts
  • Stefan Zohren

Paper Information

  • arXiv ID: 2602.23330v1
  • Categories: cs.AI, q-fin.TR
  • Published: February 26, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Model Agreement via Anchoring

Numerous lines of aim to control model disagreement -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and stan...

[Paper] A Dataset is Worth 1 MB

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...