[Paper] Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

Published: (March 6, 2026 at 12:36 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2603.06503v1

Overview

The paper introduces Beyond Rows to Reasoning (BRTR), a new agentic framework that lets large language models (LLMs) work with massive, real‑world spreadsheets in a truly interactive way. By swapping the traditional single‑shot retrieval step for an iterative “tool‑calling” loop, BRTR can fetch, reason about, and edit data across dozens of sheets while preserving the fine‑grained context that enterprise users need.

Key Contributions

  • Agentic Retrieval Loop – Replaces one‑shot retrieval with a multi‑step, planner‑driven tool‑calling cycle that can pull rows, charts, and formulas on demand.
  • Multimodal Embedding Evaluation – Benchmarks five embedding models on mixed tabular‑visual data; NVIDIA NeMo Retriever 1B emerges as the best performer.
  • State‑of‑the‑Art Benchmarks – Sets new records on three hard spreadsheet‑understanding suites (FRTR‑Bench +25 pts, SpreadsheetLLM +7 pts, FINCH +32 pts).
  • Comprehensive Human Evaluation – Over 200 hours of expert assessment confirm the system’s reliability and auditability.
  • Cost‑Efficiency Analysis – Shows that GPT‑5.2 delivers the optimal balance of accuracy and API cost for the iterative workflow.

Methodology

  1. Planner Layer – A lightweight LLM decides what information is needed next (e.g., “fetch rows 120‑150 from Sheet 3”).
  2. Retriever Module – Uses a multimodal embedding index (NeMo Retriever 1B) to locate the exact cells, embedded images, or formulas that match the planner’s query.
  3. Tool‑Calling Loop – The planner issues a series of API‑style calls (fetch, compute, edit). After each call, the LLM receives the returned data and can refine its next request, enabling multi‑step reasoning.
  4. Execution Engine – A thin wrapper around Excel‑compatible libraries (e.g., openpyxl, pandas) carries out the actual data extraction or modification.
  5. Audit Trail – Every tool call and its result are logged, giving developers a transparent, reproducible trace of the reasoning process.

The whole pipeline runs within the LLM’s context window because only the relevant slice of the spreadsheet is streamed in at each step, avoiding the massive token blow‑up of naïve full‑context injection.

Results & Findings

BenchmarkPrior SOTABRTR (best config)Gain
FRTR‑Bench58 %83 %+25 pts
SpreadsheetLLM71 %78 %+7 pts
FINCH44 %76 %+32 pts
  • Retrieval quality: NeMo Retriever 1B achieved a 12 % higher recall on mixed tabular‑visual queries than the next best model.
  • Ablation: Removing the planner drops performance by ~15 pts; skipping iterative retrieval costs another ~10 pts, confirming each component’s necessity.
  • Cost: Using GPT‑5.2 for the planner and reasoning costs ~0.45 ¢ per spreadsheet interaction, roughly half the expense of GPT‑4 while delivering higher accuracy.

Practical Implications

  • Enterprise Automation: Companies can embed BRTR into internal bots that answer ad‑hoc financial queries, generate KPI dashboards, or audit spreadsheets without exposing the entire workbook to the LLM.
  • Developer Tooling: SDKs can expose the planner‑retriever API, letting developers build custom “spreadsheet assistants” that fetch only the rows they need, dramatically reducing latency and token usage.
  • Regulatory Compliance: The explicit tool‑call trace satisfies audit requirements for financial reporting, making AI‑augmented spreadsheet edits defensible in regulated industries.
  • Cross‑Modal Insight: Because the retriever handles embedded charts and images, analysts can ask questions like “What trend does the sales chart in Sheet 2 show?” and receive a data‑driven answer, opening up new UI possibilities for BI platforms.

Limitations & Future Work

  • Scalability to Billion‑Cell Workbooks: While BRTR handles millions of cells, the current index construction still requires offline preprocessing; real‑time indexing of truly massive workbooks remains an open challenge.
  • Domain‑Specific Knowledge: The planner relies on general‑purpose LLMs; specialized financial or scientific vocabularies sometimes cause sub‑optimal retrieval queries. Fine‑tuning the planner on domain corpora is a promising direction.
  • Tooling Ecosystem: The prototype integrates with Python‑based Excel libraries; extending support to cloud‑native spreadsheet services (Google Sheets, Office 365) will broaden applicability.

Bottom line: BRTR shows that moving from a “single‑shot fetch‑and‑answer” mindset to an agentic, iterative retrieval paradigm can unlock reliable, audit‑ready AI assistance for the complex spreadsheets that power modern enterprises. Developers interested in building smarter data‑centric assistants should keep an eye on this emerging workflow.

Authors

  • Anmol Gulati
  • Sahil Sen
  • Waqar Sarguroh
  • Kevin Paul

Paper Information

  • arXiv ID: 2603.06503v1
  • Categories: cs.CL
  • Published: March 6, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »