[Paper] Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

Published: 3 days ago (March 6, 2026 at 12:36 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2603.06503v1

Overview

The paper introduces Beyond Rows to Reasoning (BRTR), a new agentic framework that lets large language models (LLMs) work with massive, real‑world spreadsheets in a truly interactive way. By swapping the traditional single‑shot retrieval step for an iterative “tool‑calling” loop, BRTR can fetch, reason about, and edit data across dozens of sheets while preserving the fine‑grained context that enterprise users need.

Key Contributions

Agentic Retrieval Loop – Replaces one‑shot retrieval with a multi‑step, planner‑driven tool‑calling cycle that can pull rows, charts, and formulas on demand.
Multimodal Embedding Evaluation – Benchmarks five embedding models on mixed tabular‑visual data; NVIDIA NeMo Retriever 1B emerges as the best performer.
State‑of‑the‑Art Benchmarks – Sets new records on three hard spreadsheet‑understanding suites (FRTR‑Bench +25 pts, SpreadsheetLLM +7 pts, FINCH +32 pts).
Comprehensive Human Evaluation – Over 200 hours of expert assessment confirm the system’s reliability and auditability.
Cost‑Efficiency Analysis – Shows that GPT‑5.2 delivers the optimal balance of accuracy and API cost for the iterative workflow.

Methodology

Planner Layer – A lightweight LLM decides what information is needed next (e.g., “fetch rows 120‑150 from Sheet 3”).
Retriever Module – Uses a multimodal embedding index (NeMo Retriever 1B) to locate the exact cells, embedded images, or formulas that match the planner’s query.
Tool‑Calling Loop – The planner issues a series of API‑style calls (fetch, compute, edit). After each call, the LLM receives the returned data and can refine its next request, enabling multi‑step reasoning.
Execution Engine – A thin wrapper around Excel‑compatible libraries (e.g., openpyxl, pandas) carries out the actual data extraction or modification.
Audit Trail – Every tool call and its result are logged, giving developers a transparent, reproducible trace of the reasoning process.

The whole pipeline runs within the LLM’s context window because only the relevant slice of the spreadsheet is streamed in at each step, avoiding the massive token blow‑up of naïve full‑context injection.

Results & Findings

Benchmark	Prior SOTA	BRTR (best config)	Gain
FRTR‑Bench	58 %	83 %	+25 pts
SpreadsheetLLM	71 %	78 %	+7 pts
FINCH	44 %	76 %	+32 pts

Retrieval quality: NeMo Retriever 1B achieved a 12 % higher recall on mixed tabular‑visual queries than the next best model.
Ablation: Removing the planner drops performance by ~15 pts; skipping iterative retrieval costs another ~10 pts, confirming each component’s necessity.
Cost: Using GPT‑5.2 for the planner and reasoning costs ~0.45 ¢ per spreadsheet interaction, roughly half the expense of GPT‑4 while delivering higher accuracy.

Practical Implications

Enterprise Automation: Companies can embed BRTR into internal bots that answer ad‑hoc financial queries, generate KPI dashboards, or audit spreadsheets without exposing the entire workbook to the LLM.
Developer Tooling: SDKs can expose the planner‑retriever API, letting developers build custom “spreadsheet assistants” that fetch only the rows they need, dramatically reducing latency and token usage.
Regulatory Compliance: The explicit tool‑call trace satisfies audit requirements for financial reporting, making AI‑augmented spreadsheet edits defensible in regulated industries.
Cross‑Modal Insight: Because the retriever handles embedded charts and images, analysts can ask questions like “What trend does the sales chart in Sheet 2 show?” and receive a data‑driven answer, opening up new UI possibilities for BI platforms.

Limitations & Future Work

Scalability to Billion‑Cell Workbooks: While BRTR handles millions of cells, the current index construction still requires offline preprocessing; real‑time indexing of truly massive workbooks remains an open challenge.
Domain‑Specific Knowledge: The planner relies on general‑purpose LLMs; specialized financial or scientific vocabularies sometimes cause sub‑optimal retrieval queries. Fine‑tuning the planner on domain corpora is a promising direction.
Tooling Ecosystem: The prototype integrates with Python‑based Excel libraries; extending support to cloud‑native spreadsheet services (Google Sheets, Office 365) will broaden applicability.

Bottom line: BRTR shows that moving from a “single‑shot fetch‑and‑answer” mindset to an agentic, iterative retrieval paradigm can unlock reliable, audit‑ready AI assistance for the complex spreadsheets that power modern enterprises. Developers interested in building smarter data‑centric assistants should keep an eye on this emerging workflow.

Authors

Anmol Gulati
Sahil Sen
Waqar Sarguroh
Kevin Paul

Paper Information

arXiv ID: 2603.06503v1
Categories: cs.CL
Published: March 6, 2026
PDF: Download PDF

[Paper] Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] KCLarity at SemEval-2026 Task 6: Encoder and Zero-Shot Approaches to Political Evasion Detection

[Paper] Speak in Context: Multilingual ASR with Speech Context Alignment via Contrastive Learning

[Paper] COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics

[Paper] NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches