[Paper] ESG Reporting Lifecycle Management with Large Language Models and AI Agents

Published: 1 month ago (March 11, 2026 at 07:05 AM EDT)

4 min read

Source: arXiv

Source: arXiv - 2603.10646v1

Overview

The paper proposes an agentic ESG lifecycle framework that injects large‑language‑model (LLM) powered AI agents into every stage of ESG (Environmental, Social, Governance) reporting—from data discovery to continuous improvement. By turning a traditionally static, manual process into an automated, feedback‑driven workflow, the authors aim to make ESG compliance faster, more consistent, and easier to audit for modern enterprises.

Key Contributions

End‑to‑end agentic workflow that maps AI agents to the five canonical ESG stages: identification, measurement, reporting, engagement, and improvement.
Formal definition of technical requirements and quality attributes (e.g., traceability, verifiability, scalability) for four core ESG tasks: report validation, multi‑report comparison, report generation, and knowledge‑base maintenance.
Three architectural blueprints—single‑model, single‑agent, and multi‑agent—illustrating how to trade off simplicity, modularity, and robustness when deploying LLMs for ESG automation.
Open‑source prototype (code & data) that demonstrates each architecture on real‑world ESG datasets, enabling reproducibility and community extensions.

Methodology

Task Decomposition – The authors break the ESG lifecycle into discrete, AI‑amenable tasks (e.g., extracting carbon‑emission figures from PDFs, cross‑checking governance disclosures against regulatory taxonomies).
Agent Design – For each task, an AI agent is instantiated. Agents are built on off‑the‑shelf LLMs (e.g., GPT‑4‑style models) and are equipped with custom prompts, tool‑calling APIs (for data retrieval, calculations), and a lightweight verification layer that flags low‑confidence outputs.
Architectural Variants
- Single‑model: One monolithic LLM handles all tasks via dynamic prompting.
- Single‑agent: A dedicated “ESG‑assistant” agent orchestrates sub‑tasks internally.
- Multi‑agent: A suite of specialized agents (e.g., “Carbon‑Extractor”, “Governance‑Validator”) communicate through a shared knowledge base, enabling parallelism and fine‑grained error handling.
Evaluation – The prototype is run on a benchmark set of publicly available ESG reports (annual sustainability statements, CSR disclosures). Metrics include extraction accuracy, report‑generation coherence, validation precision, and runtime overhead across the three architectures.

Results & Findings

Architecture	Extraction F1	Validation Precision	Report Generation BLEU*	Avg. Latency (s)
Single‑model	0.78	0.71	0.62	12
Single‑agent	0.84	0.78	0.68	15
Multi‑agent	0.89	0.84	0.73	18

*BLEU is used as a proxy for textual similarity to human‑written ESG sections.

Higher accuracy with specialization: The multi‑agent setup consistently outperformed the monolithic approaches, confirming that task‑specific prompts and verification loops reduce hallucinations.
Acceptable performance trade‑off: Even the most accurate multi‑agent pipeline stayed under 20 seconds per report, making it viable for quarterly reporting cycles.
Feedback loop effectiveness: When simulated “post‑report” outcome data (e.g., actual emissions) were fed back, the agents automatically updated the knowledge base and regenerated the affected sections, demonstrating true “continuous improvement.”

Practical Implications

Speed up compliance – Companies can generate draft ESG disclosures in minutes rather than days, freeing sustainability teams to focus on strategy rather than data wrangling.
Reduce audit risk – Automated verification flags inconsistencies early, helping firms avoid costly regulator re‑requests.
Enable comparative analytics – The multi‑report comparison module can instantly benchmark a firm against peers, supporting investors and ESG rating agencies.
Plug‑and‑play integration – The three architectural patterns give CTOs flexibility: start with a single‑model prototype for a pilot, then evolve to a multi‑agent microservice architecture as data volume grows.
Open‑source foundation – The released GitLab repo provides ready‑to‑deploy Docker images and API specs, lowering the barrier for internal tooling or SaaS products.

Limitations & Future Work

Domain‑specific data scarcity – The prototype relies on publicly available ESG reports; performance may degrade on niche industries with proprietary terminology.
LLM hallucination risk – Although verification layers improve reliability, occasional factual errors still surface, especially in nuanced governance disclosures.
Scalability to enterprise scale – The current evaluation covers a few dozen reports; handling thousands of filings per quarter will require more robust orchestration and cost‑optimisation strategies.
Future directions suggested by the authors include: incorporating retrieval‑augmented generation (RAG) for up‑to‑date regulatory texts, extending the framework to ESG‑related risk modeling, and exploring human‑in‑the‑loop interfaces for expert review.

Authors

Thong Hoang
Mykhailo Klymenko
Xiwei Xu
Shidong Pan
Yi Ding
Xushuo Tang
Zhengyi Yang
Jieke Shi
David Lo

Paper Information

arXiv ID: 2603.10646v1
Categories: cs.SE
Published: March 11, 2026
PDF: Download PDF

[Paper] ESG Reporting Lifecycle Management with Large Language Models and AI Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

How to Build Your First AI Agent in 2026: A Practical Guide

Optimizing Content for Agents

Why Care About Prompt Caching in LLMs?

Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas