[Paper] ESG Reporting Lifecycle Management with Large Language Models and AI Agents

Published: (March 11, 2026 at 07:05 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2603.10646v1

Overview

The paper proposes an agentic ESG lifecycle framework that injects large‑language‑model (LLM) powered AI agents into every stage of ESG (Environmental, Social, Governance) reporting—from data discovery to continuous improvement. By turning a traditionally static, manual process into an automated, feedback‑driven workflow, the authors aim to make ESG compliance faster, more consistent, and easier to audit for modern enterprises.

Key Contributions

  • End‑to‑end agentic workflow that maps AI agents to the five canonical ESG stages: identification, measurement, reporting, engagement, and improvement.
  • Formal definition of technical requirements and quality attributes (e.g., traceability, verifiability, scalability) for four core ESG tasks: report validation, multi‑report comparison, report generation, and knowledge‑base maintenance.
  • Three architectural blueprints—single‑model, single‑agent, and multi‑agent—illustrating how to trade off simplicity, modularity, and robustness when deploying LLMs for ESG automation.
  • Open‑source prototype (code & data) that demonstrates each architecture on real‑world ESG datasets, enabling reproducibility and community extensions.

Methodology

  1. Task Decomposition – The authors break the ESG lifecycle into discrete, AI‑amenable tasks (e.g., extracting carbon‑emission figures from PDFs, cross‑checking governance disclosures against regulatory taxonomies).
  2. Agent Design – For each task, an AI agent is instantiated. Agents are built on off‑the‑shelf LLMs (e.g., GPT‑4‑style models) and are equipped with custom prompts, tool‑calling APIs (for data retrieval, calculations), and a lightweight verification layer that flags low‑confidence outputs.
  3. Architectural Variants
    • Single‑model: One monolithic LLM handles all tasks via dynamic prompting.
    • Single‑agent: A dedicated “ESG‑assistant” agent orchestrates sub‑tasks internally.
    • Multi‑agent: A suite of specialized agents (e.g., “Carbon‑Extractor”, “Governance‑Validator”) communicate through a shared knowledge base, enabling parallelism and fine‑grained error handling.
  4. Evaluation – The prototype is run on a benchmark set of publicly available ESG reports (annual sustainability statements, CSR disclosures). Metrics include extraction accuracy, report‑generation coherence, validation precision, and runtime overhead across the three architectures.

Results & Findings

ArchitectureExtraction F1Validation PrecisionReport Generation BLEU*Avg. Latency (s)
Single‑model0.780.710.6212
Single‑agent0.840.780.6815
Multi‑agent0.890.840.7318

*BLEU is used as a proxy for textual similarity to human‑written ESG sections.

  • Higher accuracy with specialization: The multi‑agent setup consistently outperformed the monolithic approaches, confirming that task‑specific prompts and verification loops reduce hallucinations.
  • Acceptable performance trade‑off: Even the most accurate multi‑agent pipeline stayed under 20 seconds per report, making it viable for quarterly reporting cycles.
  • Feedback loop effectiveness: When simulated “post‑report” outcome data (e.g., actual emissions) were fed back, the agents automatically updated the knowledge base and regenerated the affected sections, demonstrating true “continuous improvement.”

Practical Implications

  • Speed up compliance – Companies can generate draft ESG disclosures in minutes rather than days, freeing sustainability teams to focus on strategy rather than data wrangling.
  • Reduce audit risk – Automated verification flags inconsistencies early, helping firms avoid costly regulator re‑requests.
  • Enable comparative analytics – The multi‑report comparison module can instantly benchmark a firm against peers, supporting investors and ESG rating agencies.
  • Plug‑and‑play integration – The three architectural patterns give CTOs flexibility: start with a single‑model prototype for a pilot, then evolve to a multi‑agent microservice architecture as data volume grows.
  • Open‑source foundation – The released GitLab repo provides ready‑to‑deploy Docker images and API specs, lowering the barrier for internal tooling or SaaS products.

Limitations & Future Work

  • Domain‑specific data scarcity – The prototype relies on publicly available ESG reports; performance may degrade on niche industries with proprietary terminology.
  • LLM hallucination risk – Although verification layers improve reliability, occasional factual errors still surface, especially in nuanced governance disclosures.
  • Scalability to enterprise scale – The current evaluation covers a few dozen reports; handling thousands of filings per quarter will require more robust orchestration and cost‑optimisation strategies.
  • Future directions suggested by the authors include: incorporating retrieval‑augmented generation (RAG) for up‑to‑date regulatory texts, extending the framework to ESG‑related risk modeling, and exploring human‑in‑the‑loop interfaces for expert review.

Authors

  • Thong Hoang
  • Mykhailo Klymenko
  • Xiwei Xu
  • Shidong Pan
  • Yi Ding
  • Xushuo Tang
  • Zhengyi Yang
  • Jieke Shi
  • David Lo

Paper Information

  • arXiv ID: 2603.10646v1
  • Categories: cs.SE
  • Published: March 11, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

Optimizing Content for Agents

Just as useless of an idea as LLMs.txt was It’s all dumb abstractions that AI doesn’t need because AIs are as smart as humans so they can just use what was alre...

Why Care About Prompt Caching in LLMs?

Scaling Costs and Latency in RAG and AI Agents We’ve talked a lot about what an incredible tool RAG is for leveraging the power of AI on custom data. Whether w...