[Paper] ESG Reporting Lifecycle Management with Large Language Models and AI Agents
Source: arXiv - 2603.10646v1
Overview
The paper proposes an agentic ESG lifecycle framework that injects large‑language‑model (LLM) powered AI agents into every stage of ESG (Environmental, Social, Governance) reporting—from data discovery to continuous improvement. By turning a traditionally static, manual process into an automated, feedback‑driven workflow, the authors aim to make ESG compliance faster, more consistent, and easier to audit for modern enterprises.
Key Contributions
- End‑to‑end agentic workflow that maps AI agents to the five canonical ESG stages: identification, measurement, reporting, engagement, and improvement.
- Formal definition of technical requirements and quality attributes (e.g., traceability, verifiability, scalability) for four core ESG tasks: report validation, multi‑report comparison, report generation, and knowledge‑base maintenance.
- Three architectural blueprints—single‑model, single‑agent, and multi‑agent—illustrating how to trade off simplicity, modularity, and robustness when deploying LLMs for ESG automation.
- Open‑source prototype (code & data) that demonstrates each architecture on real‑world ESG datasets, enabling reproducibility and community extensions.
Methodology
- Task Decomposition – The authors break the ESG lifecycle into discrete, AI‑amenable tasks (e.g., extracting carbon‑emission figures from PDFs, cross‑checking governance disclosures against regulatory taxonomies).
- Agent Design – For each task, an AI agent is instantiated. Agents are built on off‑the‑shelf LLMs (e.g., GPT‑4‑style models) and are equipped with custom prompts, tool‑calling APIs (for data retrieval, calculations), and a lightweight verification layer that flags low‑confidence outputs.
- Architectural Variants
- Single‑model: One monolithic LLM handles all tasks via dynamic prompting.
- Single‑agent: A dedicated “ESG‑assistant” agent orchestrates sub‑tasks internally.
- Multi‑agent: A suite of specialized agents (e.g., “Carbon‑Extractor”, “Governance‑Validator”) communicate through a shared knowledge base, enabling parallelism and fine‑grained error handling.
- Evaluation – The prototype is run on a benchmark set of publicly available ESG reports (annual sustainability statements, CSR disclosures). Metrics include extraction accuracy, report‑generation coherence, validation precision, and runtime overhead across the three architectures.
Results & Findings
| Architecture | Extraction F1 | Validation Precision | Report Generation BLEU* | Avg. Latency (s) |
|---|---|---|---|---|
| Single‑model | 0.78 | 0.71 | 0.62 | 12 |
| Single‑agent | 0.84 | 0.78 | 0.68 | 15 |
| Multi‑agent | 0.89 | 0.84 | 0.73 | 18 |
*BLEU is used as a proxy for textual similarity to human‑written ESG sections.
- Higher accuracy with specialization: The multi‑agent setup consistently outperformed the monolithic approaches, confirming that task‑specific prompts and verification loops reduce hallucinations.
- Acceptable performance trade‑off: Even the most accurate multi‑agent pipeline stayed under 20 seconds per report, making it viable for quarterly reporting cycles.
- Feedback loop effectiveness: When simulated “post‑report” outcome data (e.g., actual emissions) were fed back, the agents automatically updated the knowledge base and regenerated the affected sections, demonstrating true “continuous improvement.”
Practical Implications
- Speed up compliance – Companies can generate draft ESG disclosures in minutes rather than days, freeing sustainability teams to focus on strategy rather than data wrangling.
- Reduce audit risk – Automated verification flags inconsistencies early, helping firms avoid costly regulator re‑requests.
- Enable comparative analytics – The multi‑report comparison module can instantly benchmark a firm against peers, supporting investors and ESG rating agencies.
- Plug‑and‑play integration – The three architectural patterns give CTOs flexibility: start with a single‑model prototype for a pilot, then evolve to a multi‑agent microservice architecture as data volume grows.
- Open‑source foundation – The released GitLab repo provides ready‑to‑deploy Docker images and API specs, lowering the barrier for internal tooling or SaaS products.
Limitations & Future Work
- Domain‑specific data scarcity – The prototype relies on publicly available ESG reports; performance may degrade on niche industries with proprietary terminology.
- LLM hallucination risk – Although verification layers improve reliability, occasional factual errors still surface, especially in nuanced governance disclosures.
- Scalability to enterprise scale – The current evaluation covers a few dozen reports; handling thousands of filings per quarter will require more robust orchestration and cost‑optimisation strategies.
- Future directions suggested by the authors include: incorporating retrieval‑augmented generation (RAG) for up‑to‑date regulatory texts, extending the framework to ESG‑related risk modeling, and exploring human‑in‑the‑loop interfaces for expert review.
Authors
- Thong Hoang
- Mykhailo Klymenko
- Xiwei Xu
- Shidong Pan
- Yi Ding
- Xushuo Tang
- Zhengyi Yang
- Jieke Shi
- David Lo
Paper Information
- arXiv ID: 2603.10646v1
- Categories: cs.SE
- Published: March 11, 2026
- PDF: Download PDF