[Paper] Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting

Published: 2 months ago (November 28, 2025 at 12:27 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.23387v1

Overview

The Hierarchical AI‑Meteorologist paper introduces a novel LLM‑agent system that turns raw weather data into clear, explainable forecasts. By reasoning at multiple time‑scales (hourly, 6‑hourly, daily) and extracting concise “weather keywords,” the system produces narratives that are both human‑readable and machine‑verifiable—addressing a long‑standing gap between data‑driven models and trustworthy weather reporting.

Key Contributions

Hierarchical reasoning framework that fuses short‑term and long‑term meteorological signals before generating text.
Dual‑output LLM agent: simultaneously creates a natural‑language forecast and a short list of semantic keywords summarizing dominant weather events.
Keyword‑anchored validation: uses the extracted keywords to check temporal coherence, factual consistency, and overall plausibility of the generated report.
Open‑source reproducible pipeline built on publicly available OpenWeather and Meteostat datasets, enabling other researchers and developers to replicate and extend the approach.
Demonstrated improvement in interpretability and robustness compared with flat, single‑scale LLM forecasting baselines.

Methodology

Data Ingestion – Raw observations (temperature, wind, precipitation, etc.) are pulled from OpenWeather and Meteostat APIs and pre‑processed into structured time‑series tables at three granularities: hourly, 6‑hourly, and daily.
Hierarchical Context Construction – The three granularities are fed into a lightweight transformer encoder that learns cross‑scale relationships (e.g., a sudden temperature dip in the hourly slice that aligns with a larger cold‑front trend in the daily slice).
LLM‑Agent Prompting – The encoded context is inserted into a prompt for a large language model (e.g., GPT‑4‑style). The prompt explicitly asks the model to:
- Write a concise weather narrative for the target region and period.
- Output 3‑5 “weather keywords” that capture the most salient phenomena (e.g., cold‑front, heavy‑rain, gusty‑winds).
Keyword‑Based Consistency Checks – After generation, a lightweight rule‑based verifier cross‑references the keywords with the original structured data. If mismatches are detected (e.g., a keyword “snow” but no snowfall in the data), the system can request a regeneration or flag the report for human review.
Evaluation – The authors compare the hierarchical system against a flat baseline (single‑scale LLM) using both automatic metrics (BLEU, ROUGE) and human expert ratings for clarity, factuality, and usefulness.

Results & Findings

Metric	Hierarchical AI‑Meteorologist	Flat LLM Baseline
BLEU (forecast text)	0.42	0.31
ROUGE‑L (summary quality)	0.58	0.44
Keyword‑Data Alignment	93 % correct	71 % correct
Human expert rating (1‑5) – Clarity	4.6	3.8
Human expert rating – Factual consistency	4.7	3.9

The hierarchical model consistently produced more accurate and coherent narratives, especially for multi‑day forecasts where trend aggregation matters.
Keyword extraction proved a reliable “semantic anchor”: mismatches dropped dramatically, and the verification step caught 87 % of factual errors before they reached the end user.
Qualitative feedback highlighted that developers found the keyword list useful for downstream automation (e.g., triggering alerts or populating UI widgets).

Practical Implications

Automated Weather Services – Companies that provide weather APIs can embed the hierarchical agent to generate ready‑to‑publish text, reducing manual editorial effort.
Alert & Notification Systems – The concise keyword set can feed directly into rule‑based alert pipelines (e.g., “if heavy‑rain appears, send flood warning”).
Localization & Accessibility – Because the LLM produces natural language, the same pipeline can be re‑prompted for different languages or simplified summaries for non‑technical audiences.
Explainable AI Audits – The keyword‑anchored validation offers a transparent audit trail, satisfying regulatory or compliance requirements for AI‑generated content.
Edge Deployment – The hierarchical encoder is lightweight enough to run on edge servers close to data sources, enabling near‑real‑time forecast generation for IoT devices (smart agriculture, autonomous drones, etc.).

Limitations & Future Work

Model Dependency – The quality hinges on the underlying LLM; smaller or open‑source models may not match the reported performance without fine‑tuning.
Geographic Scope – Experiments focused on mid‑latitude regions with dense observation networks; performance in data‑sparse areas (e.g., oceans, remote polar zones) remains untested.
Keyword Granularity – Fixed‑size keyword lists may miss nuanced phenomena; future work could explore hierarchical keyword trees or dynamic length selection.
Real‑Time Constraints – While the encoder is efficient, the full LLM inference can still be latency‑heavy for ultra‑low‑latency applications; model distillation or caching strategies are suggested next steps.

Overall, the Hierarchical AI‑Meteorologist showcases a promising path toward trustworthy, explainable AI‑driven weather reporting—bridging the gap between raw meteorological data and developer‑friendly, actionable insights.

Authors

Daniil Sukhorukov
Andrei Zakharov
Nikita Glazkov
Katsiaryna Yanchanka
Vladimir Kirilin
Maxim Dubovitsky
Roman Sultimov
Yuri Maksimov
Ilya Makarov

Paper Information

arXiv ID: 2511.23387v1
Categories: cs.AI
Published: November 28, 2025
PDF: Download PDF

[Paper] Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

[Paper] Physics-Informed Neural Networks for Thermophysical Property Retrieval