[Paper] Meteorological data and Sky Images meets Neural Models for Photovoltaic Power Forecasting

Published: 3 days ago (February 17, 2026 at 01:14 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.15782v1

Overview

The paper presents a hybrid deep‑learning framework that fuses sky‑camera images, historical photovoltaic (PV) output, and conventional meteorological measurements to boost short‑ and long‑term solar power forecasts. By tackling the notoriously erratic “ramp” events that occur under cloudy skies, the authors show how a multimodal data strategy can make PV predictions more reliable for grid operators and renewable‑energy developers.

Key Contributions

Multimodal forecasting architecture that jointly processes sky images, PV time‑series, and weather variables (e.g., surface long‑wave radiation, wind speed, solar position).
Two‑stage experimental design covering both nowcasting (minutes‑ahead) and day‑ahead forecasting, demonstrating the approach works across time horizons.
Empirical evidence that adding specific meteorological features—especially downward long‑wave radiation and a combined wind + solar‑position vector— markedly improves ramp‑event detection on cloudy days.
Interpretability analysis showing how each data modality contributes to the final prediction, helping practitioners understand model behavior.
Open‑source implementation (released with the paper) that can be adapted to other PV sites with minimal re‑training.

Methodology

1. Data Collection

Sky images captured every few minutes by a fisheye camera installed at the PV plant.
PV power history (5‑minute resolution) from the inverter.
Meteorological variables from a nearby weather station: surface long‑wave radiation, short‑wave radiation, temperature, wind speed/direction, and an analytically computed solar‑position vector (azimuth/elevation).

2. Pre‑processing

Images are resized and normalized; a convolutional backbone (ResNet‑18) extracts visual features.
Time‑series data are windowed (e.g., past 30 min) and fed into a 1‑D CNN/LSTM stack.
Meteorological variables are concatenated as a dense feature vector.

3. Model Fusion

Feature vectors from the three streams are merged via a fully‑connected “fusion” layer.
The fused representation passes through a few dense layers to output either:
- Nowcasting – PV power for the next 5–30 minutes.
- Forecasting – PV power for the next 1–24 hours.

4. Training & Evaluation

Loss: Mean Absolute Error (MAE) with an auxiliary ramp‑event loss that penalizes missed rapid changes.
Baselines: persistence model, pure weather‑only neural net, and image‑only CNN.
Metrics: MAE, Root‑Mean‑Square Error (RMSE), and a custom “Ramp Detection Score”.

The pipeline is deliberately modular, allowing developers to swap in different image encoders or weather sensors without redesigning the whole system.

Results & Findings

Scenario	Baseline MAE (kW)	Hybrid Model MAE (kW)	% Improvement
Nowcasting (15 min)	2.8	2.1	25 %
Day‑ahead (6 h)	5.4	4.2	22 %
Ramp‑event detection (F1)	0.61	0.78	28 %

Cloudy days: The hybrid model reduced error by up to 35 % compared with image‑only or weather‑only baselines, confirming that the combined data sources capture complementary cues (e.g., cloud motion from images + thermal inertia from long‑wave radiation).
Feature importance (via SHAP analysis): Surface long‑wave radiation and the wind + solar‑position vector were the top contributors, especially for forecasts beyond 1 hour.
Robustness: When a subset of sensors failed (e.g., missing wind data), performance degraded gracefully, indicating the model can fallback on the remaining modalities.

Practical Implications

Grid operators can rely on more accurate ramp‑event forecasts to schedule reserve generation, reducing the need for costly spinning reserves.
PV plant owners gain better insight into expected output, enabling smarter market bidding and storage‑dispatch strategies.
Developers of solar‑forecasting APIs can adopt the multimodal fusion pattern to differentiate their services, especially in regions with frequent cloud cover.
Edge deployment: Because the image encoder can be pruned to run on low‑power devices (e.g., NVIDIA Jetson), the whole pipeline can be hosted on‑site, delivering near‑real‑time predictions without heavy bandwidth usage.
Data‑driven planning: The interpretability layer helps engineers pinpoint which sensors add the most value, guiding future instrumentation investments (e.g., adding a long‑wave radiometer may be cheaper than installing more cameras).

Limitations & Future Work

Site specificity: The model was trained on a single Mediterranean‑climate PV plant; transferability to vastly different climates (e.g., desert or high‑latitude) needs validation.
Image quality dependency: Heavy rain or fog can obscure sky cameras, reducing the visual signal; the authors suggest integrating satellite imagery as a backup.
Computational cost: While feasible on modern edge hardware, the full fusion model still requires GPU acceleration for optimal latency, which may be a barrier for very low‑cost installations.

Future Directions

Extending the framework to probabilistic forecasts (prediction intervals).
Exploring self‑supervised pre‑training on large unlabeled sky‑image datasets to reduce the need for site‑specific labeled data.
Incorporating numerical weather prediction (NWP) outputs to further stretch the forecasting horizon beyond 24 hours.

Bottom line: By weaving together what the sky looks like, what the weather says, and how the plant has behaved, this research delivers a concrete recipe for more dependable solar power forecasts—an advance that could translate into smoother grid operations and better economics for the renewable‑energy sector.

Authors

Ines Montoya‑Espinagosa
Antonio Agudo

Paper Information

arXiv ID: 2602.15782v1
Categories: cs.CV
Published: February 17, 2026
PDF: Download PDF