[Paper] Reservoir property image slices from the Groningen gas field for image translation and segmentation
Source: arXiv - 2605.03942v1
Overview
The authors release a high‑resolution, open‑source image dataset extracted from the Groningen gas field’s static geological model. By turning 3‑D reservoir grids into aligned 2‑D PNG slices of facies, porosity, permeability, and water saturation, they give the ML and geoscience communities a ready‑to‑use benchmark for image‑to‑image translation, segmentation, and other deep‑learning tasks.
Key Contributions
- Comprehensive image corpus – > 10 k PNG slices covering four core reservoir properties, all spatially aligned and ready for pixel‑wise analysis.
- Reproducible processing pipeline – archived scripts (Python/NumPy, GDAL, PyTorch) for data augmentation, mask creation, and paired‑image generation.
- Baseline experiments – reference implementations for semantic segmentation (U‑Net) and image‑to‑image translation (Pix2Pix) with reported performance metrics.
- Cross‑domain research enablement – the dataset supports studies of how one property (e.g., porosity) can be inferred from another (e.g., facies) using generative models.
- Open licensing & documentation – fully documented dataset and code under a permissive license, encouraging community contributions and reproducibility.
Methodology
- Data extraction – The authors start from the 3‑D Groningen static model (grid cells with numerical property values).
- Slice generation – For each axial direction (X‑Y, X‑Z, Y‑Z) they rasterize the grid into 2‑D images, mapping scalar values to grayscale (or categorical colormaps for facies).
- Alignment & formatting – All property images share identical pixel dimensions and coordinate systems, enabling direct pixel‑wise operations.
- Augmentation & pairing – Using a provided workflow, they apply rotations, flips, and noise to create training/validation splits and construct paired datasets (e.g., facies ↔ porosity).
- Baseline modeling – Simple CNN‑based U‑Net for segmentation and a conditional GAN (Pix2Pix) for translation are trained on the paired data to demonstrate feasibility and to set performance baselines.
The pipeline is deliberately modular: developers can swap in their own models, add new augmentations, or extend the dataset with additional properties.
Results & Findings
- Segmentation – The U‑Net baseline achieved ~ 85 % mean Intersection‑over‑Union (mIoU) on facies classification, confirming that the image quality and labeling are sufficient for modern deep‑learning models.
- Image‑to‑Image Translation – Pix2Pix could predict porosity maps from facies with a structural similarity index (SSIM) of ~ 0.78, indicating that cross‑property relationships are learnable from the data.
- Reproducibility – All experiments could be re‑run from the archived scripts, producing identical metrics, which validates the integrity of the dataset and workflow.
These results serve as reference points; more sophisticated architectures (e.g., Transformers, diffusion models) are expected to push performance further.
Practical Implications
- Rapid prototyping – Reservoir engineers and data scientists can skip the time‑consuming step of building a geological model and jump straight into model development and hyper‑parameter tuning.
- Benchmarking – The dataset offers a common ground for comparing segmentation, super‑resolution, or generative approaches across the geoscience and computer‑vision communities.
- Cross‑property inference – Companies can explore AI‑driven “property completion” (e.g., estimating permeability where only facies data exist), potentially reducing the need for expensive well logs or core analysis.
- Educational tool – Universities can use the dataset in courses on geostatistics, reservoir simulation, or deep learning, giving students hands‑on experience with realistic subsurface data.
- Integration with existing workflows – Because the images are in standard PNG format and the code uses widely adopted libraries, they can be plugged into commercial reservoir‑modeling platforms (e.g., Schlumberger Petrel, Halliburton Landmark) via simple import scripts.
Limitations & Future Work
- Static model only – The dataset reflects a single static snapshot; dynamic properties (e.g., pressure changes over time) are not captured.
- Resolution constraints – While high for 2‑D slices, the pixel size may still be coarse for fine‑scale heterogeneity studies.
- Geographical specificity – The data originates from the Groningen field; transferability to other basins with different geology may require domain adaptation.
- Future extensions – The authors suggest adding time‑lapse (4‑D) data, incorporating seismic attributes, and expanding the benchmark to include uncertainty quantification and physics‑informed neural networks.
Overall, this open dataset lowers the barrier for applying state‑of‑the‑art computer‑vision techniques to reservoir characterization, opening new avenues for AI‑enhanced oil and gas exploration and production.
Authors
- Abdulrahman Al-Fakih
- Nabil Sariah
- Ardiansyah Koeshidayatullah
- SanLinn I. Kaka
Paper Information
- arXiv ID: 2605.03942v1
- Categories: cs.CV, cs.DB, physics.geo-ph
- Published: May 5, 2026
- PDF: Download PDF