[Paper] PRISM-CAFO: Prior-conditioned Remote-sensing Infrastructure Segmentation and Mapping for CAFOs
Source: arXiv - 2601.11451v1
Overview
The paper introduces PRISM‑CAFO, an end‑to‑end, explainable pipeline that automatically discovers and characterises Concentrated Animal Feeding Operations (CAFOs) from high‑resolution aerial and satellite images. By combining a tuned object detector with modern segmentation and a lightweight cross‑attention classifier, the authors achieve state‑of‑the‑art accuracy while also providing visual attributions that tie each prediction back to the underlying infrastructure (barns, lagoons, silos, etc.). This work is especially relevant as the number of large livestock facilities grows and regulators, insurers, and NGOs need scalable, trustworthy mapping tools.
Key Contributions
- Infrastructure‑first detection: A domain‑tuned YOLOv8 model first spots candidate CAFO structures (e.g., barns, feedlots, manure lagoons).
- Segmentation‑guided refinement: SAM‑2 masks are generated from the detector boxes and filtered using component‑specific geometric rules (area, orientation, spatial relations).
- Hybrid feature fusion: Structured descriptors (counts, areas, relative positions) are fused with deep visual embeddings via a lightweight spatial cross‑attention classifier.
- Explainability by design: The system outputs mask‑level attribution maps that explicitly link classification decisions to detected infrastructure elements.
- Performance boost: When paired with a Swin‑B backbone, PRISM‑CAFO outperforms the strongest baseline by up to 15 % on a nationwide CAFO benchmark.
- Domain‑prior analysis: Gradient‑activation studies quantify how much the engineered priors (e.g., “barns are rectangular”) contribute to the final predictions.
Methodology
- Candidate detection – A YOLOv8 detector, fine‑tuned on a curated CAFO imagery set, scans each image and outputs bounding boxes for likely infrastructure pieces.
- Mask generation & filtering – For every box, the Segment Anything Model v2 (SAM‑2) produces a pixel‑accurate mask. Simple rule‑based filters (minimum area, aspect‑ratio, proximity to other masks) prune away false positives and enforce domain knowledge (e.g., lagoons are large, low‑aspect‑ratio blobs).
- Descriptor extraction – From the surviving masks the pipeline computes a set of structured features:
- Counts of each infrastructure type
- Area and perimeter statistics
- Orientation (principal axis)
- Spatial relations (e.g., distance between barn and lagoon)
- Feature fusion & classification – A Swin‑B transformer extracts a global visual embedding from the whole image. This embedding is combined with the structured descriptors through a spatial cross‑attention module that lets the model attend to the most relevant infrastructure when deciding the CAFO class (e.g., dairy, swine, poultry).
- Explainable output – The cross‑attention weights are visualised as attribution masks, highlighting which barns, lagoons, or silos drove the final label.
Results & Findings
- Accuracy: PRISM‑CAFO (Swin‑B backbone) achieves a mean Average Precision (mAP) of 0.84, beating the previous best (0.73) by up to 15 % across diverse U.S. regions.
- Robustness: Performance remains stable when tested on imagery from different sensors (e.g., PlanetScope vs. Maxar) and varying resolutions (30 cm–1 m).
- Ablation: Removing the structured descriptors drops mAP by ~6 %, confirming that domain priors add measurable value beyond raw pixels.
- Explainability: Gradient‑activation maps show that the classifier consistently focuses on the correct infrastructure masks (e.g., lagoons for swine CAFOs), providing a transparent audit trail.
- Scalability: The end‑to‑end pipeline processes a 1 km² tile in ~2.5 seconds on a single GPU, making continent‑scale mapping feasible.
Practical Implications
- Regulatory monitoring – Agencies can automate the detection of unregistered or non‑compliant CAFOs, reducing the need for costly field inspections.
- Risk assessment – Insurers and public‑health officials can overlay CAFO locations with disease‑outbreak or flood‑risk maps to prioritize mitigation.
- Environmental impact studies – Researchers can quickly quantify manure‑lagoon surface area or barn density to model nutrient runoff and greenhouse‑gas emissions.
- Supply‑chain transparency – Food‑industry auditors can verify that suppliers’ facilities are correctly classified and located, supporting sustainability certifications.
- Open‑source tooling – Because the pipeline relies on widely available models (YOLOv8, SAM‑2, Swin‑B) and a modest amount of custom code, it can be adapted to other infrastructure‑mapping tasks (e.g., solar farms, mining sites).
Limitations & Future Work
- Label scarcity – The approach still depends on a manually annotated training set of CAFO components; expanding this dataset to cover more geographic and seasonal variations would improve generalisation.
- Complex mixed‑use sites – Facilities that combine multiple animal types or have atypical layouts sometimes confuse the classifier, suggesting a need for richer relational modeling.
- Temporal dynamics – The current pipeline processes single snapshots; incorporating time‑series imagery could detect seasonal changes (e.g., temporary feedlots) and improve robustness to cloud cover.
- Edge deployment – While inference is fast on a GPU, running the full pipeline on edge devices or in low‑bandwidth environments remains an open challenge.
Overall, PRISM‑CAFO demonstrates how marrying deep vision models with domain‑specific priors can deliver both high performance and interpretability—an approach that is likely to inspire many other remote‑sensing applications.
Authors
- Oishee Bintey Hoque
- Nibir Chandra Mandal
- Kyle Luong
- Amanda Wilson
- Samarth Swarup
- Madhav Marathe
- Abhijin Adiga
Paper Information
- arXiv ID: 2601.11451v1
- Categories: cs.CV, cs.AI, cs.LG
- Published: January 16, 2026
- PDF: Download PDF