[Paper] SortWaste: A Densely Annotated Dataset for Object Detection in Industrial Waste Sorting

Published: (January 5, 2026 at 12:34 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2601.02299v1

Overview

The paper introduces SortWaste, a large‑scale, densely annotated dataset of real‑world waste images captured inside a Material Recovery Facility (MRF). By pairing the dataset with a new “hardness” metric called ClutterScore, the authors expose how current object‑detection models struggle with the chaotic visual conditions typical of industrial sorting lines—an insight that could steer the next generation of AI‑driven recycling solutions.

Key Contributions

  • SortWaste dataset: ≈ 30 k high‑resolution images with over 400 k bounding‑box annotations covering 13 common waste categories (plastics, metals, paper, etc.).
  • ClutterScore metric: Quantifies scene difficulty using object count, class‑entropy, size‑entropy, and spatial overlap, enabling systematic analysis of model performance across clutter levels.
  • Comprehensive benchmark: Evaluation of several state‑of‑the‑art detectors (Faster RCNN, YOLOv8, DETR, etc.) on both the full dataset and a “plastic‑only” subset, reporting mAP, recall, and ClutterScore‑conditioned results.
  • Open‑source release: Dataset, annotation tools, and evaluation scripts are publicly available under a permissive license, encouraging reproducibility and community contributions.

Methodology

  1. Data collection – Cameras were mounted on a conveyor belt inside an operational MRF, capturing continuous streams of mixed waste under realistic lighting and motion blur.
  2. Annotation pipeline – Trained annotators used a custom labeling UI to draw tight bounding boxes and assign one of the predefined material classes. Overlap handling was enforced to ensure dense coverage.
  3. ClutterScore design – Four proxies are computed per image:
    • Object count (more objects → higher score)
    • Class entropy (diverse material mix)
    • Size entropy (wide range of object scales)
    • Spatial overlap (degree of occlusion)
      These are normalized and summed to produce a single scalar ranging from 0 (very clean) to 1 (extremely cluttered).
  4. Model training & evaluation – Standard training recipes (COCO‑style augmentation, AdamW optimizer) were applied to each detector. Performance was measured with mean Average Precision (mAP) and stratified by ClutterScore bins (low, medium, high).

Results & Findings

DetectorOverall mAP (all classes)Plastic‑only mAPmAP (high ClutterScore)
Faster RCNN (ResNet‑50)48.2 %59.7 %31.4 %
YOLOv8 (large)51.5 %62.3 %34.0 %
DETR (ResNet‑101)45.9 %57.1 %28.7 %
  • Performance drops sharply as clutter rises: the best‑performing model loses ~30 pp of mAP between low‑ and high‑clutter scenes.
  • Plastic detection is easier than the full‑multiclass task, likely because plastics dominate the visual texture and have fewer intra‑class variations.
  • Error analysis shows most failures stem from heavy occlusion and small objects (< 30 px), confirming the relevance of the ClutterScore components.

Practical Implications

  • Robotics & automation: Companies building robotic arms for sorting can use SortWaste to pre‑train perception modules that are already exposed to realistic occlusions and size variations, reducing the “simulation‑to‑real” gap.
  • Edge deployment: The benchmark highlights which architectures retain acceptable accuracy under high clutter while staying within typical edge‑device constraints (e.g., YOLOv8‑large on NVIDIA Jetson).
  • Process optimization: Facility managers can compute ClutterScore on live camera feeds to trigger adaptive sorting strategies—e.g., slowing the belt or invoking a secondary inspection station when the score exceeds a threshold.
  • Regulatory compliance & reporting: Accurate material classification supports automated reporting for waste‑diversion targets, helping firms meet ESG (Environmental, Social, Governance) mandates.

Limitations & Future Work

  • Geographic bias – All images come from a single MRF in Portugal; waste composition may differ in other regions (e.g., higher glass content).
  • Static camera viewpoint – The dataset does not cover multi‑angle or 3‑D sensing (depth, LiDAR), which could aid occlusion handling.
  • Class granularity – Some categories (e.g., “plastic”) are broad; finer sub‑classes (PET vs. HDPE) are not distinguished, limiting recycling‑specific decisions.
  • Future directions suggested by the authors include expanding the dataset to multiple facilities, integrating depth sensors, and exploring transformer‑based detectors that explicitly model object‑object interactions to mitigate clutter‑induced errors.

Authors

  • Sara Inácio
  • Hugo Proença
  • João C. Neves

Paper Information

  • arXiv ID: 2601.02299v1
  • Categories: cs.CV
  • Published: January 5, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »