[Paper] SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation

Published: (March 6, 2026 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2603.06572v1

Overview

The paper presents SCOPE, a plug‑and‑play framework that dramatically improves incremental few‑shot (IFS) segmentation for 3‑D point clouds. By cleverly re‑using “background” points that already exist in the base‑training scenes, SCOPE enriches class prototypes without retraining the backbone, delivering state‑of‑the‑art accuracy while keeping catastrophic forgetting in check.

Key Contributions

  • Background‑guided prototype enrichment: extracts high‑confidence pseudo‑instances from unlabeled background regions to build a reusable prototype pool.
  • Plug‑and‑play design: works with any prototype‑based 3‑D segmentation model; no extra parameters or backbone fine‑tuning required.
  • Incremental few‑shot learning: when a new class arrives with only a handful of annotated points, SCOPE fuses its few‑shot prototypes with relevant background prototypes, yielding richer class representations.
  • Strong empirical gains: on ScanNet and S3DIS datasets, novel‑class IoU improves up to +6.98%, mean IoU up to +2.25%, with minimal forgetting of base classes.
  • Open‑source implementation: code released at https://github.com/Surrey-UP-Lab/SCOPE, facilitating reproducibility and adoption.

Methodology

  1. Base training – A standard prototype‑based 3‑D segmentation network (e.g., PointNet++, KPConv) is trained on a set of base categories using full supervision.
  2. Background mining – After base training, a class‑agnostic segmentation head runs over the same scenes, flagging high‑confidence regions that were originally labeled as “background”. These regions are clustered into pseudo‑instances and each instance is turned into a background prototype. All prototypes are stored in a lightweight pool.
  3. Few‑shot adaptation – When a novel class appears, the developer provides only a few annotated point clouds. The model extracts few‑shot prototypes from these samples.
  4. Prototype enrichment – For each novel class, SCOPE queries the background pool for prototypes that are geometrically or semantically similar (e.g., using cosine similarity). The retrieved background prototypes are merged (e.g., weighted averaging) with the few‑shot prototypes, producing an enriched prototype that captures both the scarce labeled data and the richer context already observed in the scene.
  5. Inference – The enriched prototypes replace the original few‑shot prototypes in the classifier head; the backbone remains frozen, so inference speed and memory footprint stay unchanged.

Results & Findings

DatasetMetricBaseline (no SCOPE)SCOPE (+)
ScanNetNovel‑class IoU48.3%55.3% (+6.98)
ScanNetMean IoU (all classes)61.2%63.5% (+2.25)
S3DISNovel‑class IoU42.1%45.7% (+3.61)
S3DISMean IoU58.4%60.1% (+1.70)
  • Low forgetting: Base‑class IoU drops by less than 1 % compared to the fully trained baseline, confirming that freezing the backbone and enriching prototypes does not erode previously learned knowledge.
  • Scalability: Adding new classes incurs only a small constant‑time lookup in the prototype pool; the method scales linearly with the number of novel categories.
  • Robustness: Experiments with varying numbers of few‑shot samples (1‑5) show consistent gains, indicating that the background pool compensates for extreme label scarcity.

Practical Implications

  • Rapid product updates: Robotics or AR/VR platforms can incorporate new object categories on‑device with just a few annotated scans, avoiding costly full‑retraining pipelines.
  • Edge deployment: Since SCOPE does not modify the backbone or increase model size, it fits comfortably on GPUs/NPUs with limited memory, making it suitable for autonomous drones, handheld LiDAR scanners, or smart glasses.
  • Data‑efficient pipelines: Developers can leverage existing scene datasets (e.g., indoor scans) as a “free” source of background prototypes, reducing the need for exhaustive labeling of every possible object.
  • Modular integration: Any existing prototype‑based 3‑D segmentation codebase can be upgraded by adding the SCOPE module, accelerating adoption in open‑source projects and commercial SDKs.

Limitations & Future Work

  • Dependence on background quality: The enrichment relies on the class‑agnostic model’s ability to generate reliable pseudo‑instances; noisy background prototypes could degrade performance in highly cluttered scenes.
  • Prototype similarity metric: Current cosine‑similarity retrieval may miss subtle semantic cues; learning a more expressive similarity function could further boost enrichment.
  • Extension beyond indoor scans: The paper focuses on indoor datasets (ScanNet, S3DIS). Applying SCOPE to outdoor LiDAR (e.g., autonomous driving) may require handling larger scale variations and dynamic objects.
  • Continual learning beyond few‑shot: Future work could explore how to update the background prototype pool incrementally as new scenes are collected, enabling truly lifelong learning without manual re‑mining.

Authors

  • Vishal Thengane
  • Zhaochong An
  • Tianjin Huang
  • Son Lam Phung
  • Abdesselam Bouzerdoum
  • Lu Yin
  • Na Zhao
  • Xiatian Zhu

Paper Information

  • arXiv ID: 2603.06572v1
  • Categories: cs.CV, cs.LG
  • Published: March 6, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »