[Paper] SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation

Published: 3 days ago (March 6, 2026 at 01:59 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2603.06572v1

Overview

The paper presents SCOPE, a plug‑and‑play framework that dramatically improves incremental few‑shot (IFS) segmentation for 3‑D point clouds. By cleverly re‑using “background” points that already exist in the base‑training scenes, SCOPE enriches class prototypes without retraining the backbone, delivering state‑of‑the‑art accuracy while keeping catastrophic forgetting in check.

Key Contributions

Background‑guided prototype enrichment: extracts high‑confidence pseudo‑instances from unlabeled background regions to build a reusable prototype pool.
Plug‑and‑play design: works with any prototype‑based 3‑D segmentation model; no extra parameters or backbone fine‑tuning required.
Incremental few‑shot learning: when a new class arrives with only a handful of annotated points, SCOPE fuses its few‑shot prototypes with relevant background prototypes, yielding richer class representations.
Strong empirical gains: on ScanNet and S3DIS datasets, novel‑class IoU improves up to +6.98%, mean IoU up to +2.25%, with minimal forgetting of base classes.
Open‑source implementation: code released at https://github.com/Surrey-UP-Lab/SCOPE, facilitating reproducibility and adoption.

Methodology

Base training – A standard prototype‑based 3‑D segmentation network (e.g., PointNet++, KPConv) is trained on a set of base categories using full supervision.
Background mining – After base training, a class‑agnostic segmentation head runs over the same scenes, flagging high‑confidence regions that were originally labeled as “background”. These regions are clustered into pseudo‑instances and each instance is turned into a background prototype. All prototypes are stored in a lightweight pool.
Few‑shot adaptation – When a novel class appears, the developer provides only a few annotated point clouds. The model extracts few‑shot prototypes from these samples.
Prototype enrichment – For each novel class, SCOPE queries the background pool for prototypes that are geometrically or semantically similar (e.g., using cosine similarity). The retrieved background prototypes are merged (e.g., weighted averaging) with the few‑shot prototypes, producing an enriched prototype that captures both the scarce labeled data and the richer context already observed in the scene.
Inference – The enriched prototypes replace the original few‑shot prototypes in the classifier head; the backbone remains frozen, so inference speed and memory footprint stay unchanged.

Results & Findings

Dataset	Metric	Baseline (no SCOPE)	SCOPE (+)
ScanNet	Novel‑class IoU	48.3%	55.3% (+6.98)
ScanNet	Mean IoU (all classes)	61.2%	63.5% (+2.25)
S3DIS	Novel‑class IoU	42.1%	45.7% (+3.61)
S3DIS	Mean IoU	58.4%	60.1% (+1.70)

Low forgetting: Base‑class IoU drops by less than 1 % compared to the fully trained baseline, confirming that freezing the backbone and enriching prototypes does not erode previously learned knowledge.
Scalability: Adding new classes incurs only a small constant‑time lookup in the prototype pool; the method scales linearly with the number of novel categories.
Robustness: Experiments with varying numbers of few‑shot samples (1‑5) show consistent gains, indicating that the background pool compensates for extreme label scarcity.

Practical Implications

Rapid product updates: Robotics or AR/VR platforms can incorporate new object categories on‑device with just a few annotated scans, avoiding costly full‑retraining pipelines.
Edge deployment: Since SCOPE does not modify the backbone or increase model size, it fits comfortably on GPUs/NPUs with limited memory, making it suitable for autonomous drones, handheld LiDAR scanners, or smart glasses.
Data‑efficient pipelines: Developers can leverage existing scene datasets (e.g., indoor scans) as a “free” source of background prototypes, reducing the need for exhaustive labeling of every possible object.
Modular integration: Any existing prototype‑based 3‑D segmentation codebase can be upgraded by adding the SCOPE module, accelerating adoption in open‑source projects and commercial SDKs.

Limitations & Future Work

Dependence on background quality: The enrichment relies on the class‑agnostic model’s ability to generate reliable pseudo‑instances; noisy background prototypes could degrade performance in highly cluttered scenes.
Prototype similarity metric: Current cosine‑similarity retrieval may miss subtle semantic cues; learning a more expressive similarity function could further boost enrichment.
Extension beyond indoor scans: The paper focuses on indoor datasets (ScanNet, S3DIS). Applying SCOPE to outdoor LiDAR (e.g., autonomous driving) may require handling larger scale variations and dynamic objects.
Continual learning beyond few‑shot: Future work could explore how to update the background prototype pool incrementally as new scenes are collected, enabling truly lifelong learning without manual re‑mining.

Authors

Vishal Thengane
Zhaochong An
Tianjin Huang
Son Lam Phung
Abdesselam Bouzerdoum
Lu Yin
Na Zhao
Xiatian Zhu

Paper Information

arXiv ID: 2603.06572v1
Categories: cs.CV, cs.LG
Published: March 6, 2026
PDF: Download PDF

[Paper] SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

[Paper] SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning

[Paper] Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education

[Paper] Multimodal Large Language Models as Image Classifiers