[Paper] AnchorOPT: Towards Optimizing Dynamic Anchors for Adaptive Prompt Learning

Published: 2 months ago (November 26, 2025 at 04:11 AM EST)

1 min read

Source: arXiv

Source: arXiv

AnchorOPT: Towards Optimizing Dynamic Anchors for Adaptive Prompt Learning

Abstract

Existing prompt learning methods built upon CLIP models leverage textual tokens as anchors to guide the learnable soft tokens, improving CLIP generalizations. However, these anchors—static in both value and position—lack cross‑task and stage‑adaptive flexibility. To address this limitation, we propose AnchorOPT, a dynamic anchor‑based prompt learning framework. AnchorOPT introduces dynamism in two key dimensions:

Anchor values eschew handcrafted explicit textual tokens (e.g., “shape”, “color”), instead learning dynamically from task‑specific data.
Positional relationship between anchor and soft tokens is no longer fixed but adaptively optimized via a learnable position matrix conditioned on the training stage and task context.

Training Procedure

AnchorOPT is trained in two stages:

Stage 1: Learn the anchor tokens.
Stage 2: Freeze the anchors and transfer them to optimize soft tokens and the position matrix.

Experiments

Extensive experiments demonstrate that using only a simple learnable anchor and position matrix achieves performance comparable to or exceeding methods that incorporate additional learnable modules or regularization techniques. As a plug‑and‑play module, AnchorOPT integrates seamlessly into existing frameworks, yielding consistent performance gains across diverse datasets.

Resources

Code: https://github.com/zhengli97/ATPrompt
PDF: https://arxiv.org/pdf/2511.21188v1

Submitted to arXiv on 26 Nov 2025 (v1).

[Paper] AnchorOPT: Towards Optimizing Dynamic Anchors for Adaptive Prompt Learning

AnchorOPT: Towards Optimizing Dynamic Anchors for Adaptive Prompt Learning

Abstract

Training Procedure

Experiments

Resources

Related posts

[Paper] Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models

[Paper] Video-CoM: Interactive Video Reasoning via Chain of Manipulations

[Paper] Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

[Paper] AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement