[Paper] A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

Published: 3 days ago (June 9, 2026 at 01:59 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.11189v1

Overview

Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the pretrained model encodes a rich knowledge prior. In this work, we reinterpret SFT as target distribution design: instead of studying only the loss objective, we analyze the token-level target that the loss drives the model to match. We introduce the Q-target framework, which decomposes SFT supervision into two explicit choices: (1) how strongly to rely on the observed token, and (2) how to allocate the remaining probability mass over alternatives. This perspective unifies many existing SFT variants as implicit choices of the target distribution Q. Building on this view, we propose Target-SFT which constructs the training objective directly from the desired target distribution. This method consistently outperforms across the ten reasoning dataset-model settings evaluated, showing the effectiveness of this target-based approach. Overall, our formulation reveals a more fundamental design principle for SFT training and opens a broader search space for SFT objectives.

Key Contributions

This paper presents research in the following areas:

cs.LG
cs.AI
cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.LG.

Authors

Tong Xie
Yuanhao Ban
Yunqi Hong
Sohyun An
Yihang Chen
Cho-Jui Hsieh

Paper Information

arXiv ID: 2606.11189v1
Categories: cs.LG, cs.AI, cs.CL
Published: June 9, 2026
PDF: Download PDF

[Paper] A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

[Paper] EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

[Paper] Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

[Paper] SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation