[Paper] Fine-Tuning LLMs with Fine-Grained Human Feedback on Text Spans

Published: 3 weeks ago (December 29, 2025 at 01:51 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.23693v1

Overview

A new study from Sky CH‑Wang, Justin Svegliato, Helen Appel, and Jason Eisner introduces a more granular way to fine‑tune large language models (LLMs) using human feedback. Instead of asking annotators to pick a whole response as “better,” they mark specific text spans they like or dislike and explain why, letting the model iteratively rewrite only the problematic parts. The authors show that this fine‑grained, step‑by‑step supervision yields better alignment than traditional A/B preference ranking or full‑sentence rewrites.

Key Contributions

Fine‑grained feedback format: Annotators label “liked” and “disliked” spans and provide brief rationales, turning a single response into a chain of targeted edits.
Improvement‑chain dataset: A new dataset of revision chains where each step is a minimal rewrite of the previous step, enabling direct preference pairs between adjacent revisions.
Preference‑pair construction from edits: Instead of global A/B comparisons, the method creates preference pairs from each incremental edit, giving the model clearer learning signals.
Empirical advantage: Experiments demonstrate that models trained on these localized edits outperform baselines trained on standard A/B rankings or full‑sentence contrastive rewrites.
Open‑source resources: The authors release the annotation schema, dataset, and training scripts to encourage reproducibility and further research.

Methodology

Collecting feedback: Human annotators read a model‑generated answer and highlight text spans they like or dislike. For each disliked span they write a short comment describing the issue (e.g., “incorrect fact,” “awkward phrasing”).
Generating improvement chains: Starting from the original answer, the base LLM rewrites the first disliked span according to the annotator’s comment, then moves left‑to‑right through the remaining spans, producing a sequence of progressively improved drafts.
Creating preference pairs: Each adjacent pair in the chain (original → first edit, first edit → second edit, etc.) forms a binary preference: the later version is “better” for the specific edited region.
Training objective: The model is fine‑tuned with a standard pairwise preference loss (e.g., Bradley‑Terry or KL‑divergence) but applied to these localized pairs, encouraging it to replicate the targeted edits.
Baseline comparisons: The authors also train models using conventional A/B preference data (whole‑response ranking) and full‑sentence contrastive rewrites to benchmark performance.

Results & Findings

Higher alignment scores: Models fine‑tuned on fine‑grained edit pairs achieve a ~7–10% lift in preference‑ranking accuracy over A/B‑trained baselines on held‑out evaluation sets.
Faster convergence: Because each training example focuses on a small edit, the loss decreases more quickly, requiring fewer epochs to reach peak performance.
Better factual consistency: The fine‑grained approach reduces hallucinations in the evaluated tasks, as annotators can directly flag incorrect facts and the model learns to correct them locally.
Human evaluation: Independent judges rated the fine‑grained‑trained model’s outputs as more fluent and relevant in 68% of cases compared to the A/B‑trained counterpart.

Practical Implications

More efficient fine‑tuning pipelines: Developers can collect cheaper, higher‑signal feedback by asking annotators to highlight problem areas instead of writing full alternative responses.
Targeted model debugging: The improvement‑chain format doubles as a diagnostic tool—seeing which spans are repeatedly edited can surface systematic weaknesses (e.g., date handling, code syntax).
Rapid iteration for product features: Teams building chat assistants, code generators, or summarizers can integrate this workflow to iteratively polish model outputs with minimal human effort.
Reduced annotation cost: Since each feedback instance yields multiple training pairs (one per edit), the data‑to‑model‑performance ratio improves, lowering the total cost of alignment.
Potential for UI integration: Front‑end tools could let users directly highlight problematic text in a model’s reply, feeding those signals back into a continuous‑learning loop.

Limitations & Future Work

Annotation overhead: While cheaper than full rewrites, the process still requires annotators to understand the model’s output well enough to pinpoint and comment on specific spans.
Scope of edits: The method focuses on local textual changes; large‑scale structural revisions (e.g., reorganizing an entire answer) may not be captured effectively.
Generalization to other modalities: The study is limited to text; extending fine‑grained feedback to code, tables, or multimodal outputs remains an open question.
Scalability of improvement chains: Very long chains could introduce noise if earlier edits affect later context; future work could explore hierarchical or attention‑based mechanisms to maintain coherence.

Overall, the paper offers a practical, data‑efficient recipe for aligning LLMs with human preferences, opening the door to more responsive and trustworthy AI assistants.

Authors

Sky CH-Wang
Justin Svegliato
Helen Appel
Jason Eisner

Paper Information

arXiv ID: 2512.23693v1
Categories: cs.CL
Published: December 29, 2025
PDF: Download PDF

[Paper] Fine-Tuning LLMs with Fine-Grained Human Feedback on Text Spans

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] How Long Is a Piece of String? A Brief Empirical Analysis of Tokenizers

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents