[Paper] Fine-Tuning LLMs with Fine-Grained Human Feedback on Text Spans

Published: (December 29, 2025 at 01:51 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.23693v1

Overview

A new study from Sky CH‑Wang, Justin Svegliato, Helen Appel, and Jason Eisner introduces a more granular way to fine‑tune large language models (LLMs) using human feedback. Instead of asking annotators to pick a whole response as “better,” they mark specific text spans they like or dislike and explain why, letting the model iteratively rewrite only the problematic parts. The authors show that this fine‑grained, step‑by‑step supervision yields better alignment than traditional A/B preference ranking or full‑sentence rewrites.

Key Contributions

  • Fine‑grained feedback format: Annotators label “liked” and “disliked” spans and provide brief rationales, turning a single response into a chain of targeted edits.
  • Improvement‑chain dataset: A new dataset of revision chains where each step is a minimal rewrite of the previous step, enabling direct preference pairs between adjacent revisions.
  • Preference‑pair construction from edits: Instead of global A/B comparisons, the method creates preference pairs from each incremental edit, giving the model clearer learning signals.
  • Empirical advantage: Experiments demonstrate that models trained on these localized edits outperform baselines trained on standard A/B rankings or full‑sentence contrastive rewrites.
  • Open‑source resources: The authors release the annotation schema, dataset, and training scripts to encourage reproducibility and further research.

Methodology

  1. Collecting feedback: Human annotators read a model‑generated answer and highlight text spans they like or dislike. For each disliked span they write a short comment describing the issue (e.g., “incorrect fact,” “awkward phrasing”).
  2. Generating improvement chains: Starting from the original answer, the base LLM rewrites the first disliked span according to the annotator’s comment, then moves left‑to‑right through the remaining spans, producing a sequence of progressively improved drafts.
  3. Creating preference pairs: Each adjacent pair in the chain (original → first edit, first edit → second edit, etc.) forms a binary preference: the later version is “better” for the specific edited region.
  4. Training objective: The model is fine‑tuned with a standard pairwise preference loss (e.g., Bradley‑Terry or KL‑divergence) but applied to these localized pairs, encouraging it to replicate the targeted edits.
  5. Baseline comparisons: The authors also train models using conventional A/B preference data (whole‑response ranking) and full‑sentence contrastive rewrites to benchmark performance.

Results & Findings

  • Higher alignment scores: Models fine‑tuned on fine‑grained edit pairs achieve a ~7–10% lift in preference‑ranking accuracy over A/B‑trained baselines on held‑out evaluation sets.
  • Faster convergence: Because each training example focuses on a small edit, the loss decreases more quickly, requiring fewer epochs to reach peak performance.
  • Better factual consistency: The fine‑grained approach reduces hallucinations in the evaluated tasks, as annotators can directly flag incorrect facts and the model learns to correct them locally.
  • Human evaluation: Independent judges rated the fine‑grained‑trained model’s outputs as more fluent and relevant in 68% of cases compared to the A/B‑trained counterpart.

Practical Implications

  • More efficient fine‑tuning pipelines: Developers can collect cheaper, higher‑signal feedback by asking annotators to highlight problem areas instead of writing full alternative responses.
  • Targeted model debugging: The improvement‑chain format doubles as a diagnostic tool—seeing which spans are repeatedly edited can surface systematic weaknesses (e.g., date handling, code syntax).
  • Rapid iteration for product features: Teams building chat assistants, code generators, or summarizers can integrate this workflow to iteratively polish model outputs with minimal human effort.
  • Reduced annotation cost: Since each feedback instance yields multiple training pairs (one per edit), the data‑to‑model‑performance ratio improves, lowering the total cost of alignment.
  • Potential for UI integration: Front‑end tools could let users directly highlight problematic text in a model’s reply, feeding those signals back into a continuous‑learning loop.

Limitations & Future Work

  • Annotation overhead: While cheaper than full rewrites, the process still requires annotators to understand the model’s output well enough to pinpoint and comment on specific spans.
  • Scope of edits: The method focuses on local textual changes; large‑scale structural revisions (e.g., reorganizing an entire answer) may not be captured effectively.
  • Generalization to other modalities: The study is limited to text; extending fine‑grained feedback to code, tables, or multimodal outputs remains an open question.
  • Scalability of improvement chains: Very long chains could introduce noise if earlier edits affect later context; future work could explore hierarchical or attention‑based mechanisms to maintain coherence.

Overall, the paper offers a practical, data‑efficient recipe for aligning LLMs with human preferences, opening the door to more responsive and trustworthy AI assistants.

Authors

  • Sky CH-Wang
  • Justin Svegliato
  • Helen Appel
  • Jason Eisner

Paper Information

  • arXiv ID: 2512.23693v1
  • Categories: cs.CL
  • Published: December 29, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »