[Paper] SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support

Published: (December 12, 2025 at 01:05 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.11755v1

Overview

The paper introduces SUMFORU, a new framework that uses large language models (LLMs) to generate product‑review summaries tailored to an individual shopper’s preferences. By aligning the summarizer with explicit user personas, the system aims to cut through the noise of thousands of online reviews and deliver concise, personally relevant decision‑support content.

Key Contributions

  • Persona‑aware summarization pipeline that steers LLM outputs toward a user’s stated preferences (e.g., “budget‑conscious”, “eco‑friendly”).
  • Two‑stage alignment strategy:
    1. Supervised Fine‑Tuning (SFT) with asymmetric knowledge distillation to inject persona signals into the model.
    2. Reinforcement Learning with AI Feedback (RLAIF) that leverages a learned preference estimator to fine‑tune the model on subtle, persona‑specific cues.
  • High‑quality data construction from the Amazon 2023 Review Dataset, including automatic cleaning, deduplication, and persona annotation.
  • Comprehensive evaluation across rule‑based metrics (consistency, grounding), LLM‑based judges, and human assessments, showing consistent gains over generic baselines.
  • Demonstrated generalization to product categories not seen during training, indicating robustness of the alignment approach.

Methodology

  1. Data Pipeline – The authors scrape the Amazon 2023 Review Dataset, filter out low‑quality or duplicate entries, and automatically generate persona tags (e.g., “price‑sensitive”, “performance‑oriented”) using a combination of keyword heuristics and a small seed classifier.
  2. Stage‑1: Persona‑aware SFT – A base LLM (e.g., LLaMA‑2) is fine‑tuned on the cleaned review‑summary pairs. Asymmetric knowledge distillation copies the knowledge of a larger “teacher” model into the smaller “student” while injecting persona embeddings, so the model learns to condition its output on the user profile.
  3. Stage‑2: RLAIF – A separate preference estimator (trained on a modest set of human‑rated persona‑summary pairs) predicts how well a generated summary matches a given persona. This estimator provides a reward signal for reinforcement learning, allowing the model to adjust its generation policy toward higher persona alignment without needing costly human feedback loops.
  4. Inference – At runtime, a developer supplies a persona vector (or a textual description) alongside the product ID. The model produces a concise, grounded summary that highlights aspects most relevant to that persona.

Results & Findings

EvaluationBaseline (generic)SUMFORU (SFT + RLAIF)
Consistency (rule‑based)71.2 %84.9 %
Grounding (facts from reviews)68.5 %81.3 %
Persona Preference Alignment (LLM judge)0.62 (BLEU‑like)0.78
Human Preference Score (1‑5)3.44.3
  • Consistency & grounding improve because the two‑stage alignment forces the model to stay faithful to source reviews while respecting persona constraints.
  • Preference alignment jumps significantly, confirming that the RLAIF stage captures fine‑grained user signals that SFT alone misses.
  • Cross‑category tests (e.g., training on electronics, testing on home‑goods) show only a ~3 % drop, indicating the approach generalizes well.

Practical Implications

  • E‑commerce platforms can embed SUMFORU as a plug‑in to generate “personalized highlight reels” for each shopper, reducing decision fatigue and potentially increasing conversion rates.
  • Developer APIs: The framework can be exposed as a micro‑service where developers send a product ID and a JSON‑encoded persona; the service returns a 2‑3 sentence summary. This fits neatly into recommendation pipelines or chat‑bot assistants.
  • Reduced reliance on manual curation – marketers no longer need to write multiple persona‑specific copy blocks; the model auto‑generates them on demand.
  • Better accessibility – concise, persona‑aligned summaries help users with limited time or cognitive load (e.g., seniors, neurodiverse users) make informed purchases.
  • Data‑driven personalization – because the preference estimator is trained on real user feedback, the system can evolve as consumer priorities shift (e.g., increased focus on sustainability).

Limitations & Future Work

  • Persona definition granularity – The current approach relies on a predefined set of persona tags; overly coarse personas may miss niche preferences.
  • Feedback loop cost – While RLAIF avoids expensive human labeling, training the preference estimator still needs a curated dataset, which may be a barrier for smaller vendors.
  • Potential bias – The model inherits biases present in the Amazon review corpus; future work should incorporate bias‑mitigation techniques and fairness audits.
  • Real‑time adaptation – Extending the framework to update persona embeddings on‑the‑fly (e.g., based on a shopper’s browsing history) is an open research direction.

SUMFORU showcases how steerable LLM alignment can move review summarization from a one‑size‑fits‑all utility to a truly personalized decision‑support tool, opening new avenues for smarter, user‑centric e‑commerce experiences.

Authors

  • Yuming Feng
  • Xinrui Jiang

Paper Information

  • arXiv ID: 2512.11755v1
  • Categories: cs.CL
  • Published: December 12, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »