[Paper] Sentiment-Aware Extractive and Abstractive Summarization for Unstructured Text Mining
Source: arXiv - 2512.20404v1
Overview
The paper by Junyi Liu and Stanley Kok tackles a growing pain point for developers and data engineers: turning noisy, emotion‑laden user‑generated content (tweets, reviews, forum posts) into short, meaningful summaries that still convey the underlying sentiment. By weaving sentiment signals directly into both extractive and abstractive summarization pipelines, the authors demonstrate a practical way to surface “what people are saying” and “how they feel” in a single, compact output—something traditional news‑oriented summarizers often miss.
Key Contributions
- Sentiment‑aware extractive ranking: Extends the classic TextRank algorithm with sentiment‑weighted edges, so sentences that carry strong emotional cues rise higher in the ranking.
- Sentiment‑infused abstractive generation: Modifies the UniLM (Unified Language Model) decoder to condition on sentiment embeddings, enabling the generator to produce summaries that explicitly reflect positive, negative, or mixed tones.
- Dual‑pipeline framework: Offers a plug‑and‑play architecture that can be toggled between extractive, abstractive, or hybrid modes depending on latency or resource constraints.
- Comprehensive evaluation on real‑world UGC datasets: Shows consistent gains over baseline summarizers on ROUGE‑L, sentiment‑preservation metrics, and human preference scores.
- Open‑source implementation & reproducibility kit: Provides code, pretrained models, and a small benchmark suite for the community to experiment with sentiment‑aware summarization out of the box.
Methodology
-
Data Preparation – The authors collected three public corpora of short, informal texts (Twitter sentiment dataset, Amazon product reviews, and Reddit discussion threads). Each document was paired with a human‑written summary and a sentiment label (positive, negative, neutral).
-
Sentiment‑augmented TextRank – Standard TextRank builds a graph where nodes are sentences and edge weights reflect lexical similarity. Liu & Kok inject a sentiment similarity term:
[ w_{ij}= \alpha \cdot \text{cosine}(s_i, s_j) + (1-\alpha) \cdot \text{sent_sim}(s_i, s_j) ]
where
sent_simis high when two sentences share the same sentiment polarity. The resulting scores prioritize emotionally salient sentences. -
UniLM with Sentiment Conditioning – The pretrained UniLM encoder‑decoder is fine‑tuned on the same datasets, but the decoder receives an additional sentiment embedding (learned from a simple sentiment classifier) concatenated to each token’s hidden state. This guides the generation process to reflect the overall sentiment of the source text.
-
Hybrid Fusion (optional) – For the best of both worlds, the top‑k extractive sentences are fed as a “pseudo‑source” to the abstractive decoder, allowing the model to rewrite sentiment‑rich extracts into a smoother narrative.
-
Evaluation – Standard ROUGE‑1/2/L scores measure content overlap, while a Sentiment Preservation Score (SPS) checks whether the polarity of the summary matches that of the source (using a separate sentiment classifier). Human judges also rated readability, informativeness, and sentiment fidelity.
Results & Findings
| Model | ROUGE‑L ↑ | SPS ↑ | Human Preference (%) |
|---|---|---|---|
| Vanilla TextRank | 31.2 | 71.4 | 38 |
| Vanilla UniLM (abstractive) | 34.8 | 73.1 | 42 |
| Sentiment‑aware TextRank | 36.5 | 81.9 | 55 |
| Sentiment‑aware UniLM | 38.2 | 84.6 | 62 |
| Hybrid (extractive + abstractive) | 39.1 | 86.3 | 68 |
- Adding sentiment weighting boosted ROUGE scores by ~5–6 points and lifted sentiment preservation by ~10–13 % over the baselines.
- Human evaluators preferred the sentiment‑aware outputs in over two‑thirds of cases, citing clearer emotional tone and better relevance to the original posts.
- The hybrid approach achieved the highest overall performance while still running within acceptable latency for batch processing (≈ 0.8 s per 200‑word document on a single GPU).
Practical Implications
- Brand & Reputation Monitoring: Companies can automatically generate daily digests of social chatter that not only summarize key topics but also flag rising negative sentiment, enabling faster PR response.
- Customer Support Automation: Chatbot pipelines can surface concise sentiment‑aware summaries of ticket histories, helping agents prioritize angry or frustrated users.
- Market Research Dashboards: Analysts can ingest streams of product reviews and receive short, sentiment‑labeled briefs, reducing manual reading time dramatically.
- Content Moderation: Moderators get a quick emotional snapshot of a thread, aiding decisions about escalation or removal.
- Low‑Resource Deployment: Because the extractive component is lightweight, developers can run sentiment‑aware summarization on edge devices or within serverless functions for real‑time alerts.
Limitations & Future Work
- Domain Sensitivity: The sentiment classifier was trained on English‑only, mostly Western‑centric data; performance may degrade on multilingual or culturally nuanced texts.
- Short‑Text Bias: While effective for ≤ 300‑word inputs, the framework shows diminishing returns on longer articles where discourse structure becomes more complex.
- Fine‑Grained Emotions: The current polarity (positive/negative/neutral) ignores subtler emotions (e.g., sarcasm, disappointment). The authors suggest extending the sentiment signal to a multi‑dimensional affective space.
- Real‑Time Constraints: The abstractive UniLM decoder remains the bottleneck for high‑throughput streaming scenarios; future work could explore distilled or quantized models to cut latency.
Overall, Liu and Kok’s sentiment‑aware summarization pipeline bridges a critical gap between raw user‑generated content and actionable business intelligence, offering a ready‑to‑integrate toolset for developers looking to add emotional awareness to their text‑analytics stacks.
Authors
- Junyi Liu
- Stanley Kok
Paper Information
- arXiv ID: 2512.20404v1
- Categories: cs.CL
- Published: December 23, 2025
- PDF: Download PDF