[Paper] Contrastive Learning with Narrative Twins for Modeling Story Salience
Source: arXiv - 2601.07765v1
Overview
The paper introduces a novel contrastive‑learning framework that teaches a model to spot the most salient events in a story. By pairing each narrative with a “twin” that tells the same plot in different wording, the system learns to separate plot‑relevant content from surface fluff. The authors show that these story embeddings outperform standard masked‑language‑model baselines on both short (ROCStories) and long (Wikipedia plot) texts, and they explore four simple operations (deletion, shifting, disruption, summarization) for extracting salient sentences.
Key Contributions
- Narrative‑Twin Contrastive Objective: A training scheme that forces the model to differentiate a story from its twin (same plot, different surface form) and from a distractor (similar surface features, different plot).
- Salience‑Inference Operations: Formalization and empirical evaluation of four narratologically inspired manipulations (deletion, shifting, disruption, summarization) to probe which sentences the model deems important.
- Empirical Gains: Demonstrated that contrastively learned story embeddings beat a strong masked‑language‑model baseline on salience detection across two datasets of differing length and genre.
- Twin‑Generation Strategies: Showed that when curated twins are unavailable, random token dropout can approximate twins, and that effective distractors can be sourced from LLM‑generated alternatives or intra‑story segments.
Methodology
-
Data Preparation
- Narrative Twins: For each story, a twin is created that preserves the underlying plot but rewrites the language. In the ROCStories setting, twins are manually curated; for longer Wikipedia plots, twins are generated by prompting large language models (LLMs).
- Distractors: Two types are used: (a) surface‑similar but plot‑different texts (LLM‑generated) and (b) different sections of the same long narrative.
-
Contrastive Learning Setup
- A transformer encoder (e.g., RoBERTa) maps each story into a fixed‑size embedding.
- The loss pushes the embedding of the original story closer to its twin and farther from the distractor, using a standard InfoNCE formulation.
-
Salience Probing Operations
- Deletion: Remove a sentence and measure the drop in similarity to the original embedding.
- Shifting: Move a sentence to a different position and observe the embedding change.
- Disruption: Replace a sentence with a random one and compute the impact.
- Summarization: Replace the story with an automatically generated summary and compare embeddings.
- The operation that causes the largest embedding shift is taken as an indicator that the altered sentence was salient.
-
Evaluation
- Human annotations of salient sentences serve as the gold standard.
- Model predictions are compared against these annotations using precision, recall, and F1.
Results & Findings
| Dataset | Baseline (MLM) | Contrastive Model | Best Operation |
|---|---|---|---|
| ROCStories (5‑sentence) | F1 = 0.42 | F1 = 0.58 | Summarization |
| Wikipedia Plot (≈30 sentences) | F1 = 0.35 | F1 = 0.51 | Summarization |
- Summarization consistently outperformed the other three operations, indicating that the model’s embedding is most sensitive to the removal of globally important content.
- Random dropout twins still yielded improvements over the baseline, confirming that perfect twins are not strictly required.
- Distractors generated by LLMs were as effective as human‑crafted ones, simplifying data creation for new domains.
Practical Implications
- Automated Story Editing: Tools can flag or suggest removal of low‑salience sentences, helping writers tighten narratives or generate concise plot outlines.
- Content Summarization: Embedding‑based salience detection can feed into downstream summarizers that prioritize plot‑critical events, improving story‑aware summarization for media, gaming, or legal case briefs.
- Narrative‑Driven Recommendation: Platforms (e.g., interactive fiction engines, video‑game dialogue systems) can use salience scores to surface the most impactful story branches to users.
- Dataset Creation for NLP: The twin‑generation recipe (LLM prompting or dropout) offers a low‑cost way to bootstrap contrastive datasets for any genre where plot alignment matters (e.g., news articles, product reviews).
Limitations & Future Work
- Twin Quality Dependency: While random dropout works, the best performance still relies on high‑quality twins; generating truly plot‑preserving rewrites for very long or complex narratives remains challenging.
- Domain Generalization: Experiments focus on short fiction and Wikipedia plots; it’s unclear how the approach scales to dialogue‑heavy scripts, multimodal stories, or non‑English corpora.
- Interpretability: The contrastive embeddings are opaque; future work could explore attention‑based visualizations to make salience decisions more transparent to authors.
- Integration with Generation Models: Combining the salience detector with controllable text generation (e.g., prompting LLMs to produce high‑salience continuations) is an open avenue for richer storytelling AI.
Authors
- Igor Sterner
- Alex Lascarides
- Frank Keller
Paper Information
- arXiv ID: 2601.07765v1
- Categories: cs.CL
- Published: January 12, 2026
- PDF: Download PDF