[Paper] MetFuse: Figurative Fusion between Metonymy and Metaphor
Source: arXiv - 2604.12919v1
Overview
The paper “MetFuse: Figurative Fusion between Metonymy and Metaphor” tackles a surprisingly common linguistic phenomenon—sentences that blend two types of figurative language, metonymy and metaphor. While most NLP research treats these phenomena separately, the authors build a unified framework that can turn a plain sentence into three figurative versions (metonymic, metaphoric, and a hybrid of both) and release a high‑quality dataset (MetFuse) of 1,000 meaning‑aligned quadruplets (4,000 sentences total). Their experiments show that adding this data consistently boosts the performance of metonymy and metaphor classifiers across a range of benchmarks.
Key Contributions
- Unified transformation framework that generates metonymic, metaphoric, and hybrid variants from a literal sentence.
- MetFuse dataset: 1,000 human‑verified quadruplets (literal + metonymic + metaphoric + hybrid), the first resource dedicated to studying figurative fusion.
- Empirical validation: Augmenting eight existing metonymy/metaphor benchmarks with MetFuse improves classification accuracy, especially for metonymy when hybrid examples are added.
- Cross‑figurative analysis: Demonstrates that the presence of a metaphor makes a metonymic noun easier for both humans and large language models (LLMs) to detect.
- Open‑source release: Dataset and code are publicly available, encouraging further research on multi‑figurative language understanding.
Methodology
-
Sentence Construction
- Start with a literal sentence (e.g., “The crown announced new tax reforms”).
- Apply a set of linguistic rules and crowd‑sourced rewrites to produce:
- a metonymic version (where a part stands for a whole, e.g., “The crown announced… → “The monarchy announced…”),
- a metaphoric version (where one concept is described in terms of another, e.g., “The crown announced… → “The kingdom’s head announced…”), and
- a hybrid version that combines both transformations.
-
Human Verification
- Each quadruplet is reviewed by multiple annotators to ensure the intended figurative meaning is preserved and aligned across the four sentences.
-
Dataset Integration & Evaluation
- The MetFuse quadruplets are mixed into the training sets of eight public metonymy/metaphor classification benchmarks.
- Standard classifiers (BERT, RoBERTa, etc.) are fine‑tuned on the augmented data.
- Performance is measured with accuracy/F1 and compared against baselines trained without MetFuse.
-
Analysis of Figurative Interaction
- Conduct probing experiments where models (and human annotators) are asked to label the figurative type of sentences that are either pure metonymy, pure metaphor, or hybrid.
- Compare detection rates to quantify the “boost” effect of one figurative type on the other.
Results & Findings
| Task | Baseline (no MetFuse) | + MetFuse (Hybrid) | % Gain |
|---|---|---|---|
| Metonymy classification (4 benchmarks) | 78.2 % F1 | 82.7 % F1 | +4.5 % |
| Metaphor classification (4 benchmarks) | 81.5 % F1 | 84.1 % F1 | +2.6 % |
- Hybrid examples deliver the biggest lift for metonymy tasks, confirming that the metaphorical context clarifies the metonymic cue.
- Human annotators identified metonymy correctly in 71 % of hybrid sentences vs. 58 % in metonymy‑only sentences.
- LLMs (GPT‑4, Llama‑2) showed the same trend, with a 6‑point F1 improvement on hybrid inputs.
- Error analysis revealed that most remaining mistakes involve rare proper nouns or domain‑specific jargon, suggesting that further lexical coverage could help.
Practical Implications
- Better figurative language handling in downstream apps – chatbots, voice assistants, and content moderation tools can more reliably interpret statements like “The White House announced…” when a metaphor is also present.
- Improved data augmentation pipelines – developers can automatically generate hybrid figurative variants to enrich training data for any task that benefits from nuanced meaning (e.g., sentiment analysis, intent detection).
- Enhanced LLM prompting – prompting strategies that explicitly ask the model to consider both metonymic and metaphorical cues can yield more accurate explanations or paraphrases.
- Cross‑domain transfer – the framework can be adapted to domain‑specific corpora (legal, medical) where metonymic shorthand (“the bench” for judges) often co‑occurs with metaphorical language, leading to more robust domain‑adapted models.
Limitations & Future Work
- Scope of lexical items – MetFuse focuses mainly on nouns that are classic metonymic targets; extending to verbs and adjectives remains an open challenge.
- Cultural and language diversity – The dataset is English‑centric; figurative fusion behaves differently in other languages and cultures, so multilingual extensions are needed.
- Model size dependence – Gains were more pronounced for mid‑size transformers; very large LLMs already capture some figurative cues, reducing the marginal benefit.
- Future directions proposed by the authors include:
- Scaling the framework to automatically generate larger corpora.
- Exploring joint multi‑task learning for metonymy, metaphor, and other figurative devices (irony, sarcasm).
- Integrating the dataset into evaluation suites for LLMs’ figurative reasoning abilities.
Authors
- Saptarshi Ghosh
- Tianyu Jiang
Paper Information
- arXiv ID: 2604.12919v1
- Categories: cs.CL
- Published: April 14, 2026
- PDF: Download PDF