[Paper] Context-Aware Pragmatic Metacognitive Prompting for Sarcasm Detection

Published: 2 months ago (November 26, 2025 at 12:19 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2511.21066v1

Overview

Detecting sarcasm in text is still a tough nut for NLP systems, even with powerful pre‑trained language models (PLMs) and large language models (LLMs). This paper builds on a recent prompting technique called Pragmatic Metacognitive Prompting (PMP) and shows how adding contextual knowledge—both from the web and from the model’s own internal memory—can dramatically boost sarcasm‑detection performance across several benchmark datasets.

Key Contributions

Context‑aware prompting: Introduces a retrieval‑aware extension to PMP that supplies external background information when the model lacks the needed cultural or domain knowledge.
Self‑knowledge awareness: Proposes a “self‑knowledge” strategy that asks the LLM to surface relevant facts it already knows, reducing reliance on external retrieval.
Empirical gains: Achieves up to +9.87 % macro‑F1 on an Indonesian Twitter sarcasm set and consistent improvements (≈3–4 % macro‑F1) on English‑language benchmarks (SemEval‑2018 Task 3, MUStARD).
Open‑source pipeline: Releases code and data‑handling scripts, enabling reproducibility and easy integration into existing sarcasm‑detection workflows.

Methodology

Base Prompt (PMP): The authors start with the existing Pragmatic Metacognitive Prompt, which frames sarcasm detection as a metacognitive reasoning task—asking the model to first consider the literal meaning, then the pragmatic (sarcastic) intent.
Retrieval‑aware augmentation:
- Non‑parametric (web) retrieval: For each input sentence, a lightweight search engine fetches the top‑k web snippets that contain potentially relevant slang, cultural references, or obscure entities. These snippets are concatenated to the prompt as “background knowledge.”
- Self‑knowledge retrieval: The LLM is first queried with a meta‑prompt (“What facts do you know that could help interpret this sentence?”). Its own generated knowledge is then fed back into the main sarcasm‑detection prompt.
Prompt composition: The final prompt consists of three parts—(a) the original PMP instruction, (b) the retrieved knowledge block, and (c) the target sentence.
Evaluation: Experiments run on three public sarcasm corpora using GPT‑3.5‑style LLMs via the OpenAI API. Macro‑F1 is the primary metric, reflecting balanced performance across sarcastic and non‑sarcastic classes.

Results & Findings

Dataset	Baseline PMP (macro‑F1)	+Non‑parametric retrieval	+Self‑knowledge retrieval
Twitter Indonesia Sarcastic	62.3 %	72.2 % (+9.87 %)	–
SemEval‑2018 Task 3	78.1 %	–	81.4 % (+3.29 %)
MUStARD	71.5 %	–	75.6 % (+4.08 %)

Context matters: Adding web‑sourced background dramatically helps when the text contains region‑specific slang or references unknown to the LLM.
Self‑knowledge is complementary: Even without external retrieval, prompting the model to surface its own facts yields consistent gains, especially on English datasets where the LLM already has broader coverage.
Error analysis: Remaining failures often involve multi‑turn sarcasm or heavily ambiguous humor that requires deeper discourse modeling beyond single‑sentence context.

Practical Implications

Better moderation tools: Social‑media platforms can integrate the retrieval‑aware PMP pipeline to flag sarcastic or potentially toxic content more reliably, reducing false positives caused by literal‑meaning misinterpretations.
Cross‑cultural chatbots: Customer‑service bots deployed in multilingual markets (e.g., Indonesia) can use the web‑retrieval component to stay up‑to‑date with local slang, improving user experience and avoiding miscommunication.
Low‑resource adaptation: Since the approach relies on plug‑and‑play retrieval rather than fine‑tuning massive models, developers can retrofit existing LLM‑based pipelines with minimal compute overhead.
Explainability: The retrieved snippets are visible to developers, offering a transparent “why” behind a sarcasm prediction—useful for audit trails and compliance.

Limitations & Future Work

Retrieval quality dependence: Noisy or irrelevant web snippets can hurt performance; the current system uses a simple BM25 ranker without sophisticated relevance feedback.
Latency overhead: Real‑time applications must balance the extra API calls for retrieval against response time constraints.
Scope of evaluation: Experiments focus on three datasets; broader testing on multi‑turn dialogues and other languages is needed.
Future directions: The authors plan to explore neural re‑ranking of retrieved documents, adaptive prompt length control, and integration with multi‑modal cues (e.g., emojis, images) to capture sarcasm that spans text and visual context.

Authors

Michael Iskandardinata
William Christian
Derwin Suhartono

Paper Information

arXiv ID: 2511.21066v1
Categories: cs.CL, cs.AI
Published: November 26, 2025
PDF: Download PDF

[Paper] Context-Aware Pragmatic Metacognitive Prompting for Sarcasm Detection

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

We are spinning up planet-sized brains just to format a JSON file

Sycophancy is the first LLM 'dark pattern'

20 Years in Fashion, 30 Days with AI: How I Used ChatGPT to Predict 2026 Trends

The Art of Agent Prompting: Lessons from Anthropic’s AI Team