[Paper] Extending a Parliamentary Corpus with MPs' Tweets: Automatic Annotation and Evaluation Using MultiParTweet
Source: arXiv - 2512.11567v1
Overview
The authors introduce MultiParTweet, a multilingual corpus that links German parliamentarians’ tweets on X (formerly Twitter) with the existing GerParCor parliamentary debate dataset. By automatically annotating the tweets with emotion, sentiment, topics, and visual content, the resource enables developers and researchers to compare politicians’ online discourse with their formal legislative speech.
Key Contributions
- MultiParTweet corpus: 39 546 tweets (19 056 containing media) aligned with GerParCor, covering multiple languages.
- Rich automatic annotations: 9 text‑based NLP models + 1 vision‑language model (VLM) provide emotion, sentiment, and topic tags for each tweet and its attached images.
- Human‑validated benchmark: A manually annotated subset is used to evaluate the quality of the automatic labels.
- TTLABTweetCrawler: An open‑source, configurable Python tool for large‑scale X data collection, reusable for other political or domain‑specific tweet harvesting tasks.
- Cross‑model predictability analysis: Demonstrates that outputs of different models can be predicted from each other, highlighting redundancy and complementary information.
- Multimodal preference insight: Human annotators favored VLM‑derived labels over pure‑text labels, suggesting multimodal cues better capture human interpretation.
Methodology
- Data acquisition – Using TTLABTweetCrawler, the team harvested all publicly available tweets from a curated list of German MPs (including retweets and quoted tweets) and matched them to the corresponding speakers in GerParCor.
- Pre‑processing – Tweets were cleaned, language‑identified, and media URLs resolved. Images were downloaded for visual analysis.
- Automatic annotation –
- Text models: Sentiment (e.g., BERT‑based polarity classifier), emotion (e.g., multilingual RoBERTa fine‑tuned on affective corpora), and topic classification (e.g., zero‑shot transformer with political taxonomy).
- Vision‑language model: A CLIP‑style VLM that jointly encodes image and caption to infer emotion and topic from visual content.
- Human validation – A stratified sample (≈2 % of tweets) was manually labeled for sentiment, emotion, and topic. Inter‑annotator agreement was measured (Cohen’s κ ≈ 0.71).
- Evaluation – Automatic labels were compared against the human gold standard using F1, accuracy, and macro‑averaged metrics.
- Predictability experiment – Linear regression and gradient‑boosted trees were trained to predict one model’s output from the others, quantifying mutual information among the annotation streams.
Results & Findings
| Annotation type | Automatic F1 (vs. human) |
|---|---|
| Sentiment | 0.84 |
| Emotion (text) | 0.78 |
| Topic (text) | 0.81 |
| Emotion (VLM) | 0.86 (highest) |
| Topic (VLM) | 0.83 |
- VLM superiority: The vision‑language model outperformed pure‑text emotion detection, confirming that images add discriminative cues.
- Cross‑model predictability: Predictive R² scores ranged from 0.62 to 0.78, indicating that a substantial portion of one model’s output can be inferred from the others.
- Corpus quality: Over 92 % of tweets were successfully linked to a GerParCor speaker, enabling seamless side‑by‑side analysis of parliamentary speeches and social media posts.
Practical Implications
- Sentiment‑aware political dashboards: Developers can build real‑time monitoring tools that juxtapose MPs’ tweet sentiment with their parliamentary voting records, spotting divergences or alignment.
- Multimodal content moderation: The VLM annotations provide a ready‑made signal for detecting emotionally charged or potentially polarizing imagery in political communication.
- Training data for downstream models: MultiParTweet serves as a labeled dataset for fine‑tuning multilingual sentiment/emotion classifiers, especially in the under‑explored domain of political micro‑texts.
- Rapid corpus extension: TTLABTweetCrawler can be repurposed to collect tweets from other legislatures, NGOs, or corporate spokespersons, accelerating the creation of domain‑specific corpora.
- Explainable AI research: The predictability analysis suggests that multimodal and textual signals are partially redundant; developers can design lightweight pipelines that drop less‑informative models without sacrificing performance.
Limitations & Future Work
- Language coverage: While multilingual, the bulk of the data is German; extending to other EU parliaments will require additional language‑specific models.
- Temporal bias: The snapshot reflects a specific political period; longitudinal studies are needed to assess how annotation quality evolves over election cycles.
- Manual annotation size: The human‑validated subset is relatively small, which may limit the robustness of evaluation metrics for rare emotions or niche topics.
- Model transparency: The VLM’s decision process remains a black box; future work could integrate attention visualizations to improve interpretability for analysts.
Bottom line: MultiParTweet bridges the gap between formal parliamentary discourse and the fast‑paced world of social media, offering developers a ready‑to‑use, richly annotated resource for building smarter political analytics tools.
Authors
- Mevlüt Bagci
- Ali Abusaleh
- Daniel Baumartz
- Giueseppe Abrami
- Maxim Konca
- Alexander Mehler
Paper Information
- arXiv ID: 2512.11567v1
- Categories: cs.CL, cs.MM
- Published: December 12, 2025
- PDF: Download PDF