[Paper] CitiLink-Summ: Summarization of Discussion Subjects in European Portuguese Municipal Meeting Minutes

Published: 2 months ago (February 18, 2026 at 12:03 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.16607v1

Overview

The paper introduces CitiLink‑Summ, the first publicly‑available corpus of European Portuguese municipal meeting minutes, paired with thousands of manually written subject‑level summaries. By providing this resource and baseline experiments with modern summarization models, the authors open a new avenue for NLP research on dense, administrative texts that are otherwise hard for citizens to digest.

Key Contributions

New Dataset: 100 municipal meeting minutes (≈ 2 M words) annotated with 2,322 high‑quality, hand‑crafted summaries, each aligned to a specific discussion subject.
First Benchmark: Establishes the inaugural evaluation suite for subject‑level summarization in European Portuguese municipal documents.
Baseline Experiments: Fine‑tunes and tests state‑of‑the‑art generative models (BART, PRIMERA) and large language models (LLMs) on the corpus.
Comprehensive Evaluation: Reports results using lexical (ROUGE, BLEU, METEOR) and semantic (BERTScore) metrics, highlighting the gap between current models and human performance.
Open‑Source Release: Publishes the corpus, preprocessing scripts, and training checkpoints under a permissive license to encourage reproducibility and community contributions.

Methodology

Data Collection & Annotation
- Minutes were sourced from several Portuguese municipalities and digitized.
- Legal and linguistic experts manually extracted each discussion subject and wrote a concise, self‑contained summary (≈ 30–50 words).
Pre‑processing
- Texts were cleaned, tokenized with a Portuguese‑specific tokenizer, and split into document → subject → summary triples.
- A train/validation/test split (80/10/10) was created, preserving subject distribution across municipalities.
Model Fine‑tuning
- BART‑base and PRIMERA (a multi‑document summarizer) were fine‑tuned on the training set for 3 epochs, using the standard cross‑entropy loss.
- For LLMs, zero‑shot and few‑shot prompting were performed with GPT‑3.5‑turbo and LLaMA‑13B, feeding the full minute and a short instruction to “summarize each discussion subject”.
Evaluation
- Generated summaries were compared against the human references using ROUGE‑1/2/L, BLEU, METEOR, and BERTScore (F1).
- Statistical significance was assessed with paired bootstrap resampling.

Results & Findings

Model	ROUGE‑1	ROUGE‑2	ROUGE‑L	BERTScore‑F1
BART‑base (fine‑tuned)	38.7	15.2	35.9	71.4
PRIMERA (fine‑tuned)	41.3	17.0	38.2	73.1
GPT‑3.5‑turbo (zero‑shot)	32.5	11.8	30.1	66.2
LLaMA‑13B (few‑shot)	35.0	13.4	32.8	68.9
Human reference (upper bound)	100	100	100	100

PRIMERA achieved the best lexical scores, indicating it can capture the salient phrases of a subject more effectively than a standard encoder‑decoder model.
LLMs lag behind fine‑tuned models, especially on ROUGE‑2, suggesting they struggle with precise phrase overlap in this niche domain.
All automatic scores are still far from the human upper bound, highlighting the difficulty of summarizing dense administrative language.

Practical Implications

Civic Tech Platforms: Developers can integrate PRIMERA‑based pipelines to auto‑generate subject‑level digests, making minutes searchable and citizen‑friendly.
Transparency & Accountability: Municipal websites could automatically publish concise summaries alongside full minutes, lowering the barrier for public oversight.
Multilingual Extension: The dataset and codebase can serve as a template for building similar resources in other low‑resource languages (e.g., Galician, Catalan).
Workflow Automation: City clerks can use the model to pre‑populate draft summaries, reducing manual effort and standardizing documentation.
Search & Retrieval: Summaries improve indexing, enabling developers to build smarter Q&A bots that answer citizen queries like “What decisions were made about waste collection in March?” without scanning entire PDFs.

Limitations & Future Work

Size & Diversity: Only 100 minutes from a limited set of municipalities were annotated; scaling to more regions and longer time spans is needed for broader generalization.
Subject Granularity: Summaries target pre‑identified subjects; automatic subject detection (topic segmentation) remains an open challenge.
Evaluation Scope: Metrics focus on n‑gram overlap; human evaluation (readability, factual correctness) is required to assess real‑world utility.
Model Adaptation: Exploring domain‑adapted LLMs (e.g., fine‑tuning GPT‑NeoX on Portuguese legal text) could narrow the performance gap.
Cross‑Lingual Transfer: Investigating whether models trained on CitiLink‑Summ can help summarize minutes in related Romance languages via multilingual transfer learning.

Authors

Miguel Marques
Ana Luísa Fernandes
Ana Filipa Pacheco
Rute Rebouças
Inês Cantante
José Isidro
Luís Filipe Cunha
Alípio Jorge
Nuno Guimarães
Sérgio Nunes
António Leal
Purificação Silvano
Ricardo Campos

Paper Information

arXiv ID: 2602.16607v1
Categories: cs.CL
Published: February 18, 2026
PDF: Download PDF

[Paper] CitiLink-Summ: Summarization of Discussion Subjects in European Portuguese Municipal Meeting Minutes

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

[Paper] RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

[Paper] SPQ: An Ensemble Technique for Large Language Model Compression

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures