[Paper] Tackling a Challenging Corpus for Early Detection of Gambling Disorder: UNSL at MentalRiskES 2025

Published: (November 28, 2025 at 11:26 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.23325v1

Overview

The paper reports on UNSL’s winning entry in the MentalRiskES 2025 challenge, where the goal is to flag social‑media users who are at high risk of developing a gambling disorder. By combining lightweight pattern‑based classifiers with large‑scale language models, the authors demonstrate that it’s possible to achieve both high predictive accuracy and fast decision‑making—two criteria that are often at odds in real‑time mental‑health monitoring systems.

Key Contributions

  • Hybrid CPI + DMC framework that treats predictive performance and decision latency as separate optimization objectives.
  • Three concrete models:
    1. SS3 – a transparent, token‑level classifier that can explain why a post is risky.
    2. BERT‑extended – a fine‑tuned BERT model with a custom gambling‑specific vocabulary.
    3. SBERT – a sentence‑embedding model used for similarity‑based risk scoring.
  • Decision policies that aggregate a user’s historical posts, allowing the system to trigger an alert as soon as enough evidence accumulates.
  • Top‑2 placement in the official leaderboard, especially excelling on the “decision‑speed” metric.
  • Error analysis that surfaces the intrinsic difficulty of separating borderline (low‑risk) from truly high‑risk users, highlighting data quality issues.

Methodology

  1. Corpus & Task – The challenge provides a multilingual corpus of Reddit‑style posts labeled high‑risk or low‑risk for gambling disorder.
  2. CPI (Classification‑Performance‑Indicator) + DMC (Decision‑Making‑Cost) – Instead of a single loss, the authors optimize two scores:
    • CPI: traditional F1/accuracy on the validation set.
    • DMC: average number of posts examined before the system makes a prediction.
  3. Model Stack
    • SS3: builds a hierarchical word‑frequency map that can be inspected to see which terms contributed to a risk label.
    • BERT‑extended: the base BERT tokenizer is enriched with gambling‑related slang, emojis, and domain‑specific hashtags, then fine‑tuned on the training split.
    • SBERT: encodes each post into a dense vector; similarity to known high‑risk prototypes drives a risk score.
  4. Decision Policies – For each user, the system updates a cumulative risk score after every new post. When the score crosses a pre‑defined threshold, an alert is emitted. Two thresholds are explored: a conservative one (high precision, slower) and an aggressive one (higher recall, faster).
  5. Evaluation – The challenge’s official metrics combine classification quality (macro‑F1) with decision latency (average posts needed). The authors report both per‑model and ensemble results.

Results & Findings

ModelMacro‑F1Avg. Posts to Decision
SS30.784.2
BERT‑extended0.815.1
SBERT0.763.8
Ensemble (best‑of‑three)0.844.0
  • The ensemble secured 2nd place overall and 1st place on the decision‑speed sub‑metric.
  • Error analysis revealed that many misclassifications involved users whose language hovered around gambling terminology without clear signs of pathological behavior, underscoring the semantic ambiguity in the data.
  • The transparent SS3 model, while slightly less accurate than BERT, offered valuable interpretability that could be crucial for clinical hand‑off.

Practical Implications

  • Real‑time monitoring tools: The decision‑policy approach enables platforms (e.g., forums, streaming chat) to flag at‑risk users after only a handful of posts, supporting timely interventions.
  • Explainable AI for mental health: SS3’s token‑level explanations can be presented to moderators or clinicians, fostering trust and facilitating manual review.
  • Domain‑adapted language models: Extending BERT’s vocabulary with gambling‑specific slang dramatically improves detection—an approach that can be replicated for other behavioral addictions (e.g., gaming, shopping).
  • Resource‑constrained deployment: Because SS3 and SBERT are lightweight compared to full BERT, a hybrid pipeline can run on edge devices or low‑cost servers, making the solution viable for NGOs or smaller platforms.
  • Policy design: The dual‑objective framework (accuracy vs. speed) gives product teams a tunable knob to align the system with their risk tolerance—e.g., stricter thresholds for platforms with higher legal liability.

Limitations & Future Work

  • Data quality: The corpus contains noisy labels and limited contextual information (e.g., no user demographics), which hampers the model’s ability to disambiguate borderline cases.
  • Generalization: All experiments are confined to the provided Reddit‑style dataset; cross‑platform validation (Twitter, Discord, etc.) is still an open question.
  • Ethical safeguards: The paper notes the need for transparent consent mechanisms and bias audits before deploying such detectors in the wild.
  • Future directions suggested by the authors include: augmenting the training set with clinician‑annotated examples, exploring multimodal signals (e.g., images, emojis), and integrating causal inference methods to better understand the progression from low‑ to high‑risk behavior.

Authors

  • Horacio Thompson
  • Marcelo Errecalde

Paper Information

  • arXiv ID: 2511.23325v1
  • Categories: cs.CL
  • Published: November 28, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »