[Paper] OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations

Published: (December 10, 2025 at 11:18 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.09804v1

Overview

The paper introduces OnCoCo 1.0, a publicly released dataset that captures ~2,800 individual messages from online counseling sessions and annotates them with a fine‑grained taxonomy (38 counselor‑type labels + 28 client‑type labels). By moving beyond the narrow, interview‑style coding schemes that dominate the field, the authors provide a resource that can power more nuanced NLP models for mental‑health chatbots, analytics dashboards, and therapist‑assist tools.

Key Contributions

  • A new coding scheme covering 66 distinct utterance types (38 counselor, 28 client), designed specifically for text‑based online counseling.
  • OnCoCo 1.0 dataset: 2,800 manually labeled messages drawn from real‑world counseling conversations, released under an open licence.
  • Baseline models: Fine‑tuned transformer classifiers (BERT, RoBERTa, etc.) benchmarked on the dataset, with code and trained checkpoints made publicly available.
  • Comprehensive analysis showing how the new taxonomy captures conversational dynamics that traditional Motivational Interviewing (MI) codes miss.
  • Resource package (data, annotation guidelines, and scripts) that can be directly plugged into existing NLP pipelines for mental‑health applications.

Methodology

  1. Taxonomy design – The authors surveyed existing counseling coding systems (e.g., MI, CBT) and identified gaps for asynchronous, text‑only sessions. They iteratively merged, split, and refined categories with input from clinical psychologists, ending up with 66 fine‑grained labels.
  2. Data collection – Anonymous chat logs from a licensed online counseling platform were sampled, de‑identified, and segmented into single‑utterance messages.
  3. Annotation process – Two trained annotators labeled each message; inter‑annotator agreement (Cohen’s κ ≈ 0.78) was achieved after a pilot phase and periodic adjudication.
  4. Model training – Standard pretrained language models (BERT‑base, RoBERTa‑large) were fine‑tuned on the 66‑class classification task using a stratified 80/10/10 train/validation/test split. Hyperparameters were kept simple (learning rate 2e‑5, batch size 16, 3 epochs) to showcase baseline performance.
  5. Evaluation – Accuracy, macro‑F1, and per‑class confusion matrices were reported, alongside ablation experiments that test the impact of the full taxonomy versus a collapsed MI‑style label set.

Results & Findings

ModelAccuracyMacro‑F1
BERT‑base71.4 %0.68
RoBERTa‑large73.9 %0.71
MI‑only baseline (10 classes)62.1 %0.55
  • The fine‑grained taxonomy yields ~10 % higher macro‑F1 than a conventional MI‑style label set, indicating better discrimination of subtle counseling moves.
  • Error analysis shows most confusion occurs between semantically adjacent labels (e.g., “reflective listening” vs. “affirmation”), suggesting that richer contextual modeling (dialogue history) could push performance further.
  • The public release of pretrained checkpoints enables developers to plug‑and‑play the classifier into downstream applications without retraining from scratch.

Practical Implications

  • Chatbot enhancement – Developers building mental‑health conversational agents can use the classifier to detect specific therapist strategies (e.g., “open‑ended question”, “validation”) and adapt responses in real time, leading to more empathetic and effective interactions.
  • Quality assurance for tele‑therapy platforms – Automated tagging of counselor and client utterances supports compliance monitoring, therapist training, and outcome analytics without manual chart review.
  • Research‑ready benchmark – OnCoCo 1.0 offers a ready‑made testbed for experimenting with multi‑label, hierarchical, or few‑shot learning techniques in the mental‑health domain.
  • Integration with existing pipelines – Because the dataset and models are released in standard Hugging Face format, they can be dropped into pipelines that already use BERT/RoBERTa for sentiment analysis, intent detection, or dialogue act classification.

Limitations & Future Work

  • Scope of data – The corpus contains only ~2.8 k messages from a single counseling service, which may limit generalizability across cultural contexts or different therapeutic modalities.
  • Single‑utterance focus – Labels are assigned per message without explicit modeling of dialogue history; future work could explore sequential models (e.g., Transformers with memory, RNNs) to capture turn‑taking dynamics.
  • Class imbalance – Some fine‑grained categories appear only a handful of times, leading to lower per‑class performance; techniques like data augmentation or hierarchical classification are suggested.
  • Ethical considerations – While the data are anonymized, deploying automated classifiers in mental‑health settings raises privacy and bias concerns that need careful governance.

The authors plan to expand OnCoCo with multilingual extensions, richer metadata (e.g., session outcomes), and to benchmark newer architectures such as instruction‑tuned LLMs.

Authors

  • Jens Albrecht
  • Robert Lehmann
  • Aleksandra Poltermann
  • Eric Rudolph
  • Philipp Steigerwald
  • Mara Stieler

Paper Information

  • arXiv ID: 2512.09804v1
  • Categories: cs.CL, cs.LG
  • Published: December 10, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »