[Paper] Feature Selection Empowered BERT for Detection of Hate Speech with Vocabulary Augmentation

Published: (December 1, 2025 at 02:11 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.02141v1

Overview

The paper proposes a leaner way to fine‑tune BERT for hate‑speech detection that cuts down training data and compute without sacrificing accuracy. By selecting the most informative samples and expanding BERT’s tokenizer with slang and misspelled abusive terms, the authors show a practical path toward faster, more adaptable moderation models.

Key Contributions

  • Data‑efficient sample selection: Uses TF‑IDF scores to keep only the top 75 % of training examples that carry the most information.
  • Vocabulary augmentation: Extends BERT’s WordPiece tokenizer with a curated list of hate‑speech slang, leet‑speak, and lexical variants that are otherwise split into sub‑tokens.
  • Empirical validation: Demonstrates that the reduced‑data, augmented‑vocab model matches or exceeds baseline BERT performance on a standard hate‑speech benchmark.
  • Computational savings: Shows a measurable drop in training time and memory usage, making the approach attractive for production pipelines.

Methodology

  1. Dataset preprocessing – The authors start with a publicly available hate‑speech dataset (e.g., the Davidson or Founta corpus).
  2. TF‑IDF‑based pruning – For each training example, a TF‑IDF vector is computed across the corpus. Samples with the lowest aggregate TF‑IDF scores (the bottom 25 %) are discarded, under the assumption that they contribute little discriminative signal.
  3. Tokenizer enrichment – A domain‑specific lexicon is built by mining frequent abusive slang, obfuscations (e.g., “h8”, “n1g@”), and community‑specific variants. These terms are added to BERT’s tokenizer as new tokens, preventing them from being broken into generic sub‑words.
  4. Fine‑tuning – The reduced dataset and the augmented tokenizer are used to fine‑tune the standard BERT‑base model. Hyper‑parameters remain largely unchanged to isolate the effect of the two interventions.
  5. Evaluation – Standard metrics (accuracy, F1‑score, precision, recall) are reported and compared against a baseline BERT model trained on the full dataset with the original tokenizer.

Results & Findings

  • Performance parity: The trimmed‑data model (75 % of samples) achieved an F1‑score within 0.3 % of the full‑data baseline, confirming that the discarded examples were largely redundant.
  • Boost from augmentation: Adding the slang tokens lifted the F1‑score by ~1.2 % over the trimmed‑data baseline, indicating that BERT’s default vocabulary misses many abusive cues.
  • Training efficiency: Epoch time dropped by roughly 30 % and peak GPU memory usage fell by ~20 % thanks to the smaller training set.
  • Robustness to novel terms: In a held‑out test set containing newly coined slurs, the augmented model retained higher recall (≈ 4 % absolute gain) than the vanilla BERT, demonstrating better adaptability to evolving language.

Practical Implications

  • Faster model iteration: Teams can retrain moderation models more frequently (e.g., weekly) without incurring prohibitive compute costs, enabling quicker response to emerging hate‑speech trends.
  • Lower infrastructure budget: Smaller training sets translate to reduced cloud GPU spend, making advanced NLP moderation accessible to startups and smaller platforms.
  • Improved detection of evasive language: By explicitly teaching the tokenizer to recognize slang and leet‑speak, moderation APIs become less vulnerable to simple obfuscation tricks.
  • Plug‑and‑play augmentation pipeline: The lexical‑augmentation step can be automated (e.g., via periodic scraping of hate‑speech forums) and integrated into existing BERT fine‑tuning scripts with minimal code changes.

Limitations & Future Work

  • Lexicon maintenance: The slang list requires continual updates; automated discovery pipelines may be needed to keep pace with rapid meme evolution.
  • Generalization to other domains: The TF‑IDF pruning strategy was evaluated only on one hate‑speech benchmark; its effectiveness on larger, more diverse corpora remains to be tested.
  • Model size constraints: The study focused on BERT‑base; scaling the approach to larger transformers (e.g., RoBERTa‑large) could reveal different trade‑offs in memory and speed.
  • Bias considerations: Adding domain‑specific tokens may inadvertently amplify biases if the curated list over‑represents certain groups; future work should incorporate bias‑mitigation checks.

Bottom line: By intelligently trimming training data and teaching BERT the language of hate, developers can build faster, cheaper, and more resilient content‑moderation models that stay ahead of the ever‑shifting slang landscape.

Authors

  • Pritish N. Desai
  • Tanay Kewalramani
  • Srimanta Mandal

Paper Information

  • arXiv ID: 2512.02141v1
  • Categories: cs.CL, cs.AI, cs.NE
  • Published: December 1, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »