[Paper] Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators

Published: 5 days ago (May 5, 2026 at 12:52 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2605.03969v1

Overview

Detecting AI‑generated text is becoming a critical security and quality‑control task as large language models (LLMs) proliferate across blogs, news, code comments, and more. This paper introduces a feature‑augmented transformer detector that stays reliable even when the text comes from unseen domains or from different generation pipelines. By fusing classic linguistic cues (readability scores, vocabulary richness, etc.) with a modern DeBERTa backbone, the authors achieve strong cross‑domain performance while using a single, fixed decision threshold—making the system far more practical for real‑world deployment.

Key Contributions

Feature‑augmented architecture: Combines attention‑based linguistic feature fusion (FeatAttn) with DeBERTa‑v3‑base, improving robustness to distribution shift.
Fixed‑threshold evaluation protocol: Calibrates one balanced‑accuracy‑optimal threshold on a validation set and re‑uses it across all test domains, exposing realistic error asymmetries.
Comprehensive cross‑domain benchmarks: In‑domain (HC3 PLUS), cross‑dataset (M4 benchmark), and external (AI‑Text‑Detection‑Pile) evaluations reveal brittleness of vanilla transformers and the gains from feature augmentation.
State‑of‑the‑art results: The DeBERTa‑v3‑base+FeatAttn model reaches 85.9 % balanced accuracy on the challenging M4 benchmark, outperforming strong zero‑shot baselines by up to +7.22 pp.
Ablation insights: Readability and vocabulary‑level features drive most of the robustness improvement, guiding future feature‑engineering efforts.
Stability analysis: Multi‑seed experiments show low variance, confirming that the approach is not a lucky run but a reproducible improvement.

Methodology

Data & Baselines
- Primary training set: HC3 PLUS, a large collection of human‑written and AI‑generated passages spanning multiple topics.
- Baseline models: vanilla BERT, RoBERTa, and DeBERTa transformers trained as binary classifiers.
Feature Extraction
- Compute a suite of linguistic descriptors per text fragment:
  - Readability indices (Flesch‑Kincaid, Gunning Fog, etc.)
  - Vocabulary richness (type‑token ratio, hapax‑legomena count)
  - Surface‑level statistics (sentence length, punctuation density)
- Feed these features into a lightweight attention module that learns how to weight each cue relative to the transformer’s contextual embeddings.
Training & Calibration
- Train the combined model end‑to‑end on HC3 PLUS.
- On a held‑out validation split, sweep thresholds to maximize balanced accuracy (average of true‑positive and true‑negative rates).
- Fix this threshold for every downstream test set—no per‑domain tuning.
Evaluation Protocol
- In‑domain: HC3 PLUS test split (near‑ceiling performance).
- Cross‑domain: M4 benchmark (covers news, scientific, social media, etc.) and AI‑Text‑Detection‑Pile (external, unseen generators).
- Compare against zero‑shot LLM detectors (e.g., GPT‑4‑based classifiers) and earlier BERT/RoBERTa baselines.

Results & Findings

Dataset	Model	Balanced Accuracy
HC3 PLUS (in‑domain)	DeBERTa‑v3‑base+FeatAttn	99.5 %
M4 (cross‑domain)	DeBERTa‑v3‑base+FeatAttn	85.9 %
M4 (cross‑domain)	RoBERTa‑base (no features)	~78 %
AI‑Text‑Detection‑Pile	DeBERTa‑v3‑base+FeatAttn	~82 %
Zero‑shot GPT‑4 detector	–	~78 %

In‑domain performance is near perfect for all modern transformers, confirming that the task is easy when training and test distributions match.
Under shift, vanilla models drop sharply (≈70‑78 % BA), while the feature‑augmented DeBERTa maintains a high‑80s score, demonstrating superior transferability.
Ablation shows that removing readability or vocabulary features reduces cross‑domain BA by ~4‑5 pp, whereas other features (e.g., punctuation) have marginal impact.
Stability: Across 5 random seeds, the DeBERTa‑v3‑base+FeatAttn model’s BA variance is <0.6 pp, indicating robust training dynamics.

Practical Implications

Deployable detector: With a single calibrated threshold, developers can embed the model into content‑moderation pipelines, plagiarism checkers, or API services without per‑client tuning.
Domain‑agnostic security: The approach guards against “adversarial” AI‑generated spam that originates from new LLMs or niche domains (e.g., technical documentation, code comments).
Feature‑driven interpretability: Because readability and lexical richness drive decisions, engineers can surface these cues to users (e.g., “text flagged due to unusually low readability”), aiding transparency.
Cost‑effective scaling: DeBERTa‑v3‑base is comparable in size to BERT‑large, so inference latency remains acceptable for real‑time moderation services.
Benchmarking standard: The fixed‑threshold protocol offers a more realistic evaluation metric for any future AI‑text detector, encouraging the community to report performance under genuine distribution shift.

Limitations & Future Work

Generator coverage: Although the model generalizes across many LLMs, it may still struggle with future architectures that deliberately mimic human linguistic patterns (e.g., models trained with adversarial readability objectives).
Feature engineering overhead: Computing readability scores adds a small preprocessing cost; integrating these cues directly into the transformer (e.g., via token‑level embeddings) could streamline the pipeline.
Binary focus: The study treats detection as a hard yes/no problem; extending to a calibrated confidence score or multi‑class “human / AI‑generated / mixed” could provide richer signals.
Broader modalities: Text often appears alongside code, tables, or images. Future work could explore multimodal fusion (e.g., combining code syntax features with language cues).

Bottom line: By marrying classic linguistic diagnostics with a cutting‑edge DeBERTa transformer, the authors deliver a detector that not only excels when the data is familiar but also holds its ground when the AI‑text landscape shifts—a valuable asset for any developer tasked with safeguarding the integrity of user‑generated content.

Authors

Mohamed Mady
Johannes Reschke
Björn Schuller

Paper Information

arXiv ID: 2605.03969v1
Categories: cs.CL, cs.AI
Published: May 5, 2026
PDF: Download PDF

[Paper] Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

[Paper] CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

[Paper] Fast Byte Latent Transformer

[Paper] Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims