[Paper] TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs
Source: arXiv - 2511.21624v1
Overview
The paper introduces TAGFN, a new, large‑scale text‑attributed graph dataset built specifically for fake‑news detection. By coupling rich textual content with graph structure (e.g., social‑media interactions, article citations), TAGFN gives researchers a realistic benchmark to test both classic graph‑based outlier detectors and the newest Large Language Model (LLM)‑enhanced approaches.
Key Contributions
- A first‑of‑its‑kind dataset for graph‑outlier detection in the fake‑news domain, containing millions of nodes, edges, and high‑quality annotations.
- Unified evaluation framework that supports traditional graph algorithms, graph neural networks (GNNs), and LLM‑augmented models under the same experimental protocol.
- Fine‑tuning pipeline for adapting LLMs (e.g., GPT‑4, LLaMA) to the fake‑news detection task using the graph’s textual attributes.
- Open‑source release of the dataset (via Hugging Face) and accompanying code, encouraging reproducibility and community contributions.
Methodology
- Data collection – The authors harvested news articles, their metadata, and the social‑graph of user interactions from multiple public platforms (e.g., Twitter, Reddit). Each article becomes a node with a text attribute (the article body) and metadata attributes (publisher, timestamp, etc.). Edges capture relationships such as “shared by the same user,” “cites,” or “replies to.”
- Annotation – Articles were labeled as real or fake using verified fact‑checking sources (e.g., PolitiFact, Snopes). The labeling process was semi‑automated and then manually audited to ensure high precision.
- Graph construction – A heterogeneous graph is built where different edge types are preserved, enabling models to learn from both structural patterns (e.g., echo‑chamber clusters) and textual cues.
- Benchmark design – The dataset is split into training/validation/test sets for supervised learning and also provides an unsupervised outlier‑detection split where only a small fraction of nodes are labeled.
- Baseline implementations – The authors evaluate classic outlier detectors (e.g., LOF, Isolation Forest), GNN‑based methods (e.g., GraphSAGE, GAT), and LLM‑enhanced pipelines that concatenate node embeddings from a frozen LLM with graph embeddings.
Results & Findings
| Model | Setting | ROC‑AUC | Precision@100 | Comment |
|---|---|---|---|---|
| Isolation Forest (features only) | Unsupervised | 0.71 | 0.42 | Struggles without graph context |
| GraphSAGE | Supervised | 0.84 | 0.68 | Gains from structural cues |
| GAT + Text Embedding (BERT) | Supervised | 0.88 | 0.73 | Attention over neighbors helps |
| LLM‑Fine‑Tuned (LLaMA‑7B) + GraphSAGE | Supervised | 0.92 | 0.81 | LLM provides richer semantic signals |
| LLM‑Zero‑Shot Prompting | Unsupervised | 0.78 | 0.55 | Competitive without any fine‑tuning |
- LLM‑augmented models consistently outperform pure graph or pure text baselines, confirming that large‑scale language understanding adds value to graph‑based outlier detection.
- Unsupervised LLM prompting (e.g., “Is this article likely fake?”) already beats many classic detectors, showing promise for low‑resource scenarios.
- The heterogeneous edge types (user‑share vs. citation) contribute differently; user‑share edges are the strongest signal for clustering fake news.
Practical Implications
- Misinformation pipelines: Companies building real‑time fact‑checking tools can plug TAGFN‑trained models into their content moderation stacks, leveraging both social‑graph dynamics and article semantics.
- LLM fine‑tuning for domain‑specific safety: The provided fine‑tuning scripts let developers adapt any open‑source LLM to detect fake news with minimal labeled data, reducing reliance on costly human annotation.
- Graph‑aware recommendation systems: Platforms can use the outlier scores to down‑rank or flag suspicious content before it spreads, improving user trust.
- Benchmark for research & product teams: TAGFN offers a reproducible testbed for evaluating new GNN architectures, contrastive learning on graphs, or prompt‑engineering strategies for LLMs in the misinformation space.
Limitations & Future Work
- Temporal bias: The dataset captures a snapshot of news from a specific period; models may degrade as topics and manipulation tactics evolve.
- Platform coverage: While Twitter and Reddit are well represented, other channels (e.g., private messaging apps) are missing, limiting generalizability.
- Label noise: Even with fact‑checking sources, some borderline cases remain ambiguous, potentially affecting supervised training.
- Scalability of LLM fine‑tuning: Fine‑tuning large models (≥13B parameters) still demands substantial GPU resources, which may be prohibitive for smaller teams.
Future directions suggested by the authors include extending TAGFN with temporal edges, incorporating multilingual news, and exploring prompt‑tuning techniques that require fewer compute resources while retaining LLM‑level performance.
If you’re interested in experimenting with TAGFN, the dataset and code are ready to clone from Hugging Face and GitHub. Dive in, and you might be the next to push the frontier of trustworthy AI in the fight against fake news.
Authors
- Kay Liu
- Yuwei Han
- Haoyan Xu
- Henry Peng Zou
- Yue Zhao
- Philip S. Yu
Paper Information
- arXiv ID: 2511.21624v1
- Categories: cs.SI, cs.CL
- Published: November 26, 2025
- PDF: Download PDF