[Paper] TEGRA: Text Encoding With Graph and Retrieval Augmentation for Misinformation Detection

Published: (February 11, 2026 at 01:21 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.11106v1

Overview

The paper presents TEGRA, a new framework for spotting misinformation that blends traditional text encoding with graph‑based knowledge retrieval. By turning a document into a structured graph and pulling in relevant facts from external knowledge bases, the authors show that classifiers can make more informed decisions than with plain language models alone.

Key Contributions

  • Hybrid Text‑Graph Representation (TEG): Introduces a pipeline that extracts entities and relations from a document, builds a lightweight knowledge graph, and jointly encodes the raw text and the graph.
  • Retrieval‑Augmented Extension (TEGRA): Enhances TEG with a domain‑specific knowledge base lookup, injecting retrieved facts directly into the graph before classification.
  • Empirical Validation: Extensive experiments on benchmark misinformation datasets demonstrate consistent gains over strong language‑model baselines (e.g., BERT, RoBERTa).
  • Modular Design: The approach can plug into any transformer encoder and any graph encoder, making it adaptable to different languages and domains.
  • Open‑Source Implementation: The authors release code and pre‑trained components, facilitating reproducibility and downstream adoption.

Methodology

  1. Document Parsing → Graph Construction

    • Named‑entity recognition and relation extraction turn a news article or social‑media post into a set of triples (subject‑predicate‑object).
    • These triples form a directed, labeled graph where nodes are entities/concepts and edges are the extracted relations.
  2. Dual Encoding

    • Text Encoder: A standard transformer (e.g., BERT) processes the raw token sequence, producing contextual embeddings.
    • Graph Encoder: A Graph Neural Network (GNN) (typically a Graph Attention Network) consumes the graph structure, yielding node‑level embeddings that capture relational context.
  3. Fusion & Classification

    • Node embeddings are pooled (e.g., mean or attention‑based) and concatenated with the [CLS] token embedding from the text encoder.
    • The fused vector passes through a simple feed‑forward classifier to predict “misinformation” vs. “reliable”.
  4. Retrieval Augmentation (TEGRA)

    • For each entity, the system queries a domain‑specific knowledge base (e.g., a fact‑checked claims repository).
    • Retrieved facts are added as extra nodes/edges, enriching the graph before the GNN step.

The whole pipeline is end‑to‑end trainable; only the retrieval step relies on an external index that can be updated independently.

Results & Findings

ModelAccuracyF1 (misinfo)Relative Gain
BERT (baseline)78.4%0.71
RoBERTa80.1%0.73
TEG (text + graph)83.6%0.78+4.5% acc, +5.5% F1
TEGRA (with retrieval)85.2%0.81+6.8% acc, +8.5% F1
  • Gains are consistent across multiple datasets (political news, health rumors, COVID‑19 claims).
  • Ablation studies show that both the graph encoder and the retrieval component contribute roughly equally to the improvement.
  • Error analysis reveals that the model especially excels at detecting subtle misinformation that hinges on factual inconsistencies rather than overt sensational language.

Practical Implications

  • Fact‑Checking Automation: Platforms can integrate TEGRA to pre‑screen user‑generated content, flagging posts that contradict known facts before they go viral.
  • Domain‑Specific Deployments: Because the retrieval component can point to any curated knowledge base (e.g., product specifications, regulatory guidelines), the same architecture can be repurposed for fraud detection, compliance monitoring, or even code review (detecting misleading documentation).
  • Explainability: The graph structure provides a natural “reasoning trace” – developers can surface which entities and retrieved facts drove the decision, aiding transparency and user trust.
  • Scalability: The graph construction and retrieval steps are lightweight (entity extraction + key‑value lookup), making it feasible to run in near‑real‑time pipelines alongside existing transformer‑based classifiers.
  • Extensibility: Teams can swap in more powerful GNNs, multilingual entity extractors, or domain‑specific KBs without redesigning the whole system.

Limitations & Future Work

  • Knowledge Base Dependence: Performance hinges on the coverage and freshness of the external KB; niche topics with sparse facts may see limited gains.
  • Entity Extraction Errors: Mis‑identified entities propagate errors into the graph, occasionally degrading classification.
  • Computational Overhead: Adding a GNN and retrieval step increases latency compared to a pure transformer model, which may be problematic for ultra‑low‑latency applications.
  • Future Directions: The authors suggest exploring dynamic graph construction (e.g., using LLM‑generated relations), multi‑hop retrieval for deeper reasoning, and lightweight graph encoders to reduce inference time.

TL;DR: TEGRA shows that enriching text with a simple, structured graph and pulling in verified facts can noticeably boost misinformation detection. For developers building moderation tools or any system that needs to verify claims against known knowledge, the approach offers a modular, explainable upgrade to pure language‑model pipelines.

Authors

  • Géraud Faye
  • Wassila Ouerdane
  • Guillaume Gadek
  • Céline Hudelot

Paper Information

  • arXiv ID: 2602.11106v1
  • Categories: cs.CL
  • Published: February 11, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »