[Paper] Odin: Oriented Dual-module Integration for Text-rich Network Representation Learning
Source: arXiv - 2511.21416v1
Overview
The paper introduces Odin, a novel architecture that fuses textual information and graph structure without relying on traditional multi‑hop message passing. By inserting graph‑aware modules at carefully chosen depths inside a Transformer, Odin delivers richer node representations while sidestepping the over‑smoothing problems that plague many Graph Neural Networks (GNNs). A lightweight variant, Light Odin, brings the same design principles to resource‑constrained settings.
Key Contributions
- Oriented Dual‑module Integration: A mechanism that injects graph structure into a Transformer at low, middle, and high layers, aligning structural abstraction with the model’s semantic hierarchy.
- Hop‑free Design: Eliminates the need for explicit multi‑hop diffusion; multi‑hop context is captured implicitly through the layered integration.
- Theoretical Expressiveness: Proves that Odin’s representational power strictly subsumes that of pure Transformers and standard GNNs.
- Light Odin: A streamlined version that retains the layered structural abstraction while dramatically reducing compute and memory footprints.
- State‑of‑the‑Art Empirics: Sets new accuracy records on several benchmark text‑rich graphs, and Light Odin achieves comparable results with far lower cost.
- Open‑Source Release: Full implementation and pretrained models are publicly available on GitHub.
Methodology
- Base Transformer Backbone – Starts from a standard pre‑trained language model (e.g., BERT) that processes each node’s textual attribute as a sequence and produces a global
[CLS]token representing the node. - Dual‑module Blocks – At selected Transformer layers, a graph module runs in parallel with the usual self‑attention.
- Structure Encoder: Takes the adjacency information (or a learned edge embedding) and aggregates neighbor information only on the
[CLS]token using a lightweight attention‑style operation. - Orientation Mechanism: Controls when and how the graph signal is merged, allowing early layers to capture local topology, middle layers to blend medium‑range patterns, and deep layers to encode high‑level structural cues.
- Structure Encoder: Takes the adjacency information (or a learned edge embedding) and aggregates neighbor information only on the
- Fusion Strategy – The graph output is added (or concatenated) to the Transformer hidden state before the next self‑attention block, preserving the original language modeling capacity while enriching it with topology.
- Light Odin Optimizations – Replaces full attention with linearized attention, reduces hidden dimensions in the graph encoder, and shares parameters across layers to cut down FLOPs.
The overall pipeline remains end‑to‑end differentiable, so the model can be fine‑tuned on downstream tasks such as node classification, link prediction, or graph‑level classification.
Results & Findings
| Dataset (text‑rich) | Baseline (GNN) | Baseline (Transformer) | Odin | Light Odin |
|---|---|---|---|---|
| Cora‑Text | 78.3 % | 81.1 % | 84.7 % | 84.2 % |
| PubMed‑Abstract | 81.5 % | 83.0 % | 86.9 % | 86.4 % |
| Amazon‑Reviews | 73.2 % | 75.6 % | 79.8 % | 79.3 % |
| Ogbn‑Arxiv (full‑text) | 71.4 % | 73.9 % | 77.5 % | 77.0 % |
- Accuracy Gains: Odin consistently outperforms both pure GNNs and pure Transformers by 3–5 percentage points.
- Training Efficiency: Light Odin reduces training time by ~40 % and memory usage by ~35 % while staying within 0.5 % of Odin’s accuracy.
- Ablation: Removing the oriented integration (i.e., injecting graph info at every layer) hurts performance, confirming the importance of hierarchical placement.
- Expressiveness Test: On synthetic graphs designed to separate GNN and Transformer capabilities, Odin solves all cases that either model alone can, demonstrating the claimed strict superset property.
Practical Implications
- Unified Text‑Graph Pipelines: Developers can now use a single model for tasks that previously required a two‑stage pipeline (language model + GNN), simplifying codebases and deployment.
- Scalable Knowledge Graph Enrichment: Because Odin does not depend on costly multi‑hop message passing, it scales better to massive knowledge graphs where neighbor explosion is a bottleneck.
- Low‑Resource Scenarios: Light Odin makes it feasible to run sophisticated text‑graph reasoning on edge devices or in latency‑critical services (e.g., recommendation engines that need to incorporate product descriptions and co‑purchase graphs in real time).
- Fine‑tuning Flexibility: The architecture can be dropped into existing Transformer‑based code (Hugging Face Transformers) with minimal changes—just add the dual‑module layers and feed adjacency matrices.
- Better Generalization: By aligning structural abstraction with semantic depth, Odin reduces over‑smoothing, leading to more robust node embeddings that retain discriminative power even in dense graphs.
Limitations & Future Work
- Adjacency Requirement: Odin still needs an explicit graph structure; it cannot infer latent connections from text alone.
- Fixed Integration Points: The current design uses manually chosen layers for graph injection; learning optimal insertion points could further boost performance.
- Edge Feature Simplicity: The paper treats edges as binary or simple embeddings; richer edge attributes (timestamps, weights) are not fully explored.
- Benchmark Diversity: Experiments focus on academic citation and product‑review graphs; applying Odin to heterogeneous graphs (e.g., multimodal social networks) remains an open direction.
Future research may address adaptive layer selection, richer edge modeling, and extensions to dynamic or streaming graphs, potentially widening Odin’s applicability across more real‑world AI systems.
Authors
- Kaifeng Hong
- Yinglong Zhang
- Xiaoying Hong
- Xuewen Xia
- Xing Xu
Paper Information
- arXiv ID: 2511.21416v1
- Categories: cs.CL, cs.LG
- Published: November 26, 2025
- PDF: Download PDF