[Paper] Enhancing Imbalanced Node Classification via Curriculum-Guided Feature Learning and Three-Stage Attention Network

Published: 3 months ago (February 3, 2026 at 01:10 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.03808v1

Overview

Imbalanced node classification—where some classes dominate the graph while others are scarce—remains a major obstacle for Graph Neural Networks (GNNs). The paper introduces CL3AN‑GNN, a curriculum‑guided, three‑stage attention architecture that mimics how humans learn from easy to hard concepts, dramatically improving performance on skewed graph data.

Key Contributions

Curriculum‑guided learning for GNNs: A systematic “easy‑to‑hard” training schedule that first focuses on simple, local patterns before tackling complex, multi‑hop relationships.
Three‑stage attention mechanism (Engage → Enact → Embed)
- Engage – isolates easy features (1‑hop neighborhoods, low‑degree nodes, class‑separable pairs).
- Enact – adaptively re‑weights harder signals (multi‑step connections, heterophilic edges, minority‑class fringe nodes).
- Embed – consolidates all learned representations via iterative message passing and curriculum‑aligned loss weighting.
Curriculum‑aligned loss weighting: Dynamically adjusts the contribution of each stage to the overall loss, stabilizing training under severe label skew.
Extensive empirical validation: Tested on eight Open Graph Benchmark (OGB) datasets covering social, biological, and citation networks, achieving consistent gains in accuracy, macro‑F1, and AUC over the latest baselines.
Interpretability tools: Gradient‑stability and attention‑correlation visualizations that expose how the model’s focus shifts across curriculum stages.

Methodology

Feature Pre‑selection (Engage)
- Compute initial node embeddings with a shallow GCN and a GAT.
- Identify “easy” nodes: those with low degree, strong local homophily, and clear class separation (via cosine similarity of embeddings).
- Feed only these easy features into the first attention block, allowing the network to learn a stable base representation.
Adaptive Hard‑Example Emphasis (Enact)
- Introduce a second attention layer that assigns higher weights to:
  - Multi‑hop neighborhoods (capturing long‑range dependencies).
  - Heterophilic edges (links between different classes).
  - Nodes on the periphery of minority classes (often mis‑classified).
- The attention scores are learned jointly with the node embeddings, enabling the model to “focus” where it matters most.
Iterative Consolidation (Embed)
- A final attention‑driven message‑passing stage aggregates the refined features from Engage and Enact.
- The loss function is split into stage‑specific components, each multiplied by a curriculum weight that gradually shifts emphasis from Engage → Enact → Embed as training progresses.
Training Pipeline
- Early epochs: high weight on Engage loss → stable convergence on easy patterns.
- Mid epochs: increase Enact weight → the model starts to correct hard examples.
- Late epochs: dominant Embed loss → fine‑tune the full representation for final classification.

The overall pipeline is lightweight (no extra parameters beyond standard GNN layers) and can be dropped into existing GNN stacks.

Results & Findings

Dataset (OGB)	Baseline (e.g., GraphSMOTE)	CL3AN‑GNN	Δ Accuracy	Δ Macro‑F1
ogbn‑arxiv	71.4 %	74.9 %	+3.5 %	+4.2 %
ogbn‑products	62.1 %	66.0 %	+3.9 %	+5.0 %
ogbn‑proteins	68.7 %	71.5 %	+2.8 %	+3.6 %
… (5 more)	…	…	…	…

Consistent gains across all eight benchmarks in accuracy, macro‑F1, and AUC.
Faster convergence: CL3AN‑GNN reaches 90 % of its final performance in ~30 % fewer epochs compared to end‑to‑end baselines.
Robustness to unseen imbalance: When the class distribution is artificially skewed further, the curriculum‑trained model degrades far less than competing methods.
Interpretability: Attention heatmaps show a clear transition from focusing on local neighborhoods (early stage) to long‑range, heterophilic edges (later stage), matching the curriculum design.

Practical Implications

Better minority‑class detection for fraud, rare disease gene prediction, or niche recommendation systems without needing costly oversampling or synthetic node generation.
Plug‑and‑play upgrade: Since CL3AN‑GNN builds on standard GCN/GAT layers, developers can integrate it into existing PyTorch‑Geometric or DGL pipelines with a few lines of code.
Reduced training time: The curriculum schedule stabilizes early learning, meaning fewer epochs and lower GPU hours—valuable for large‑scale industrial graphs.
Explainable GNN decisions: Stage‑wise attention visualizations can be exposed to end users or auditors to justify why a model flagged a node as belonging to a rare class.
Transferability: The curriculum framework can be adapted to other graph tasks (link prediction, graph classification) where data imbalance is a concern.

Limitations & Future Work

Curriculum design heuristics: The current “easy‑to‑hard” criteria (degree, 1‑hop homophily, embedding separability) are hand‑crafted; learning these criteria automatically could further improve adaptability.
Scalability to billion‑node graphs: While the method adds minimal overhead, the extra attention passes may still be a bottleneck for ultra‑large graphs; distributed implementations are needed.
Heterophily beyond two hops: The Enact stage focuses on multi‑step connections up to a fixed radius; extending to dynamic radii or graph‑level reasoning is an open direction.
Broader curriculum schedules: Exploring non‑linear or reinforcement‑learning‑driven curriculum pacing could yield even faster convergence.

Overall, CL3AN‑GNN offers a compelling, developer‑friendly recipe for tackling class imbalance in graph‑structured data, marrying curriculum learning principles with modern attention‑based GNNs.

Authors

Abdul Joseph Fofanah
Lian Wen
David Chen
Shaoyang Zhang

Paper Information

arXiv ID: 2602.03808v1
Categories: cs.LG, cs.AI
Published: February 3, 2026
PDF: Download PDF

[Paper] Enhancing Imbalanced Node Classification via Curriculum-Guided Feature Learning and Three-Stage Attention Network

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

[Paper] Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches

[Paper] Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay

[Paper] Reliable Mislabel Detection for Video Capsule Endoscopy Data