[Paper] Enhancing Imbalanced Node Classification via Curriculum-Guided Feature Learning and Three-Stage Attention Network
Source: arXiv - 2602.03808v1
Overview
Imbalanced node classification—where some classes dominate the graph while others are scarce—remains a major obstacle for Graph Neural Networks (GNNs). The paper introduces CL3AN‑GNN, a curriculum‑guided, three‑stage attention architecture that mimics how humans learn from easy to hard concepts, dramatically improving performance on skewed graph data.
Key Contributions
- Curriculum‑guided learning for GNNs: A systematic “easy‑to‑hard” training schedule that first focuses on simple, local patterns before tackling complex, multi‑hop relationships.
- Three‑stage attention mechanism (Engage → Enact → Embed)
- Engage – isolates easy features (1‑hop neighborhoods, low‑degree nodes, class‑separable pairs).
- Enact – adaptively re‑weights harder signals (multi‑step connections, heterophilic edges, minority‑class fringe nodes).
- Embed – consolidates all learned representations via iterative message passing and curriculum‑aligned loss weighting.
- Curriculum‑aligned loss weighting: Dynamically adjusts the contribution of each stage to the overall loss, stabilizing training under severe label skew.
- Extensive empirical validation: Tested on eight Open Graph Benchmark (OGB) datasets covering social, biological, and citation networks, achieving consistent gains in accuracy, macro‑F1, and AUC over the latest baselines.
- Interpretability tools: Gradient‑stability and attention‑correlation visualizations that expose how the model’s focus shifts across curriculum stages.
Methodology
-
Feature Pre‑selection (Engage)
- Compute initial node embeddings with a shallow GCN and a GAT.
- Identify “easy” nodes: those with low degree, strong local homophily, and clear class separation (via cosine similarity of embeddings).
- Feed only these easy features into the first attention block, allowing the network to learn a stable base representation.
-
Adaptive Hard‑Example Emphasis (Enact)
- Introduce a second attention layer that assigns higher weights to:
- Multi‑hop neighborhoods (capturing long‑range dependencies).
- Heterophilic edges (links between different classes).
- Nodes on the periphery of minority classes (often mis‑classified).
- The attention scores are learned jointly with the node embeddings, enabling the model to “focus” where it matters most.
- Introduce a second attention layer that assigns higher weights to:
-
Iterative Consolidation (Embed)
- A final attention‑driven message‑passing stage aggregates the refined features from Engage and Enact.
- The loss function is split into stage‑specific components, each multiplied by a curriculum weight that gradually shifts emphasis from Engage → Enact → Embed as training progresses.
-
Training Pipeline
- Early epochs: high weight on Engage loss → stable convergence on easy patterns.
- Mid epochs: increase Enact weight → the model starts to correct hard examples.
- Late epochs: dominant Embed loss → fine‑tune the full representation for final classification.
The overall pipeline is lightweight (no extra parameters beyond standard GNN layers) and can be dropped into existing GNN stacks.
Results & Findings
| Dataset (OGB) | Baseline (e.g., GraphSMOTE) | CL3AN‑GNN | Δ Accuracy | Δ Macro‑F1 |
|---|---|---|---|---|
| ogbn‑arxiv | 71.4 % | 74.9 % | +3.5 % | +4.2 % |
| ogbn‑products | 62.1 % | 66.0 % | +3.9 % | +5.0 % |
| ogbn‑proteins | 68.7 % | 71.5 % | +2.8 % | +3.6 % |
| … (5 more) | … | … | … | … |
- Consistent gains across all eight benchmarks in accuracy, macro‑F1, and AUC.
- Faster convergence: CL3AN‑GNN reaches 90 % of its final performance in ~30 % fewer epochs compared to end‑to‑end baselines.
- Robustness to unseen imbalance: When the class distribution is artificially skewed further, the curriculum‑trained model degrades far less than competing methods.
- Interpretability: Attention heatmaps show a clear transition from focusing on local neighborhoods (early stage) to long‑range, heterophilic edges (later stage), matching the curriculum design.
Practical Implications
- Better minority‑class detection for fraud, rare disease gene prediction, or niche recommendation systems without needing costly oversampling or synthetic node generation.
- Plug‑and‑play upgrade: Since CL3AN‑GNN builds on standard GCN/GAT layers, developers can integrate it into existing PyTorch‑Geometric or DGL pipelines with a few lines of code.
- Reduced training time: The curriculum schedule stabilizes early learning, meaning fewer epochs and lower GPU hours—valuable for large‑scale industrial graphs.
- Explainable GNN decisions: Stage‑wise attention visualizations can be exposed to end users or auditors to justify why a model flagged a node as belonging to a rare class.
- Transferability: The curriculum framework can be adapted to other graph tasks (link prediction, graph classification) where data imbalance is a concern.
Limitations & Future Work
- Curriculum design heuristics: The current “easy‑to‑hard” criteria (degree, 1‑hop homophily, embedding separability) are hand‑crafted; learning these criteria automatically could further improve adaptability.
- Scalability to billion‑node graphs: While the method adds minimal overhead, the extra attention passes may still be a bottleneck for ultra‑large graphs; distributed implementations are needed.
- Heterophily beyond two hops: The Enact stage focuses on multi‑step connections up to a fixed radius; extending to dynamic radii or graph‑level reasoning is an open direction.
- Broader curriculum schedules: Exploring non‑linear or reinforcement‑learning‑driven curriculum pacing could yield even faster convergence.
Overall, CL3AN‑GNN offers a compelling, developer‑friendly recipe for tackling class imbalance in graph‑structured data, marrying curriculum learning principles with modern attention‑based GNNs.
Authors
- Abdul Joseph Fofanah
- Lian Wen
- David Chen
- Shaoyang Zhang
Paper Information
- arXiv ID: 2602.03808v1
- Categories: cs.LG, cs.AI
- Published: February 3, 2026
- PDF: Download PDF