[Paper] G-Loss: Graph-Guided Fine-Tuning of Language Models

Published: 20 hours ago (April 28, 2026 at 12:55 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.25853v1

Overview

The paper introduces G‑Loss, a new loss function for fine‑tuning large language models (LLMs) such as BERT. By weaving a graph that reflects global document‑level similarities into the training objective, G‑Loss helps models learn embeddings that respect the broader semantic landscape—something traditional losses (cross‑entropy, contrastive, etc.) overlook.

Key Contributions

Graph‑guided loss formulation that integrates semi‑supervised label propagation directly into the fine‑tuning objective.
Document‑similarity graph construction from the embedding space, capturing global semantic relations across the whole training corpus.
Empirical validation on five diverse text classification benchmarks (MR, R8, R52, Ohsumed, 20NG), showing faster convergence and higher accuracy versus standard loss functions.
Visualization and analysis of the learned embedding spaces, demonstrating improved semantic coherence and class separability.

Methodology

Base Model – Start with a pre‑trained transformer (e.g., BERT) and obtain initial token/CLS embeddings for every document in the fine‑tuning set.
Graph Construction – Compute pairwise cosine similarities between document embeddings and keep the top‑k nearest neighbors for each node, forming an undirected similarity graph (G = (V, E)).
Label Propagation – Treat the available class labels as seeds and run a semi‑supervised propagation algorithm (e.g., personalized PageRank) on (G) to generate soft pseudo‑labels for unlabeled edges.
G‑Loss Definition – Combine the standard supervised loss (cross‑entropy) with a graph‑regularization term that penalizes discrepancies between a node’s embedding and the propagated label distribution of its neighbors. Formally:

[ \mathcal{L}{\text{G‑Loss}} = \mathcal{L}{\text{sup}} + \lambda \sum_{(i,j)\in E} w_{ij}, \text{KL}\big(p_i ,|, p_j\big) ]

where (w_{ij}) are edge weights, (p_i) are the model’s predicted class distributions, and (\lambda) balances the two terms.

Fine‑tuning Loop – Optimize the combined loss end‑to‑end; the graph is recomputed periodically (e.g., every epoch) to reflect the evolving embedding space.

Results & Findings

Dataset	Baseline (Cross‑Entropy)	G‑Loss	Δ Accuracy	Convergence (epochs)
MR (sentiment)	88.2 %	90.5 %	+2.3 %	3 → 2
R8 (topic)	94.1 %	95.6 %	+1.5 %	4 → 2
R52 (topic)	92.8 %	94.3 %	+1.5 %	5 → 3
Ohsumed (medical)	78.4 %	81.0 %	+2.6 %	6 → 4
20NG (news)	84.7 %	86.9 %	+2.2 %	5 → 3

Faster convergence: G‑Loss typically reaches its peak performance 30–50 % earlier than the baseline.
Richer embeddings: t‑SNE visualizations show tighter intra‑class clusters and clearer inter‑class margins.
Robustness to label scarcity: When only 20 % of training labels are retained, G‑Loss degrades only ~1 % versus ~3 % for the baseline, highlighting the benefit of the graph’s semi‑supervised signal.

Practical Implications

Improved downstream classifiers: Developers can plug G‑Loss into existing fine‑tuning pipelines (PyTorch, Hugging Face Transformers) to boost accuracy on any text classification task without architectural changes.
Reduced training time: Faster convergence translates to lower GPU hours, which is attractive for production environments where model updates are frequent.
Better handling of noisy or sparse labels: The graph‑based regularization acts as a “semantic smoothing” layer, making models more tolerant to mislabeled data—a common pain point in real‑world corpora.
Potential for retrieval & clustering: Since G‑Loss yields embeddings that respect global similarity, the same fine‑tuned model can be reused for semantic search, duplicate detection, or topic clustering with minimal extra work.

Limitations & Future Work

Graph construction overhead: Building and updating the similarity graph can be costly for very large datasets; the authors suggest approximate nearest‑neighbor methods as a mitigation.
Hyper‑parameter sensitivity: The balance factor (\lambda) and the number of neighbors (k) need careful tuning; default values work well on the benchmarks but may need adjustment for domain‑specific data.
Scope limited to classification: Experiments focus on supervised classification; extending G‑Loss to generation‑oriented tasks (e.g., QA, summarization) remains an open question.

Overall, G‑Loss offers a pragmatic way to inject global semantic awareness into language model fine‑tuning, promising tangible gains for developers building robust NLP services.

Authors

Sharma Aditya
Agarwal Vinti
Kumar Rajesh

Paper Information

arXiv ID: 2604.25853v1
Categories: cs.CL, cs.AI, cs.LG
Published: April 28, 2026
PDF: Download PDF

[Paper] G-Loss: Graph-Guided Fine-Tuning of Language Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recursive Multi-Agent Systems

[Paper] Toward a Functional Geometric Algebra for Natural Language Semantics

[Paper] Three Models of RLHF Annotation: Extension, Evidence, and Authority

[Paper] Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling