[Paper] Bridging Code Graphs and Large Language Models for Better Code Understanding

Published: 1 day ago (December 8, 2025 at 11:00 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.07666v1

Overview

Large Language Models (LLMs) have become the go‑to tool for many code‑related tasks, but they still treat source code as a flat string of tokens. This ignores the rich graph‑like structure that compilers and developers rely on (e.g., abstract syntax trees, data‑flow graphs). The paper introduces CGBridge, a plug‑and‑play “bridge” that injects code‑graph knowledge into any frozen LLM, yielding noticeably better performance on summarization, translation, and other code‑understanding benchmarks while keeping inference fast.

Key Contributions

Graph‑aware pre‑training: Trains a dedicated code‑graph encoder on 270 K real‑world code graphs to capture structural semantics.
Cross‑modal bridge module: Aligns code tokens, graph embeddings, and natural‑language prompts via cross‑attention, producing structure‑enriched prompts without touching the LLM’s weights.
Plug‑and‑play design: Works with any off‑the‑shelf instruction‑following LLM (e.g., GPT‑3.5, LLaMA) – no architectural changes or massive fine‑tuning required.
Empirical gains: Shows up to 16 % relative improvement on LLM‑as‑a‑Judge for code summarization and up to 39 % boost in execution accuracy for code translation.
Efficiency: Inference is > 4× faster than LoRA‑based fine‑tuning, because only the lightweight bridge runs at inference time.

Methodology

Code Graph Encoder
- Constructs graph representations (AST, control‑flow, data‑flow) for each source file.
- Trains the encoder with self‑supervised objectives (node masking, edge prediction) on a large corpus of 270 K graphs, so it learns to embed structural patterns.
Bridge Module
- Takes three inputs: (a) the original tokenized code, (b) the graph encoder’s embeddings, and (c) the natural‑language task prompt.
- Uses a cross‑modal attention layer to let each modality “talk” to the others, producing a structure‑informed prompt that succinctly encodes graph semantics.
Integration with a Frozen LLM
- The LLM’s parameters stay frozen. The bridge‑generated prompt is concatenated to the user’s prompt and fed to the LLM.
- Only the bridge is fine‑tuned on downstream tasks (e.g., summarization, translation), dramatically reducing training cost.
Evaluation
- Benchmarks include code summarization (using LLM‑as‑a‑Judge) and code translation (measured by execution accuracy).
- Baselines: vanilla LLM, graph‑augmented prompting (simple concatenation of graph tokens), and LoRA‑based fine‑tuning.

Results & Findings

Task	Baseline (LLM)	Graph‑augmented Prompt	CGBridge	Relative Gain vs. Baseline
Code Summarization (LLM‑as‑a‑Judge)	–	+9.12 %	+16.19 %	16 %
Code Translation (Execution Accuracy)	–	+9.84 %	+38.87 %	39 %
Inference Speed (relative to LoRA)	1×	1×	>4× faster	—

What this means: The bridge not only injects useful structural cues but does so more efficiently than traditional parameter‑efficient fine‑tuning. The larger gains on translation suggest that structural correctness (e.g., preserving control flow) benefits heavily from graph knowledge.

Practical Implications

Developer tools: IDE plugins that rely on LLMs for code suggestions, documentation generation, or automated refactoring can become more accurate without needing to retrain the massive underlying model.
CI/CD pipelines: Automated code translation (e.g., migrating Python 2 → 3 or Java → Kotlin) can achieve higher success rates, reducing manual bug‑fixing effort.
Low‑resource environments: Since only a lightweight bridge is trained and the LLM stays frozen, companies can leverage existing LLM APIs (OpenAI, Anthropic) while adding a custom graph encoder locally for proprietary codebases.
Security & compliance: Graph‑aware prompts can help LLMs better understand data‑flow, making static analysis or vulnerability detection more reliable when combined with generative models.

Limitations & Future Work

Graph construction cost: Building AST/Data‑flow graphs for large codebases adds preprocessing overhead, which may be non‑trivial for languages without mature parsers.
Domain specificity: The encoder is trained on a generic corpus; specialized domains (e.g., embedded C, hardware description languages) might need additional fine‑tuning.
Prompt length still bounded: Although the bridge compresses graph information, extremely large modules could still hit token limits in some LLM APIs.
Future directions: The authors suggest exploring dynamic graph‑selection (only the most relevant sub‑graph per query), extending the bridge to multi‑modal inputs (e.g., test cases, documentation), and evaluating on more diverse tasks such as bug fixing or security audit.

Authors

Zeqi Chen
Zhaoyang Chu
Yi Gui
Feng Guo
Yao Wan
Chuan Shi

Paper Information

arXiv ID: 2512.07666v1
Categories: cs.CL, cs.SE
Published: December 8, 2025
PDF: Download PDF

[Paper] Bridging Code Graphs and Large Language Models for Better Code Understanding

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

[Paper] Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders

[Paper] Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis

[Paper] Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts