[Paper] Bridging Code Graphs and Large Language Models for Better Code Understanding
Source: arXiv - 2512.07666v1
Overview
Large Language Models (LLMs) have become the go‑to tool for many code‑related tasks, but they still treat source code as a flat string of tokens. This ignores the rich graph‑like structure that compilers and developers rely on (e.g., abstract syntax trees, data‑flow graphs). The paper introduces CGBridge, a plug‑and‑play “bridge” that injects code‑graph knowledge into any frozen LLM, yielding noticeably better performance on summarization, translation, and other code‑understanding benchmarks while keeping inference fast.
Key Contributions
- Graph‑aware pre‑training: Trains a dedicated code‑graph encoder on 270 K real‑world code graphs to capture structural semantics.
- Cross‑modal bridge module: Aligns code tokens, graph embeddings, and natural‑language prompts via cross‑attention, producing structure‑enriched prompts without touching the LLM’s weights.
- Plug‑and‑play design: Works with any off‑the‑shelf instruction‑following LLM (e.g., GPT‑3.5, LLaMA) – no architectural changes or massive fine‑tuning required.
- Empirical gains: Shows up to 16 % relative improvement on LLM‑as‑a‑Judge for code summarization and up to 39 % boost in execution accuracy for code translation.
- Efficiency: Inference is > 4× faster than LoRA‑based fine‑tuning, because only the lightweight bridge runs at inference time.
Methodology
-
Code Graph Encoder
- Constructs graph representations (AST, control‑flow, data‑flow) for each source file.
- Trains the encoder with self‑supervised objectives (node masking, edge prediction) on a large corpus of 270 K graphs, so it learns to embed structural patterns.
-
Bridge Module
- Takes three inputs: (a) the original tokenized code, (b) the graph encoder’s embeddings, and (c) the natural‑language task prompt.
- Uses a cross‑modal attention layer to let each modality “talk” to the others, producing a structure‑informed prompt that succinctly encodes graph semantics.
-
Integration with a Frozen LLM
- The LLM’s parameters stay frozen. The bridge‑generated prompt is concatenated to the user’s prompt and fed to the LLM.
- Only the bridge is fine‑tuned on downstream tasks (e.g., summarization, translation), dramatically reducing training cost.
-
Evaluation
- Benchmarks include code summarization (using LLM‑as‑a‑Judge) and code translation (measured by execution accuracy).
- Baselines: vanilla LLM, graph‑augmented prompting (simple concatenation of graph tokens), and LoRA‑based fine‑tuning.
Results & Findings
| Task | Baseline (LLM) | Graph‑augmented Prompt | CGBridge | Relative Gain vs. Baseline |
|---|---|---|---|---|
| Code Summarization (LLM‑as‑a‑Judge) | – | +9.12 % | +16.19 % | 16 % |
| Code Translation (Execution Accuracy) | – | +9.84 % | +38.87 % | 39 % |
| Inference Speed (relative to LoRA) | 1× | 1× | >4× faster | — |
What this means: The bridge not only injects useful structural cues but does so more efficiently than traditional parameter‑efficient fine‑tuning. The larger gains on translation suggest that structural correctness (e.g., preserving control flow) benefits heavily from graph knowledge.
Practical Implications
- Developer tools: IDE plugins that rely on LLMs for code suggestions, documentation generation, or automated refactoring can become more accurate without needing to retrain the massive underlying model.
- CI/CD pipelines: Automated code translation (e.g., migrating Python 2 → 3 or Java → Kotlin) can achieve higher success rates, reducing manual bug‑fixing effort.
- Low‑resource environments: Since only a lightweight bridge is trained and the LLM stays frozen, companies can leverage existing LLM APIs (OpenAI, Anthropic) while adding a custom graph encoder locally for proprietary codebases.
- Security & compliance: Graph‑aware prompts can help LLMs better understand data‑flow, making static analysis or vulnerability detection more reliable when combined with generative models.
Limitations & Future Work
- Graph construction cost: Building AST/Data‑flow graphs for large codebases adds preprocessing overhead, which may be non‑trivial for languages without mature parsers.
- Domain specificity: The encoder is trained on a generic corpus; specialized domains (e.g., embedded C, hardware description languages) might need additional fine‑tuning.
- Prompt length still bounded: Although the bridge compresses graph information, extremely large modules could still hit token limits in some LLM APIs.
- Future directions: The authors suggest exploring dynamic graph‑selection (only the most relevant sub‑graph per query), extending the bridge to multi‑modal inputs (e.g., test cases, documentation), and evaluating on more diverse tasks such as bug fixing or security audit.
Authors
- Zeqi Chen
- Zhaoyang Chu
- Yi Gui
- Feng Guo
- Yao Wan
- Chuan Shi
Paper Information
- arXiv ID: 2512.07666v1
- Categories: cs.CL, cs.SE
- Published: December 8, 2025
- PDF: Download PDF