[Paper] Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding

Published: (December 4, 2025 at 02:37 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.04538v1

Overview

The paper introduces CoCo (Completion by Comprehension), a new framework that boosts code‑completion models by feeding them a richer, multi‑granular understanding of the surrounding codebase. By extracting and structuring static analysis information from functions, files, and whole projects, CoCo turns raw source code into precise natural‑language prompts that guide generation, delivering a noticeable jump in accuracy over existing retrieval‑augmented approaches.

Key Contributions

  • Multi‑granularity context extraction: Static analysis is used to harvest structured information at the function, file, and project levels, preserving control‑flow and dependency semantics.
  • Graph‑based context selector: A lightweight graph model filters out redundant or noisy snippets, ensuring the prompt contains only the most relevant context.
  • Unified natural‑language prompting: The selected structural data is converted into a consistent textual format that can be directly appended to any code‑completion model.
  • Structure‑aware re‑ranking: After generation, a re‑ranker evaluates candidates against both semantic meaning and code structure, selecting the most plausible completion.
  • Model‑agnostic integration: CoCo can wrap around any existing LLM‑based code generator, delivering up to 20.2 % absolute EM improvement on benchmark suites (CrossCodeEval, RepoEval).

Methodology

  1. Static Code Analysis – The system parses the target repository with a language‑specific analyzer (e.g., JavaParser, tree‑sitter). It extracts:

    • Function‑level: signatures, local variable types, called APIs.
    • File‑level: imported modules, class hierarchies, global constants.
    • Project‑level: build scripts, dependency graphs, cross‑file call relations.
  2. Graph Construction & Selection – All extracted entities become nodes in a directed graph where edges encode “uses”, “defines”, or “calls”. A relevance score (based on proximity to the completion point and frequency of use) drives a pruning algorithm that keeps the top‑k most informative nodes while discarding unrelated code.

  3. Prompt Generation – The remaining nodes are linearized into natural‑language statements (e.g., “The function parseJson takes a String and returns a Map<String, Object>”) and concatenated with the original incomplete snippet. This prompt is fed to any off‑the‑shelf code‑completion model.

  4. Structure‑Aware Re‑Ranking – The model may emit several candidate completions. CoCo parses each candidate, checks consistency with the extracted graph (e.g., does it respect variable scopes and type constraints?), and re‑orders them before returning the final answer.

Results & Findings

  • CrossCodeEval: CoCo + CodeGen‑2B achieved 71.4 % EM, versus 51.2 % for the baseline RAG approach (≈ 20 % absolute gain).
  • RepoEval: When paired with GPT‑3.5‑Turbo, CoCo lifted EM from 58.7 % to 76.9 %.
  • Ablation studies showed that removing the graph‑based selector drops performance by ~8 %, while skipping the re‑ranker costs another ~5 %—demonstrating that each component contributes meaningfully.
  • The framework remained model‑agnostic: identical gains were observed across three different LLM back‑ends (CodeGen, StarCoder, GPT‑4), confirming that the improvement stems from better context, not from a specific model architecture.

Practical Implications

  • IDE plugins & CI tools – CoCo can be wrapped around existing autocomplete engines (e.g., GitHub Copilot, Tabnine) to provide more accurate suggestions, especially in large monorepos where cross‑file dependencies matter.
  • On‑device code assistants – Because the static analysis and graph pruning are lightweight, they can run locally on developer machines, reducing reliance on costly remote retrieval calls.
  • Automated refactoring & bug‑fix generation – The structure‑aware re‑ranker ensures that generated patches respect type safety and control flow, making the output safer for automated PR bots.
  • Cross‑language portability – While the paper focuses on Java‑like languages, the pipeline (parser → graph → prompt) is language‑agnostic, opening the door for similar gains in Python, TypeScript, Rust, etc.

Limitations & Future Work

  • Static analysis depth – The current implementation stops at syntactic dependencies; deeper semantic analyses (e.g., data‑flow, alias analysis) could capture even richer intent.
  • Scalability to massive repos – Although the graph selector trims noise, processing extremely large codebases may still incur noticeable latency; incremental indexing or caching strategies are needed.
  • Prompt length constraints – Very large contexts can hit token limits of LLM APIs; future work may explore hierarchical prompting or learned compression techniques.
  • Evaluation breadth – Benchmarks used are still synthetic; real‑world developer studies would validate the perceived usefulness of CoCo‑enhanced completions in daily workflows.

CoCo demonstrates that “understanding before generating” isn’t just a research curiosity—it’s a practical recipe for making AI‑assisted coding tools smarter, safer, and more developer‑friendly.

Authors

  • Xinkui Zhao
  • Rongkai Liu
  • Yifan Zhang
  • Chen Zhi
  • Lufei Zhang
  • Guanjie Cheng
  • Yueshen Xu
  • Shuiguang Deng
  • Jianwei Yin

Paper Information

  • arXiv ID: 2512.04538v1
  • Categories: cs.SE
  • Published: December 4, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »