[Paper] AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion
Source: arXiv - 2601.19697v1
Overview
Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository‑specific context and domain knowledge. While retrieval‑augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross‑file context, they suffer from two fundamental problems:
- Misalignment between the query and the target code in the retrieval process.
- Inability of existing retrieval methods to effectively utilize inference information.
To address these challenges, we propose AlignCoder, a repository‑level code completion framework that introduces:
- A query enhancement mechanism that generates multiple candidate completions to construct an enhanced query, bridging the semantic gap between the initial query and the target code.
- A reinforcement‑learning‑based retriever training method that trains an AlignRetriever to leverage inference information in the enhanced query for more accurate retrieval.
We evaluate AlignCoder on two widely‑used benchmarks (CrossCodeEval and RepoEval) across five backbone code LLMs, demonstrating an 18.1 % improvement in EM score compared to baselines on the CrossCodeEval benchmark. The results show superior performance and high generalizability across various code LLMs and programming languages.
Key Contributions
- cs.SE
- cs.AI
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.SE.
Authors
- Tianyue Jiang
- Yanli Wang
- Yanlin Wang
- Daya Guo
- Ensheng Shi
- Yuchi Ma
- Jiachi Chen
- Zibin Zheng
Paper Information
- arXiv ID: 2601.19697v1
- Categories: cs.SE, cs.AI
- Published: January 27, 2026
- PDF: Download PDF