[Paper] CodeCompass: Navigating the Navigation Paradox in Agentic Code Intelligence

Published: 3 days ago (February 23, 2026 at 11:58 AM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.20048v1

Overview

The paper “CodeCompass: Navigating the Navigation Paradox in Agentic Code Intelligence” uncovers why large‑scale code‑assistant agents still miss the most important files in real‑world projects, even when they can ingest millions of tokens. By separating navigation (finding the right place in a codebase) from retrieval (searching by keywords), the authors show that a graph‑based structural view of a repository dramatically improves task success rates.

Key Contributions

Navigation Paradox definition – a taxonomy that distinguishes three problem families: semantic‑search, structural, and hidden‑dependency tasks, exposing why lexical retrieval alone fails.
CodeCompass infrastructure – an open‑source “Model Context Protocol” server that materializes a repository’s dependency graph and serves it to agents via simple tool calls.
Empirical validation – 258 automated runs on 30 realistic FastAPI benchmark tasks demonstrate a jump from ~76 % to 99.4 % completion when agents use the graph‑based navigation tool.
Behavioral insight – despite tool availability, 58 % of trials never invoked the graph API, highlighting a gap between tool provision and agent prompting.
Reproducible evaluation suite – scripts, datasets, and a benchmark harness released for the community to test other navigation or retrieval approaches.

Methodology

Benchmark creation – The authors curated 30 coding tasks from a production FastAPI codebase, deliberately mixing easy lexical matches, purely structural dependencies, and “hidden‑dependency” cases where the needed file shares no token overlap with the prompt.
Agent setup – Two baseline agents (a vanilla LLM with a 1 M‑token context window and a BM25 lexical retriever) were compared against the same LLM equipped with the CodeCompass tool.
CodeCompass server – The repository’s import‑graph, call‑graph, and file‑level dependency edges were pre‑computed and exposed via a lightweight JSON‑RPC endpoint. Agents could query the graph (e.g., “list files that import auth.py”) and receive ranked node lists.
Prompt engineering – For the tool‑enabled runs, the authors added explicit instructions (“When you suspect a hidden dependency, call the graph_search tool”) to force the model to consider structural context.
Automation & metrics – Each task was executed 8–10 times with random seeds, and success was measured by whether the agent produced a correct, runnable solution within a fixed time budget.

Results & Findings

Scenario	Vanilla LLM	BM25 Retrieval	LLM + CodeCompass
Semantic‑search	92 %	94 %	95 %
Structural	71 %	73 %	96 %
Hidden‑dependency	76 %	78 %	99.4 %

Graph navigation outperforms lexical search especially when the target file has no overlapping identifiers with the query.
Task completion improves by 23.2 pp on hidden‑dependency tasks, confirming the Navigation Paradox: the bottleneck is how agents look for code, not how much code they can see.
Adoption gap: Even with the tool available, more than half of the runs never called it, indicating that LLMs need explicit prompting to switch from lexical heuristics to structural reasoning.

Practical Implications

Tooling for IDEs & CI bots: Embedding a lightweight dependency‑graph service (like CodeCompass) can turn any LLM‑based code assistant into a “structural navigator,” dramatically reducing missed files in large monorepos.
Prompt design patterns: Developers building custom agents should include clear “when‑to‑use‑graph” cues (e.g., “If you cannot find a function by name, query the import graph”).
Reduced debugging cycles: By reliably locating hidden dependencies, agents can generate patches that compile and pass tests on the first try, saving developer time in CI pipelines.
Scalable to other languages: The protocol is language‑agnostic; generating import or module graphs for Java, JavaScript, or Rust would give similar gains.
Open‑source baseline: The released benchmark lets teams measure the impact of their own navigation tools, fostering a community standard for “structural code intelligence.”

Limitations & Future Work

Prompt dependence: The current gains hinge on manually crafted prompts; future research should explore automated prompt‑generation or fine‑tuning to internalize the navigation behavior.
Graph freshness: CodeCompass assumes a static snapshot of the repository; incremental updates for rapidly changing codebases remain an open challenge.
Generalization beyond FastAPI: While the benchmark is realistic, it is confined to a single Python web framework; broader cross‑language, cross‑domain studies are needed to confirm universality.
Tool call overhead: The study does not quantify latency introduced by remote graph queries; optimizing the protocol for low‑latency environments is a next step.

Bottom line: By giving agents a map of the code’s architecture rather than just a giant text dump, CodeCompass resolves the “Navigation Paradox” and pushes agentic code intelligence toward reliable, production‑grade assistance. Developers looking to boost the effectiveness of LLM‑powered tooling should consider adding a structural navigation layer and, equally importantly, teach their agents when to use it.

Authors

Tarakanath Paipuru

Paper Information

arXiv ID: 2602.20048v1
Categories: cs.AI, cs.SE
Published: February 23, 2026
PDF: Download PDF

[Paper] CodeCompass: Navigating the Navigation Paradox in Agentic Code Intelligence

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

[Paper] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

[Paper] GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

[Paper] Surrogate models for Rock-Fluid Interaction: A Grid-Size-Invariant Approach