[Paper] Co-Evolution of Types and Dependencies: Towards Repository-Level Type Inference for Python Code

Published: 1 month ago (December 25, 2025 at 04:15 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.21591v1

Overview

The paper introduces Co‑Evolution of Types and Dependencies (CoTyDe), a new technique that leverages large language models (LLMs) to infer types across an entire Python codebase—not just isolated files or functions. By modeling how objects and their type relationships evolve together, CoTyDe dramatically improves the accuracy of repository‑level type annotation, a long‑standing pain point for large‑scale Python projects.

Key Contributions

Entity Dependency Graph (EDG): A novel graph representation that captures objects, functions, and their cross‑module type dependencies throughout a repository.
Iterative Co‑evolution Inference: Types and dependencies are refined together in multiple passes, allowing earlier guesses to inform later ones and vice‑versa.
Type‑Checker‑in‑the‑Loop: An integrated static type checker validates each inference step, automatically correcting mistakes and preventing error propagation.
Empirical Validation: Evaluation on 12 real‑world Python repositories shows a 27 % boost in TypeSim and 40 % boost in TypeExact over the strongest prior tool, while eliminating 92.7 % of newly introduced type errors.

Methodology

Graph Construction:
- Parse the whole repository to extract entities (classes, functions, variables).
- Build the EDG where nodes are entities and edges encode “uses”, “inherits”, or “calls” relationships, enriched with any existing type hints.
LLM‑Powered Inference Loop:
- Feed each node (and its local graph context) to a pre‑trained LLM (e.g., GPT‑4) that proposes a candidate type.
- Update the node’s type annotation in the EDG.
Co‑evolution Cycle:
- After a round of LLM predictions, run a static type checker (e.g., mypy) on the partially annotated code.
- The checker reports conflicts; these are fed back to the LLM as corrective prompts, prompting it to revise the problematic nodes.
- Repeat until the graph stabilizes (no new conflicts) or a maximum iteration count is reached.
Final Validation:
- Run a full repository‑wide type check to compute the final TypeSim (semantic similarity to ground‑truth types) and TypeExact (exact match) scores.

Results & Findings

Metric	CoTyDe	Best Baseline
TypeSim	0.89	0.70
TypeExact	0.84	0.60
New Type Errors Introduced	7.3 % (i.e., 92.7 % removed)	30 %+

The iterative co‑evolution reduces cascading errors: each correction narrows the search space for subsequent inferences.
The EDG enables the LLM to reason about global relationships (e.g., a class used across many modules) rather than isolated snippets, which accounts for the large performance jump.
Runtime overhead is modest: on average, a 500‑file repository is processed in ~15 minutes on a single GPU, making the approach feasible for CI pipelines.

Practical Implications

Automated Annotation for Legacy Code: Teams can run CoTyDe on existing monoliths to generate high‑quality type hints, unlocking static analysis, IDE autocompletion, and safer refactoring.
CI/CD Integration: Because the tool produces a type‑checker‑validated output, it can be added as a gate in CI pipelines to enforce type‑safety without manual review.
Improved Tooling Ecosystem: IDEs and linters can consume the generated stubs to provide better diagnostics, reducing the “dynamic‑typing surprise” that often leads to runtime crashes.
Facilitates Migration to Typed Python: Projects aiming to adopt typing‑heavy codebases (e.g., for mypy strict mode or Pyright) get a solid starting point, cutting migration effort by an order of magnitude.

Limitations & Future Work

LLM Dependency: The quality of inferred types hinges on the underlying LLM; smaller or open‑source models may not match the reported gains.
Scalability to Very Large Repos: While 15 minutes is acceptable for medium‑size codebases, repositories with tens of thousands of files may need graph partitioning or distributed inference.
Handling Dynamic Metaprogramming: Heavy use of exec, eval, or runtime attribute injection remains challenging for static graph construction.
Future Directions: The authors plan to (1) explore model‑agnostic prompting strategies to reduce reliance on proprietary LLMs, (2) integrate incremental graph updates for continuous development, and (3) extend the EDG to capture runtime‑generated types via hybrid static‑dynamic analysis.

Authors

Shuo Sun
Shixin Zhang
Jiwei Yan
Jun Yan
Jian Zhang

Paper Information

arXiv ID: 2512.21591v1
Categories: cs.SE
Published: December 25, 2025
PDF: Download PDF

[Paper] Co-Evolution of Types and Dependencies: Towards Repository-Level Type Inference for Python Code

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] HALF: Process Hollowing Analysis Framework for Binary Programs with the Assistance of Kernel Modules

[Paper] Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development

[Paper] A Story About Cohesion and Separation: Label-Free Metric for Log Parser Evaluation

[Paper] The State of the SBOM Tool Ecosystems: A Comparative Analysis of SPDX and CycloneDX