[Paper] Lightweight Model Editing for LLMs to Correct Deprecated API Recommendations

Published: 2 months ago (November 25, 2025 at 10:36 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.21022v1

Overview

Large Language Models (LLMs) have become go‑to assistants for code completion, but their knowledge is frozen at the time of training. As third‑party libraries evolve, many APIs become deprecated, and LLMs still suggest the old calls, leading to broken or insecure code. This paper investigates whether lightweight model‑editing techniques can quickly patch LLMs with up‑to‑date API knowledge—without the massive cost of full retraining.

Key Contributions

EDAPIBench: a new benchmark containing 70+ deprecated Python APIs from 8 popular libraries, with >3,000 edit instances for systematic evaluation.
Comprehensive study: applied ten state‑of‑the‑art model‑editing methods to three coder LLMs (Qwen2.5‑Coder, StarCoder2, DeepSeek‑Coder).
Best‑performing baseline: identified AdaLoRA (a parameter‑efficient fine‑tuning technique) as the most effective at making the models emit the correct, modern APIs.
AdaLoRA‑L: a novel refinement that isolates “Common API Layers” (general knowledge) from “Specific API Layers” (API‑specific knowledge) to improve Specificity—i.e., prevent unintended side effects on unrelated code.
Extensive analysis: measured not only accuracy of the edited API calls but also how much the edit “spills over” to unrelated knowledge, providing a more nuanced view of model‑editing safety.

Methodology

Benchmark construction (EDAPIBench)
- Collected deprecated functions/methods from eight widely used Python packages (e.g., NumPy, Pandas, TensorFlow).
- For each deprecated API, generated a target (the new recommended call) and a set of editing instances (prompt‑completion pairs that would normally trigger the old API).
Model editing techniques
- Evaluated ten recent methods, ranging from simple low‑rank adapters (LoRA, AdaLoRA) to more complex gradient‑based edits (ROME, MEMIT).
- Each technique receives a small “edit dataset” (a few examples of the deprecated → updated mapping) and updates only a tiny fraction of the model’s parameters.
AdaLoRA‑L refinement
- Ran a layer‑importance analysis (e.g., using gradient‑based saliency) to identify layers that consistently matter for all API predictions → labeled as Common API Layers.
- Conversely, layers that become important only when a specific API is involved are marked as Specific API Layers.
- During editing, AdaLoRA‑L freezes the common layers and applies AdaLoRA updates only to the specific ones, aiming to keep the rest of the model untouched.
Evaluation metrics
- Accuracy: proportion of edited prompts that now produce the up‑to‑date API.
- Specificity: how often the edit changes the model’s behavior on unrelated prompts (lower is better).
- Generalization: ability to apply the edit to variations of the prompt (different code contexts).
- Efficiency: compute time and memory overhead compared to full fine‑tuning.

Results & Findings

Model (Coder)	Best baseline (AdaLoRA)	AdaLoRA‑L	Accuracy ↑	Specificity ↑ (i.e., less spill‑over)
Qwen2.5‑Coder	78 % (top‑1)	77 %	✔️	+15 % (dramatic reduction in unintended changes)
StarCoder2	74 %	73 %	✔️	+12 %
DeepSeek‑Coder	71 %	70 %	✔️	+10 %

AdaLoRA consistently outperformed other editing methods on raw accuracy, confirming that low‑rank, parameter‑efficient fine‑tuning is well‑suited for API updates.
However, AdaLoRA’s edits sometimes altered the model’s responses to unrelated code snippets (low Specificity).
AdaLoRA‑L reclaimed most of that lost specificity while keeping accuracy virtually unchanged, demonstrating that isolating “general” knowledge layers is an effective safety guard.
All methods required orders of magnitude less compute than full retraining (minutes on a single GPU vs. days on a multi‑GPU cluster).

Practical Implications

Rapid SDK upgrades: Development teams can push a tiny “edit package” to their internal LLM‑based code assistants whenever a library deprecates a function, avoiding costly model re‑training pipelines.
Continuous integration: AdaLoRA‑L edits can be scripted as part of CI/CD, automatically refreshing the model’s API knowledge as part of a release cycle.
Tooling ecosystem: IDE plugins (e.g., VS Code extensions) could download and apply these edits on‑the‑fly, ensuring developers always see up‑to‑date suggestions without waiting for a new model version.
Safety & compliance: By preserving specificity, AdaLoRA‑L reduces the risk of unintentionally breaking unrelated code generation, a key concern for enterprises that rely on LLMs for production code.
Cost efficiency: The approach needs only a few hundred edit examples per API and runs in minutes, making it feasible for small teams or open‑source projects with limited compute budgets.

Limitations & Future Work

Scope limited to Python: The benchmark focuses on Python libraries; cross‑language applicability (e.g., Java, JavaScript) remains untested.
Edit granularity: Some APIs involve complex signature changes or behavioral shifts that a simple token‑level edit may not capture.
Long‑term stability: Repeated edits over time could accumulate hidden interactions; the paper suggests periodic “reset” checks but does not provide a systematic solution.
Automated layer importance: The current importance analysis is heuristic; future work could explore more robust, possibly learning‑based, methods to delineate common vs. specific layers.
User‑facing tooling: Turning AdaLoRA‑L into a plug‑and‑play library for developers (e.g., a pip install llm‑api‑editor) is an obvious next step.

Bottom line: By marrying lightweight model‑editing with a clever layer‑selection strategy, this work shows that we can keep LLM‑powered code assistants current with the fast‑moving world of software libraries—without the heavyweight cost of full model retraining. For developers, that translates into fewer broken suggestions, smoother upgrades, and a more sustainable path for AI‑assisted programming.

Authors

Guancheng Lin
Xiao Yu
Jacky Keung
Xing Hu
Xin Xia
Alex X. Liu

Paper Information

arXiv ID: 2511.21022v1
Categories: cs.SE
Published: November 26, 2025
PDF: Download PDF

[Paper] Lightweight Model Editing for LLMs to Correct Deprecated API Recommendations

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] SV-LIB 1.0: A Standard Exchange Format for Software-Verification Tasks

[Paper] Large Language Models for Unit Test Generation: Achievements, Challenges, and the Road Ahead

[Paper] Multi-Agent Systems for Dataset Adaptation in Software Engineering: Capabilities, Limitations, and Future Directions

[Paper] Hierarchical Evaluation of Software Design Capabilities of Large Language Models of Code