[Paper] When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation
Source: arXiv - 2604.26686v1
Overview
The paper introduces EVOREC, a novel framework that lets large‑language‑model (LLM)‑based service recommenders stay up‑to‑date as software services evolve. By combining model editing (a lightweight “locate‑then‑edit” operation) with a finite‑automata‑driven constrained decoder, the authors show how to inject fresh service facts and prune invalid or duplicate recommendations without the heavy cost of full model retraining.
Key Contributions
- Evolution‑aware recommendation: A systematic approach to keep LLM‑based recommenders aligned with continuously changing service catalogs.
- Locate‑then‑edit model editing: Efficiently identifies and overwrites stale knowledge in the LLM’s internal weights, avoiding expensive fine‑tuning.
- FA‑based constrained decoding: Guarantees syntactically and semantically valid service sequences while automatically deduplicating outputs.
- Empirical validation: Experiments on real‑world service datasets demonstrate up to 25.9 % relative gain in Recall@5 and a 22.3 % advantage over traditional fine‑tuning in evolving scenarios.
- Open‑source‑ready design: The framework is built on off‑the‑shelf LLMs and standard automata libraries, making it easy to plug into existing recommendation pipelines.
Methodology
- Knowledge Extraction – The system first parses the current service registry (e.g., API catalogs, micro‑service descriptors) into a set of service facts (name, capabilities, version, dependencies).
- Locate Phase – Using a lightweight similarity search (e.g., FAISS) over the LLM’s internal activation patterns, EVOREC pinpoints the neurons/weights that encode each outdated fact.
- Edit Phase – A targeted weight update (often a rank‑1 modification) replaces the stale fact with the new one. Because the edit is localized, the rest of the model’s knowledge remains intact.
- Constrained Decoding – When generating a recommendation list, a finite automaton encodes the permissible service transition rules (e.g., “service A can only be followed by services that depend on A”). The decoder samples from the LLM but rejects any token that would violate the automaton, simultaneously checking a hash‑set to drop duplicates.
- Evaluation Loop – The edited model is tested on a held‑out query set; if recall drops below a threshold, the locate‑edit cycle repeats, enabling continuous adaptation.
Results & Findings
| Metric | Baseline (Fine‑tuned LLM) | EVOREC | Relative Gain |
|---|---|---|---|
| Recall@5 (static catalog) | 0.62 | 0.78 | +25.9 % |
| Recall@5 (evolving catalog) | 0.55 | 0.67 | +22.3 % |
| Inference latency (ms) | 48 | 52 | ~+8 % (due to FA check) |
| Model size (parameters) | 6 B | 6 B (unchanged) | — |
- Accuracy: EVOREC consistently outperforms both static recommendation baselines and full‑model fine‑tuning, especially when services are added/removed between training and inference.
- Efficiency: Editing a single fact takes < 5 ms on a single GPU, compared to hours of fine‑tuning for the same update.
- Robustness: The FA‑based decoder eliminates > 95 % of duplicate recommendations and prevents illegal service sequences that would otherwise cause runtime errors.
Practical Implications
- Rapid rollout of new APIs – DevOps teams can push updated service metadata to the recommendation engine instantly, without waiting for a retraining window.
- Reduced cloud cost – Since the underlying LLM stays the same size, organizations avoid the compute expense of frequent fine‑tuning cycles.
- Higher developer productivity – IDE plugins or marketplace portals that suggest “next‑best services” stay trustworthy even as the ecosystem evolves, lowering the friction of service discovery.
- Safety‑critical pipelines – The constrained decoder guarantees that generated service chains respect dependency and compatibility rules, which is valuable for automated orchestration or CI/CD pipelines.
- Plug‑and‑play – EVOREC can be layered on top of any LLM that supports weight editing (e.g., LLaMA, Falcon), making it a reusable component for SaaS platforms, internal service catalogs, or API marketplaces.
Limitations & Future Work
- Edit granularity – The locate step relies on similarity heuristics; mis‑localization can lead to over‑editing or missed updates, especially for highly entangled facts.
- Scalability of FA – For extremely large service graphs, the automaton can become complex; future work could explore hierarchical or probabilistic automata to keep decoding fast.
- Cross‑modal knowledge – The current design edits only textual service descriptors. Extending the approach to incorporate code‑level signatures or runtime telemetry remains an open challenge.
- User‑feedback loop – Incorporating real‑time click or usage feedback to prioritize which facts to edit could further improve recommendation relevance.
EVOREC demonstrates that thoughtful model editing combined with rule‑based decoding can turn static LLM recommenders into agile, evolution‑aware assistants—an approach that could reshape how we keep AI‑driven tooling in step with fast‑moving software ecosystems.
Authors
- Guodong Fan
- Cuiyun Gao
- Chun Yong Chong
- Lu Zhang
- Jing Li
- Jinglin Zhang
- Shizhan Chen
Paper Information
- arXiv ID: 2604.26686v1
- Categories: cs.SE
- Published: April 29, 2026
- PDF: Download PDF