When an AI refactor renames things wrong across files
Source: Dev.to
Overview
In one project we leaned on a code‑generation assistant to perform a cross‑file refactor: rename a domain object and propagate the change through services, tests, and a few utility modules. The assistant produced a plausible patch set that looked consistent at a glance, but when we ran the app a handful of endpoints started returning 500s. The failure was not a single glaring syntax error — it was mismatched identifiers that the runtime tolerated in some paths and broke in others.
At first the changes seemed harmless: similar names, slight capitalization or pluralization differences, or mixed use of snake_case and camelCase where the codebase had conventions. Because the assistant gave diffs for each file, and the CI passed a subset of quick checks, the obvious gating failed. We documented the incident in our internal postmortem and linked it to the general tool pages we used for multi‑turn prompting and automation, such as crompt.ai, to help other teams understand the trade‑offs.
How it surfaced during development
The symptoms arrived gradually. Developers opened unrelated feature branches, saw intermittent TypeErrors in the console, and spent hours tracing call stacks that jumped between renamed helpers. The assistant had sometimes replaced userProfile with user_profile and other times with profileUser. Unit tests that referenced mocked interfaces passed because the mocks used the assistant‑provided names selectively; integration tests that exercised serialization failed silently or produced unexpected payloads.
We attempted to iterate with the assistant using a multi‑turn chat session to correct the refactor. That helped on a per‑file basis but made the problem worse across the repository: the model didn’t maintain a strict symbol table and treated each file as an isolated transformation. Quick edits fixed visible failures but left latent mismatches in code paths the tests didn’t cover.
Why the inconsistency was subtle
Two factors made this class of bug easy to miss.
- Probabilistic text transformations – Language models optimize for plausible continuations, not for atomic code semantics. Small stylistic shifts — a plural, capitalization change, or reordering of words — all look plausible to a model yet break references at runtime.
- Lack of a canonical symbol map – The assistant’s context window and per‑file prompting meant there was no machine‑readable map of symbol names being enforced across files.
These behaviors compounded: the assistant would produce a correct‑looking replacement in 70–80 % of files, and humans skimmed diffs fast enough to miss the remaining 20–30 %. Because the diffs were syntactically valid, linters and some static checks didn’t flag them as errors. We used external verification, including a research‑oriented verification pass via deep research, to cross‑reference symbols, which uncovered several misspelled references that had survived earlier checks.
Practical mitigations we adopted
- Generate a symbol table first – List intended renames and apply them with language‑server‑aware tooling (e.g., the editor’s rename refactor or a codemod that uses the AST). This uses the compiler’s symbol resolution rather than textual heuristics.
- Require full integration test runs before merges – Ensures runtime‑level mismatches are caught.
- Add a short static‑analysis job – Checks for unexpected identifier variants across the repository.
- Treat assistant‑generated diffs as drafts – Require an explicit verification pass before acceptance.
These small process changes reduce the chance that minor, probabilistic naming differences cascade into runtime failures.