[Paper] Evolving with AI: A Longitudinal Analysis of Developer Logs

Published: 3 weeks ago (January 15, 2026 at 05:30 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.10258v1

Overview

AI‑powered coding assistants are now embedded in many IDEs, but we still don’t know how they reshape developers’ day‑to‑day work over months or years. This paper presents the first large‑scale, two‑year longitudinal study of real developer telemetry combined with a follow‑up survey, shedding light on the subtle ways AI tools influence productivity, code quality, editing habits, reuse, and context switching.

Key Contributions

Longitudinal telemetry dataset: Fine‑grained IDE logs from ~800 professional developers collected over 24 months, the most extensive time‑spanning data set of AI‑assistant usage to date.
Mixed‑method analysis: Integration of quantitative log mining with a qualitative survey of 62 developers to triangulate objective behavior with subjective perception.
Five‑dimension workflow model: Systematic examination of productivity, code quality, editing patterns, code reuse, and context‑switching under AI assistance.
Empirical paradox: AI users write more code yet also delete more, while self‑reports claim productivity gains but little perceived change in other dimensions.
Design recommendations: Concrete guidelines for IDE and AI‑assistant designers to mitigate hidden costs (e.g., excessive churn) and amplify real benefits.

Methodology

Telemetry collection – The authors instrumented a popular commercial IDE to capture every edit event (insert, delete, rename, refactor, etc.), file‑level metrics (lines added/removed), and AI‑assistant invocations. Data were anonymized and aggregated per developer.
Cohort definition – Developers were split into “AI users” (≥ 10 % of their edits triggered an assistant) and “non‑users”. The study tracked each cohort continuously for two years.
Survey – After the telemetry period, a structured questionnaire was sent to a subset of participants (62 respondents) covering perceived productivity, code quality, reuse, and workflow disruption.
Analysis pipeline –
- Descriptive statistics to compare code volume, deletion rates, and assistant usage frequency.
- Interrupted time‑series models to detect changes after the first AI interaction.
- Thematic coding of open‑ended survey responses to surface perceived benefits and pain points.

The mixed‑method design lets the authors cross‑validate objective log trends with developers’ own narratives.

Results & Findings

Dimension	Telemetry Insight	Survey Perception
Productivity (code volume)	AI users produce ≈ 30 % more lines of code per month than non‑users.	78 % report “faster development” or “more features delivered”.
Code quality	No statistically significant difference in static‑analysis warnings; however, AI users have 15 % higher deletion churn (more lines added then removed).	62 % feel code quality is “about the same” or “slightly improved”.
Editing behavior	AI‑triggered edits are shorter but more frequent; overall edit sessions are 12 % longer in duration.	Developers notice “more suggestions, but not always useful”.
Code reuse	Slight uptick (≈ 5 %) in copy‑paste and library‑import events among AI users.	48 % say the assistant helps them discover existing APIs.
Context switching	No measurable increase in window‑focus changes; AI users actually spend 8 % less time in external documentation browsers.	55 % report “less need to search Stack Overflow”.

The key paradox is that while developers feel more productive, the logs reveal a hidden cost: a higher rate of code churn, suggesting that many AI‑generated snippets are trialed and discarded.

Practical Implications

Tooling designers should surface churn metrics (e.g., “how many suggestions were undone”) to help users gauge the net value of AI suggestions.
IDE integrations can prioritize context‑aware suggestions that align with the current task, reducing unnecessary trial‑and‑error edits.
Team leads may want to monitor deletion rates as an early indicator of over‑reliance on low‑quality AI output, balancing speed with maintainability.
Developers can adopt a “sandbox” workflow: generate snippets in a temporary file, review, then commit—minimizing noisy deletions in the main codebase.
Training data curators for AI assistants should emphasize high‑quality, well‑tested code to lower the delete‑after‑add ratio observed in the wild.

Limitations & Future Work

Sample bias: Participants were users of a single commercial IDE, which may not represent developers using alternative editors or open‑source tooling.
Metric scope: The study relies on line‑based metrics and static analysis warnings; deeper semantic quality (e.g., performance regressions) was not captured.
Causality vs. correlation: While the interrupted time‑series design controls for many confounders, unobserved factors (e.g., project deadlines) could influence both AI usage and churn.
Future directions: Extending the telemetry to multiple IDE ecosystems, incorporating runtime performance data, and experimenting with real‑time feedback loops that adapt AI suggestion frequency based on observed churn.

Authors

Agnia Sergeyuk
Eric Huang
Dariia Karaeva
Anastasiia Serova
Yaroslav Golubev
Iftekhar Ahmed

Paper Information

arXiv ID: 2601.10258v1
Categories: cs.SE, cs.HC
Published: January 15, 2026
PDF: Download PDF

[Paper] Evolving with AI: A Longitudinal Analysis of Developer Logs

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Applying Formal Methods Tools to an Electronic Warfare Codebase (Experience report)

[Paper] A Practical Guide to Establishing Technical Debt Management

[Paper] RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback

[Paper] Automation and Reuse Practices in GitHub Actions Workflows: A Practitioner's Perspective