[Paper] Evolving with AI: A Longitudinal Analysis of Developer Logs

Published: (January 15, 2026 at 05:30 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.10258v1

Overview

AI‑powered coding assistants are now embedded in many IDEs, but we still don’t know how they reshape developers’ day‑to‑day work over months or years. This paper presents the first large‑scale, two‑year longitudinal study of real developer telemetry combined with a follow‑up survey, shedding light on the subtle ways AI tools influence productivity, code quality, editing habits, reuse, and context switching.

Key Contributions

  • Longitudinal telemetry dataset: Fine‑grained IDE logs from ~800 professional developers collected over 24 months, the most extensive time‑spanning data set of AI‑assistant usage to date.
  • Mixed‑method analysis: Integration of quantitative log mining with a qualitative survey of 62 developers to triangulate objective behavior with subjective perception.
  • Five‑dimension workflow model: Systematic examination of productivity, code quality, editing patterns, code reuse, and context‑switching under AI assistance.
  • Empirical paradox: AI users write more code yet also delete more, while self‑reports claim productivity gains but little perceived change in other dimensions.
  • Design recommendations: Concrete guidelines for IDE and AI‑assistant designers to mitigate hidden costs (e.g., excessive churn) and amplify real benefits.

Methodology

  1. Telemetry collection – The authors instrumented a popular commercial IDE to capture every edit event (insert, delete, rename, refactor, etc.), file‑level metrics (lines added/removed), and AI‑assistant invocations. Data were anonymized and aggregated per developer.
  2. Cohort definition – Developers were split into “AI users” (≥ 10% of their edits triggered an assistant) and “non‑users”. The study tracked each cohort continuously for two years.
  3. Survey – After the telemetry period, a structured questionnaire was sent to a subset of participants (62 respondents) covering perceived productivity, code quality, reuse, and workflow disruption.
  4. Analysis pipeline
    • Descriptive statistics to compare code volume, deletion rates, and assistant usage frequency.
    • Interrupted time‑series models to detect changes after the first AI interaction.
    • Thematic coding of open‑ended survey responses to surface perceived benefits and pain points.

The mixed‑method design lets the authors cross‑validate objective log trends with developers’ own narratives.

Results & Findings

DimensionTelemetry InsightSurvey Perception
Productivity (code volume)AI users produce ≈ 30 % more lines of code per month than non‑users.78 % report “faster development” or “more features delivered”.
Code qualityNo statistically significant difference in static‑analysis warnings; however, AI users have 15 % higher deletion churn (more lines added then removed).62 % feel code quality is “about the same” or “slightly improved”.
Editing behaviorAI‑triggered edits are shorter but more frequent; overall edit sessions are 12 % longer in duration.Developers notice “more suggestions, but not always useful”.
Code reuseSlight uptick (≈ 5 %) in copy‑paste and library‑import events among AI users.48 % say the assistant helps them discover existing APIs.
Context switchingNo measurable increase in window‑focus changes; AI users actually spend 8 % less time in external documentation browsers.55 % report “less need to search Stack Overflow”.

The key paradox is that while developers feel more productive, the logs reveal a hidden cost: a higher rate of code churn, suggesting that many AI‑generated snippets are trialed and discarded.

Practical Implications

  • Tooling designers should surface churn metrics (e.g., “how many suggestions were undone”) to help users gauge the net value of AI suggestions.
  • IDE integrations can prioritize context‑aware suggestions that align with the current task, reducing unnecessary trial‑and‑error edits.
  • Team leads may want to monitor deletion rates as an early indicator of over‑reliance on low‑quality AI output, balancing speed with maintainability.
  • Developers can adopt a “sandbox” workflow: generate snippets in a temporary file, review, then commit—minimizing noisy deletions in the main codebase.
  • Training data curators for AI assistants should emphasize high‑quality, well‑tested code to lower the delete‑after‑add ratio observed in the wild.

Limitations & Future Work

  • Sample bias: Participants were users of a single commercial IDE, which may not represent developers using alternative editors or open‑source tooling.
  • Metric scope: The study relies on line‑based metrics and static analysis warnings; deeper semantic quality (e.g., performance regressions) was not captured.
  • Causality vs. correlation: While the interrupted time‑series design controls for many confounders, unobserved factors (e.g., project deadlines) could influence both AI usage and churn.
  • Future directions: Extending the telemetry to multiple IDE ecosystems, incorporating runtime performance data, and experimenting with real‑time feedback loops that adapt AI suggestion frequency based on observed churn.

Authors

  • Agnia Sergeyuk
  • Eric Huang
  • Dariia Karaeva
  • Anastasiia Serova
  • Yaroslav Golubev
  • Iftekhar Ahmed

Paper Information

  • arXiv ID: 2601.10258v1
  • Categories: cs.SE, cs.HC
  • Published: January 15, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »