[Paper] Pre-Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM-Assisted Programming

Published: (November 24, 2025 at 02:42 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.18849v1

Overview

Large Language Models (LLMs) are now a staple in modern IDEs, offering on‑the‑fly code completions and whole‑function suggestions. However, a large share of these AI‑driven hints are never used, leading to wasted compute cycles, higher latency, and a noisy developer experience.
The paper “Pre‑Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM‑Assisted Programming” proposes a tiny, privacy‑preserving model that decides whether to call the LLM at all, based solely on real‑time editor telemetry (typing speed, cursor moves, file switches, etc.). In a four‑month field study with a VS Code extension, the filter almost doubled suggestion acceptance while cutting 35 % of unnecessary LLM invocations.

Key Contributions

  • Behavior‑only pre‑filter: A lightweight classifier that predicts suggestion acceptance using only non‑code telemetry, never inspecting the source text or the LLM prompt.
  • Production‑scale evaluation: Deployed in a real VS Code plugin used by thousands of developers for four months, providing robust, naturalistic data.
  • Significant UX & efficiency gains: Acceptance rate rose from 18.4 % to 34.2 %; 35 % of low‑value LLM calls were suppressed, reducing latency and cloud compute costs.
  • Privacy‑first design: All features are derived from on‑device interaction signals, keeping user code and intent private.
  • Open‑source reference implementation: The authors release the telemetry collection pipeline and the pre‑filter model for community experimentation.

Methodology

  1. Telemetry collection: The plugin streams lightweight editor events (keystroke timestamps, cursor jumps, file open/close, focus changes) to a local feature extractor. No source code or text snippets leave the developer’s machine.
  2. Feature engineering: Over a sliding 5‑second window, the system computes summary statistics (e.g., average typing speed, pause frequency, navigation entropy) that capture the developer’s current “flow state.”
  3. Model training: Using historical logs where the downstream LLM suggestion was either accepted or ignored, the authors trained a binary classifier (gradient‑boosted trees) to predict acceptance probability.
  4. Runtime decision: Before each potential LLM call, the filter evaluates the current telemetry window. If the predicted acceptance probability falls below a configurable threshold, the LLM request is skipped; otherwise it proceeds as usual.
  5. A/B field study: Two user groups (baseline vs. filtered) were run in parallel for four months. Metrics such as suggestion acceptance, latency, and cloud‑compute usage were logged and statistically compared.

Results & Findings

MetricBaseline (no filter)With pre‑filter
Suggestion acceptance rate18.4 %34.2 %
LLM calls per hour per user12.88.3 (‑35 %)
Average suggestion latency420 ms310 ms (‑26 %)
Cloud compute cost (per 1 k users)$1,200$780
  • Higher acceptance stems from showing suggestions only when the developer appears receptive (e.g., steady typing, low navigation churn).
  • Latency reduction is a direct side‑effect of fewer round‑trips to the LLM service.
  • Cost savings are proportional to the drop in API calls, demonstrating a clear business case for large‑scale IDE vendors.

Practical Implications

  • IDE vendors can embed a similar telemetry‑based gatekeeper to make AI assistants feel less intrusive while cutting operational expenses.
  • Developers benefit from fewer “pop‑ups” that break concentration, leading to smoother coding sessions and faster feedback loops.
  • Team leads & DevOps can justify the cloud spend on LLM services by showing measurable reductions in unnecessary API usage.
  • Open‑source plugin authors now have a ready‑to‑use pattern for privacy‑preserving adaptation that doesn’t require any code analysis or user‑provided prompts.
  • Future AI‑assisted tools (e.g., test generation, documentation bots) can adopt the same pre‑filtering concept to improve timing and relevance across the software development lifecycle.

Limitations & Future Work

  • Telemetry scope: The model only sees short‑term interaction signals; longer‑term context (project history, developer expertise) might further improve predictions.
  • Generalizability: The study focused on VS Code and a specific LLM backend; results may vary with other editors or model families.
  • Threshold tuning: Choosing the acceptance‑probability cutoff trades off recall vs. precision; adaptive thresholds per user were not explored.
  • User consent & transparency: While privacy‑preserving, the approach still requires clear opt‑in mechanisms and UI cues to avoid “black‑box” behavior.

Future research directions include multi‑modal signals (e.g., eye‑tracking, voice commands), cross‑editor federated learning to personalize filters without sharing raw telemetry, and extending the pre‑filter to other AI‑driven developer aids such as bug‑fix suggestions or refactoring bots.

Authors

  • Mohammad Nour Al Awad
  • Sergey Ivanov
  • Olga Tikhonova

Paper Information

  • arXiv ID: 2511.18849v1
  • Categories: cs.SE, cs.AI, cs.HC
  • Published: November 24, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »