[Paper] Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse

Published: (December 18, 2025 at 12:24 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.16790v1

Overview

The paper Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse investigates why large language models (LLMs) often “lean on” source‑code comments when they solve software‑engineering tasks. By probing the hidden representations of LLMs, the authors show that comments are stored as distinct latent concepts—and that toggling these concepts can dramatically boost or cripple the model’s performance on tasks such as code completion, translation, and refinement.

Key Contributions

  • First concept‑level interpretability analysis for SE‑focused LLMs, using the same methodology that has been popular in computer‑vision (Concept Activation Vectors, CAV).
  • Discovery that LLMs encode comments as separate latent concepts, and that they further differentiate comment sub‑types (Javadoc, inline, multiline).
  • Controlled activation/deactivation experiments that reveal task‑specific performance swings ranging from a ‑90 % drop to a +67 % gain when comment concepts are manipulated.
  • Comprehensive measurement across 10 SE tasks, showing that code summarization triggers the strongest comment‑concept activation, while code completion is the least sensitive.
  • A practical roadmap for building next‑generation SE tools that can explicitly query, edit, or suppress internal concepts rather than relying solely on prompt engineering.

Methodology

  1. Data & Tasks – The authors collected a balanced corpus of Java snippets annotated with three comment styles (Javadoc, inline, multiline). They evaluated three canonical SE tasks:

    • Code Completion (predict next token)
    • Code Translation (e.g., Java → Python)
    • Code Refinement (bug‑fix or style improvement)
  2. Concept Activation Vectors (CAV) – For each comment subtype, they trained linear classifiers on the model’s intermediate embeddings to produce a CAV that points in the direction of “comment‑ness”.

  3. Concept Manipulation – Using the CAVs, they performed two operations:

    • Activation – Adding a scaled version of the CAV to the embedding, effectively “injecting” comment knowledge.
    • Deactivation – Subtracting the CAV, attempting to erase the comment signal.
  4. Evaluation Loop – After each manipulation, the model generated outputs for the three tasks, and standard SE metrics (BLEU, Exact Match, Pass@k, etc.) were recorded.

  5. Cross‑Task Survey – A separate experiment prompted the same LLM to perform ten different SE tasks while measuring the magnitude of the comment‑concept activation in its latent space, allowing a comparative sensitivity analysis.

All steps are fully reproducible with publicly released code and model checkpoints.

Results & Findings

TaskEffect of Activating comment conceptEffect of Deactivating comment concept
Code Completion± 5 % (minor) – model already relies on syntax– 30 % to – 90 % (dramatic drop)
Code Translation+ 22 % to + 67 % (significant boost)– 15 % to – 45 %
Code Refinement+ 12 % to + 48 %– 10 % to – 35 %
  • Subtype matters: Javadoc activation helped translation the most, while inline comments were most beneficial for refinement.
  • Task sensitivity ranking (strongest to weakest comment‑concept activation):
    1. Code Summarization
    2. Code Translation
    3. Code Refinement
    4. Code Generation (e.g., scaffolding)
    5. Code Completion

These numbers demonstrate that the hidden “comment” neuron clusters are not just artifacts; they are functional levers that can be pulled to improve downstream SE performance.

Practical Implications

  • Prompt‑aware tooling – IDE plugins could automatically detect when a user’s request aligns with a high‑comment‑activation task (e.g., summarization) and prepend a short synthetic comment to steer the model.
  • Model‑level fine‑tuning – Instead of retraining the whole LLM, developers can fine‑tune only the comment‑concept vectors, achieving large gains with a fraction of compute.
  • Safety & robustness – Deactivating comment concepts can be used to mitigate “hallucinated” documentation or privacy‑leaking comments that inadvertently influence code generation.
  • Custom SE assistants – Companies can embed a lightweight “concept controller” that toggles comment awareness per API call, tailoring the assistant to the specific workflow (e.g., aggressive comment use for documentation bots, minimal use for low‑latency autocomplete).
  • Explainability for developers – By visualizing the activation strength of comment concepts, developers gain a transparent view of why a model suggested a particular refactor, fostering trust and easier debugging.

Limitations & Future Work

  • Model scope – Experiments were limited to a handful of open‑source LLMs (≈2‑3 B parameters). Larger commercial models may exhibit different concept structures.
  • Language focus – The study only examined Java; other languages with different commenting conventions (Python docstrings, Rust comments) need separate analysis.
  • Static manipulation – The activation/deactivation was performed post‑hoc on a frozen embedding; integrating concept control directly into the generation loop could yield smoother, more natural outputs.
  • User study missing – While the technical gains are clear, the paper does not evaluate how developers perceive or benefit from concept‑controlled suggestions in real IDE settings.

Future research directions include extending CAV analysis to other SE concepts (e.g., type hints, test cases), building end‑to‑end “concept‑aware” LLM APIs, and conducting longitudinal user studies to quantify productivity impact.

Authors

  • Aaron Imani
  • Mohammad Moshirpour
  • Iftekhar Ahmed

Paper Information

  • arXiv ID: 2512.16790v1
  • Categories: cs.SE
  • Published: December 18, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »