[Paper] LikeThis! Empowering App Users to Submit UI Improvement Suggestions Instead of Complaints

Published: (March 4, 2026 at 11:33 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2603.04245v1

Overview

The paper introduces LikeThis!, a generative‑AI tool that turns a typical, often vague user complaint (e.g., “this screen looks weird”) into concrete UI improvement suggestions. By feeding the user’s comment together with a screenshot, LikeThis! instantly produces several alternative designs, letting the user pick the one that best matches their intent. The authors show that this approach not only yields clearer feedback for developers but also improves the overall quality of the UI suggestions generated by current AI models.

Key Contributions

  • A novel feedback loop that converts raw user complaints into actionable UI redesign options, bridging the gap between end‑users and developers.
  • Benchmarking of image‑generation models on a public UI critique dataset, demonstrating that GPT‑Image‑1 outperforms three leading alternatives in preserving design fidelity while fixing UI issues.
  • Two‑step generation pipeline (specification → sketch) that proves essential for producing coherent and issue‑free UI improvements.
  • Empirical user study with 15 participants across 10 real‑world apps, showing higher understandability and actionability of feedback when augmented with AI‑generated suggestions.
  • Open‑source prototype (LikeThis!) that can be integrated into existing app feedback channels (e.g., in‑app bug reporters, app store reviews).

Methodology

  1. Data Collection – The authors used a publicly available dataset containing UI screenshots paired with expert critiques and improvement sketches.
  2. Model Benchmarking – Four image‑generation models (GPT‑Image‑1, DALL·E 3, Stable Diffusion, and a custom diffusion model) were prompted to produce redesigns based on the critiques. Quality was measured on three axes: issue resolution, visual fidelity, and absence of new problems.
  3. Two‑Step Generation – Instead of asking the model to jump straight to a new UI, LikeThis! first asks the model to output a solution specification (a textual description of the change). This spec is then fed to the image model to render the sketch.
  4. User Study – 15 participants installed a modified version of 10 popular apps that included the LikeThis! widget. They reported UI problems as they normally would, then selected from the AI‑generated alternatives. Developers of those apps later rated each piece of feedback on understandability and actionability, comparing raw comments vs. comments plus AI suggestions.

Results & Findings

  • Model performance: GPT‑Image‑1 achieved a 23 % higher issue‑resolution score than the next best model while maintaining 95 % visual fidelity. The specification‑first pipeline reduced “new issue” introductions by 40 % compared with a single‑prompt approach.
  • User study outcomes:
    • 87 % of participants said the generated alternatives captured what they meant better than their original text.
    • Developers rated AI‑augmented feedback 1.8 points higher (on a 5‑point Likert scale) for understandability and 2.1 points higher for actionability.
    • The average time to submit feedback dropped from 45 seconds (free‑form text) to 28 seconds (selecting a generated option).
  • Overall impact: The combination of textual critique and visual suggestion creates a feedback artifact that is both human‑readable and machine‑ready for downstream design tools.

Practical Implications

  • In‑app feedback channels can be upgraded with a “Suggest an improvement” button that instantly offers design alternatives, reducing the friction of writing detailed bug reports.
  • Design teams receive richer, visual tickets that can be directly imported into tools like Figma or Sketch, shortening the design‑to‑implementation cycle.
  • App store reviewers could be equipped with a lightweight version of LikeThis! to turn low‑quality reviews into actionable design tickets, improving the signal‑to‑noise ratio for developers.
  • Automated triage pipelines can prioritize feedback that already includes a concrete UI mockup, allowing AI‑driven bots to auto‑assign tickets or even generate prototype code snippets.
  • Cross‑platform consistency: Because the system works on screenshots, it can be used for both iOS and Android apps without needing platform‑specific instrumentation.

Limitations & Future Work

  • Dataset bias: The benchmark dataset consists of expert‑crafted critiques, which may not fully reflect the diversity of real‑world user language.
  • Scalability of specs: The textual specification step still relies on the model’s ability to understand ambiguous user phrasing; occasional misinterpretations were observed.
  • Design system constraints: The generated sketches ignore app‑specific style guides (colors, typography), so developers must still adapt them to the existing design system.
  • Future directions include: integrating style‑guide awareness into the generation pipeline, extending the approach to multi‑screen flows, and evaluating long‑term effects on user satisfaction and development velocity in large‑scale production environments.

Authors

  • Jialiang Wei
  • Ali Ebrahimi Pourasad
  • Walid Maalej

Paper Information

  • arXiv ID: 2603.04245v1
  • Categories: cs.SE, cs.AI, cs.HC
  • Published: March 4, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »