[Paper] LLMAID: Identifying AI Capabilities in Android Apps with LLMs

Published: 1 week ago (November 24, 2025 at 07:54 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.19059v1

Overview

The paper introduces LLMAID, a framework that leverages large language models (LLMs) to automatically discover and classify AI capabilities embedded in Android applications. By moving beyond labor‑intensive manual reviews and brittle rule‑based scanners, LLMAID dramatically expands the visibility of AI‑powered mobile apps—an insight that matters to developers, security analysts, and regulators alike.

Key Contributions

LLMAID pipeline: A four‑stage system (candidate extraction, knowledge‑base interaction, AI capability analysis, and service summarization) that harnesses LLM reasoning to pinpoint AI components in app binaries.
Large‑scale evaluation: Applied to 4,201 real‑world Android apps, LLMAID uncovered 242 % more AI‑enabled apps than the best existing rule‑based tool.
High accuracy: Achieved > 90 % precision and recall in detecting AI‑related libraries, models, and services.
Developer‑focused summaries: Generated concise AI service descriptions that were rated more informative than original app store texts in a user study.
Empirical landscape analysis: Provided the first systematic view of AI functionality distribution on Android, highlighting a dominance of computer‑vision tasks (≈ 55 %) and object detection as the top use case (≈ 25 %).

Methodology

Candidate Extraction – Static analysis scans the APK for clues (e.g., imported packages, model files, network endpoints) that could indicate AI usage.
Knowledge‑Base Interaction – The extracted clues are fed to an LLM (e.g., GPT‑4) which queries a curated AI‑service knowledge base (lists of known AI SDKs, cloud APIs, model formats).
AI Capability Analysis & Detection – The LLM reasons over the combined evidence, classifying each candidate as a true AI component or a false positive, and tags its functional domain (vision, NLP, speech, etc.).
AI Service Summarization – For each confirmed AI capability, the LLM produces a short, human‑readable summary (e.g., “uses TensorFlow Lite for on‑device object detection of retail products”).

The pipeline is fully automated, requiring only the APK as input, and can be run at scale across app stores.

Results & Findings

Coverage boost: LLMAID identified 1,018 AI‑enabled apps, versus 300 found by the prior rule‑based baseline.
Precision/Recall: Both metrics exceeded 90 %, confirming that the LLM‑driven reasoning does not sacrifice reliability for breadth.
Developer feedback: In a study with 30 Android developers, 87 % preferred LLMAID’s generated summaries over the original Play Store descriptions for understanding AI functionality.
Capability distribution:
- Computer‑vision dominates (54.80 % of AI apps).
- Object detection is the most common task (25.19 %).
- Remaining AI domains (speech, language, recommendation) each account for < 15 % of the total.

These findings suggest that mobile AI is still heavily visual‑centric, likely driven by camera‑based use cases.

Practical Implications

App store vetting: Marketplaces can integrate LLMAID to flag AI‑enabled apps automatically, aiding compliance checks (e.g., privacy policies for on‑device vs. cloud inference).
Security & privacy audits: Security teams gain a fast way to locate AI libraries that may introduce novel attack surfaces (model extraction, adversarial inputs).
Developer tooling: IDE plugins could surface LLMAID’s summaries during code review, helping engineers understand third‑party AI dependencies and licensing implications.
Competitive intelligence: Companies can monitor trends in AI adoption across categories, informing product roadmaps (e.g., “object detection is hot in retail apps”).
Regulatory reporting: Automated detection eases the burden of answering “does this app use AI?” for compliance with emerging AI transparency regulations.

Limitations & Future Work

LLM dependence: Accuracy hinges on the underlying language model’s knowledge; newer AI SDKs may be missed until the model is updated.
Static‑only analysis: Dynamic loading or obfuscated code could evade detection; combining LLMAID with runtime monitoring is a promising direction.
Knowledge‑base freshness: Maintaining an up‑to‑date repository of AI services and model formats is non‑trivial and requires community effort.
Cross‑platform extension: The current implementation targets Android; adapting the pipeline to iOS or cross‑platform frameworks (Flutter, React Native) remains open.

Overall, LLMAID demonstrates that LLMs can serve as powerful assistants for large‑scale software intelligence tasks, opening the door to more transparent and secure AI ecosystems on mobile platforms.

Authors

Pei Liu
Terry Zhuo
Jiawei Deng
Thong James
Shidong Pan
Sherry Xu
Zhenchang Xing
Qinghua Lu
Xiaoning Du
Hongyu Zhang

Paper Information

arXiv ID: 2511.19059v1
Categories: cs.SE
Published: November 24, 2025
PDF: Download PDF

[Paper] LLMAID: Identifying AI Capabilities in Android Apps with LLMs

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Configuration Defects in Kubernetes

[Paper] POLARIS: Is Multi-Agentic Reasoning the Next Wave in Engineering Self-Adaptive Systems?

[Paper] Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

[Paper] PBFuzz: Agentic Directed Fuzzing for PoV Generation