[Paper] LLMAID: Identifying AI Capabilities in Android Apps with LLMs
Source: arXiv - 2511.19059v1
Overview
The paper introduces LLMAID, a framework that leverages large language models (LLMs) to automatically discover and classify AI capabilities embedded in Android applications. By moving beyond labor‑intensive manual reviews and brittle rule‑based scanners, LLMAID dramatically expands the visibility of AI‑powered mobile apps—an insight that matters to developers, security analysts, and regulators alike.
Key Contributions
- LLMAID pipeline: A four‑stage system (candidate extraction, knowledge‑base interaction, AI capability analysis, and service summarization) that harnesses LLM reasoning to pinpoint AI components in app binaries.
- Large‑scale evaluation: Applied to 4,201 real‑world Android apps, LLMAID uncovered 242 % more AI‑enabled apps than the best existing rule‑based tool.
- High accuracy: Achieved > 90 % precision and recall in detecting AI‑related libraries, models, and services.
- Developer‑focused summaries: Generated concise AI service descriptions that were rated more informative than original app store texts in a user study.
- Empirical landscape analysis: Provided the first systematic view of AI functionality distribution on Android, highlighting a dominance of computer‑vision tasks (≈ 55 %) and object detection as the top use case (≈ 25 %).
Methodology
- Candidate Extraction – Static analysis scans the APK for clues (e.g., imported packages, model files, network endpoints) that could indicate AI usage.
- Knowledge‑Base Interaction – The extracted clues are fed to an LLM (e.g., GPT‑4) which queries a curated AI‑service knowledge base (lists of known AI SDKs, cloud APIs, model formats).
- AI Capability Analysis & Detection – The LLM reasons over the combined evidence, classifying each candidate as a true AI component or a false positive, and tags its functional domain (vision, NLP, speech, etc.).
- AI Service Summarization – For each confirmed AI capability, the LLM produces a short, human‑readable summary (e.g., “uses TensorFlow Lite for on‑device object detection of retail products”).
The pipeline is fully automated, requiring only the APK as input, and can be run at scale across app stores.
Results & Findings
- Coverage boost: LLMAID identified 1,018 AI‑enabled apps, versus 300 found by the prior rule‑based baseline.
- Precision/Recall: Both metrics exceeded 90 %, confirming that the LLM‑driven reasoning does not sacrifice reliability for breadth.
- Developer feedback: In a study with 30 Android developers, 87 % preferred LLMAID’s generated summaries over the original Play Store descriptions for understanding AI functionality.
- Capability distribution:
- Computer‑vision dominates (54.80 % of AI apps).
- Object detection is the most common task (25.19 %).
- Remaining AI domains (speech, language, recommendation) each account for < 15 % of the total.
These findings suggest that mobile AI is still heavily visual‑centric, likely driven by camera‑based use cases.
Practical Implications
- App store vetting: Marketplaces can integrate LLMAID to flag AI‑enabled apps automatically, aiding compliance checks (e.g., privacy policies for on‑device vs. cloud inference).
- Security & privacy audits: Security teams gain a fast way to locate AI libraries that may introduce novel attack surfaces (model extraction, adversarial inputs).
- Developer tooling: IDE plugins could surface LLMAID’s summaries during code review, helping engineers understand third‑party AI dependencies and licensing implications.
- Competitive intelligence: Companies can monitor trends in AI adoption across categories, informing product roadmaps (e.g., “object detection is hot in retail apps”).
- Regulatory reporting: Automated detection eases the burden of answering “does this app use AI?” for compliance with emerging AI transparency regulations.
Limitations & Future Work
- LLM dependence: Accuracy hinges on the underlying language model’s knowledge; newer AI SDKs may be missed until the model is updated.
- Static‑only analysis: Dynamic loading or obfuscated code could evade detection; combining LLMAID with runtime monitoring is a promising direction.
- Knowledge‑base freshness: Maintaining an up‑to‑date repository of AI services and model formats is non‑trivial and requires community effort.
- Cross‑platform extension: The current implementation targets Android; adapting the pipeline to iOS or cross‑platform frameworks (Flutter, React Native) remains open.
Overall, LLMAID demonstrates that LLMs can serve as powerful assistants for large‑scale software intelligence tasks, opening the door to more transparent and secure AI ecosystems on mobile platforms.
Authors
- Pei Liu
- Terry Zhuo
- Jiawei Deng
- Thong James
- Shidong Pan
- Sherry Xu
- Zhenchang Xing
- Qinghua Lu
- Xiaoning Du
- Hongyu Zhang
Paper Information
- arXiv ID: 2511.19059v1
- Categories: cs.SE
- Published: November 24, 2025
- PDF: Download PDF