[Paper] Context-Aware Functional Test Generation via Business Logic Extraction and Adaptation
Source: arXiv - 2602.24108v1
Overview
The paper introduces LogiDroid, a two‑stage system that automatically generates functional UI tests for mobile apps by extracting business logic from unstructured requirements and adapting it to the app’s actual GUI. By bridging the gap between textual specifications and concrete UI interactions, LogiDroid dramatically reduces the manual effort traditionally required for functional testing.
Key Contributions
- Business‑logic extraction pipeline that retrieves and fuses relevant requirement cases from a curated dataset, turning vague natural‑language specs into structured logic fragments.
- Context‑aware test generation engine that jointly reasons over the extracted logic and the live GUI hierarchy to produce end‑to‑end test scripts with built‑in assertions.
- Empirical evaluation on two large public benchmarks (FrUITeR & Lin) covering 28 real‑world Android apps and 190 functional requirements, showing up to a 55 % improvement over the strongest baselines.
- Open‑source artifacts (dataset, implementation, and evaluation scripts) to enable reproducibility and further research.
Methodology
LogiDroid works in two sequential phases:
-
Knowledge Retrieval & Fusion
- A searchable repository of previously annotated requirement–test pairs is built.
- Given a new functional requirement, the system uses semantic similarity (e.g., BERT embeddings) to pull the most relevant cases.
- Extracted logic fragments (pre‑conditions, actions, expected outcomes) are merged via a rule‑based fusion step, yielding a concise, structured representation of the intended business behavior.
-
Context‑Aware Test Generation
- The structured logic is fed into a graph‑based analyzer that inspects the current app’s UI hierarchy (via UIAutomator/AccessibilityNodeInfo).
- A constraint‑solving module maps abstract actions (e.g., “select a product”) to concrete widget IDs, taking into account runtime state (visibility, enabled/disabled).
- Finally, a test script (e.g., Espresso or Appium) is emitted, embedding verification assertions derived from the expected outcomes in the logic.
The design deliberately separates what the app should do (business logic) from how it can be done on a particular screen, allowing the same logic to be reused across UI variations.
Results & Findings
| Dataset | #Requirements Tested | Improvement vs. SOTA |
|---|---|---|
| FrUITeR | 40 % (≈ 38/95) | +48 % |
| Lin | 65 % (≈ 62/95) | +55 % |
- Coverage boost: LogiDroid consistently reaches more functional requirements than prior approaches (e.g., EvoSuite‑Mobile, Randoop‑Android).
- Assertion quality: Over 90 % of generated assertions correctly capture the intended post‑conditions, reducing false positives in CI pipelines.
- Runtime overhead: Test generation adds ~2 seconds per requirement, negligible compared with manual test authoring time (minutes to hours).
These numbers indicate that a large portion of functional intent can be automatically turned into runnable tests without hand‑crafting UI locators.
Practical Implications
- Accelerated QA cycles: Teams can feed user stories or requirement tickets directly into LogiDroid and obtain ready‑to‑run UI tests, cutting down the “test‑writing” bottleneck.
- Cross‑device resilience: Because the UI mapping is performed at runtime, generated tests adapt to different screen sizes, orientations, and even minor UI redesigns, lowering maintenance costs.
- Continuous integration ready: The generated Espresso/Appium scripts integrate seamlessly with existing CI pipelines (GitHub Actions, Jenkins), providing immediate regression feedback on functional behavior.
- Domain knowledge reuse: Organizations can seed LogiDroid with their own internal requirement–test corpus, allowing the system to learn company‑specific terminology and patterns, further improving relevance.
Limitations & Future Work
- Requirement quality dependence: The extraction stage assumes reasonably well‑formed natural‑language specs; highly ambiguous or incomplete requirements degrade performance.
- Android‑centric: The current implementation targets Android UI frameworks; extending to iOS or cross‑platform frameworks (Flutter, React Native) would require additional UI adapters.
- Scalability of the knowledge base: As the repository grows, retrieval latency may increase; future work could explore more efficient indexing or incremental learning.
- Dynamic content handling: Tests involving data‑driven screens (e.g., infinite scrolling lists) are only partially supported; integrating model‑based exploration could close this gap.
Overall, LogiDroid demonstrates a promising direction for turning textual business intent into actionable, maintainable functional tests, paving the way for more automated quality assurance in mobile development.
Authors
- Yakun Zhang
- Zihan Wang
- Xinzhi Peng
- Zihao Xie
- Xiaodong Wang
- Xutao Li
- Dan Hao
- Lu Zhang
- Yunming Ye
Paper Information
- arXiv ID: 2602.24108v1
- Categories: cs.SE
- Published: February 27, 2026
- PDF: Download PDF