[Paper] Context-Aware Functional Test Generation via Business Logic Extraction and Adaptation

Published: 3 days ago (February 27, 2026 at 10:47 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.24108v1

Overview

The paper introduces LogiDroid, a two‑stage system that automatically generates functional UI tests for mobile apps by extracting business logic from unstructured requirements and adapting it to the app’s actual GUI. By bridging the gap between textual specifications and concrete UI interactions, LogiDroid dramatically reduces the manual effort traditionally required for functional testing.

Key Contributions

Business‑logic extraction pipeline that retrieves and fuses relevant requirement cases from a curated dataset, turning vague natural‑language specs into structured logic fragments.
Context‑aware test generation engine that jointly reasons over the extracted logic and the live GUI hierarchy to produce end‑to‑end test scripts with built‑in assertions.
Empirical evaluation on two large public benchmarks (FrUITeR & Lin) covering 28 real‑world Android apps and 190 functional requirements, showing up to a 55 % improvement over the strongest baselines.
Open‑source artifacts (dataset, implementation, and evaluation scripts) to enable reproducibility and further research.

Methodology

LogiDroid works in two sequential phases:

Knowledge Retrieval & Fusion
- A searchable repository of previously annotated requirement–test pairs is built.
- Given a new functional requirement, the system uses semantic similarity (e.g., BERT embeddings) to pull the most relevant cases.
- Extracted logic fragments (pre‑conditions, actions, expected outcomes) are merged via a rule‑based fusion step, yielding a concise, structured representation of the intended business behavior.
Context‑Aware Test Generation
- The structured logic is fed into a graph‑based analyzer that inspects the current app’s UI hierarchy (via UIAutomator/AccessibilityNodeInfo).
- A constraint‑solving module maps abstract actions (e.g., “select a product”) to concrete widget IDs, taking into account runtime state (visibility, enabled/disabled).
- Finally, a test script (e.g., Espresso or Appium) is emitted, embedding verification assertions derived from the expected outcomes in the logic.

The design deliberately separates what the app should do (business logic) from how it can be done on a particular screen, allowing the same logic to be reused across UI variations.

Results & Findings

Dataset	#Requirements Tested	Improvement vs. SOTA
FrUITeR	40 % (≈ 38/95)	+48 %
Lin	65 % (≈ 62/95)	+55 %

Coverage boost: LogiDroid consistently reaches more functional requirements than prior approaches (e.g., EvoSuite‑Mobile, Randoop‑Android).
Assertion quality: Over 90 % of generated assertions correctly capture the intended post‑conditions, reducing false positives in CI pipelines.
Runtime overhead: Test generation adds ~2 seconds per requirement, negligible compared with manual test authoring time (minutes to hours).

These numbers indicate that a large portion of functional intent can be automatically turned into runnable tests without hand‑crafting UI locators.

Practical Implications

Accelerated QA cycles: Teams can feed user stories or requirement tickets directly into LogiDroid and obtain ready‑to‑run UI tests, cutting down the “test‑writing” bottleneck.
Cross‑device resilience: Because the UI mapping is performed at runtime, generated tests adapt to different screen sizes, orientations, and even minor UI redesigns, lowering maintenance costs.
Continuous integration ready: The generated Espresso/Appium scripts integrate seamlessly with existing CI pipelines (GitHub Actions, Jenkins), providing immediate regression feedback on functional behavior.
Domain knowledge reuse: Organizations can seed LogiDroid with their own internal requirement–test corpus, allowing the system to learn company‑specific terminology and patterns, further improving relevance.

Limitations & Future Work

Requirement quality dependence: The extraction stage assumes reasonably well‑formed natural‑language specs; highly ambiguous or incomplete requirements degrade performance.
Android‑centric: The current implementation targets Android UI frameworks; extending to iOS or cross‑platform frameworks (Flutter, React Native) would require additional UI adapters.
Scalability of the knowledge base: As the repository grows, retrieval latency may increase; future work could explore more efficient indexing or incremental learning.
Dynamic content handling: Tests involving data‑driven screens (e.g., infinite scrolling lists) are only partially supported; integrating model‑based exploration could close this gap.

Overall, LogiDroid demonstrates a promising direction for turning textual business intent into actionable, maintainable functional tests, paving the way for more automated quality assurance in mobile development.

Authors

Yakun Zhang
Zihan Wang
Xinzhi Peng
Zihao Xie
Xiaodong Wang
Xutao Li
Dan Hao
Lu Zhang
Yunming Ye

Paper Information

arXiv ID: 2602.24108v1
Categories: cs.SE
Published: February 27, 2026
PDF: Download PDF

[Paper] Context-Aware Functional Test Generation via Business Logic Extraction and Adaptation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] LeGend: A Data-Driven Framework for Lemma Generation in Hardware Model Checking

[Paper] The Vocabulary of Flaky Tests in the Context of SAP HANA

[Paper] Invariant-Driven Automated Testing

[Paper] Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding