[Paper] XTrace: A Non-Invasive Dynamic Tracing Framework for Android Applications in Production
Source: arXiv - 2512.21555v1
Overview
Mobile apps are getting more complex while the Android ecosystem fragments across countless device models and OS versions. Traditional static logging or post‑crash analysis can’t capture the fleeting “ghost bugs” that only appear in very specific runtime conditions. XTrace tackles this gap by offering a non‑invasive, production‑ready dynamic tracing framework that can instrument any Android app on‑the‑fly—no new release, no VM hacks, and virtually no performance penalty.
Key Contributions
- Non‑invasive proxying model – intercepts method calls without altering ART’s internal data structures, preserving VM stability.
- Leverages built‑in ART instrumentation – builds on Android’s native tracing hooks, then optimizes them for ultra‑low overhead.
- Production‑grade performance – < 7 ms added startup latency, < 0.01 ms per intercepted method call, validated on devices running Android 5.0 through 15+.
- Large‑scale real‑world evaluation – deployed in a ByteDance app with hundreds of millions of daily active users; A/B tests showed no statistically significant impact on crash or ANR rates.
- Effective root‑cause localization – helped engineers diagnose > 11 severe crashes and multiple performance bottlene‑cks, cutting investigation time by > 90 %.
Methodology
- Instrumentation Hook Selection – XTrace taps into ART’s stable “method entry/exit” callbacks that are already part of the VM’s debugging interface.
- Proxy Layer Injection – Instead of rewriting method bodies, XTrace inserts a thin proxy that forwards calls to the original implementation while logging contextual data (thread ID, arguments, timestamps, etc.).
- Dynamic Configuration – Developers specify target classes/methods via a JSON‑based policy that can be updated at runtime; the framework loads these policies without restarting the app.
- Performance Optimizations
- Batching of trace records to reduce JNI crossing costs.
- Selective sampling to avoid tracing high‑frequency methods unless explicitly requested.
- Lock‑free buffers for concurrent writes from multiple threads.
- Safety Guardrails – A watchdog monitors CPU/memory impact; if thresholds are breached, XTrace automatically throttles or disables tracing for the offending component.
The whole pipeline runs entirely in user space, meaning it can be shipped as an optional SDK or even injected via a side‑load, keeping the original APK untouched.
Results & Findings
| Metric | Baseline | XTrace‑enabled | Δ |
|---|---|---|---|
| Startup latency | 1.2 s | 1.207 s | +7 ms |
| Per‑method call overhead | 0 ms | 0.009 ms | <0.01 ms |
| Crash User Rate (CUR) | 0.12 % | 0.119 % | p > 0.05 |
| ANR rate | 0.03 % | 0.031 % | p > 0.05 |
| Root‑cause localization time | ~4 h per incident | ~20 min per incident | > 90 % reduction |
- Stability: No increase in crashes or ANRs across Android 5.0–15+.
- Coverage: Able to instrument > 95 % of methods in the target app, even on heavily obfuscated code.
- Diagnostic ROI: The 11 severe crashes uncovered were previously invisible to static logs; fixing them reduced overall crash volume by ~2 %.
Practical Implications
- Instant observability: Ops teams can turn on tracing for a suspect feature in minutes, without waiting for a new build or OTA rollout.
- Cost‑effective debugging: Eliminates the need for heavyweight instrumentation frameworks (e.g., Frida, Xposed) that require root or custom ROMs.
- Continuous performance monitoring: Developers can sample high‑frequency UI paths in production to spot latency spikes before they affect users.
- Compliance & privacy: Since XTrace runs entirely on the device and only logs developer‑specified data, it aligns with GDPR‑style data‑minimization policies.
- SDK integration: The framework can be packaged as a drop‑in library for any Android project, offering a “debug‑mode” toggle that can be remotely activated via feature flags.
In short, XTrace gives engineering teams a production‑grade “debug console” that works across the fragmented Android landscape without sacrificing user experience.
Limitations & Future Work
- Instrumentation Scope: While XTrace can proxy Java/Kotlin methods, native (JNI/C++) calls remain opaque and require separate tooling.
- Policy Management Overhead: Large, complex tracing policies can become hard to maintain; the authors suggest building higher‑level DSLs or UI tools.
- Battery Impact: Continuous high‑frequency tracing could increase wake‑locks; future work includes adaptive sampling based on device power state.
- Security Sandbox: The current design assumes trusted policy delivery; extending the framework with signed policy verification would harden it against malicious injection.
The authors plan to explore deeper integration with Android’s upcoming “Dynamic Feature Modules” and to open‑source a lightweight policy editor to broaden community adoption.
Authors
- Qi Hu
- Jiangchao Liu
- Xin Yu
- Lin Zhang
- Edward Jiang
Paper Information
- arXiv ID: 2512.21555v1
- Categories: cs.SE
- Published: December 25, 2025
- PDF: Download PDF