The Missing Control Plane for Local AI Agents
Source: Dev.to
The Problem with Current Mobile AI Agents
- Model‑first focus: Discussions often center on model size, cost, or context window, assuming the agent already has a way to interact with the OS.
- Platform restrictions:
- iOS sandboxing blocks one app from controlling another.
- Android Accessibility Services are heavyweight, require scary permissions, and have limited synthesis capabilities.
- Result: Even a powerful on‑device model can’t open Maps, tap “Confirm”, or type a message because it has no hands.
What a Control Plane Provides
A control plane sits underneath the model and handles:
- Observation – capture screen state, UI hierarchy, current activity, foreground app.
- Execution – perform discrete actions such as tap, type, swipe, draw, key event, or launch an app.
- Feedback – report what changed after each action so the model can adjust its next step.
Drengr: One Implementation of the Control Plane
Drengr exposes three simple MCP (Model‑Control‑Protocol) tools that any AI client supporting the protocol can use (e.g., Claude Desktop, Cursor, Windsurf).
| Tool | Purpose |
|---|---|
drengr_look | Observe the current screen + UI tree |
drengr_do | Execute a tap / type / swipe / … |
drengr_query | Read structured data (devices, activity, crashes) |
These three verbs replace fragile selectors, XPath gymnastics, or a constantly‑running Appium daemon.
Runtime Architecture
- Single static Rust binary – drives the device via native channels (ADB on Android, WDA on iOS simulators).
- Cross‑platform abstraction – the same binary works for both Android and iOS without extra dependencies.
The Agent Loop in Practice
-
Observation
drengr_lookDrengr captures a screenshot, dumps the UI tree, and builds a compact text description (~300 tokens vs ~100 KB for an image).
-
Decision
The model processes the description and returns a JSON envelope describing the desired action.
-
Execution & Feedback
drengr_doDrengr performs the action, generates a situation report (a diff against the previous state), and feeds it back to the model for the next iteration.
The situation report is the part most frameworks miss; without it, the model is blind between steps and may over‑act (e.g., repeatedly tapping a dead button).
Why a Local Control Plane Is Essential
| Concern | Cloud‑only assistants struggle with |
|---|---|
| Latency | A two‑second round‑trip feels broken when you’re holding the phone. |
| Privacy | Banking, health, and messaging data should stay on‑device. |
| Network independence | Subways, airplanes, or spotty Wi‑Fi shouldn’t cripple the assistant. |
As on‑device models become ubiquitous, the control plane must also run locally. Drengr’s static binary design reflects this requirement.
Real‑World Use Cases
With the three tools above, an on‑device agent can:
- Open Photos, find recent pictures, and attach them to a WhatsApp message.
- Monitor a flight‑booking app for price drops and automatically rebook.
- Operate a banking app via screen‑sharing for low‑vision users.
- Perform the long tail of tasks people normally ask a human assistant to do on their phone.
These scenarios need hands‑and‑eyes infrastructure, not new model capabilities.
Getting Started with Drengr
Drengr is free to use. Install and verify it in two commands:
# Install via Claude Code (or run directly)
claude mcp add drengr -- npx -y drengr mcp
# Verify the installation
drengr doctor
Point your AI agent at the running Drengr instance, and watch the model act with real hands.
The Rust implementation was a deliberate choice—see the separate post for details.