Building an MCP Server for Linux Desktop GUI Automation on Wayland
Source: Dev.to
When working on AI‑agent tooling, I quickly ran into a hard limit: Wayland provides no clean way to automate GUI interactions. Unlike X11’s xdotool and virtual displays (DISPLAY=:99), Wayland’s security model blocks global input injection and screen grabbing without explicit portal authorizations.
To solve this, I created kwin‑mcp, an MCP server that gives AI agents full GUI‑automation capabilities inside an isolated KWin Wayland session.
Problem Statement
- Wayland blocks the automation patterns we took for granted on X11.
- XDG RemoteDesktop portals require interactive user authorization, which is unusable for headless automation.
- Most Wayland compositors expose no input‑injection API.
Architecture
kwin-mcp builds three layers of isolation for each session:
| Layer | Description |
|---|---|
Private D‑Bus bus (dbus-run-session) | Isolates the session from host services. |
Virtual Wayland compositor (kwin_wayland --virtual) | Runs a KDE Plasma desktop entirely in memory, with no windows shown on the host display. |
| Scoped input injection (KWin’s EIS D‑Bus interface) | Keeps all input events confined to the isolated session. |
The AI agent never touches the host desktop; it interacts only with the virtual KDE Plasma environment.
MCP Tools
The server provides 29 tools grouped by category:
| Category | Tools |
|---|---|
| Mouse | click, drag, scroll, move, button down/up |
| Keyboard | type (ASCII), type unicode, key press, key down/up |
| Touch | tap, swipe, pinch, multi‑finger swipe |
| Screen | screenshot, accessibility tree, find UI elements, wait for element |
| Session | start, stop, launch app, list/focus windows |
| System | clipboard get/set, D‑Bus calls, Wayland protocol info |
Note: Screenshot capture runs at ~30‑70 ms per frame via KWin’s
ScreenShot2D‑Bus interface. Any action tool can accept ascreenshot_after_msparameter for burst‑frame capture without extra round‑trips.
Performance
- Screen capture: ~30‑70 ms per frame.
- Input injection: Directly routed through KWin’s EIS interface, avoiding XDG RemoteDesktop dialogs.
Compatibility
- Compositor: Currently works only with KDE Plasma 6+ (KWin). Other compositors (GNOME, Sway, etc.) do not expose the required EIS interface.
- Keyboard layout: US QWERTY for direct typing (Unicode input via
wtypeworks). - Accessibility: AT‑SPI2 coverage varies by application.
Installation
pip install kwin-mcp
Or, using uv:
uv tool install kwin-mcp
Claude Code MCP configuration
{
"mcpServers": {
"kwin-mcp": {
"command": "kwin-mcp"
}
}
}
License
kwin-mcp is released under the MIT License and is available on GitHub and PyPI.