Building an MCP Server for Linux Desktop GUI Automation on Wayland

Published: (February 23, 2026 at 11:14 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

When working on AI‑agent tooling, I quickly ran into a hard limit: Wayland provides no clean way to automate GUI interactions. Unlike X11’s xdotool and virtual displays (DISPLAY=:99), Wayland’s security model blocks global input injection and screen grabbing without explicit portal authorizations.

To solve this, I created kwin‑mcp, an MCP server that gives AI agents full GUI‑automation capabilities inside an isolated KWin Wayland session.

Problem Statement

  • Wayland blocks the automation patterns we took for granted on X11.
  • XDG RemoteDesktop portals require interactive user authorization, which is unusable for headless automation.
  • Most Wayland compositors expose no input‑injection API.

Architecture

kwin-mcp builds three layers of isolation for each session:

LayerDescription
Private D‑Bus bus (dbus-run-session)Isolates the session from host services.
Virtual Wayland compositor (kwin_wayland --virtual)Runs a KDE Plasma desktop entirely in memory, with no windows shown on the host display.
Scoped input injection (KWin’s EIS D‑Bus interface)Keeps all input events confined to the isolated session.

The AI agent never touches the host desktop; it interacts only with the virtual KDE Plasma environment.

MCP Tools

The server provides 29 tools grouped by category:

CategoryTools
Mouseclick, drag, scroll, move, button down/up
Keyboardtype (ASCII), type unicode, key press, key down/up
Touchtap, swipe, pinch, multi‑finger swipe
Screenscreenshot, accessibility tree, find UI elements, wait for element
Sessionstart, stop, launch app, list/focus windows
Systemclipboard get/set, D‑Bus calls, Wayland protocol info

Note: Screenshot capture runs at ~30‑70 ms per frame via KWin’s ScreenShot2 D‑Bus interface. Any action tool can accept a screenshot_after_ms parameter for burst‑frame capture without extra round‑trips.

Performance

  • Screen capture: ~30‑70 ms per frame.
  • Input injection: Directly routed through KWin’s EIS interface, avoiding XDG RemoteDesktop dialogs.

Compatibility

  • Compositor: Currently works only with KDE Plasma 6+ (KWin). Other compositors (GNOME, Sway, etc.) do not expose the required EIS interface.
  • Keyboard layout: US QWERTY for direct typing (Unicode input via wtype works).
  • Accessibility: AT‑SPI2 coverage varies by application.

Installation

pip install kwin-mcp

Or, using uv:

uv tool install kwin-mcp

Claude Code MCP configuration

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "kwin-mcp"
    }
  }
}

License

kwin-mcp is released under the MIT License and is available on GitHub and PyPI.

0 views
Back to Blog

Related posts

Read more »

Installing Kiro on Fedora / Red Hat

Overview What this guide does - Installs the Kiro IDE desktop app from the official download server - Sets up a desktop entry so you can launch Kiro from your...