A Rust library for headless GUI application testing on Wayland. Launches apps in isolated compositor sessions, interacts with them via AT-SPI accessibility APIs, and captures screenshots and WebM video via PipeWire.
The repo also contains waydriver-mcp, a standalone MCP server binary built on top of the library that lets AI assistants drive GTK4 apps directly — see MCP server below.
The clip below is the full output of crates/waydriver-examples/examples/gnome_calculator.rs, runnable with cargo run -p waydriver-examples --example gnome_calculator. Read the source for the API surface in context — it covers a session lifecycle, AT-SPI button clicks, keyboard chord dispatch (Shift+9/Shift+0 for parens), a typed unit conversion, and per-step result verification via XPath locators. The recording is captured by waydriver itself via PipeWire.
gnome-calculator-demo.webm
Each test session creates an isolated environment with a headless compositor, input injection, and screen capture:
graph TD
subgraph Session["Per-session processes"]
dbus["dbus-daemon (private)"]
dbus --- mutter["Mutter --headless --wayland"]
mutter --- screencast["ScreenCast API (screenshots)"]
mutter --- remotedesktop["RemoteDesktop API (input)"]
dbus --- pipewire["PipeWire (frame capture)"]
dbus --- wireplumber["WirePlumber (PipeWire graph manager)"]
app["Your app (on Mutter's Wayland display)"]
app --- atspi["AT-SPI (accessibility tree, actions)"]
end
The library is backend-agnostic. Three traits define the interface:
CompositorRuntime— lifecycle of a headless compositor (start, stop, expose Wayland display)InputBackend— keyboard and pointer injectionCaptureBackend— screen capture (start/stop PipeWire streams, grab PNG frames)
Concrete implementations are separate crates. The trait-based design allows backends to be added as sibling crates without changing the core.
| Feature | Mutter | KWin | Sway |
|---|---|---|---|
| Headless compositor | Yes | — | — |
| Keyboard input | Yes (RemoteDesktop) | — | — |
| Pointer input | Yes (RemoteDesktop) | — | — |
| Screenshots | Yes (ScreenCast + PipeWire) | — | — |
| Video recording (WebM/VP8) | Yes (ScreenCast + PipeWire) | — | — |
| AT-SPI (UI inspection, clicks) | Yes | — | — |
Currently only Mutter is implemented (waydriver-compositor-mutter, waydriver-input-mutter, waydriver-capture-mutter). Each compositor has its own APIs (Mutter uses org.gnome.Mutter.* D-Bus interfaces, KWin has org.kde.KWin.*, Sway uses wlroots Wayland protocols), so each would need its own set of backend crates.
| Crate | Purpose |
|---|---|
waydriver |
Trait definitions, Session, AT-SPI client, keysym helpers, shared GStreamer capture helper |
waydriver-compositor-mutter |
CompositorRuntime impl — manages Mutter, PipeWire, WirePlumber, private D-Bus |
waydriver-input-mutter |
InputBackend impl — keyboard/pointer via Mutter RemoteDesktop |
waydriver-capture-mutter |
CaptureBackend impl — screenshots via Mutter ScreenCast + PipeWire |
waydriver-mcp |
Binary — MCP JSON-RPC server over stdio that exposes the library to AI assistants |
use std::sync::Arc;
use waydriver::{Session, SessionConfig, CompositorRuntime};
use waydriver_compositor_mutter::MutterCompositor;
use waydriver_input_mutter::MutterInput;
use waydriver_capture_mutter::MutterCapture;
let mut compositor = MutterCompositor::new();
compositor.start(None).await?;
// `state()` is `Option`; immediately after a successful `start()` it is
// always `Some` — `expect` documents that invariant locally.
let state = compositor.state().expect("state available after start");
let input = MutterInput::new(state.clone());
let capture = MutterCapture::new(state);
let session = Arc::new(Session::start(
Box::new(compositor),
Box::new(input),
Box::new(capture),
SessionConfig {
command: "your-gtk-app".into(),
args: vec![],
cwd: None,
app_name: "your-gtk-app".into(),
// Record the entire session to a WebM file. Set to `None` to skip.
video_output: Some("/tmp/session.webm".into()),
video_bitrate: None, // defaults to waydriver::capture::DEFAULT_VIDEO_BITRATE (2 Mbps)
video_fps: None, // defaults to waydriver::capture::DEFAULT_VIDEO_FPS (15)
},
).await?);
// Take a screenshot (returns PNG bytes).
let png = session.take_screenshot().await?;
// Target widgets with XPath selectors over the AT-SPI tree. Actions
// auto-wait for the element to be visible + enabled before firing.
session.locate("//Button[@name='primary-button']").click().await?;
session.locate("//Text[@name='search']").set_text("hello").await?;
// Keyboard input with modifier chords.
session.press_chord("Ctrl+Shift+S").await?;
// Explicit waits when auto-wait isn't enough — e.g. an item appearing
// after some async work.
session.locate("//Label[@name='status']")
.wait_for_text(|t| t == "ready")
.await?;
// Inspect the tree while debugging selectors.
let xml = session.dump_tree().await?;
println!("{xml}");
Arc::try_unwrap(session).unwrap().kill().await?;Session::locate(xpath) returns a lazy Locator — each action re-snapshots
the AT-SPI tree and re-resolves the selector, so you don't have to worry
about stale element handles. Common methods:
| Method | What it does |
|---|---|
click() / double_click() / right_click() |
Invoke the AT-SPI Action interface (primary, secondary, tertiary actions) |
hover() / drag_to(target) |
Pointer-driven hover and drag — lands on real Wayland input events for repaint |
focus() / scroll_into_view() |
Component::grab_focus and scroll_to/scroll_to_point |
set_text(s) / fill(s) |
Direct EditableText write vs. focus-and-type fallback for widgets without EditableText (e.g. GtkTextView) |
select_option(by) |
Pick a child of a Selection-interface container by label or index |
text() |
Read via the Text interface |
count() / all() / inspect_all() |
Multi-match: count, list of locators, full metadata in one snapshot |
name() / role() / attribute(k) / attributes() / bounds() |
Accessible name, role, AT-SPI attributes, screen-relative bounds |
is_showing() / is_enabled() |
State predicates |
wait_for_visible() / _hidden() / _enabled() / _count(n) / _text(pred) |
Block until state or predicate holds |
wait_for(pred) / wait_until(pred) / wait_until_async(pred) |
General-purpose predicate auto-waits |
with_timeout(d) |
Per-call override of the auto-wait timeout |
nth(i) / first() / last() / parent() / locate(sub_xpath) |
Compose sub-locators |
Single-target actions (click, focus, set_text, text, ...) error with
AmbiguousSelector if the selector matches more than one element. Narrow
with .nth(i) or a more specific XPath.
waydriver-mcp is a standalone binary that exposes the library over the Model Context Protocol, letting AI assistants (Claude Desktop, Claude Code, etc.) drive GTK4 apps in isolated headless sessions. It speaks JSON-RPC over stdio and constructs the Mutter backends internally — clients only see the high-level tools below.
| Tool | Purpose |
|---|---|
start_session |
Spawn a headless Mutter session and launch a command inside it (optional report_dir, resolution, record_video, video_bitrate overrides per session) |
list_sessions |
List active session ids, app names, and Wayland displays |
kill_session |
Tear down a session and clean up all child processes |
dump_tree |
Dump the AT-SPI accessibility tree as XML — each node carries a _ref you can target with query/click/etc. |
query |
Evaluate an XPath over the tree; returns every match's role, name, attributes, and states |
click / double_click / right_click |
Invoke an element's primary / secondary / tertiary AT-SPI Action. Auto-waits for visibility + enablement. |
hover |
Move the pointer to an element's center — drives a real Wayland motion event so hover-state UI repaints |
drag_to |
Press, move across an element's center, release — full Wayland drag gesture |
focus |
Give keyboard focus to an element via AT-SPI Component::grab_focus |
set_text |
Replace an editable element's contents via EditableText (fast, requires the interface) |
fill |
Focus + clear + type — fallback for widgets without EditableText (e.g. GtkTextView/GtkEntry). Tries AT-SPI Component::grab_focus first; widgets whose bridge doesn't expose Component (the documented GTK4 case) fall back to a pointer click at the widget's centre to drive focus through the input layer, the same way a user would. Set assume_focused: true to skip the whole focus step when the target is already focused. Supports caret_nav/select_all clear modes. |
select_option |
Pick an entry from a Selection-interface container (combo box, list, …) by label or by index |
read_text |
Read an element's text via the Text interface |
type_text |
Type a string into the currently focused element through the input backend |
press_key |
Press a named key or chord (Return, Ctrl+A, Shift+Tab, Escape, …) |
move_pointer |
Move the pointer by a relative offset in logical pixels |
pointer_click |
Press and release a pointer button (defaults to left click) |
take_screenshot |
Capture a PNG via the keepalive ScreenCast stream and return its path |
Selectors use XPath 1.0 against a snapshot of the AT-SPI tree serialized to XML, with role names normalized to PascalCase (e.g. push button → Button). Example XPaths: //Button[@name='OK'], //Text[@name='search'], //MenuItem[contains(@name, 'Mode')], (//Button)[last()].
Each session produces output under a configurable report directory. Screenshots are written as {report_dir}/{session_id}/{session_id}-{n}.png — each session gets its own subdirectory and n increments per take_screenshot call. The base report_dir defaults to /tmp/waydriver and can be overridden with the --report-dir <PATH> CLI flag or the WAYDRIVER_REPORT_DIR environment variable. Individual start_session calls may also pass a report_dir argument to override the server default for that session.
Alongside the screenshots, each session writes:
{session_id}.webm— full-session VP8/WebM recording of the display at 15 fps, finalized with a seekhead onkill_session. On by default; disable per-server with--record-video false/WAYDRIVER_RECORD_VIDEO=false, or per-session withstart_session'srecord_video: false. Bitrate via--video-bitrate <bits/sec>/WAYDRIVER_VIDEO_BITRATE(default2_000_000) or per-sessionvideo_bitrate.events.jsonl— append-only audit log of every session-scoped tool call (action, params, ok/err status, timestamp) at{report_dir}/{session_id}/events.jsonl.events.js— atomic rewrite of the same data aswindow.__events_update([...])for consumption by the viewer.index.html— styled viewer (Tailwind via the Play CDN) that embeds the recording in a<video>tag when present. Reloadsevents.jsevery 2 s via a<script src>swap (which works overfile://unlikefetch), append-only rendering so expanded<details>stay expanded across refreshes. Written once at session start.
start_session's response includes a file:// URL to the session viewer — open it directly from the filesystem in any browser. No HTTP server, no ports, no network access required. Multiple waydriver-mcp instances (different Claude Code tabs / projects) can run side by side without conflict.
waydriver-mcp needs ~8 system services at runtime (mutter, pipewire, wireplumber, dbus, AT-SPI, gstreamer). Installing these manually is fragile and distro-specific. Docker solves four problems:
- Security — the MCP server spawns arbitrary processes, interacts with them via D-Bus, and captures their screen. Running this on your host session gives it access to everything your user can do. Inside a container, it only sees what you explicitly mount — no access to your files, browser sessions, or credentials. Add
--network noneto block network access entirely (the report viewer is purely staticfile://, so it works without any network) - Zero-setup distribution —
docker pulland you're running, no system packages to install - D-Bus isolation — each container gets its own dbus-daemon, so apps with singleton D-Bus activation don't interfere across concurrent test sessions
- ABI compatibility — apps built inside the container are guaranteed to link against the same libraries the MCP runtime uses
Prebuilt images are published to GitHub Container Registry for each release:
| Image | Purpose |
|---|---|
ghcr.io/bohdantkachenko/waydriver-mcp |
Runtime — MCP server with all system deps |
ghcr.io/bohdantkachenko/waydriver-mcp-builder |
Build env — Fedora 42 + Rust + gcc/g++ + meson + cmake + GTK4/GLib dev headers |
docker pull ghcr.io/bohdantkachenko/waydriver-mcp:latest
docker pull ghcr.io/bohdantkachenko/waydriver-mcp-builder:latestUse the builder image to compile your app in a Fedora environment that matches the runtime. The resulting binary is ABI-compatible with the runtime image. See Testing your app below for language-specific build examples.
MCP client config (e.g. .mcp.json for Claude Code):
{
"mcpServers": {
"waydriver-mcp": {
"command": "sh",
"args": ["-c", "docker run --rm -i --network none -v \"$PWD:/workspace:ro\" -v /tmp/waydriver:/tmp/waydriver ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
}
}
}$PWD:/workspace:ro— mounts the project directory so the MCP can launch your app binaries from/workspace//tmp/waydriver:/tmp/waydriver— makes session reports (screenshots, WebM recordings,events.jsonl,index.html) accessible on the host at/tmp/waydriver/. The mount uses the same path on both sides so thefile://URL thatstart_sessionreturns is openable as-is on the host--network none— safe to fully isolate: the report viewer is pure static HTML + JS loaded from your local filesystem
For NixOS users, also mount the Nix store so Nix-built binaries work inside the container:
{
"mcpServers": {
"waydriver-mcp": {
"command": "sh",
"args": ["-c", "docker run --rm -i --network none -v /nix/store:/nix/store:ro -v \"$PWD:/workspace:ro\" -v /tmp/waydriver:/tmp/waydriver ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
}
}
}Or build from source:
docker build -t waydriver-mcp .The MCP server is persistent — it stays up for the entire AI assistant session. You rebuild your app independently, and each start_session call picks up the latest binary from the volume. No MCP restart needed between iterations.
Rust apps — build with the builder image, volume-mount the binary:
docker run --rm -v "$PWD:/src:ro" -v "$PWD/build:/out" \
ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
sh -c "cp -r /src /tmp/build && cd /tmp/build && cargo build --release && cp target/release/myapp /out/"{
"mcpServers": {
"waydriver-mcp": {
"command": "docker",
"args": ["run", "--rm", "-i",
"-v", "/path/to/myapp/build:/workspace:ro",
"ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
}
}
}Then call start_session with command: "/workspace/myapp".
C/C++ apps — the builder image includes gcc, g++, meson, ninja-build, cmake, and GTK4/GLib dev headers:
docker run --rm -v "$PWD:/src:ro" -v "$PWD/build:/out" \
ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
sh -c "cp -r /src /tmp/build && cd /tmp/build && meson setup _build && meson compile -C _build && cp _build/myapp /out/"For extra deps (e.g. libadwaita-devel), extend the builder:
FROM ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest
RUN dnf install -y libadwaita-develNode/Python apps — extend the runtime image to add the interpreter, use a named volume for deps:
FROM ghcr.io/bohdantkachenko/waydriver-mcp:latest
RUN dnf install -y nodejs && dnf clean allInstall deps into a named volume (re-run only when lockfile changes):
docker volume create myapp-nodemods
docker run --rm \
-v "$PWD/package.json:/app/package.json:ro" \
-v "$PWD/package-lock.json:/app/package-lock.json:ro" \
-v "myapp-nodemods:/app/node_modules" \
-w /app \
ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
sh -c "dnf install -y nodejs npm && npm ci --omit=dev"Mount source + deps — edit source freely, MCP picks up changes on next start_session:
"args": ["run", "--rm", "-i",
"-v", "/path/to/myapp/src:/app/src:ro",
"-v", "myapp-nodemods:/app/node_modules:ro",
"myapp-mcp:latest"]NixOS users — mount /nix/store so Nix-built binaries just work:
"args": ["run", "--rm", "-i",
"-v", "/nix/store:/nix/store:ro",
"-v", "/path/to/myapp:/workspace:ro",
"ghcr.io/bohdantkachenko/waydriver-mcp:latest"]For local development without Docker, the Nix app wraps the binary with the required runtime env vars:
nix run .#mcpSessions are kept in an in-memory HashMap keyed by id, so multiple apps can run concurrently within one server process.
All dependencies are provided by the Nix flake (nix develop). If not using Nix, you need the following system packages.
| Debian/Ubuntu | Fedora | Arch |
|---|---|---|
pkg-config |
pkg-config |
pkg-config |
libglib2.0-dev |
glib2-devel |
glib2 |
libgstreamer1.0-dev |
gstreamer1-devel |
gstreamer |
libgstreamer-plugins-base1.0-dev |
gstreamer1-plugins-base-devel |
gst-plugins-base |
| Debian/Ubuntu | Fedora | Arch |
|---|---|---|
mutter |
mutter |
mutter |
pipewire |
pipewire |
pipewire |
wireplumber |
wireplumber |
wireplumber |
gstreamer1.0-plugins-base |
gstreamer1-plugins-base |
gst-plugins-base |
gstreamer1.0-plugins-good |
gstreamer1-plugins-good |
gst-plugins-good |
gstreamer1.0-pipewire |
gstreamer1-plugins-pipewire |
gst-plugin-pipewire |
at-spi2-core |
at-spi2-core |
at-spi2-core |
dbus |
dbus |
dbus |
Quick install:
# Debian/Ubuntu
sudo apt install pkg-config libglib2.0-dev libgstreamer1.0-dev \
libgstreamer-plugins-base1.0-dev mutter pipewire wireplumber \
gstreamer1.0-plugins-base gstreamer1.0-plugins-good \
gstreamer1.0-pipewire at-spi2-core dbus
# Fedora
sudo dnf install pkg-config glib2-devel gstreamer1-devel \
gstreamer1-plugins-base-devel mutter pipewire wireplumber \
gstreamer1-plugins-base gstreamer1-plugins-good \
gstreamer1-plugins-pipewire at-spi2-core dbus
# Arch
sudo pacman -S pkg-config glib2 gstreamer gst-plugins-base \
gst-plugins-good gst-plugin-pipewire mutter pipewire \
wireplumber at-spi2-core dbusIn headless mode, Mutter only composites (and delivers Wayland frame callbacks) when a ScreenCast consumer is pulling frames. Without an active stream, GTK4 apps render their first frame but never repaint — the frame clock never ticks.
Session::start opens a persistent ScreenCast stream that stays alive for the session's lifetime. This keeps Mutter compositing continuously so frame callbacks flow and GTK4 apps repaint normally.
Two input paths are available, with different trade-offs:
-
RemoteDesktop keyboard/pointer (
press_keysym,pointer_button) — events go through the full Wayland input pipeline (Mutter -> Wayland protocol -> GDK -> GTK event loop). GTK4 processes them normally and repaints. Use this for interactions that need to produce visible changes. -
AT-SPI actions (
Locator::click()/focus()/set_text()) — directly invoke widget signal handlers through the accessibility tree, targeted by XPath. Accurate and precise, but they update GTK4's internal model without triggering compositor redraws. Useful for reading the accessibility tree and programmatic activation, but screenshots taken after AT-SPI-only interactions may show stale frames.
Apps are launched with GSETTINGS_BACKEND=keyfile and XDG_CONFIG_HOME pointing to the per-session runtime directory. This bypasses the host dconf daemon entirely, so each session starts with default app state and never reads or writes the user's settings.
GTK4's built-in AT-SPI backend only registers on the host session bus — it ignores custom DBUS_SESSION_BUS_ADDRESS. So each session uses two D-Bus connections:
- Host session bus: AT-SPI communication with the app
- Private D-Bus: Mutter's ScreenCast and RemoteDesktop APIs (isolated from the host compositor)
graph LR
subgraph Host
host_dbus["Host session bus"]
end
subgraph Session["Per-session"]
private_dbus["Private D-Bus"]
mutter["Mutter"]
app["Your app"]
waydriver["WayDriver"]
end
waydriver -- "AT-SPI" --> host_dbus
app -- "AT-SPI register" --> host_dbus
waydriver -- "ScreenCast\nRemoteDesktop" --> private_dbus
mutter -- "org.gnome.Mutter.*" --> private_dbus
graph LR
screencast["Mutter ScreenCast API"]
monitor["RecordMonitor\n(virtual monitor)"]
pipewire["PipeWire stream\n(keepalive)"]
gst_shot["On-demand GStreamer pipeline\n(pngenc snapshot=true)"]
gst_rec["Long-lived GStreamer pipeline\n(vp8enc + webmmux)"]
png["PNG bytes"]
webm["WebM file"]
screencast --> monitor --> pipewire
pipewire --> gst_shot --> png
pipewire --> gst_rec --> webm
The keepalive PipeWire stream doubles as the capture source for both paths. take_screenshot spins up a transient pngenc pipeline on each call; recording runs a single vp8enc ! webmmux ! filesink pipeline for the session's lifetime, flushed with EOS on Session::kill so the WebM is seekable. Both use the GStreamer Rust bindings (gstreamer + gstreamer-app crates) and only gst-plugins-good (no -bad/-ugly).