OpenGUI

A Windows-native automation service for LLM agents. .NET 9, sub-ms IPC, named-pipe protocol. Full perception stack: UIA → OCR → SendInput with divergence detection and crash recovery.

Quick Start

# 1. Install (copies binaries, starts service, configures auto-start on login)
powershell -ExecutionPolicy Bypass -File dist\OpenGUI\install.ps1

# 2. Verify it's running
python -c "from reliable_bridge import send; print(send('status'))"
# → {"ok": true, "data": {"version": "2.9.3", "platform": "Windows NT 10.0"}}

from reliable_bridge import send

# Open an app
send("open", {"path": "notepad.exe"})

# Type text
send("type", {"text": "Hello from OpenGUI"})

# Click an element by text
send("click-text", {"text": "Save"})

# Detect dialogs
send("detect-modals")
# → {"modal_detected": true, "modals": [{"title": "Save As", ...}]}

# Full picture of the desktop
send("get-actionable")
# → ranked list of interactable elements with bounds, types, confidence

Or talk over the named pipe directly:

// Send over \\.\pipe\OpenGUI.Service
{"action": "status", "args": {}}

// Response
{"ok": true, "data": {"version": "2.9.3", "screen": [1920, 1200], "platform": "Microsoft Windows NT 10.0.26200.0"}}

Architecture

flowchart TB
    Agent["LLM Agent / Agent Loop<br/>agent_loop.py"]
    Bridge["Python Client<br/>reliable_bridge.py"]
    subgraph Service["OpenGUI Service"]
        IPC["Named Pipe Server<br/>JSON dispatch<br/>~0.1ms routing"]
        Perception["Perception Stack<br/>• UIA CacheRequest<br/>• Win32 SendInput<br/>• StateEngine diff"]
        Handlers["Handlers<br/>• UiaHandler<br/>• IntelligenceHandler<br/>• EdgeCaseHandler<br/>• Handler"]
        Journal["Runtime Journal<br/>journal-YYYY-MM-DD.jsonl"]
    end
    subgraph OS["Windows OS"]
        UIA["UI Automation API<br/>CacheRequest + TreeWalker"]
        Win32["Win32 API<br/>SendInput, GetForegroundWindow<br/>EnumWindows"]
        GDI["GDI+ / DirectComposition"]
    end

    Agent <-->|NamedPipe JSON| IPC
    Bridge <-->|NamedPipe| IPC
    IPC --> Perception
    IPC --> Handlers
    Perception --> UIA
    Handlers --> Win32
    Handlers --> GDI
    IPC -.-> Journal

Process Model

┌──────────────────────────────┐
│  Agent Loop (agent_loop.py)  │  Observe → Plan → Execute → Verify → Adapt
│  ─────────────────────────   │  One step at a time. Retries on failure.
│  • Loads goal.json           │  Shell action for native commands.
│  • Walks plan steps          │  Reconnects on service death (30s retry).
│  • Calls OpenGUI via pipe    │
└──────────┬───────────────────┘
           │ named pipe: \\.\pipe\OpenGUI.Service
           ▼
┌──────────────────────────────┐
│  OpenGUI Service             │  Single-instance daemon.
│  ─────────────────────────   │  Handles ~20 commands.
│  • SendInput keyboard/mouse  │  Divergence detection.
│  • UIA tree traversal        │  Self-cleaning on kill.
│  • Modal detection           │
│  • Runtime journal           │
└──────────────────────────────┘

Why OpenGUI

LLMs need to interact with desktop applications — not just browsers, but Notepad, Settings, Word, Excel, file dialogs, installers. OpenGUI provides a unified layer:

Semantic — find elements by text, not coordinates
Reliable — UIA tree primary, coordinates fallback
Crash-safe — reconnects automatically after service restart
Fast — ~3ms for hotkey dispatch, ~11ms for modal scan
LLM-optimized — structured JSON with confidence, diff, divergence metadata

📖 **How AI agents connect to OpenGUI →

Compared to other tools

Feature	OpenGUI	AutoIt/AHK	UiPath	WinAppDriver
LLM-native JSON API	✅	❌	❌	❌
Out-of-process (no injection)	✅	❌	✅	❌
Named pipe IPC (~0.1ms)	✅	N/A	❌	❌
Divergence detection	✅ (v2.7)	❌	Partial	❌
SendInput with timing control	✅	❌	❌	❌
Crash reconnection	✅	❌	❌	❌
Auto-cleanup (orphaned apps)	✅	❌	❌	❌

Commands

Status & Info

Command	Description
`status`	Version, platform, screen size
`get-active-window`	Foreground window identity
`get-actionable`	Ranked interactable elements
`journal-recent`	Recent runtime journal entries

Input

Command	Description	Notes
`click`	Mouse click at (x, y)	SendInput absolute
`type`	Type Unicode text	SendInput unicode method
`key`	Single key press	SendInput
`hotkey`	Modifier+key combo	30ms inter-event delays
`scroll`	Mouse wheel	Positive = down
`hover`	Move cursor to (x, y)
`click-text`	Click element by text	May be ambiguous in save dialogs — prefer `hotkey({"keys": "Alt+S"})` for Notepad

App Lifecycle

Command	Description
`open`	Launch executable
`focus`	Bring window to foreground

Perception

Command	Description
`capture-state`	Snapshot current UIA tree
`verify-action`	Check action succeeded (returns ActionTruth)
`wait-stable`	Poll until UI settles
`detect-modals`	Scan for Save, UAC, error dialogs

Dialog Handling

Command	Description
`resolve-overwrite-dialog`	Handle file replace/rename/cancel
`save-as`	Navigate save dialog
`save-with-overwrite`	Save + overwrite resolution

Performance

Metric	Value
Command dispatch	~3ms (hotkey), ~10ms (detect-modals)
UIA tree walk (detect-modals)	~11ms (CacheRequest)
SendInput hotkey	~120ms (4 events × 30ms delays)
Wait-stable (stable UI)	~500ms (2 polls at 150ms)
Journal query	<1ms (sequential log)

Phase 3 — Closed-Loop Agent

agent_loop.py is a self-contained goal-driven agent that runs on top of OpenGUI.

# Run a single test
python py/agent_loop.py py/test1_happy_path.json

# Run all 7 Phase 3 tests
for f in py/test*.json; do
  python py/agent_loop.py "$f"
done

Test	Description	Steps	Time
1	Happy path: create & save file	9	~8s
2	Overwrite existing file	12	~8s
3	Focus theft recovery	9	~4s
4	Modal stack: unsaved changes + overwrite	15	~16s
5	Stale identity: close & reopen window	9	~5s
6	Wrong-directory save detection	12	~11s
7	Service kill + restart + reconnect	15	~10s

All 7 tests pass fully automated (no operator input) — committed to CI.

How the agent loop works

┌──────────┐ ┌──────┐ ┌─────────┐ ┌──────┐ ┌──────┐
│ Observe  │→│ Plan │→│ Execute │→│Verify│→│ Adapt│
└──────────┘ └──────┘ └─────────┘ └──────┘ └──────┘
                                                      │
                                                  ┌───┘
                                                  ▼
                                            (repeat or complete)

Each step:

Observe — capture baseline state, journal sequence ID
Plan — load next step from companion .plan.json file
Execute — send command via named pipe; shell action for native OS commands
Verify — call verify-action on the service; returns ActionTruth with outcome, confidence, divergence
Adapt — handle failures: transient retry (2s × 15 = 30s for service death), fallback chain, abort after 3 consecutive failures

Test Results

C# Test Suites (TestClient)

Suite	Tests	Passed	Rate
Hardening (UIA, input, edge cases)	14	14	100%
Timeout Governor (ceiling enforcement)	5	5	100%
Stress (20-burst concurrency)	6	6	100%
Singleton Race (dual launch)	5	5	100%
Overlay Lifecycle (crash recovery)	3	3	100%
Focus-Safe Typing (theft detection)	3	3	100%
Phase 2 Total	36	36	100%

Phase 3 Agent Loop Tests

Test	Steps	Time	Status
1: Happy Path	9	8s	✅
2: Overwrite	12	8s	✅
3: Focus Theft	9	4s	✅
4: Modal Stack	15	16s	✅
5: Stale Identity	9	5s	✅
6: Silent Corruption	12	11s	✅
7: Supervisor Stress	15	10s	✅
Total	81	~62s	7/7 ✅

Project Structure

OpenGUI/
├── src/
│   ├── OpenGUI.Service/        # Main service (IPC, perception, handlers, Win32)
│   │   ├── Handlers/           # EdgeCase, Intelligence, Uia, Handler
│   │   ├── Native/             # Win32 P/Invoke (SendInput, UIA, window management)
│   │   ├── Perception/         # ActionTruth, ActionResult, StateEngine, DivergenceEngine
│   │   └── app.manifest        # Windows manifest (asInvoker)
│   ├── OpenGUI.Common/         # Shared types, protocol models
│   ├── OpenGUI.Watchdog/       # External crash recovery (job objects, overlays)
│   ├── OpenGUI.OverlayBroker/  # Overlay window management
│   ├── OpenGUI.TestClient/     # Test harness (hardening, timeout, Phase 2)
│   └── OpenGUI.Control/        # WPF control library
├── py/
│   ├── agent_loop.py           # Phase 3 closed-loop agent
│   ├── reliable_bridge.py      # Python named-pipe client
│   ├── scripts/
│   │   └── restart_service.py  # Kill + restart service helper
│   └── test*_*.json            # Test goals (1-7)
│   └── test*_*.plan.json       # Test plans (1-7)
├── docs/
│   ├── OPERATING_MODEL.md      # Runtime architecture, IPC, lifecycle, roadmap
│   ├── AGENTS.md               # How AI agents connect and use OpenGUI
│   └── PHASE_*.md              # Phase design docs
├── .github/workflows/
│   └── ci.yml                  # CI: build → start service → Python + C# tests
├── dist/                       # Build artifacts (published binaries)
├── OpenGUI.sln                 # .NET solution
├── README.md
└── CONTRIBUTING.md

Building

# Build Debug
dotnet build OpenGUI.sln

# Publish Release (self-contained)
dotnet publish src/OpenGUI.Service/OpenGUI.Service.csproj -c Release --nologo

# Deploy to dist/
rm -rf dist/OpenGUI/bin
cp -r src/OpenGUI.Service/bin/Release/net9.0-windows/win-x64/publish/ dist/OpenGUI/bin
cp py/agent_loop.py py/reliable_bridge.py py/test*.json py/test*.plan.json dist/OpenGUI/py/

# Run C# test suites
dotnet run --project src/OpenGUI.TestClient -- --hardening
dotnet run --project src/OpenGUI.TestClient -- --timeout
dotnet run --project src/OpenGUI.TestClient -- --all-phase2

# Run Phase 3 agent loop
python py/agent_loop.py py/test1_happy_path.json

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
$HERMES_KANBAN_WORKSPACE		$HERMES_KANBAN_WORKSPACE
.githooks		.githooks
.github/workflows		.github/workflows
QuickTest		QuickTest
dist/OpenGUI		dist/OpenGUI
docs		docs
py		py
scripts		scripts
skills		skills
src		src
test-ocr		test-ocr
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
OpenGUI.sln		OpenGUI.sln
README.md		README.md
VERSIONING.md		VERSIONING.md
benchmark.ps1		benchmark.ps1
install.ps1		install.ps1
registry.json		registry.json
test-ocr.csx		test-ocr.csx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenGUI

Quick Start

Architecture

Process Model

Why OpenGUI

Compared to other tools

Commands

Status & Info

Input

App Lifecycle

Perception

Dialog Handling

Performance

Phase 3 — Closed-Loop Agent

How the agent loop works

Test Results

C# Test Suites (TestClient)

Phase 3 Agent Loop Tests

Project Structure

Building

License

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenGUI

Quick Start

Architecture

Process Model

Why OpenGUI

Compared to other tools

Commands

Status & Info

Input

App Lifecycle

Perception

Dialog Handling

Performance

Phase 3 — Closed-Loop Agent

How the agent loop works

Test Results

C# Test Suites (TestClient)

Phase 3 Agent Loop Tests

Project Structure

Building

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages