Skip to content

lessenings-prog/OpenGUI

Repository files navigation

OpenGUI

A Windows-native automation service for LLM agents. .NET 9, sub-ms IPC, named-pipe protocol. Full perception stack: UIA → OCR → SendInput with divergence detection and crash recovery.

.NET Version Platform CI

OpenGUI demo


Quick Start

# 1. Install (copies binaries, starts service, configures auto-start on login)
powershell -ExecutionPolicy Bypass -File dist\OpenGUI\install.ps1

# 2. Verify it's running
python -c "from reliable_bridge import send; print(send('status'))"
# → {"ok": true, "data": {"version": "2.9.3", "platform": "Windows NT 10.0"}}
from reliable_bridge import send

# Open an app
send("open", {"path": "notepad.exe"})

# Type text
send("type", {"text": "Hello from OpenGUI"})

# Click an element by text
send("click-text", {"text": "Save"})

# Detect dialogs
send("detect-modals")
# → {"modal_detected": true, "modals": [{"title": "Save As", ...}]}

# Full picture of the desktop
send("get-actionable")
# → ranked list of interactable elements with bounds, types, confidence

Or talk over the named pipe directly:

// Send over \\.\pipe\OpenGUI.Service
{"action": "status", "args": {}}

// Response
{"ok": true, "data": {"version": "2.9.3", "screen": [1920, 1200], "platform": "Microsoft Windows NT 10.0.26200.0"}}

Architecture

flowchart TB
    Agent["LLM Agent / Agent Loop<br/>agent_loop.py"]
    Bridge["Python Client<br/>reliable_bridge.py"]
    subgraph Service["OpenGUI Service"]
        IPC["Named Pipe Server<br/>JSON dispatch<br/>~0.1ms routing"]
        Perception["Perception Stack<br/>• UIA CacheRequest<br/>• Win32 SendInput<br/>• StateEngine diff"]
        Handlers["Handlers<br/>• UiaHandler<br/>• IntelligenceHandler<br/>• EdgeCaseHandler<br/>• Handler"]
        Journal["Runtime Journal<br/>journal-YYYY-MM-DD.jsonl"]
    end
    subgraph OS["Windows OS"]
        UIA["UI Automation API<br/>CacheRequest + TreeWalker"]
        Win32["Win32 API<br/>SendInput, GetForegroundWindow<br/>EnumWindows"]
        GDI["GDI+ / DirectComposition"]
    end

    Agent <-->|NamedPipe JSON| IPC
    Bridge <-->|NamedPipe| IPC
    IPC --> Perception
    IPC --> Handlers
    Perception --> UIA
    Handlers --> Win32
    Handlers --> GDI
    IPC -.-> Journal
Loading

Process Model

┌──────────────────────────────┐
│  Agent Loop (agent_loop.py)  │  Observe → Plan → Execute → Verify → Adapt
│  ─────────────────────────   │  One step at a time. Retries on failure.
│  • Loads goal.json           │  Shell action for native commands.
│  • Walks plan steps          │  Reconnects on service death (30s retry).
│  • Calls OpenGUI via pipe    │
└──────────┬───────────────────┘
           │ named pipe: \\.\pipe\OpenGUI.Service
           ▼
┌──────────────────────────────┐
│  OpenGUI Service             │  Single-instance daemon.
│  ─────────────────────────   │  Handles ~20 commands.
│  • SendInput keyboard/mouse  │  Divergence detection.
│  • UIA tree traversal        │  Self-cleaning on kill.
│  • Modal detection           │
│  • Runtime journal           │
└──────────────────────────────┘

Why OpenGUI

LLMs need to interact with desktop applications — not just browsers, but Notepad, Settings, Word, Excel, file dialogs, installers. OpenGUI provides a unified layer:

  • Semantic — find elements by text, not coordinates
  • Reliable — UIA tree primary, coordinates fallback
  • Crash-safe — reconnects automatically after service restart
  • Fast — ~3ms for hotkey dispatch, ~11ms for modal scan
  • LLM-optimized — structured JSON with confidence, diff, divergence metadata

📖 **How AI agents connect to OpenGUI →

Compared to other tools

Feature OpenGUI AutoIt/AHK UiPath WinAppDriver
LLM-native JSON API
Out-of-process (no injection)
Named pipe IPC (~0.1ms) N/A
Divergence detection ✅ (v2.7) Partial
SendInput with timing control
Crash reconnection
Auto-cleanup (orphaned apps)

Commands

Status & Info

Command Description
status Version, platform, screen size
get-active-window Foreground window identity
get-actionable Ranked interactable elements
journal-recent Recent runtime journal entries

Input

Command Description Notes
click Mouse click at (x, y) SendInput absolute
type Type Unicode text SendInput unicode method
key Single key press SendInput
hotkey Modifier+key combo 30ms inter-event delays
scroll Mouse wheel Positive = down
hover Move cursor to (x, y)
click-text Click element by text May be ambiguous in save dialogs — prefer hotkey({"keys": "Alt+S"}) for Notepad

App Lifecycle

Command Description
open Launch executable
focus Bring window to foreground

Perception

Command Description
capture-state Snapshot current UIA tree
verify-action Check action succeeded (returns ActionTruth)
wait-stable Poll until UI settles
detect-modals Scan for Save, UAC, error dialogs

Dialog Handling

Command Description
resolve-overwrite-dialog Handle file replace/rename/cancel
save-as Navigate save dialog
save-with-overwrite Save + overwrite resolution

Performance

Metric Value
Command dispatch ~3ms (hotkey), ~10ms (detect-modals)
UIA tree walk (detect-modals) ~11ms (CacheRequest)
SendInput hotkey ~120ms (4 events × 30ms delays)
Wait-stable (stable UI) ~500ms (2 polls at 150ms)
Journal query <1ms (sequential log)

Phase 3 — Closed-Loop Agent

agent_loop.py is a self-contained goal-driven agent that runs on top of OpenGUI.

# Run a single test
python py/agent_loop.py py/test1_happy_path.json

# Run all 7 Phase 3 tests
for f in py/test*.json; do
  python py/agent_loop.py "$f"
done
Test Description Steps Time
1 Happy path: create & save file 9 ~8s
2 Overwrite existing file 12 ~8s
3 Focus theft recovery 9 ~4s
4 Modal stack: unsaved changes + overwrite 15 ~16s
5 Stale identity: close & reopen window 9 ~5s
6 Wrong-directory save detection 12 ~11s
7 Service kill + restart + reconnect 15 ~10s

All 7 tests pass fully automated (no operator input) — committed to CI.

How the agent loop works

┌──────────┐ ┌──────┐ ┌─────────┐ ┌──────┐ ┌──────┐
│ Observe  │→│ Plan │→│ Execute │→│Verify│→│ Adapt│
└──────────┘ └──────┘ └─────────┘ └──────┘ └──────┘
                                                      │
                                                  ┌───┘
                                                  ▼
                                            (repeat or complete)

Each step:

  1. Observe — capture baseline state, journal sequence ID
  2. Plan — load next step from companion .plan.json file
  3. Execute — send command via named pipe; shell action for native OS commands
  4. Verify — call verify-action on the service; returns ActionTruth with outcome, confidence, divergence
  5. Adapt — handle failures: transient retry (2s × 15 = 30s for service death), fallback chain, abort after 3 consecutive failures

Test Results

C# Test Suites (TestClient)

Suite Tests Passed Rate
Hardening (UIA, input, edge cases) 14 14 100%
Timeout Governor (ceiling enforcement) 5 5 100%
Stress (20-burst concurrency) 6 6 100%
Singleton Race (dual launch) 5 5 100%
Overlay Lifecycle (crash recovery) 3 3 100%
Focus-Safe Typing (theft detection) 3 3 100%
Phase 2 Total 36 36 100%

Phase 3 Agent Loop Tests

Test Steps Time Status
1: Happy Path 9 8s
2: Overwrite 12 8s
3: Focus Theft 9 4s
4: Modal Stack 15 16s
5: Stale Identity 9 5s
6: Silent Corruption 12 11s
7: Supervisor Stress 15 10s
Total 81 ~62s 7/7 ✅

Project Structure

OpenGUI/
├── src/
│   ├── OpenGUI.Service/        # Main service (IPC, perception, handlers, Win32)
│   │   ├── Handlers/           # EdgeCase, Intelligence, Uia, Handler
│   │   ├── Native/             # Win32 P/Invoke (SendInput, UIA, window management)
│   │   ├── Perception/         # ActionTruth, ActionResult, StateEngine, DivergenceEngine
│   │   └── app.manifest        # Windows manifest (asInvoker)
│   ├── OpenGUI.Common/         # Shared types, protocol models
│   ├── OpenGUI.Watchdog/       # External crash recovery (job objects, overlays)
│   ├── OpenGUI.OverlayBroker/  # Overlay window management
│   ├── OpenGUI.TestClient/     # Test harness (hardening, timeout, Phase 2)
│   └── OpenGUI.Control/        # WPF control library
├── py/
│   ├── agent_loop.py           # Phase 3 closed-loop agent
│   ├── reliable_bridge.py      # Python named-pipe client
│   ├── scripts/
│   │   └── restart_service.py  # Kill + restart service helper
│   └── test*_*.json            # Test goals (1-7)
│   └── test*_*.plan.json       # Test plans (1-7)
├── docs/
│   ├── OPERATING_MODEL.md      # Runtime architecture, IPC, lifecycle, roadmap
│   ├── AGENTS.md               # How AI agents connect and use OpenGUI
│   └── PHASE_*.md              # Phase design docs
├── .github/workflows/
│   └── ci.yml                  # CI: build → start service → Python + C# tests
├── dist/                       # Build artifacts (published binaries)
├── OpenGUI.sln                 # .NET solution
├── README.md
└── CONTRIBUTING.md

Building

# Build Debug
dotnet build OpenGUI.sln

# Publish Release (self-contained)
dotnet publish src/OpenGUI.Service/OpenGUI.Service.csproj -c Release --nologo

# Deploy to dist/
rm -rf dist/OpenGUI/bin
cp -r src/OpenGUI.Service/bin/Release/net9.0-windows/win-x64/publish/ dist/OpenGUI/bin
cp py/agent_loop.py py/reliable_bridge.py py/test*.json py/test*.plan.json dist/OpenGUI/py/

# Run C# test suites
dotnet run --project src/OpenGUI.TestClient -- --hardening
dotnet run --project src/OpenGUI.TestClient -- --timeout
dotnet run --project src/OpenGUI.TestClient -- --all-phase2

# Run Phase 3 agent loop
python py/agent_loop.py py/test1_happy_path.json

License

MIT

About

Windows-native desktop automation service for LLM agents — named-pipe IPC, UIA perception, O-P-E-V-R verification loop

Topics

Resources

Contributing

Stars

Watchers

Forks

Contributors

Languages