Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .kiro/steering/product.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# AppClaw — Product Overview

AppClaw is an agentic AI layer for mobile automation on Android and iOS. Users describe goals in plain English and AppClaw orchestrates device interactions through Appium (via MCP). It supports multiple LLM providers (Anthropic, OpenAI, Google Gemini, Groq, Ollama) via the Vercel AI SDK.

## Core Modes

- **Agent mode** — LLM-driven goal execution (e.g. `appclaw "Send a WhatsApp message to Mom"`)
- **YAML flows** — declarative, zero-LLM automation steps defined in YAML files
- **Playground** — interactive REPL for building flows live on a device
- **Explorer** — generates YAML test flows from a PRD or app description
- **Record/Replay** — capture and adaptively replay goal executions
- **Report** — Express server serving HTML run reports

## Two Agent Modes

- `dom` — uses XML page source and accessibility IDs/XPath to locate elements
- `vision` — screenshot-first using Stark (df-vision + Gemini) for element location

## Perception → Reason → Act Loop

Each step: read screen state → send to LLM → execute action (tap/type/swipe/etc.) → repeat until goal complete or max steps reached.

## Published Artifacts

- **npm package** (`appclaw`) — CLI + SDK
- **VS Code extension** — live multi-device grid view
- **GitHub Action** — CI integration
- **Landing page** — Cloudflare Workers static site
66 changes: 66 additions & 0 deletions .kiro/steering/structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Project Structure

## Root Layout

```
appclaw/
├── src/ # All TypeScript source (compiled → dist/)
├── dist/ # Compiled output (mirrors src/, gitignored)
├── tests/ # Test files (vitest)
├── flows/ # Example YAML flow files
├── examples/ # Example flows and PRDs
├── schemas/ # JSON schemas (flow.schema.json, env.schema.json)
├── skills/ # AI agent skill definitions (generate-appclaw-flow, use-appclaw-cli)
├── bin/ # CLI entry point (bin/appclaw.js)
├── docs/ # QA documentation
├── logs/ # Runtime execution logs (gitignored)
├── vscode-extension/ # VS Code extension (separate package.json + tsconfig)
├── github-action/ # GitHub Action definition
├── landing/ # Cloudflare Workers landing page
└── .appclaw/ # Runtime data: guides/, runs/ (recordings, screenshots)
```

## Source Modules (`src/`)

| Module | Responsibility |
| ---------------- | --------------------------------------------------------------------------------- |
| `index.ts` | CLI entry — routes to all 6 modes based on flags |
| `config.ts` | Zod-validated config from `.env` |
| `constants.ts` | Default models, pricing, stuck detection thresholds |
| `agent/` | Core agent loop, stuck detection, recovery, planner, human-in-the-loop |
| `llm/` | Multi-provider LLM integration — provider factory, prompt builder, action schemas |
| `mcp/` | Appium MCP client — tool calling, element finding, screenshots, keyboard |
| `perception/` | Screen parsing — Android/iOS XML parsers, DOM trimmer, screen diff |
| `vision/` | AI vision element location via Stark (df-vision + Gemini) |
| `flow/` | YAML flow parsing and execution, natural language step handling, parallel runner |
| `device/` | Device setup pipeline — platform/device picker, iOS setup, Appium session |
| `memory/` | Episodic memory — trajectory recording, fingerprinting, retrieval |
| `explorer/` | PRD → YAML flow generation, screen crawler |
| `playground/` | Interactive REPL for building flows |
| `recording/` | Session recorder and adaptive replayer |
| `report/` | Run artifact collection, HTML report rendering, Express server |
| `sdk/` | Public SDK — `GoalRunner`, `FlowRunner`, `StepRunner`, config builder |
| `skills/` | Built-in skill implementations (find-and-tap, read-screen, submit-message) |
| `ui/terminal.ts` | Rich terminal output — spinners, boxes, markdown rendering |
| `appguides/` | App-specific interaction guides |

## Tests (`tests/`)

```
tests/
├── flow/ # Flow parsing and execution unit tests
├── sdk/ # SDK integration tests
├── e2e/ # End-to-end device tests (require connected device)
├── vision/ # Vision module tests
└── flows/ # YAML flow fixtures used by tests
```

## Key Conventions

- Each `src/` subdirectory typically has an `index.ts` as its public interface
- Types are co-located in `types.ts` within each module
- No barrel re-exports at the root `src/` level — import from specific modules
- The SDK (`src/sdk/`) is the only public API surface; everything else is internal
- YAML flows live in `flows/` (project-level) or `examples/flows/` (examples)
- `.appclaw/runs/` stores per-run artifacts: `manifest.json`, `recording.mp4`, step screenshots
- `.appclaw/guides/` stores per-app interaction guides keyed by bundle/package ID
86 changes: 86 additions & 0 deletions .kiro/steering/tech.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Tech Stack

## Language & Runtime

- **TypeScript** (strict mode, ES2022 modules)
- **Node.js** 18+ runtime
- **tsx** for dev/local execution without compiling

## Build System

- **TypeScript compiler** (`tsc`) — outputs to `dist/`, mirrors `src/` structure
- **No bundler** for the main package — pure tsc compilation
- **Vite** available in node_modules (used by VS Code extension)

## Key Libraries

- **Vercel AI SDK** (`ai`, `@ai-sdk/*`) — multi-provider LLM abstraction
- **appium-mcp** — Appium Model Context Protocol server (stdio or SSE transport)
- **@modelcontextprotocol/sdk** — MCP client
- **Zod** — schema validation (config, LLM responses, flow schemas)
- **yaml** — YAML flow file parsing
- **sharp** — image processing for screenshots
- **df-vision** — Stark vision element location (Gemini-backed)
- **dotenv** — `.env` config loading
- **express** — report server
- **hono** — MCP server HTTP layer
- **vitest** — test runner
- **prettier** — code formatting

## LLM Providers

Supported via Vercel AI SDK: `anthropic`, `openai`, `gemini`, `groq`, `ollama`

## Code Style

- Prettier config: single quotes, semi, 100 char print width, 2-space indent, trailing commas (ES5)
- No DI framework — modules import each other directly
- Zod for all external data validation
- Constants and model pricing centralized in `src/constants.ts`

## Common Commands

```bash
# Development
npm start # run via tsx (no compile)
npm start "goal" # run with a goal
npm run dev # run with file watching

# Build & Type Check
npm run build # tsc → dist/
npm run typecheck # type-check only, no emit
npm run lint # alias for typecheck

# Formatting
npm run format # prettier --write
npm run format:check # prettier --check

# Tests
npm test # vitest run tests/flow tests/sdk
npm run test:e2e # vitest run tests/e2e/
npm run test:e2e:android # android e2e with MCP_DEBUG=1
npm run test:watch # vitest watch mode

# VS Code Extension
npm run build:vsix # build .vsix package

# Landing page
npm run deploy:landing # deploy to Cloudflare Workers
```

## Configuration

All runtime config via `.env`, validated by Zod schema in `src/config.ts`. Key variables:

| Variable | Default | Description |
| ---------------- | -------- | ------------------------------------------------- |
| `LLM_PROVIDER` | `gemini` | `anthropic`, `openai`, `gemini`, `groq`, `ollama` |
| `LLM_API_KEY` | — | API key for chosen provider |
| `AGENT_MODE` | `dom` | `dom` or `vision` |
| `PLATFORM` | (prompt) | `android` or `ios` |
| `MAX_STEPS` | `30` | Max steps per goal |
| `CLOUD_PROVIDER` | — | `lambdatest` for remote devices |

## Release

Automated via **semantic-release** with conventional commits. Config in `.releaserc.json`.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

### Bug Fixes

* update docs ([d55a5e3](https://github.com/AppiumTestDistribution/AppClaw/commit/d55a5e3117b9628a065fe48c5392ed1be739424d))
- update docs ([d55a5e3](https://github.com/AppiumTestDistribution/AppClaw/commit/d55a5e3117b9628a065fe48c5392ed1be739424d))

## [1.2.0](https://github.com/AppiumTestDistribution/AppClaw/compare/v1.1.0...v1.2.0) (2026-04-24)

### Features

* Appguide support ([#22](https://github.com/AppiumTestDistribution/AppClaw/issues/22)) ([63e366f](https://github.com/AppiumTestDistribution/AppClaw/commit/63e366feeb1f36c22643fff8d015f5b3b1253f6c))
- Appguide support ([#22](https://github.com/AppiumTestDistribution/AppClaw/issues/22)) ([63e366f](https://github.com/AppiumTestDistribution/AppClaw/commit/63e366feeb1f36c22643fff8d015f5b3b1253f6c))

## [1.1.0](https://github.com/AppiumTestDistribution/AppClaw/compare/v1.0.0...v1.1.0) (2026-04-17)

Expand Down
52 changes: 27 additions & 25 deletions landing/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2056,28 +2056,21 @@ <h2>The agent knows<br />your app before<br />it opens it.</h2>
to reach any setting. No trial-and-error exploration.
</p>
<div class="appguide-apps">
<div class="appguide-app-pill">
<span class="app-icon">✉️</span>Gmail
</div>
<div class="appguide-app-pill">
<span class="app-icon">▶️</span>YouTube
</div>
<div class="appguide-app-pill">
<span class="app-icon">💬</span>WhatsApp
</div>
<div class="appguide-app-pill">
<span class="app-icon">🌐</span>Chrome
</div>
<div class="appguide-app-pill">
<span class="app-icon">⚙️</span>Settings
</div>
<div class="appguide-app-pill"><span class="app-icon">✉️</span>Gmail</div>
<div class="appguide-app-pill"><span class="app-icon">▶️</span>YouTube</div>
<div class="appguide-app-pill"><span class="app-icon">💬</span>WhatsApp</div>
<div class="appguide-app-pill"><span class="app-icon">🌐</span>Chrome</div>
<div class="appguide-app-pill"><span class="app-icon">⚙️</span>Settings</div>
<div class="appguide-app-pill" style="color: var(--text-3); font-style: italic">
+ your app
</div>
</div>
<div class="appguide-hint">
<span class="appguide-hint-icon">📄</span>
<span>Drop <code>.appclaw/guides/&lt;appId&gt;.md</code> to add or override any guide — custom guides always take priority over built-ins.</span>
<span
>Drop <code>.appclaw/guides/&lt;appId&gt;.md</code> to add or override any guide —
custom guides always take priority over built-ins.</span
>
</div>
</div>
<!-- Right: guide card -->
Expand All @@ -2091,19 +2084,28 @@ <h2>The agent knows<br />your app before<br />it opens it.</h2>
</div>
<div class="appguide-card-body">
<span class="ag-head">## WhatsApp Navigation</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">Bottom tabs: Chats | Updates | Communities | Calls</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">New chat: floating pencil icon (bottom-right)</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">Search: magnifying-glass at top of Chats</span><br />
<span class="ag-bullet">-</span>
<span class="ag-text">Bottom tabs: Chats | Updates | Communities | Calls</span><br />
<span class="ag-bullet">-</span>
<span class="ag-text">New chat: floating pencil icon (bottom-right)</span><br />
<span class="ag-bullet">-</span>
<span class="ag-text">Search: magnifying-glass at top of Chats</span><br />
<br />
<span class="ag-head">## Messaging</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">Open a chat → type in message bar → send via arrow</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">Attach media: paperclip icon next to message bar</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">Voice note: long-press the microphone icon</span><br />
<span class="ag-bullet">-</span>
<span class="ag-text">Open a chat → type in message bar → send via arrow</span><br />
<span class="ag-bullet">-</span>
<span class="ag-text">Attach media: paperclip icon next to message bar</span><br />
<span class="ag-bullet">-</span>
<span class="ag-text">Voice note: long-press the microphone icon</span><br />
<br />
<span class="ag-head">## Common Actions</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">Star a message: long-press → star icon</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">Forward: long-press → forward arrow</span><br />
<span class="ag-bullet">-</span> <span class="ag-text">Group info: tap the group name at the top</span>
<span class="ag-bullet">-</span>
<span class="ag-text">Star a message: long-press → star icon</span><br />
<span class="ag-bullet">-</span>
<span class="ag-text">Forward: long-press → forward arrow</span><br />
<span class="ag-bullet">-</span>
<span class="ag-text">Group info: tap the group name at the top</span>
</div>
</div>
</div>
Expand Down
Loading
Loading