diff --git a/.kiro/steering/product.md b/.kiro/steering/product.md new file mode 100644 index 0000000..0bd7d6f --- /dev/null +++ b/.kiro/steering/product.md @@ -0,0 +1,28 @@ +# AppClaw — Product Overview + +AppClaw is an agentic AI layer for mobile automation on Android and iOS. Users describe goals in plain English and AppClaw orchestrates device interactions through Appium (via MCP). It supports multiple LLM providers (Anthropic, OpenAI, Google Gemini, Groq, Ollama) via the Vercel AI SDK. + +## Core Modes + +- **Agent mode** — LLM-driven goal execution (e.g. `appclaw "Send a WhatsApp message to Mom"`) +- **YAML flows** — declarative, zero-LLM automation steps defined in YAML files +- **Playground** — interactive REPL for building flows live on a device +- **Explorer** — generates YAML test flows from a PRD or app description +- **Record/Replay** — capture and adaptively replay goal executions +- **Report** — Express server serving HTML run reports + +## Two Agent Modes + +- `dom` — uses XML page source and accessibility IDs/XPath to locate elements +- `vision` — screenshot-first using Stark (df-vision + Gemini) for element location + +## Perception → Reason → Act Loop + +Each step: read screen state → send to LLM → execute action (tap/type/swipe/etc.) → repeat until goal complete or max steps reached. + +## Published Artifacts + +- **npm package** (`appclaw`) — CLI + SDK +- **VS Code extension** — live multi-device grid view +- **GitHub Action** — CI integration +- **Landing page** — Cloudflare Workers static site diff --git a/.kiro/steering/structure.md b/.kiro/steering/structure.md new file mode 100644 index 0000000..1b6bcc3 --- /dev/null +++ b/.kiro/steering/structure.md @@ -0,0 +1,66 @@ +# Project Structure + +## Root Layout + +``` +appclaw/ +├── src/ # All TypeScript source (compiled → dist/) +├── dist/ # Compiled output (mirrors src/, gitignored) +├── tests/ # Test files (vitest) +├── flows/ # Example YAML flow files +├── examples/ # Example flows and PRDs +├── schemas/ # JSON schemas (flow.schema.json, env.schema.json) +├── skills/ # AI agent skill definitions (generate-appclaw-flow, use-appclaw-cli) +├── bin/ # CLI entry point (bin/appclaw.js) +├── docs/ # QA documentation +├── logs/ # Runtime execution logs (gitignored) +├── vscode-extension/ # VS Code extension (separate package.json + tsconfig) +├── github-action/ # GitHub Action definition +├── landing/ # Cloudflare Workers landing page +└── .appclaw/ # Runtime data: guides/, runs/ (recordings, screenshots) +``` + +## Source Modules (`src/`) + +| Module | Responsibility | +| ---------------- | --------------------------------------------------------------------------------- | +| `index.ts` | CLI entry — routes to all 6 modes based on flags | +| `config.ts` | Zod-validated config from `.env` | +| `constants.ts` | Default models, pricing, stuck detection thresholds | +| `agent/` | Core agent loop, stuck detection, recovery, planner, human-in-the-loop | +| `llm/` | Multi-provider LLM integration — provider factory, prompt builder, action schemas | +| `mcp/` | Appium MCP client — tool calling, element finding, screenshots, keyboard | +| `perception/` | Screen parsing — Android/iOS XML parsers, DOM trimmer, screen diff | +| `vision/` | AI vision element location via Stark (df-vision + Gemini) | +| `flow/` | YAML flow parsing and execution, natural language step handling, parallel runner | +| `device/` | Device setup pipeline — platform/device picker, iOS setup, Appium session | +| `memory/` | Episodic memory — trajectory recording, fingerprinting, retrieval | +| `explorer/` | PRD → YAML flow generation, screen crawler | +| `playground/` | Interactive REPL for building flows | +| `recording/` | Session recorder and adaptive replayer | +| `report/` | Run artifact collection, HTML report rendering, Express server | +| `sdk/` | Public SDK — `GoalRunner`, `FlowRunner`, `StepRunner`, config builder | +| `skills/` | Built-in skill implementations (find-and-tap, read-screen, submit-message) | +| `ui/terminal.ts` | Rich terminal output — spinners, boxes, markdown rendering | +| `appguides/` | App-specific interaction guides | + +## Tests (`tests/`) + +``` +tests/ +├── flow/ # Flow parsing and execution unit tests +├── sdk/ # SDK integration tests +├── e2e/ # End-to-end device tests (require connected device) +├── vision/ # Vision module tests +└── flows/ # YAML flow fixtures used by tests +``` + +## Key Conventions + +- Each `src/` subdirectory typically has an `index.ts` as its public interface +- Types are co-located in `types.ts` within each module +- No barrel re-exports at the root `src/` level — import from specific modules +- The SDK (`src/sdk/`) is the only public API surface; everything else is internal +- YAML flows live in `flows/` (project-level) or `examples/flows/` (examples) +- `.appclaw/runs/` stores per-run artifacts: `manifest.json`, `recording.mp4`, step screenshots +- `.appclaw/guides/` stores per-app interaction guides keyed by bundle/package ID diff --git a/.kiro/steering/tech.md b/.kiro/steering/tech.md new file mode 100644 index 0000000..04bb062 --- /dev/null +++ b/.kiro/steering/tech.md @@ -0,0 +1,86 @@ +# Tech Stack + +## Language & Runtime + +- **TypeScript** (strict mode, ES2022 modules) +- **Node.js** 18+ runtime +- **tsx** for dev/local execution without compiling + +## Build System + +- **TypeScript compiler** (`tsc`) — outputs to `dist/`, mirrors `src/` structure +- **No bundler** for the main package — pure tsc compilation +- **Vite** available in node_modules (used by VS Code extension) + +## Key Libraries + +- **Vercel AI SDK** (`ai`, `@ai-sdk/*`) — multi-provider LLM abstraction +- **appium-mcp** — Appium Model Context Protocol server (stdio or SSE transport) +- **@modelcontextprotocol/sdk** — MCP client +- **Zod** — schema validation (config, LLM responses, flow schemas) +- **yaml** — YAML flow file parsing +- **sharp** — image processing for screenshots +- **df-vision** — Stark vision element location (Gemini-backed) +- **dotenv** — `.env` config loading +- **express** — report server +- **hono** — MCP server HTTP layer +- **vitest** — test runner +- **prettier** — code formatting + +## LLM Providers + +Supported via Vercel AI SDK: `anthropic`, `openai`, `gemini`, `groq`, `ollama` + +## Code Style + +- Prettier config: single quotes, semi, 100 char print width, 2-space indent, trailing commas (ES5) +- No DI framework — modules import each other directly +- Zod for all external data validation +- Constants and model pricing centralized in `src/constants.ts` + +## Common Commands + +```bash +# Development +npm start # run via tsx (no compile) +npm start "goal" # run with a goal +npm run dev # run with file watching + +# Build & Type Check +npm run build # tsc → dist/ +npm run typecheck # type-check only, no emit +npm run lint # alias for typecheck + +# Formatting +npm run format # prettier --write +npm run format:check # prettier --check + +# Tests +npm test # vitest run tests/flow tests/sdk +npm run test:e2e # vitest run tests/e2e/ +npm run test:e2e:android # android e2e with MCP_DEBUG=1 +npm run test:watch # vitest watch mode + +# VS Code Extension +npm run build:vsix # build .vsix package + +# Landing page +npm run deploy:landing # deploy to Cloudflare Workers +``` + +## Configuration + +All runtime config via `.env`, validated by Zod schema in `src/config.ts`. Key variables: + +| Variable | Default | Description | +| ---------------- | -------- | ------------------------------------------------- | +| `LLM_PROVIDER` | `gemini` | `anthropic`, `openai`, `gemini`, `groq`, `ollama` | +| `LLM_API_KEY` | — | API key for chosen provider | +| `AGENT_MODE` | `dom` | `dom` or `vision` | +| `PLATFORM` | (prompt) | `android` or `ios` | +| `MAX_STEPS` | `30` | Max steps per goal | +| `CLOUD_PROVIDER` | — | `lambdatest` for remote devices | + +## Release + +Automated via **semantic-release** with conventional commits. Config in `.releaserc.json`. diff --git a/CHANGELOG.md b/CHANGELOG.md index e7f4120..6f6dc17 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,13 +2,13 @@ ### Bug Fixes -* update docs ([d55a5e3](https://github.com/AppiumTestDistribution/AppClaw/commit/d55a5e3117b9628a065fe48c5392ed1be739424d)) +- update docs ([d55a5e3](https://github.com/AppiumTestDistribution/AppClaw/commit/d55a5e3117b9628a065fe48c5392ed1be739424d)) ## [1.2.0](https://github.com/AppiumTestDistribution/AppClaw/compare/v1.1.0...v1.2.0) (2026-04-24) ### Features -* Appguide support ([#22](https://github.com/AppiumTestDistribution/AppClaw/issues/22)) ([63e366f](https://github.com/AppiumTestDistribution/AppClaw/commit/63e366feeb1f36c22643fff8d015f5b3b1253f6c)) +- Appguide support ([#22](https://github.com/AppiumTestDistribution/AppClaw/issues/22)) ([63e366f](https://github.com/AppiumTestDistribution/AppClaw/commit/63e366feeb1f36c22643fff8d015f5b3b1253f6c)) ## [1.1.0](https://github.com/AppiumTestDistribution/AppClaw/compare/v1.0.0...v1.1.0) (2026-04-17) diff --git a/landing/index.html b/landing/index.html index 56ec340..f951677 100644 --- a/landing/index.html +++ b/landing/index.html @@ -2056,28 +2056,21 @@
.appclaw/guides/<appId>.md to add or override any guide — custom guides always take priority over built-ins.
+ Drop .appclaw/guides/<appId>.md to add or override any guide —
+ custom guides always take priority over built-ins.
- App Guides (AppGuides) are per-app knowledge snippets injected directly into the - agent's context window at the start of every automation run. They encode navigation - patterns, gesture shortcuts, and common action paths for a specific app — so the agent - never needs to rediscover them by trial and error. + App Guides (AppGuides) are per-app knowledge snippets injected directly into the agent's + context window at the start of every automation run. They encode navigation patterns, + gesture shortcuts, and common action paths for a specific app — so the agent never needs + to rediscover them by trial and error.
APP_GUIDE (WhatsApp): ++APP_GUIDE (WhatsApp): ## WhatsApp Navigation - Bottom tabs: Chats | Updates | Communities | Calls @@ -3493,14 +3494,15 @@+- Voice note: long-press the microphone iconHow it works
## Messaging - Open a chat → type in the message bar at the bottom → send via arrow icon - Attach media: paperclip icon next to message bar -- Voice note: long-press the microphone icon
.appclaw/guides/<appId>.md (highest priority, overrides built-ins)
+ Custom guide — .appclaw/guides/<appId>.md (highest
+ priority, overrides built-ins)
AppClaw ships with guides for the most commonly automated apps on both Android and iOS. - These activate automatically when AppClaw detects the matching package name or bundle ID. + These activate automatically when AppClaw detects the matching package name or bundle + ID.