From eb87b7c2bd4af0a0f15bb783d2e10df7aebe4ea8 Mon Sep 17 00:00:00 2001 From: Michael Wang Date: Sun, 24 May 2026 22:47:24 -0700 Subject: [PATCH 1/2] docs: reflect Phase C shipped and offboarding fix across the md files MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All four "make one employee real" phases (A/B/C/D) are now shipped, plus the offboard cleanup from #32. Updates the status paragraphs, phase checklists, PR refs, test counts (125 → 143), and corrects a stale issue reference (#11 → #16) for the docs-reconciliation work. Closes #16 in spirit for the four files touched; LOCAL_SETUP.md drift still tracked there. Co-Authored-By: Claude Opus 4.7 --- CLAUDE.md | 7 ++++--- PROJECT_CONTEXT.md | 2 +- README.md | 6 +++--- ROADMAP.md | 8 ++++---- 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index b35ddb8..2237aea 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -5,7 +5,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project AgentOS — "Fiverr for OpenClaw." Managed platform that packages OpenClaw instances as specialized, containerized AI employees. See `README.md`, `ROADMAP.md`, and `PROJECT_CONTEXT.md` for product context; `LOCAL_SETUP.md` is the authoritative setup guide. -**Status: hackathon mode, post-hackathon depth in progress.** Demo bar was "hired and running"; the Code Review Engineer is now also a differentiated, trust-moated, memory-backed employee. Three of the four "make one employee real" phases have shipped (A: template-driven runtime; B: enforced action policy; D: agent memory + work log). Phase C (autonomous PR-watcher) is next. LLM execution is live end-to-end via OpenClaw + Kimi K2.5. +**Status: hackathon mode, post-hackathon depth in progress.** Demo bar was "hired and running"; the Code Review Engineer is now a differentiated, trust-moated, memory-backed, autonomous employee. All four "make one employee real" phases have shipped: A (template-driven runtime), B (enforced action policy), C (autonomous PR-watcher), D (agent memory + work log). LLM execution is live end-to-end via OpenClaw + Kimi K2.5. ## Architecture (read this before editing) @@ -22,7 +22,8 @@ Host (Mac) - **Each agent container** runs the official OpenClaw gateway as the engine plus a FastAPI sidecar (`backend/agent-runtime/server.py`) on port 8080. The sidecar accepts `POST /task` with a token (`openclaw-internal` by default) and proxies to OpenClaw's OpenAI-compatible `/v1/chat/completions`. - **Template-driven container shape (Phase A).** The orchestrator base64-encodes the resolved role template into `AGENT_TEMPLATE_B64`; `entrypoint.sh` decodes it, writes `SOUL.md` from the template's `system_prompt`, and installs only the skills the template lists. `resource_limits` from the template cap the Docker container. - **Agent-side auth + action policy (Phase B).** The orchestrator mints a per-agent bearer token and persists it on `agents.agent_token` while the container runs. Agent-authed gateway endpoints (currently the 4 GitHub ones) use `get_current_agent` (`backend/app/agent_auth.py`) instead of `get_current_user`, and call `require_action` (`backend/app/services/policy.py`) before doing work. Denied-by-default against the role template's `allowed_actions`. -- **Memory + audit log (Phase D).** Per-agent key/value memory (`agent_memory` table) the agent writes via the `update-memory` skill and reads back as injected `role_context` on every dispatch — survives container restarts. Every agent-authed gateway call writes a row to `agent_action_log` (allow + deny). `reviewed_prs` is the dedup table Phase C's watcher will read. +- **Memory + audit log (Phase D).** Per-agent key/value memory (`agent_memory` table) the agent writes via the `update-memory` skill and reads back as injected `role_context` on every dispatch — survives container restarts. Every agent-authed gateway call writes a row to `agent_action_log` (allow + deny). `reviewed_prs` is the dedup table the Phase C watcher reads. +- **Autonomous PR-watcher (Phase C).** FastAPI `lifespan` starts an asyncio poll loop (`backend/app/services/pr_watcher.py`) that scans every running Code Review Engineer's `watched_repos` rows every 120s, lists open PRs via the GitHub API, dedups against `reviewed_prs`, and dispatches a review task per unreviewed PR. Startup staleness gate (~30 min) prevents replaying old PRs; per-(agent, repo) error isolation keeps a bad subscription from breaking others; offboarding releases `watched_repos` rows so a re-hire under the same user can re-take the slot. - **LLM** is Kimi (Moonshot AI) — `moonshot/kimi-k2.5` — wired via `openclaw.json` inside the agent image. The chat-completions endpoint must be explicitly enabled in that config. - **Persistence** is Supabase only (users, hired employees, encrypted credentials, per-agent memory + action log + reviewed PRs). User credentials are Fernet-encrypted at rest in `backend/app/services/credential_store.py`; agent tokens are stored plaintext (rotated on stop). - **OAuth fidelity (hackathon):** GitHub is real OAuth; Slack/Gmail use a simulated consent screen that writes a placeholder token via `POST /credentials`. @@ -58,7 +59,7 @@ bun run build # tsc -b && vite build bun run lint # eslint . ``` -Backend tests (125 tests): +Backend tests (143 tests): ```bash cd backend arch -arm64 .venv/bin/python -m pytest # all diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md index 4a1b04b..8076b9e 100644 --- a/PROJECT_CONTEXT.md +++ b/PROJECT_CONTEXT.md @@ -39,7 +39,7 @@ AI employee shows up in their Slack/GitHub/tools ``` ### Platform → Agent Communication (Implemented) -Each agent container runs a lightweight FastAPI server on port 8080. The platform dispatches tasks to agents via HTTP POST to the container's internal IP on the Docker bridge network (`openclaw-agents`). All containers run on a single VPS. +Each agent container runs a lightweight FastAPI server on port 8080. The platform dispatches tasks to agents via HTTP POST to the container's internal IP on the Docker bridge network (`openclaw-agents`). For the hackathon everything runs locally on Docker Desktop; VPS / cloud deploy is post-hackathon (`#11`). ``` Platform API Agent Container diff --git a/README.md b/README.md index 4c7e621..59918fc 100644 --- a/README.md +++ b/README.md @@ -144,7 +144,7 @@ Full walkthrough is in `LOCAL_SETUP.md`. The short version: ## Tests -Backend has 125 passing unit tests: +Backend has 143 passing unit tests: ```bash cd backend @@ -163,11 +163,11 @@ On Apple Silicon use `arch -arm64 .venv/bin/python -m pytest` instead — see `C - LLM execution inside containers via OpenClaw + Kimi. - Hardening pass merged (real password login, OAuth state signing, rate limiting, CORS, error handling, frontend route guards + minimal Vitest suite). -**Code Review Engineer specialization — 3 of 4 phases shipped** +**Code Review Engineer specialization — all 4 phases shipped** - **Phase A — Template-driven runtime.** The role template shapes the container (system_prompt → SOUL.md, skills filter, resource_limits applied). PR #19. - **Phase B — Enforced action policy.** Per-agent bearer token + denied-by-default `allowed_actions` check at the gateway. The Code Review Engineer is provably unable to merge/close/push. PR #20. - **Phase D — Memory & work log.** Per-agent key/value memory (writes via the `update-memory` skill, reads back via dispatch-time injection), full audit log of every agent-authed call (allow + deny), `reviewed_prs` dedup index. PR #25. -- **Phase C — Autonomous PR-watcher.** Next. Polls watched repos and dispatches reviews within minutes. +- **Phase C — Autonomous PR-watcher.** `lifespan`-driven 120s asyncio poll loop scans every running CRE's `watched_repos`, dedups against `reviewed_prs`, and dispatches a review task per unreviewed open PR. PR #29. **Backlog (post-hackathon)** - AWS deployment via CDK (#11), frontend do-over (#12), agent memory compaction via LLM reflection (#23), other roles brought up to A/B/D parity (#17), hire/offboard UI (#13), CI for backend + frontend tests (#14). diff --git a/ROADMAP.md b/ROADMAP.md index 85dcf69..bca8f82 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -241,7 +241,7 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an - [x] **OpenClaw + Kimi integration verified end-to-end.** Container builds, gateway starts with Kimi K2.5, real LLM responses. - [x] **Role definition templates.** Secretary, Code Review Engineer, Customer Support YAMLs. - [x] **`GET /roles` + `template_loader` service** feeding the talent directory. -- [x] **Local deploy guide + start script.** `LOCAL_SETUP.md` (parts now stale post-Vite migration; tracked in #11) and `start-mac.sh` / `start.sh`. +- [x] **Local deploy guide + start script.** `LOCAL_SETUP.md` (parts now stale post-Vite migration; tracked in #16) and `start-mac.sh` / `start.sh` / `start-mac-compose.sh`. **Production hardening (PR #2, merged)** - [x] Real password-based login with bcrypt + SHA-256 pre-hash (sidesteps 72-byte truncation). @@ -253,9 +253,9 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an - [x] **Phase A — Template-driven runtime (#6, PR #19).** Orchestrator base64-encodes the resolved role template into `AGENT_TEMPLATE_B64`; `entrypoint.sh` decodes it, writes `SOUL.md` from `system_prompt`, installs only the skills the template lists, applies `resource_limits`. - [x] **Phase B — Enforced action policy (#7, PR #20).** Per-agent bearer token persisted on `agents.agent_token`; `get_current_agent` dependency; `require_action` denied-by-default against the template's `allowed_actions`. The four GitHub gateway endpoints converted to agent-auth + policy. - [x] **Phase D — Memory & work log (#8, PR #25).** `agent_memory` (key/value per agent), `agent_action_log` (allow + deny rows for every agent-authed call), `reviewed_prs` (dedup for Phase C). `update-memory` skill; dispatcher injects memory into `role_context` on every dispatch. -- [ ] **Phase C — Autonomous PR-watcher (#9).** FastAPI `lifespan` poll loop, `watched_repos` table + endpoint, dispatches reviews within minutes, dedups against `reviewed_prs`. +- [x] **Phase C — Autonomous PR-watcher (#9, PR #29).** FastAPI `lifespan`-driven 120s asyncio poll loop scans every running CRE's `watched_repos`, lists open PRs via GitHub, dedups against `reviewed_prs`, dispatches a review task per unreviewed PR. Startup staleness gate (~30 min) prevents replaying old PRs; per-(agent, repo) error isolation; offboarding releases `watched_repos` rows (PR #32) so a re-hire under the same user can re-take the slot. -**Tests:** 125 backend unit tests passing. +**Tests:** 143 backend unit tests passing. ## Backlog (post-hackathon) @@ -264,7 +264,7 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an - **Hire & offboard flow** (#13) — `/directory` lists roles but has no hire action; `hireEmployee()` / `fireEmployee()` exist in `app/src/lib/api.ts` but are never called. - **CI for backend + frontend tests** (#14) — only Claude review workflows exist today. - **`backend/.env.example`** (#15) — `LOCAL_SETUP.md` tells contributors to copy it but the file doesn't exist. -- **Reconcile docs with reality** (#11) — `README.md`, `ROADMAP.md`, `PROJECT_CONTEXT.md` partially addressed; `LOCAL_SETUP.md` still references the old Next.js/Compose setup. +- **Reconcile docs with reality** (#16) — `README.md`, `ROADMAP.md`, `PROJECT_CONTEXT.md`, `CLAUDE.md` reflect the shipped phases; `LOCAL_SETUP.md` still references the old Next.js/Compose setup and is the remaining work. - **Bring Customer Support & Secretary to A/B/D parity** (#17) — they still run the generic container with no enforced boundaries. - **Stop passing secrets as plaintext Docker env vars** (#18). - **Agent memory compaction via LLM reflection** (#23) — the interesting moat candidate. Mechanical eviction → heuristic clustering → LLM-driven consolidation. From 1aa75adc4d98a128cba915bd3c44fa9f00121336 Mon Sep 17 00:00:00 2001 From: Michael Wang Date: Sun, 24 May 2026 22:47:32 -0700 Subject: [PATCH 2/2] chore: add backend/.env.example so LOCAL_SETUP.md step 2 actually works MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit LOCAL_SETUP.md instructs contributors to `cp backend/.env.example backend/.env`, but the file did not exist — setup failed at step 2 (#15). Adds the file with every key from app/config.py plus the four read via os.getenv (CORS_ALLOWED_ORIGINS, LOG_LEVEL, RATE_LIMIT_ENABLED, PR_WATCHER_ENABLED), grouped by section with required keys called out. Placeholder values only. Adds a !.env.example exception to .gitignore so the new file isn't caught by the .env* rule. Co-Authored-By: Claude Opus 4.7 --- .gitignore | 1 + backend/.env.example | 94 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 95 insertions(+) create mode 100644 backend/.env.example diff --git a/.gitignore b/.gitignore index 311d98d..22c2851 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ .env* +!.env.example __pycache__/ *.pyc .venv/ diff --git a/backend/.env.example b/backend/.env.example new file mode 100644 index 0000000..2f4c815 --- /dev/null +++ b/backend/.env.example @@ -0,0 +1,94 @@ +# AgentOS backend environment. +# Copy this file to backend/.env and fill in the required keys. +# Required keys are marked REQUIRED; everything else has a sensible default. + + +# ── Supabase (REQUIRED) ─────────────────────────────────────────────────────── +# Create a project at supabase.com (free tier is fine), then grab these from +# Project Settings → API. Run backend/migrations/schema.sql in the SQL editor +# before first boot. +SUPABASE_URL=https://your-project.supabase.co +SUPABASE_KEY=your-supabase-anon-key + + +# ── Encryption (REQUIRED) ───────────────────────────────────────────────────── +# Fernet key used to encrypt OAuth tokens at rest in the credentials table. +# Generate with: +# python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())" +ENCRYPTION_KEY=generate-with-fernet-and-paste-here + + +# ── LLM (REQUIRED) ──────────────────────────────────────────────────────────── +# Kimi K2.5 via Moonshot AI — the model that runs inside every agent container +# via OpenClaw. Get a key at platform.moonshot.ai. +LLM_API_KEY=your-moonshot-api-key + +# Anthropic Claude — used only by the /chat passthrough router. Optional unless +# you exercise that endpoint. +ANTHROPIC_API_KEY= + + +# ── OAuth: GitHub (REQUIRED for the Code Review Engineer) ───────────────────── +# Register an OAuth app at github.com/settings/developers. For pure-local dev +# you can use http://localhost:8000/api/auth/github/callback as the redirect +# URI and leave BASE_URL unset below. For an ngrok-fronted setup, register the +# ngrok URL and set BASE_URL accordingly. +GITHUB_CLIENT_ID= +GITHUB_CLIENT_SECRET= + + +# ── OAuth: Slack / Gmail (optional for the hackathon) ───────────────────────── +# Customer Support and Secretary currently use a simulated consent screen that +# writes a placeholder token, so these can stay empty until you wire real OAuth. +# When you do: Slack hard-requires HTTPS for the redirect URI (use ngrok or +# similar) and Google's Web client type does too — register the public URL and +# set BASE_URL. +SLACK_CLIENT_ID= +SLACK_CLIENT_SECRET= +GOOGLE_CLIENT_ID= +GOOGLE_CLIENT_SECRET= + + +# ── Platform URLs ───────────────────────────────────────────────────────────── +# Public URL where OAuth providers redirect the user's browser. Leave empty for +# a fully-local GitHub OAuth setup (defaults to http://localhost:{PLATFORM_PORT}) +# or set to an ngrok URL if your OAuth app is registered against ngrok. +BASE_URL= + +# Where the frontend lives. OAuth callbacks redirect here on completion. +FRONTEND_URL=http://localhost:5173 + +# How agent containers reach the platform's gateway. host.docker.internal is +# correct for Docker Desktop on Mac. Override only if you're on Linux without +# the host.docker.internal helper. +PLATFORM_GATEWAY_URL=http://host.docker.internal:8000/gateway + +# Bind interface and port for uvicorn. +PLATFORM_HOST=0.0.0.0 +PLATFORM_PORT=8000 + + +# ── Docker ──────────────────────────────────────────────────────────────────── +# The bridge network agent containers are attached to. Matches the network +# created by start-mac.sh / start.sh / start-mac-compose.sh. +DOCKER_NETWORK=openclaw-agents + +# The image hire flow spawns. Build with: +# docker build -t openclaw/agent:latest backend/agent-runtime/ +OPENCLAW_AGENT_IMAGE=openclaw/agent:latest + + +# ── Operational toggles ─────────────────────────────────────────────────────── +# Comma-separated list of allowed CORS origins. Default in code allows the +# Vite dev server; override here for prod or alternate frontends. +CORS_ALLOWED_ORIGINS=http://localhost:5173 + +# uvicorn / app log level (DEBUG, INFO, WARNING, ERROR). +LOG_LEVEL=INFO + +# Set to "false" to disable slowapi rate limiting (the test suite sets this). +RATE_LIMIT_ENABLED=true + +# Set to "false" to skip starting the Phase C PR watcher on app startup. +# Useful if you're iterating on routers and don't want background dispatches. +PR_WATCHER_ENABLED=true