From eb87b7c2bd4af0a0f15bb783d2e10df7aebe4ea8 Mon Sep 17 00:00:00 2001
From: Michael Wang <mzw2010@nyu.edu>
Date: Sun, 24 May 2026 22:47:24 -0700
Subject: [PATCH 1/2] docs: reflect Phase C shipped and offboarding fix across
 the md files
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

All four "make one employee real" phases (A/B/C/D) are now shipped, plus
the offboard cleanup from #32. Updates the status paragraphs, phase
checklists, PR refs, test counts (125 → 143), and corrects a stale issue
reference (#11 → #16) for the docs-reconciliation work. Closes #16 in
spirit for the four files touched; LOCAL_SETUP.md drift still tracked there.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 CLAUDE.md          | 7 ++++---
 PROJECT_CONTEXT.md | 2 +-
 README.md          | 6 +++---
 ROADMAP.md         | 8 ++++----
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index b35ddb8..2237aea 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -5,7 +5,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Project
 AgentOS — "Fiverr for OpenClaw." Managed platform that packages OpenClaw instances as specialized, containerized AI employees. See `README.md`, `ROADMAP.md`, and `PROJECT_CONTEXT.md` for product context; `LOCAL_SETUP.md` is the authoritative setup guide.
 
-**Status: hackathon mode, post-hackathon depth in progress.** Demo bar was "hired and running"; the Code Review Engineer is now also a differentiated, trust-moated, memory-backed employee. Three of the four "make one employee real" phases have shipped (A: template-driven runtime; B: enforced action policy; D: agent memory + work log). Phase C (autonomous PR-watcher) is next. LLM execution is live end-to-end via OpenClaw + Kimi K2.5.
+**Status: hackathon mode, post-hackathon depth in progress.** Demo bar was "hired and running"; the Code Review Engineer is now a differentiated, trust-moated, memory-backed, autonomous employee. All four "make one employee real" phases have shipped: A (template-driven runtime), B (enforced action policy), C (autonomous PR-watcher), D (agent memory + work log). LLM execution is live end-to-end via OpenClaw + Kimi K2.5.
 
 ## Architecture (read this before editing)
 
@@ -22,7 +22,8 @@ Host (Mac)
 - **Each agent container** runs the official OpenClaw gateway as the engine plus a FastAPI sidecar (`backend/agent-runtime/server.py`) on port 8080. The sidecar accepts `POST /task` with a token (`openclaw-internal` by default) and proxies to OpenClaw's OpenAI-compatible `/v1/chat/completions`.
 - **Template-driven container shape (Phase A).** The orchestrator base64-encodes the resolved role template into `AGENT_TEMPLATE_B64`; `entrypoint.sh` decodes it, writes `SOUL.md` from the template's `system_prompt`, and installs only the skills the template lists. `resource_limits` from the template cap the Docker container.
 - **Agent-side auth + action policy (Phase B).** The orchestrator mints a per-agent bearer token and persists it on `agents.agent_token` while the container runs. Agent-authed gateway endpoints (currently the 4 GitHub ones) use `get_current_agent` (`backend/app/agent_auth.py`) instead of `get_current_user`, and call `require_action` (`backend/app/services/policy.py`) before doing work. Denied-by-default against the role template's `allowed_actions`.
-- **Memory + audit log (Phase D).** Per-agent key/value memory (`agent_memory` table) the agent writes via the `update-memory` skill and reads back as injected `role_context` on every dispatch — survives container restarts. Every agent-authed gateway call writes a row to `agent_action_log` (allow + deny). `reviewed_prs` is the dedup table Phase C's watcher will read.
+- **Memory + audit log (Phase D).** Per-agent key/value memory (`agent_memory` table) the agent writes via the `update-memory` skill and reads back as injected `role_context` on every dispatch — survives container restarts. Every agent-authed gateway call writes a row to `agent_action_log` (allow + deny). `reviewed_prs` is the dedup table the Phase C watcher reads.
+- **Autonomous PR-watcher (Phase C).** FastAPI `lifespan` starts an asyncio poll loop (`backend/app/services/pr_watcher.py`) that scans every running Code Review Engineer's `watched_repos` rows every 120s, lists open PRs via the GitHub API, dedups against `reviewed_prs`, and dispatches a review task per unreviewed PR. Startup staleness gate (~30 min) prevents replaying old PRs; per-(agent, repo) error isolation keeps a bad subscription from breaking others; offboarding releases `watched_repos` rows so a re-hire under the same user can re-take the slot.
 - **LLM** is Kimi (Moonshot AI) — `moonshot/kimi-k2.5` — wired via `openclaw.json` inside the agent image. The chat-completions endpoint must be explicitly enabled in that config.
 - **Persistence** is Supabase only (users, hired employees, encrypted credentials, per-agent memory + action log + reviewed PRs). User credentials are Fernet-encrypted at rest in `backend/app/services/credential_store.py`; agent tokens are stored plaintext (rotated on stop).
 - **OAuth fidelity (hackathon):** GitHub is real OAuth; Slack/Gmail use a simulated consent screen that writes a placeholder token via `POST /credentials`.
@@ -58,7 +59,7 @@ bun run build                  # tsc -b && vite build
 bun run lint                   # eslint .
 ```
 
-Backend tests (125 tests):
+Backend tests (143 tests):
 ```bash
 cd backend
 arch -arm64 .venv/bin/python -m pytest                              # all
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md
index 4a1b04b..8076b9e 100644
--- a/PROJECT_CONTEXT.md
+++ b/PROJECT_CONTEXT.md
@@ -39,7 +39,7 @@ AI employee shows up in their Slack/GitHub/tools
 ```
 
 ### Platform → Agent Communication (Implemented)
-Each agent container runs a lightweight FastAPI server on port 8080. The platform dispatches tasks to agents via HTTP POST to the container's internal IP on the Docker bridge network (`openclaw-agents`). All containers run on a single VPS.
+Each agent container runs a lightweight FastAPI server on port 8080. The platform dispatches tasks to agents via HTTP POST to the container's internal IP on the Docker bridge network (`openclaw-agents`). For the hackathon everything runs locally on Docker Desktop; VPS / cloud deploy is post-hackathon (`#11`).
 
 ```
 Platform API                    Agent Container
diff --git a/README.md b/README.md
index 4c7e621..59918fc 100644
--- a/README.md
+++ b/README.md
@@ -144,7 +144,7 @@ Full walkthrough is in `LOCAL_SETUP.md`. The short version:
 
 ## Tests
 
-Backend has 125 passing unit tests:
+Backend has 143 passing unit tests:
 
 ```bash
 cd backend
@@ -163,11 +163,11 @@ On Apple Silicon use `arch -arm64 .venv/bin/python -m pytest` instead — see `C
 - LLM execution inside containers via OpenClaw + Kimi.
 - Hardening pass merged (real password login, OAuth state signing, rate limiting, CORS, error handling, frontend route guards + minimal Vitest suite).
 
-**Code Review Engineer specialization — 3 of 4 phases shipped**
+**Code Review Engineer specialization — all 4 phases shipped**
 - **Phase A — Template-driven runtime.** The role template shapes the container (system_prompt → SOUL.md, skills filter, resource_limits applied). PR #19.
 - **Phase B — Enforced action policy.** Per-agent bearer token + denied-by-default `allowed_actions` check at the gateway. The Code Review Engineer is provably unable to merge/close/push. PR #20.
 - **Phase D — Memory & work log.** Per-agent key/value memory (writes via the `update-memory` skill, reads back via dispatch-time injection), full audit log of every agent-authed call (allow + deny), `reviewed_prs` dedup index. PR #25.
-- **Phase C — Autonomous PR-watcher.** Next. Polls watched repos and dispatches reviews within minutes.
+- **Phase C — Autonomous PR-watcher.** `lifespan`-driven 120s asyncio poll loop scans every running CRE's `watched_repos`, dedups against `reviewed_prs`, and dispatches a review task per unreviewed open PR. PR #29.
 
 **Backlog (post-hackathon)**
 - AWS deployment via CDK (#11), frontend do-over (#12), agent memory compaction via LLM reflection (#23), other roles brought up to A/B/D parity (#17), hire/offboard UI (#13), CI for backend + frontend tests (#14).
diff --git a/ROADMAP.md b/ROADMAP.md
index 85dcf69..bca8f82 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -241,7 +241,7 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an
 - [x] **OpenClaw + Kimi integration verified end-to-end.** Container builds, gateway starts with Kimi K2.5, real LLM responses.
 - [x] **Role definition templates.** Secretary, Code Review Engineer, Customer Support YAMLs.
 - [x] **`GET /roles` + `template_loader` service** feeding the talent directory.
-- [x] **Local deploy guide + start script.** `LOCAL_SETUP.md` (parts now stale post-Vite migration; tracked in #11) and `start-mac.sh` / `start.sh`.
+- [x] **Local deploy guide + start script.** `LOCAL_SETUP.md` (parts now stale post-Vite migration; tracked in #16) and `start-mac.sh` / `start.sh` / `start-mac-compose.sh`.
 
 **Production hardening (PR #2, merged)**
 - [x] Real password-based login with bcrypt + SHA-256 pre-hash (sidesteps 72-byte truncation).
@@ -253,9 +253,9 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an
 - [x] **Phase A — Template-driven runtime (#6, PR #19).** Orchestrator base64-encodes the resolved role template into `AGENT_TEMPLATE_B64`; `entrypoint.sh` decodes it, writes `SOUL.md` from `system_prompt`, installs only the skills the template lists, applies `resource_limits`.
 - [x] **Phase B — Enforced action policy (#7, PR #20).** Per-agent bearer token persisted on `agents.agent_token`; `get_current_agent` dependency; `require_action` denied-by-default against the template's `allowed_actions`. The four GitHub gateway endpoints converted to agent-auth + policy.
 - [x] **Phase D — Memory & work log (#8, PR #25).** `agent_memory` (key/value per agent), `agent_action_log` (allow + deny rows for every agent-authed call), `reviewed_prs` (dedup for Phase C). `update-memory` skill; dispatcher injects memory into `role_context` on every dispatch.
-- [ ] **Phase C — Autonomous PR-watcher (#9).** FastAPI `lifespan` poll loop, `watched_repos` table + endpoint, dispatches reviews within minutes, dedups against `reviewed_prs`.
+- [x] **Phase C — Autonomous PR-watcher (#9, PR #29).** FastAPI `lifespan`-driven 120s asyncio poll loop scans every running CRE's `watched_repos`, lists open PRs via GitHub, dedups against `reviewed_prs`, dispatches a review task per unreviewed PR. Startup staleness gate (~30 min) prevents replaying old PRs; per-(agent, repo) error isolation; offboarding releases `watched_repos` rows (PR #32) so a re-hire under the same user can re-take the slot.
 
-**Tests:** 125 backend unit tests passing.
+**Tests:** 143 backend unit tests passing.
 
 ## Backlog (post-hackathon)
 
@@ -264,7 +264,7 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an
 - **Hire & offboard flow** (#13) — `/directory` lists roles but has no hire action; `hireEmployee()` / `fireEmployee()` exist in `app/src/lib/api.ts` but are never called.
 - **CI for backend + frontend tests** (#14) — only Claude review workflows exist today.
 - **`backend/.env.example`** (#15) — `LOCAL_SETUP.md` tells contributors to copy it but the file doesn't exist.
-- **Reconcile docs with reality** (#11) — `README.md`, `ROADMAP.md`, `PROJECT_CONTEXT.md` partially addressed; `LOCAL_SETUP.md` still references the old Next.js/Compose setup.
+- **Reconcile docs with reality** (#16) — `README.md`, `ROADMAP.md`, `PROJECT_CONTEXT.md`, `CLAUDE.md` reflect the shipped phases; `LOCAL_SETUP.md` still references the old Next.js/Compose setup and is the remaining work.
 - **Bring Customer Support & Secretary to A/B/D parity** (#17) — they still run the generic container with no enforced boundaries.
 - **Stop passing secrets as plaintext Docker env vars** (#18).
 - **Agent memory compaction via LLM reflection** (#23) — the interesting moat candidate. Mechanical eviction → heuristic clustering → LLM-driven consolidation.

From 1aa75adc4d98a128cba915bd3c44fa9f00121336 Mon Sep 17 00:00:00 2001
From: Michael Wang <mzw2010@nyu.edu>
Date: Sun, 24 May 2026 22:47:32 -0700
Subject: [PATCH 2/2] chore: add backend/.env.example so LOCAL_SETUP.md step 2
 actually works
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

LOCAL_SETUP.md instructs contributors to `cp backend/.env.example
backend/.env`, but the file did not exist — setup failed at step 2 (#15).
Adds the file with every key from app/config.py plus the four read via
os.getenv (CORS_ALLOWED_ORIGINS, LOG_LEVEL, RATE_LIMIT_ENABLED,
PR_WATCHER_ENABLED), grouped by section with required keys called out.
Placeholder values only. Adds a !.env.example exception to .gitignore so
the new file isn't caught by the .env* rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .gitignore           |  1 +
 backend/.env.example | 94 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)
 create mode 100644 backend/.env.example

diff --git a/.gitignore b/.gitignore
index 311d98d..22c2851 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
 .env*
+!.env.example
 __pycache__/
 *.pyc
 .venv/
diff --git a/backend/.env.example b/backend/.env.example
new file mode 100644
index 0000000..2f4c815
--- /dev/null
+++ b/backend/.env.example
@@ -0,0 +1,94 @@
+# AgentOS backend environment.
+# Copy this file to backend/.env and fill in the required keys.
+# Required keys are marked REQUIRED; everything else has a sensible default.
+
+
+# ── Supabase (REQUIRED) ───────────────────────────────────────────────────────
+# Create a project at supabase.com (free tier is fine), then grab these from
+# Project Settings → API. Run backend/migrations/schema.sql in the SQL editor
+# before first boot.
+SUPABASE_URL=https://your-project.supabase.co
+SUPABASE_KEY=your-supabase-anon-key
+
+
+# ── Encryption (REQUIRED) ─────────────────────────────────────────────────────
+# Fernet key used to encrypt OAuth tokens at rest in the credentials table.
+# Generate with:
+#   python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
+ENCRYPTION_KEY=generate-with-fernet-and-paste-here
+
+
+# ── LLM (REQUIRED) ────────────────────────────────────────────────────────────
+# Kimi K2.5 via Moonshot AI — the model that runs inside every agent container
+# via OpenClaw. Get a key at platform.moonshot.ai.
+LLM_API_KEY=your-moonshot-api-key
+
+# Anthropic Claude — used only by the /chat passthrough router. Optional unless
+# you exercise that endpoint.
+ANTHROPIC_API_KEY=
+
+
+# ── OAuth: GitHub (REQUIRED for the Code Review Engineer) ─────────────────────
+# Register an OAuth app at github.com/settings/developers. For pure-local dev
+# you can use http://localhost:8000/api/auth/github/callback as the redirect
+# URI and leave BASE_URL unset below. For an ngrok-fronted setup, register the
+# ngrok URL and set BASE_URL accordingly.
+GITHUB_CLIENT_ID=
+GITHUB_CLIENT_SECRET=
+
+
+# ── OAuth: Slack / Gmail (optional for the hackathon) ─────────────────────────
+# Customer Support and Secretary currently use a simulated consent screen that
+# writes a placeholder token, so these can stay empty until you wire real OAuth.
+# When you do: Slack hard-requires HTTPS for the redirect URI (use ngrok or
+# similar) and Google's Web client type does too — register the public URL and
+# set BASE_URL.
+SLACK_CLIENT_ID=
+SLACK_CLIENT_SECRET=
+GOOGLE_CLIENT_ID=
+GOOGLE_CLIENT_SECRET=
+
+
+# ── Platform URLs ─────────────────────────────────────────────────────────────
+# Public URL where OAuth providers redirect the user's browser. Leave empty for
+# a fully-local GitHub OAuth setup (defaults to http://localhost:{PLATFORM_PORT})
+# or set to an ngrok URL if your OAuth app is registered against ngrok.
+BASE_URL=
+
+# Where the frontend lives. OAuth callbacks redirect here on completion.
+FRONTEND_URL=http://localhost:5173
+
+# How agent containers reach the platform's gateway. host.docker.internal is
+# correct for Docker Desktop on Mac. Override only if you're on Linux without
+# the host.docker.internal helper.
+PLATFORM_GATEWAY_URL=http://host.docker.internal:8000/gateway
+
+# Bind interface and port for uvicorn.
+PLATFORM_HOST=0.0.0.0
+PLATFORM_PORT=8000
+
+
+# ── Docker ────────────────────────────────────────────────────────────────────
+# The bridge network agent containers are attached to. Matches the network
+# created by start-mac.sh / start.sh / start-mac-compose.sh.
+DOCKER_NETWORK=openclaw-agents
+
+# The image hire flow spawns. Build with:
+#   docker build -t openclaw/agent:latest backend/agent-runtime/
+OPENCLAW_AGENT_IMAGE=openclaw/agent:latest
+
+
+# ── Operational toggles ───────────────────────────────────────────────────────
+# Comma-separated list of allowed CORS origins. Default in code allows the
+# Vite dev server; override here for prod or alternate frontends.
+CORS_ALLOWED_ORIGINS=http://localhost:5173
+
+# uvicorn / app log level (DEBUG, INFO, WARNING, ERROR).
+LOG_LEVEL=INFO
+
+# Set to "false" to disable slowapi rate limiting (the test suite sets this).
+RATE_LIMIT_ENABLED=true
+
+# Set to "false" to skip starting the Phase C PR watcher on app startup.
+# Useful if you're iterating on routers and don't want background dispatches.
+PR_WATCHER_ENABLED=true