Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.env*
!.env.example
__pycache__/
*.pyc
.venv/
Expand Down
7 changes: 4 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project
AgentOS — "Fiverr for OpenClaw." Managed platform that packages OpenClaw instances as specialized, containerized AI employees. See `README.md`, `ROADMAP.md`, and `PROJECT_CONTEXT.md` for product context; `LOCAL_SETUP.md` is the authoritative setup guide.

**Status: hackathon mode, post-hackathon depth in progress.** Demo bar was "hired and running"; the Code Review Engineer is now also a differentiated, trust-moated, memory-backed employee. Three of the four "make one employee real" phases have shipped (A: template-driven runtime; B: enforced action policy; D: agent memory + work log). Phase C (autonomous PR-watcher) is next. LLM execution is live end-to-end via OpenClaw + Kimi K2.5.
**Status: hackathon mode, post-hackathon depth in progress.** Demo bar was "hired and running"; the Code Review Engineer is now a differentiated, trust-moated, memory-backed, autonomous employee. All four "make one employee real" phases have shipped: A (template-driven runtime), B (enforced action policy), C (autonomous PR-watcher), D (agent memory + work log). LLM execution is live end-to-end via OpenClaw + Kimi K2.5.

## Architecture (read this before editing)

Expand All @@ -22,7 +22,8 @@ Host (Mac)
- **Each agent container** runs the official OpenClaw gateway as the engine plus a FastAPI sidecar (`backend/agent-runtime/server.py`) on port 8080. The sidecar accepts `POST /task` with a token (`openclaw-internal` by default) and proxies to OpenClaw's OpenAI-compatible `/v1/chat/completions`.
- **Template-driven container shape (Phase A).** The orchestrator base64-encodes the resolved role template into `AGENT_TEMPLATE_B64`; `entrypoint.sh` decodes it, writes `SOUL.md` from the template's `system_prompt`, and installs only the skills the template lists. `resource_limits` from the template cap the Docker container.
- **Agent-side auth + action policy (Phase B).** The orchestrator mints a per-agent bearer token and persists it on `agents.agent_token` while the container runs. Agent-authed gateway endpoints (currently the 4 GitHub ones) use `get_current_agent` (`backend/app/agent_auth.py`) instead of `get_current_user`, and call `require_action` (`backend/app/services/policy.py`) before doing work. Denied-by-default against the role template's `allowed_actions`.
- **Memory + audit log (Phase D).** Per-agent key/value memory (`agent_memory` table) the agent writes via the `update-memory` skill and reads back as injected `role_context` on every dispatch — survives container restarts. Every agent-authed gateway call writes a row to `agent_action_log` (allow + deny). `reviewed_prs` is the dedup table Phase C's watcher will read.
- **Memory + audit log (Phase D).** Per-agent key/value memory (`agent_memory` table) the agent writes via the `update-memory` skill and reads back as injected `role_context` on every dispatch — survives container restarts. Every agent-authed gateway call writes a row to `agent_action_log` (allow + deny). `reviewed_prs` is the dedup table the Phase C watcher reads.
- **Autonomous PR-watcher (Phase C).** FastAPI `lifespan` starts an asyncio poll loop (`backend/app/services/pr_watcher.py`) that scans every running Code Review Engineer's `watched_repos` rows every 120s, lists open PRs via the GitHub API, dedups against `reviewed_prs`, and dispatches a review task per unreviewed PR. Startup staleness gate (~30 min) prevents replaying old PRs; per-(agent, repo) error isolation keeps a bad subscription from breaking others; offboarding releases `watched_repos` rows so a re-hire under the same user can re-take the slot.
- **LLM** is Kimi (Moonshot AI) — `moonshot/kimi-k2.5` — wired via `openclaw.json` inside the agent image. The chat-completions endpoint must be explicitly enabled in that config.
- **Persistence** is Supabase only (users, hired employees, encrypted credentials, per-agent memory + action log + reviewed PRs). User credentials are Fernet-encrypted at rest in `backend/app/services/credential_store.py`; agent tokens are stored plaintext (rotated on stop).
- **OAuth fidelity (hackathon):** GitHub is real OAuth; Slack/Gmail use a simulated consent screen that writes a placeholder token via `POST /credentials`.
Expand Down Expand Up @@ -58,7 +59,7 @@ bun run build # tsc -b && vite build
bun run lint # eslint .
```

Backend tests (125 tests):
Backend tests (143 tests):
```bash
cd backend
arch -arm64 .venv/bin/python -m pytest # all
Expand Down
2 changes: 1 addition & 1 deletion PROJECT_CONTEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ AI employee shows up in their Slack/GitHub/tools
```

### Platform → Agent Communication (Implemented)
Each agent container runs a lightweight FastAPI server on port 8080. The platform dispatches tasks to agents via HTTP POST to the container's internal IP on the Docker bridge network (`openclaw-agents`). All containers run on a single VPS.
Each agent container runs a lightweight FastAPI server on port 8080. The platform dispatches tasks to agents via HTTP POST to the container's internal IP on the Docker bridge network (`openclaw-agents`). For the hackathon everything runs locally on Docker Desktop; VPS / cloud deploy is post-hackathon (`#11`).

```
Platform API Agent Container
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ Full walkthrough is in `LOCAL_SETUP.md`. The short version:

## Tests

Backend has 125 passing unit tests:
Backend has 143 passing unit tests:

```bash
cd backend
Expand All @@ -163,11 +163,11 @@ On Apple Silicon use `arch -arm64 .venv/bin/python -m pytest` instead — see `C
- LLM execution inside containers via OpenClaw + Kimi.
- Hardening pass merged (real password login, OAuth state signing, rate limiting, CORS, error handling, frontend route guards + minimal Vitest suite).

**Code Review Engineer specialization — 3 of 4 phases shipped**
**Code Review Engineer specialization — all 4 phases shipped**
- **Phase A — Template-driven runtime.** The role template shapes the container (system_prompt → SOUL.md, skills filter, resource_limits applied). PR #19.
- **Phase B — Enforced action policy.** Per-agent bearer token + denied-by-default `allowed_actions` check at the gateway. The Code Review Engineer is provably unable to merge/close/push. PR #20.
- **Phase D — Memory & work log.** Per-agent key/value memory (writes via the `update-memory` skill, reads back via dispatch-time injection), full audit log of every agent-authed call (allow + deny), `reviewed_prs` dedup index. PR #25.
- **Phase C — Autonomous PR-watcher.** Next. Polls watched repos and dispatches reviews within minutes.
- **Phase C — Autonomous PR-watcher.** `lifespan`-driven 120s asyncio poll loop scans every running CRE's `watched_repos`, dedups against `reviewed_prs`, and dispatches a review task per unreviewed open PR. PR #29.

**Backlog (post-hackathon)**
- AWS deployment via CDK (#11), frontend do-over (#12), agent memory compaction via LLM reflection (#23), other roles brought up to A/B/D parity (#17), hire/offboard UI (#13), CI for backend + frontend tests (#14).
Expand Down
8 changes: 4 additions & 4 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an
- [x] **OpenClaw + Kimi integration verified end-to-end.** Container builds, gateway starts with Kimi K2.5, real LLM responses.
- [x] **Role definition templates.** Secretary, Code Review Engineer, Customer Support YAMLs.
- [x] **`GET /roles` + `template_loader` service** feeding the talent directory.
- [x] **Local deploy guide + start script.** `LOCAL_SETUP.md` (parts now stale post-Vite migration; tracked in #11) and `start-mac.sh` / `start.sh`.
- [x] **Local deploy guide + start script.** `LOCAL_SETUP.md` (parts now stale post-Vite migration; tracked in #16) and `start-mac.sh` / `start.sh` / `start-mac-compose.sh`.

**Production hardening (PR #2, merged)**
- [x] Real password-based login with bcrypt + SHA-256 pre-hash (sidesteps 72-byte truncation).
Expand All @@ -253,9 +253,9 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an
- [x] **Phase A — Template-driven runtime (#6, PR #19).** Orchestrator base64-encodes the resolved role template into `AGENT_TEMPLATE_B64`; `entrypoint.sh` decodes it, writes `SOUL.md` from `system_prompt`, installs only the skills the template lists, applies `resource_limits`.
- [x] **Phase B — Enforced action policy (#7, PR #20).** Per-agent bearer token persisted on `agents.agent_token`; `get_current_agent` dependency; `require_action` denied-by-default against the template's `allowed_actions`. The four GitHub gateway endpoints converted to agent-auth + policy.
- [x] **Phase D — Memory & work log (#8, PR #25).** `agent_memory` (key/value per agent), `agent_action_log` (allow + deny rows for every agent-authed call), `reviewed_prs` (dedup for Phase C). `update-memory` skill; dispatcher injects memory into `role_context` on every dispatch.
- [ ] **Phase C — Autonomous PR-watcher (#9).** FastAPI `lifespan` poll loop, `watched_repos` table + endpoint, dispatches reviews within minutes, dedups against `reviewed_prs`.
- [x] **Phase C — Autonomous PR-watcher (#9, PR #29).** FastAPI `lifespan`-driven 120s asyncio poll loop scans every running CRE's `watched_repos`, lists open PRs via GitHub, dedups against `reviewed_prs`, dispatches a review task per unreviewed PR. Startup staleness gate (~30 min) prevents replaying old PRs; per-(agent, repo) error isolation; offboarding releases `watched_repos` rows (PR #32) so a re-hire under the same user can re-take the slot.

**Tests:** 125 backend unit tests passing.
**Tests:** 143 backend unit tests passing.

## Backlog (post-hackathon)

Expand All @@ -264,7 +264,7 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an
- **Hire & offboard flow** (#13) — `/directory` lists roles but has no hire action; `hireEmployee()` / `fireEmployee()` exist in `app/src/lib/api.ts` but are never called.
- **CI for backend + frontend tests** (#14) — only Claude review workflows exist today.
- **`backend/.env.example`** (#15) — `LOCAL_SETUP.md` tells contributors to copy it but the file doesn't exist.
- **Reconcile docs with reality** (#11) — `README.md`, `ROADMAP.md`, `PROJECT_CONTEXT.md` partially addressed; `LOCAL_SETUP.md` still references the old Next.js/Compose setup.
- **Reconcile docs with reality** (#16) — `README.md`, `ROADMAP.md`, `PROJECT_CONTEXT.md`, `CLAUDE.md` reflect the shipped phases; `LOCAL_SETUP.md` still references the old Next.js/Compose setup and is the remaining work.
- **Bring Customer Support & Secretary to A/B/D parity** (#17) — they still run the generic container with no enforced boundaries.
- **Stop passing secrets as plaintext Docker env vars** (#18).
- **Agent memory compaction via LLM reflection** (#23) — the interesting moat candidate. Mechanical eviction → heuristic clustering → LLM-driven consolidation.
Expand Down
94 changes: 94 additions & 0 deletions backend/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# AgentOS backend environment.
# Copy this file to backend/.env and fill in the required keys.
# Required keys are marked REQUIRED; everything else has a sensible default.


# ── Supabase (REQUIRED) ───────────────────────────────────────────────────────
# Create a project at supabase.com (free tier is fine), then grab these from
# Project Settings → API. Run backend/migrations/schema.sql in the SQL editor
# before first boot.
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-supabase-anon-key


# ── Encryption (REQUIRED) ─────────────────────────────────────────────────────
# Fernet key used to encrypt OAuth tokens at rest in the credentials table.
# Generate with:
# python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
ENCRYPTION_KEY=generate-with-fernet-and-paste-here


# ── LLM (REQUIRED) ────────────────────────────────────────────────────────────
# Kimi K2.5 via Moonshot AI — the model that runs inside every agent container
# via OpenClaw. Get a key at platform.moonshot.ai.
LLM_API_KEY=your-moonshot-api-key

# Anthropic Claude — used only by the /chat passthrough router. Optional unless
# you exercise that endpoint.
ANTHROPIC_API_KEY=


# ── OAuth: GitHub (REQUIRED for the Code Review Engineer) ─────────────────────
# Register an OAuth app at github.com/settings/developers. For pure-local dev
# you can use http://localhost:8000/api/auth/github/callback as the redirect
# URI and leave BASE_URL unset below. For an ngrok-fronted setup, register the
# ngrok URL and set BASE_URL accordingly.
GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=


# ── OAuth: Slack / Gmail (optional for the hackathon) ─────────────────────────
# Customer Support and Secretary currently use a simulated consent screen that
# writes a placeholder token, so these can stay empty until you wire real OAuth.
# When you do: Slack hard-requires HTTPS for the redirect URI (use ngrok or
# similar) and Google's Web client type does too — register the public URL and
# set BASE_URL.
SLACK_CLIENT_ID=
SLACK_CLIENT_SECRET=
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=


# ── Platform URLs ─────────────────────────────────────────────────────────────
# Public URL where OAuth providers redirect the user's browser. Leave empty for
# a fully-local GitHub OAuth setup (defaults to http://localhost:{PLATFORM_PORT})
# or set to an ngrok URL if your OAuth app is registered against ngrok.
BASE_URL=

# Where the frontend lives. OAuth callbacks redirect here on completion.
FRONTEND_URL=http://localhost:5173

# How agent containers reach the platform's gateway. host.docker.internal is
# correct for Docker Desktop on Mac. Override only if you're on Linux without
# the host.docker.internal helper.
PLATFORM_GATEWAY_URL=http://host.docker.internal:8000/gateway

# Bind interface and port for uvicorn.
PLATFORM_HOST=0.0.0.0
PLATFORM_PORT=8000


# ── Docker ────────────────────────────────────────────────────────────────────
# The bridge network agent containers are attached to. Matches the network
# created by start-mac.sh / start.sh / start-mac-compose.sh.
DOCKER_NETWORK=openclaw-agents

# The image hire flow spawns. Build with:
# docker build -t openclaw/agent:latest backend/agent-runtime/
OPENCLAW_AGENT_IMAGE=openclaw/agent:latest


# ── Operational toggles ───────────────────────────────────────────────────────
# Comma-separated list of allowed CORS origins. Default in code allows the
# Vite dev server; override here for prod or alternate frontends.
CORS_ALLOWED_ORIGINS=http://localhost:5173

# uvicorn / app log level (DEBUG, INFO, WARNING, ERROR).
LOG_LEVEL=INFO

# Set to "false" to disable slowapi rate limiting (the test suite sets this).
RATE_LIMIT_ENABLED=true

# Set to "false" to skip starting the Phase C PR watcher on app startup.
# Useful if you're iterating on routers and don't want background dispatches.
PR_WATCHER_ENABLED=true