From a68a0941f35ad7c6318ee6dc75ff245f175e78b0 Mon Sep 17 00:00:00 2001 From: Michael Wang Date: Sat, 23 May 2026 16:35:56 -0700 Subject: [PATCH 1/2] feat: agent memory & work log (Phase D) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the memory loop and persists the audit trail. Phase B left a deny-only log line in policy.require_action; Phase D promotes that stub to a persisted agent_action_log row covering both allow and deny. Adds a per-agent key/value memory store the agent writes via a new update-memory skill and reads back as injected role_context on every dispatch — knowledge now survives container restarts. Pulls Phase C's reviewed_prs dedup index forward so the autonomous PR-review loop can build on a stable base. Schema (migration 004 + schema.sql): - agent_memory (agent_id, key, value, updated_at) — unique(agent_id, key) - agent_action_log (agent_id, action, outcome, metadata jsonb, created_at) with a CHECK constraint pinning outcome to allowed | denied - reviewed_prs (agent_id, owner, repo, pr_number, reviewed_at) with unique(agent_id, owner, repo, pr_number) - RLS closed-by-default policies, mirrored across both files Models: - AgentMemoryModel.get / upsert / list_by_agent / delete - ActionLogModel.record / list_by_agent - ReviewedPRModel.exists / record / list_by_agent Policy: - require_action persists allow + deny rows. Audit writes are best-effort — a DB hiccup logs locally and lets the request proceed, trading audit completeness for availability; the policy decision itself never depends on the DB. Dispatcher: - dispatch_task now loads the calling agent's memory and injects it into role_context (best-effort; a memory-load failure does not block dispatch). Gateway: - GET / POST /gateway/memory — agent-authed, policy-gated via agent.memory.{read,write}. The agent's own row only — memory is scoped per agent_id. - POST /github/review now inserts a reviewed_prs row on success, guarded by exists() so re-reviews don't double-write. Skill + template: - New update-memory skill — agent POSTs key/value pairs via the agent bearer token. Server-side persistence is what lets preferences survive container restart (SOUL.md cannot). - code-review-engineer.yaml: skills += update-memory; allowed_actions += agent.memory.read, agent.memory.write. Tests: 124/124 pass. 11 new — policy audit allow + deny + audit-failure swallowed; memory write/read/auth/policy/empty-key; reviewed_prs records on success + skips on dup; dispatcher injects memory + best- effort failure. Live smoke test against the running backend (12/12 green): write/read round-trip, upsert idempotent on key, 401 on bad/missing token, 403 when a secretary role hits /memory, agent_action_log carries the allow + deny rows with role metadata. Closes #8. Co-Authored-By: Claude Opus 4.7 --- .../templates/code-review-engineer.yaml | 3 + .../skills/update-memory/SKILL.md | 45 ++++++++ backend/app/models/action_log.py | 52 +++++++++ backend/app/models/agent_memory.py | 70 ++++++++++++ backend/app/models/reviewed_pr.py | 59 ++++++++++ backend/app/routers/gateway.py | 44 +++++++- backend/app/services/dispatcher.py | 19 +++- backend/app/services/policy.py | 27 ++++- .../migrations/004_code_review_engineer.sql | 55 ++++++++++ backend/migrations/schema.sql | 45 ++++++++ backend/tests/test_dispatcher.py | 46 ++++++++ backend/tests/test_gateway.py | 102 ++++++++++++++++-- backend/tests/test_policy.py | 41 +++++++ 13 files changed, 597 insertions(+), 11 deletions(-) create mode 100644 backend/agent-runtime/skills/update-memory/SKILL.md create mode 100644 backend/app/models/action_log.py create mode 100644 backend/app/models/agent_memory.py create mode 100644 backend/app/models/reviewed_pr.py diff --git a/backend/agent-config/templates/code-review-engineer.yaml b/backend/agent-config/templates/code-review-engineer.yaml index 3e1ee6a..60cf411 100644 --- a/backend/agent-config/templates/code-review-engineer.yaml +++ b/backend/agent-config/templates/code-review-engineer.yaml @@ -7,6 +7,7 @@ required_tools: skills: - github-list-prs - github-pr-review + - update-memory system_prompt: | You are a senior code review engineer. Your responsibilities: @@ -24,6 +25,8 @@ allowed_actions: - github.pr.comment - github.review.submit - github.repo.read + - agent.memory.read + - agent.memory.write resource_limits: mem_limit: "512m" diff --git a/backend/agent-runtime/skills/update-memory/SKILL.md b/backend/agent-runtime/skills/update-memory/SKILL.md new file mode 100644 index 0000000..c2190e7 --- /dev/null +++ b/backend/agent-runtime/skills/update-memory/SKILL.md @@ -0,0 +1,45 @@ +--- +name: update_memory +description: Persist a preference or learned fact across sessions so future tasks can use it. +metadata: + { "openclaw": { "requires": { "bins": ["curl"] } } } +--- + +# Update Memory + +Use this skill to save something you have learned about how the user wants you to work — a style preference, a project convention, a person's role, anything you would want to remember next time. The value is written to the platform's memory store, scoped to you, and will be injected back into your context on the next task. SOUL.md cannot persist across container restarts; this skill is how you carry knowledge forward. + +## Save a memory + +``` +exec curl -s -X POST "${PLATFORM_GATEWAY_URL}/memory" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer ${AGENT_TOKEN}" \ + -d '{"key": "KEY", "value": "VALUE"}' +``` + +## Read your stored memory + +``` +exec curl -s "${PLATFORM_GATEWAY_URL}/memory" \ + -H "Authorization: Bearer ${AGENT_TOKEN}" +``` + +## Parameters +- `KEY`: a stable, kebab-case-ish identifier (e.g. `style.tone`, `repos.acme-frontend.lang`, `people.alice.role`). Reusing a key overwrites the previous value. +- `VALUE`: plain text. Keep it short and self-contained — one sentence to a short paragraph. + +## When to use +- A user corrected your style — save the corrected style so you don't repeat the mistake. +- You learned a project-level convention (preferred review tone, files to skip, urgency rules). +- A user named a person, repo, or system you didn't know about. + +## When not to use +- Per-task scratchpad details (those live only in the current task). +- Anything secret or sensitive — memory is stored in the platform DB, not encrypted at rest. +- Information you can re-derive trivially from the task input. + +## Important +- Choose stable keys. Bad: `note-from-2026-05-23`. Good: `style.review.tone`. +- Prefer overwriting an existing key to creating a near-duplicate one. +- If memory grows large, the platform may consolidate it on your behalf — write atomic, well-scoped facts so consolidation can do something useful with them. diff --git a/backend/app/models/action_log.py b/backend/app/models/action_log.py new file mode 100644 index 0000000..ce013ed --- /dev/null +++ b/backend/app/models/action_log.py @@ -0,0 +1,52 @@ +"""Agent action log — the append-only audit stream the gateway writes a row +to for every agent-authed call. Allows are logged alongside denials so the +log is a full work history, not just a violation list. + +Phase B left a `logger.warning` stub on the deny path; Phase D promotes it +to persisted rows here, and extends coverage to the allow path too. The +work-log surface and the future LLM reflection (issue #23) both read from +this table. +""" + +from app.database import get_supabase + +TABLE = "agent_action_log" + + +class ActionLogModel: + @staticmethod + def record( + agent_id: str, + action: str, + outcome: str, + metadata: dict | None = None, + ) -> dict: + """Insert one audit row. + + `outcome` is "allowed" or "denied" (matches the DB check constraint). + `metadata` carries free-form context — e.g. the role at the time, the + endpoint, request shape. Kept jsonb so the schema doesn't churn as we + add fields. + """ + data = { + "agent_id": agent_id, + "action": action, + "outcome": outcome, + "metadata": metadata or {}, + } + result = get_supabase().table(TABLE).insert(data).execute() + return result.data[0] + + @staticmethod + def list_by_agent(agent_id: str, limit: int = 100) -> list[dict]: + """Return recent rows for an agent, newest first.""" + result = ( + get_supabase() + .table(TABLE) + .select("*") + .eq("agent_id", agent_id) + .order("created_at", desc=True) + .limit(limit) + .execute() + ) + return result.data diff --git a/backend/app/models/agent_memory.py b/backend/app/models/agent_memory.py new file mode 100644 index 0000000..668b6f0 --- /dev/null +++ b/backend/app/models/agent_memory.py @@ -0,0 +1,70 @@ +"""Per-agent memory — the key/value store the agent writes via the +update-memory skill and reads back as injected role_context at dispatch. + +Rows are scoped to (agent_id, key). Last-write-wins on the same key +(updated_at refreshes); the row count grows with the number of distinct +preferences, not with task volume. Compaction strategies are tracked in +issue #23. +""" + +from datetime import datetime, timezone + +from app.database import get_supabase + +TABLE = "agent_memory" + + +class AgentMemoryModel: + @staticmethod + def list_by_agent(agent_id: str) -> list[dict]: + """Return every memory row for an agent, newest write first.""" + result = ( + get_supabase() + .table(TABLE) + .select("*") + .eq("agent_id", agent_id) + .order("updated_at", desc=True) + .execute() + ) + return result.data + + @staticmethod + def get(agent_id: str, key: str) -> dict | None: + result = ( + get_supabase() + .table(TABLE) + .select("*") + .eq("agent_id", agent_id) + .eq("key", key) + .execute() + ) + return result.data[0] if result.data else None + + @staticmethod + def upsert(agent_id: str, key: str, value: str) -> dict: + """Set the value for (agent_id, key), refreshing updated_at.""" + data = { + "agent_id": agent_id, + "key": key, + "value": value, + "updated_at": datetime.now(timezone.utc).isoformat(), + } + result = ( + get_supabase() + .table(TABLE) + .upsert(data, on_conflict="agent_id,key") + .execute() + ) + return result.data[0] + + @staticmethod + def delete(agent_id: str, key: str) -> bool: + result = ( + get_supabase() + .table(TABLE) + .delete() + .eq("agent_id", agent_id) + .eq("key", key) + .execute() + ) + return len(result.data) > 0 diff --git a/backend/app/models/reviewed_pr.py b/backend/app/models/reviewed_pr.py new file mode 100644 index 0000000..22b2252 --- /dev/null +++ b/backend/app/models/reviewed_pr.py @@ -0,0 +1,59 @@ +"""Dedup index for PRs an agent has already reviewed. + +Phase C's poll loop reads from this table to skip PRs it has already +dispatched a review for, so the watcher's natural 120s tick doesn't +re-review the same PR every cycle. Written server-side by the gateway +on a successful POST /github/review — never by the watcher, never by +a skill. Insert-only; rows persist as the audit trail of what was +reviewed (read: "this is the agent's PR history"). +""" + +from app.database import get_supabase + +TABLE = "reviewed_prs" + + +class ReviewedPRModel: + @staticmethod + def exists(agent_id: str, owner: str, repo: str, pr_number: int) -> bool: + result = ( + get_supabase() + .table(TABLE) + .select("id") + .eq("agent_id", agent_id) + .eq("owner", owner) + .eq("repo", repo) + .eq("pr_number", pr_number) + .execute() + ) + return bool(result.data) + + @staticmethod + def record(agent_id: str, owner: str, repo: str, pr_number: int) -> dict: + """Insert a row marking a PR as reviewed for this agent. + + The (agent_id, owner, repo, pr_number) unique constraint makes + re-inserts idempotent at the DB level; this method assumes the + caller has not already inserted the same row. + """ + data = { + "agent_id": agent_id, + "owner": owner, + "repo": repo, + "pr_number": pr_number, + } + result = get_supabase().table(TABLE).insert(data).execute() + return result.data[0] + + @staticmethod + def list_by_agent(agent_id: str, limit: int = 100) -> list[dict]: + result = ( + get_supabase() + .table(TABLE) + .select("*") + .eq("agent_id", agent_id) + .order("reviewed_at", desc=True) + .limit(limit) + .execute() + ) + return result.data diff --git a/backend/app/routers/gateway.py b/backend/app/routers/gateway.py index d6de63c..9af216d 100644 --- a/backend/app/routers/gateway.py +++ b/backend/app/routers/gateway.py @@ -5,6 +5,8 @@ from pydantic import BaseModel from app.auth import get_current_user from app.agent_auth import get_current_agent +from app.models.agent_memory import AgentMemoryModel +from app.models.reviewed_pr import ReviewedPRModel from app.services.gateway import GatewayService from app.services.policy import require_action from app.services.credential_store import CredentialStore @@ -52,6 +54,11 @@ class DigestRequest(BaseModel): channel: str = "#agentos" +class MemoryWriteRequest(BaseModel): + key: str + value: str + + # ── Write endpoints ──────────────────────────────────────────────────────────── @router.post("/email/send") @@ -117,13 +124,24 @@ async def create_pr_review( ): require_action(agent, "github.review.submit") try: - return await GatewayService.create_pr_review( + result = await GatewayService.create_pr_review( agent["user_id"], payload.owner, payload.repo, payload.pull_number, payload.body, payload.event, ) except ValueError as e: raise HTTPException(400, str(e)) + # The review landed on GitHub — record dedup so Phase C's watcher + # doesn't re-review this PR on the next tick. Idempotent at the + # DB layer via the (agent_id, owner, repo, pr_number) unique constraint. + if not ReviewedPRModel.exists( + agent["id"], payload.owner, payload.repo, payload.pull_number + ): + ReviewedPRModel.record( + agent["id"], payload.owner, payload.repo, payload.pull_number + ) + return result + @router.post("/github/review/comment") async def create_pr_review_comment( @@ -140,6 +158,30 @@ async def create_pr_review_comment( raise HTTPException(400, str(e)) +# ── Agent memory (agent-token auth + action policy) ──────────────────────────── +# The agent reads its own key/value store (injected into role_context at dispatch +# in Phase C) and writes via the update-memory skill. Memory rows are scoped to +# the calling agent_id — an agent cannot see or write another agent's memory. + +@router.get("/memory") +async def list_memory(agent: dict = Depends(get_current_agent)): + require_action(agent, "agent.memory.read") + rows = AgentMemoryModel.list_by_agent(agent["id"]) + return {"memory": [{"key": r["key"], "value": r["value"], "updated_at": r["updated_at"]} for r in rows]} + + +@router.post("/memory") +async def write_memory( + payload: MemoryWriteRequest, + agent: dict = Depends(get_current_agent), +): + require_action(agent, "agent.memory.write") + if not payload.key: + raise HTTPException(400, "key is required") + row = AgentMemoryModel.upsert(agent["id"], payload.key, payload.value) + return {"key": row["key"], "value": row["value"], "updated_at": row["updated_at"]} + + @router.post("/discord/message") async def send_discord_message( payload: DiscordRequest, diff --git a/backend/app/services/dispatcher.py b/backend/app/services/dispatcher.py index afe7370..ff85927 100644 --- a/backend/app/services/dispatcher.py +++ b/backend/app/services/dispatcher.py @@ -1,14 +1,29 @@ """Platform-side service that sends tasks to agent containers over HTTP.""" +import logging import uuid import httpx from docker.errors import NotFound from app.config import get_settings +from app.models.agent_memory import AgentMemoryModel from app.services.orchestrator import Orchestrator AGENT_PORT = 8080 +logger = logging.getLogger(__name__) + + +def _load_memory(agent_id: str) -> dict: + """Return the agent's persisted memory as a {key: value} dict for injection + into role_context. Compaction strategies (LRU / LLM reflection) land here + later — see issue #23. Best-effort: a DB hiccup must not block dispatch.""" + try: + rows = AgentMemoryModel.list_by_agent(agent_id) + except Exception as exc: # noqa: BLE001 — best-effort + logger.warning("dispatcher: memory load failed for agent=%s: %s", agent_id, exc) + return {} + return {row["key"]: row["value"] for row in rows} class Dispatcher: @@ -20,10 +35,12 @@ async def dispatch_task( self, agent_id: str, instruction: str, metadata: dict | None = None ) -> dict: container_ip = self._orch.get_container_ip(agent_id) + # Inject the agent's persisted memory into role_context so it sees + # back what update-memory wrote on previous tasks. task_payload = { "task_id": str(uuid.uuid4()), "instruction": instruction, - "role_context": {}, + "role_context": {"memory": _load_memory(agent_id)}, "metadata": metadata or {}, } diff --git a/backend/app/services/policy.py b/backend/app/services/policy.py index 48a92f6..5d38411 100644 --- a/backend/app/services/policy.py +++ b/backend/app/services/policy.py @@ -13,17 +13,38 @@ from fastapi import HTTPException +from app.models.action_log import ActionLogModel from app.services.template_loader import load_template logger = logging.getLogger(__name__) +def _audit(agent_id: str | None, action: str, outcome: str, role: str) -> None: + """Persist one agent_action_log row; never raise into the request path. + + The audit write must not break the underlying request — a DB hiccup + should log locally and let the gateway response proceed normally. + """ + if not agent_id: + return + try: + ActionLogModel.record( + agent_id=agent_id, + action=action, + outcome=outcome, + metadata={"role": role}, + ) + except Exception as exc: # noqa: BLE001 — best-effort audit + logger.warning("policy: audit write failed (%s): %s", outcome, exc) + + def require_action(agent: dict, action: str) -> None: """Raise HTTP 403 unless the agent's role template permits ``action``. Denied-by-default: anything not explicitly listed in the role template's - ``allowed_actions`` is refused. A denial is recorded as an audit log line - — Phase D promotes this stub to a persisted ``agent_action_log`` row. + ``allowed_actions`` is refused. Both outcomes write a row to + ``agent_action_log`` — Phase D promoted Phase B's deny-only log line to + a persisted audit stream that covers allow + deny. """ role = agent.get("role", "") try: @@ -37,6 +58,7 @@ def require_action(agent: dict, action: str) -> None: "policy: DENY agent=%s role=%s action=%s (not in allowed_actions)", agent.get("id"), role, action, ) + _audit(agent.get("id"), action, "denied", role) raise HTTPException( 403, f"Action '{action}' is not permitted for role '{role}'" ) @@ -45,3 +67,4 @@ def require_action(agent: dict, action: str) -> None: "policy: ALLOW agent=%s role=%s action=%s", agent.get("id"), role, action, ) + _audit(agent.get("id"), action, "allowed", role) diff --git a/backend/migrations/004_code_review_engineer.sql b/backend/migrations/004_code_review_engineer.sql index a194e18..590daff 100644 --- a/backend/migrations/004_code_review_engineer.sql +++ b/backend/migrations/004_code_review_engineer.sql @@ -10,3 +10,58 @@ alter table agents add column if not exists agent_token text; create unique index if not exists idx_agents_agent_token on agents(agent_token); + +-- ── Phase D — agent memory & work log ──────────────────────────────────────── +-- Three tables added together. agent_memory is the per-agent key/value store +-- the update-memory skill writes to, injected into role_context at dispatch. +-- agent_action_log is the append-only audit stream every agent-authed gateway +-- call writes a row to (allow + deny). reviewed_prs is Phase C's dedup index, +-- written server-side on a successful POST /github/review. + +create table if not exists agent_memory ( + id uuid primary key default gen_random_uuid(), + agent_id uuid references agents(id) on delete cascade not null, + key text not null, + value text not null, + updated_at timestamptz default now(), + unique(agent_id, key) +); + +create index if not exists idx_agent_memory_agent_id on agent_memory(agent_id); + +create table if not exists agent_action_log ( + id uuid primary key default gen_random_uuid(), + agent_id uuid references agents(id) on delete cascade not null, + action text not null, + outcome text not null check (outcome in ('allowed', 'denied')), + metadata jsonb default '{}'::jsonb, + created_at timestamptz default now() +); + +create index if not exists idx_agent_action_log_agent_id on agent_action_log(agent_id); +create index if not exists idx_agent_action_log_created_at on agent_action_log(created_at desc); + +create table if not exists reviewed_prs ( + id uuid primary key default gen_random_uuid(), + agent_id uuid references agents(id) on delete cascade not null, + owner text not null, + repo text not null, + pr_number int not null, + reviewed_at timestamptz default now(), + unique(agent_id, owner, repo, pr_number) +); + +create index if not exists idx_reviewed_prs_agent_id on reviewed_prs(agent_id); + +-- RLS — same closed-by-default pattern as the existing tables. +alter table agent_memory enable row level security; +alter table agent_action_log enable row level security; +alter table reviewed_prs enable row level security; + +drop policy if exists "Service role full access on agent_memory" on agent_memory; +drop policy if exists "Service role full access on agent_action_log" on agent_action_log; +drop policy if exists "Service role full access on reviewed_prs" on reviewed_prs; + +create policy "Service role full access on agent_memory" on agent_memory for all using (true); +create policy "Service role full access on agent_action_log" on agent_action_log for all using (true); +create policy "Service role full access on reviewed_prs" on reviewed_prs for all using (true); diff --git a/backend/migrations/schema.sql b/backend/migrations/schema.sql index 03ccbce..f80cab0 100644 --- a/backend/migrations/schema.sql +++ b/backend/migrations/schema.sql @@ -46,17 +46,62 @@ alter table credentials drop constraint if exists credentials_service_check; alter table credentials add constraint credentials_service_check check (service in ('gmail', 'slack', 'discord', 'github', 'hubspot')); +-- ── Agent memory, action log, reviewed PRs ────────────────────────────────── +create table if not exists agent_memory ( + id uuid primary key default gen_random_uuid(), + agent_id uuid references agents(id) on delete cascade not null, + key text not null, + value text not null, + updated_at timestamptz default now(), + unique(agent_id, key) +); + +create index if not exists idx_agent_memory_agent_id on agent_memory(agent_id); + +create table if not exists agent_action_log ( + id uuid primary key default gen_random_uuid(), + agent_id uuid references agents(id) on delete cascade not null, + action text not null, + outcome text not null check (outcome in ('allowed', 'denied')), + metadata jsonb default '{}'::jsonb, + created_at timestamptz default now() +); + +create index if not exists idx_agent_action_log_agent_id on agent_action_log(agent_id); +create index if not exists idx_agent_action_log_created_at on agent_action_log(created_at desc); + +create table if not exists reviewed_prs ( + id uuid primary key default gen_random_uuid(), + agent_id uuid references agents(id) on delete cascade not null, + owner text not null, + repo text not null, + pr_number int not null, + reviewed_at timestamptz default now(), + unique(agent_id, owner, repo, pr_number) +); + +create index if not exists idx_reviewed_prs_agent_id on reviewed_prs(agent_id); + -- ── Row Level Security ─────────────────────────────────────────────────────── -- The backend uses the service-role key, which bypasses RLS. These policies -- keep anon/authenticated access closed by default. alter table users enable row level security; alter table agents enable row level security; alter table credentials enable row level security; +alter table agent_memory enable row level security; +alter table agent_action_log enable row level security; +alter table reviewed_prs enable row level security; drop policy if exists "Service role full access on users" on users; drop policy if exists "Service role full access on agents" on agents; drop policy if exists "Service role full access on credentials" on credentials; +drop policy if exists "Service role full access on agent_memory" on agent_memory; +drop policy if exists "Service role full access on agent_action_log" on agent_action_log; +drop policy if exists "Service role full access on reviewed_prs" on reviewed_prs; create policy "Service role full access on users" on users for all using (true); create policy "Service role full access on agents" on agents for all using (true); create policy "Service role full access on credentials" on credentials for all using (true); +create policy "Service role full access on agent_memory" on agent_memory for all using (true); +create policy "Service role full access on agent_action_log" on agent_action_log for all using (true); +create policy "Service role full access on reviewed_prs" on reviewed_prs for all using (true); diff --git a/backend/tests/test_dispatcher.py b/backend/tests/test_dispatcher.py index 02d2436..0668e91 100644 --- a/backend/tests/test_dispatcher.py +++ b/backend/tests/test_dispatcher.py @@ -40,6 +40,52 @@ async def test_dispatch_task(self, mock_orchestrator, mock_httpx_client): call_url = mock_httpx_client.post.call_args[0][0] assert "172.18.0.5:8080/task" in call_url + @pytest.mark.asyncio + async def test_dispatch_task_injects_memory(self, mock_orchestrator, mock_httpx_client): + """The agent's persisted memory rides into the container as role_context + so the agent sees back what update-memory wrote on previous tasks.""" + from app.services.dispatcher import Dispatcher + + mock_resp = MagicMock() + mock_resp.json.return_value = {"accepted": True, "task_id": "t-1"} + mock_resp.raise_for_status = MagicMock() + mock_httpx_client.post = AsyncMock(return_value=mock_resp) + + with patch("app.services.dispatcher.AgentMemoryModel") as mock_mem: + mock_mem.list_by_agent.return_value = [ + {"key": "style.tone", "value": "concise"}, + {"key": "repos.acme.lang", "value": "TypeScript"}, + ] + dispatcher = Dispatcher(orchestrator=mock_orchestrator) + await dispatcher.dispatch_task("agent-001", "Review PR #42") + + body = mock_httpx_client.post.call_args.kwargs["json"] + assert body["role_context"]["memory"] == { + "style.tone": "concise", + "repos.acme.lang": "TypeScript", + } + + @pytest.mark.asyncio + async def test_dispatch_task_memory_failure_does_not_block( + self, mock_orchestrator, mock_httpx_client + ): + """A memory-load failure must not block dispatch — best-effort.""" + from app.services.dispatcher import Dispatcher + + mock_resp = MagicMock() + mock_resp.json.return_value = {"accepted": True, "task_id": "t-1"} + mock_resp.raise_for_status = MagicMock() + mock_httpx_client.post = AsyncMock(return_value=mock_resp) + + with patch("app.services.dispatcher.AgentMemoryModel") as mock_mem: + mock_mem.list_by_agent.side_effect = RuntimeError("db down") + dispatcher = Dispatcher(orchestrator=mock_orchestrator) + result = await dispatcher.dispatch_task("agent-001", "Do X") + + assert result["accepted"] is True + body = mock_httpx_client.post.call_args.kwargs["json"] + assert body["role_context"] == {"memory": {}} + class TestGetAgentTaskStatus: @pytest.mark.asyncio diff --git a/backend/tests/test_gateway.py b/backend/tests/test_gateway.py index 79ed293..49b7f88 100644 --- a/backend/tests/test_gateway.py +++ b/backend/tests/test_gateway.py @@ -113,12 +113,40 @@ def test_list_pull_requests(self, agent_client): resp = client.get("/gateway/github/pulls/owner/repo") assert resp.status_code == 200 - def test_create_pr_review(self, agent_client): + def test_create_pr_review_records_dedup(self, agent_client): + """On a successful review, the gateway inserts a reviewed_prs row so + Phase C's watcher will not re-review this PR on the next tick.""" client, agent, fake_sb = agent_client with patch("app.services.gateway.CredentialStore") as mock_cs, patch( "app.services.gateway.httpx.AsyncClient" - ) as mock_httpx: + ) as mock_httpx, patch("app.routers.gateway.ReviewedPRModel") as mock_rp: + mock_cs.get.return_value = {"service": "github", "token": "ghp_test", "scopes": []} + mock_resp = MagicMock(status_code=200) + mock_resp.json.return_value = {"id": 1} + mock_httpx.return_value.__aenter__ = AsyncMock( + return_value=MagicMock(request=AsyncMock(return_value=mock_resp)) + ) + mock_httpx.return_value.__aexit__ = AsyncMock(return_value=False) + mock_rp.exists.return_value = False + + resp = client.post( + "/gateway/github/review", + json={ + "owner": "acme", "repo": "api", "pull_number": 42, + "body": "LGTM", "event": "APPROVE", + }, + ) + assert resp.status_code == 200 + mock_rp.record.assert_called_once_with(agent["id"], "acme", "api", 42) + + def test_create_pr_review_skips_dedup_when_already_recorded(self, agent_client): + """If reviewed_prs already has the row, the gateway does not re-insert.""" + client, agent, fake_sb = agent_client + + with patch("app.services.gateway.CredentialStore") as mock_cs, patch( + "app.services.gateway.httpx.AsyncClient" + ) as mock_httpx, patch("app.routers.gateway.ReviewedPRModel") as mock_rp: mock_cs.get.return_value = {"service": "github", "token": "ghp_test", "scopes": []} mock_resp = MagicMock(status_code=200) mock_resp.json.return_value = {"id": 1} @@ -126,18 +154,17 @@ def test_create_pr_review(self, agent_client): return_value=MagicMock(request=AsyncMock(return_value=mock_resp)) ) mock_httpx.return_value.__aexit__ = AsyncMock(return_value=False) + mock_rp.exists.return_value = True resp = client.post( "/gateway/github/review", json={ - "owner": "acme", - "repo": "api", - "pull_number": 42, - "body": "LGTM", - "event": "APPROVE", + "owner": "acme", "repo": "api", "pull_number": 42, + "body": "LGTM", "event": "APPROVE", }, ) assert resp.status_code == 200 + mock_rp.record.assert_not_called() def test_github_no_credential(self, agent_client): client, agent, fake_sb = agent_client @@ -174,3 +201,64 @@ def test_github_action_denied_for_wrong_role(self, client, fake_supabase): headers={"Authorization": "Bearer at_secretary"}, ) assert resp.status_code == 403 + + +class TestMemory: + """The /gateway/memory endpoints let an agent persist key/value preferences + across container restarts. They are agent-authed and policy-gated, so the + agent's role template must list agent.memory.{read,write}.""" + + def test_write_memory(self, agent_client): + client, agent, fake_sb = agent_client + with patch("app.routers.gateway.AgentMemoryModel") as mock_mem: + mock_mem.upsert.return_value = { + "key": "style.tone", "value": "concise", + "updated_at": "2026-05-23T12:00:00+00:00", + } + resp = client.post( + "/gateway/memory", + json={"key": "style.tone", "value": "concise"}, + ) + assert resp.status_code == 200 + mock_mem.upsert.assert_called_once_with(agent["id"], "style.tone", "concise") + + def test_read_memory(self, agent_client): + client, agent, fake_sb = agent_client + with patch("app.routers.gateway.AgentMemoryModel") as mock_mem: + mock_mem.list_by_agent.return_value = [ + {"key": "style.tone", "value": "concise", "updated_at": "t"}, + {"key": "repos.acme.lang", "value": "TypeScript", "updated_at": "t"}, + ] + resp = client.get("/gateway/memory") + assert resp.status_code == 200 + body = resp.json() + assert len(body["memory"]) == 2 + assert {m["key"] for m in body["memory"]} == {"style.tone", "repos.acme.lang"} + mock_mem.list_by_agent.assert_called_once_with(agent["id"]) + + def test_memory_requires_token(self, client): + """Missing Authorization → 401.""" + resp = client.get("/gateway/memory") + assert resp.status_code == 401 + + def test_memory_denied_without_template_permission(self, client, fake_supabase): + """A role whose allowed_actions lack agent.memory.* gets 403, not 200.""" + agent = { + "id": "agent-x", "user_id": "user-001", "role": "customer-support", + "status": "running", "agent_token": "at_cs", + } + fake_supabase.get_table("agents").set_select_result([agent]) + resp = client.get( + "/gateway/memory", + headers={"Authorization": "Bearer at_cs"}, + ) + assert resp.status_code == 403 + + def test_memory_write_requires_key(self, agent_client): + client, agent, fake_sb = agent_client + with patch("app.routers.gateway.AgentMemoryModel"): + resp = client.post( + "/gateway/memory", + json={"key": "", "value": "v"}, + ) + assert resp.status_code == 400 diff --git a/backend/tests/test_policy.py b/backend/tests/test_policy.py index 2601aff..dce21b2 100644 --- a/backend/tests/test_policy.py +++ b/backend/tests/test_policy.py @@ -4,6 +4,8 @@ resolves *who* is calling, require_action enforces *what* they may do. """ +from unittest.mock import patch + import pytest from fastapi import HTTPException @@ -66,3 +68,42 @@ def test_valid_token_resolves_agent(self, fake_supabase): result = get_current_agent("Bearer at_good") assert result["id"] == "a1" assert result["user_id"] == "u1" + + +class TestActionLogAudit: + """Phase D promotes Phase B's deny-only log line to a persisted row, and + extends coverage to the allow path too — every require_action call leaves + an audit trail.""" + + def test_allow_writes_audit_row(self): + agent = {"id": "a1", "role": "code-review-engineer"} + with patch("app.services.policy.ActionLogModel") as mock_log: + require_action(agent, "github.review.submit") + mock_log.record.assert_called_once() + kwargs = mock_log.record.call_args.kwargs + assert kwargs["agent_id"] == "a1" + assert kwargs["action"] == "github.review.submit" + assert kwargs["outcome"] == "allowed" + assert kwargs["metadata"]["role"] == "code-review-engineer" + + def test_deny_writes_audit_row(self): + agent = {"id": "a1", "role": "code-review-engineer"} + with patch("app.services.policy.ActionLogModel") as mock_log: + with pytest.raises(HTTPException): + require_action(agent, "github.pr.merge") + mock_log.record.assert_called_once() + kwargs = mock_log.record.call_args.kwargs + assert kwargs["outcome"] == "denied" + assert kwargs["action"] == "github.pr.merge" + + def test_audit_failure_does_not_break_request(self): + """A DB hiccup on the audit write must not block the policy check.""" + agent = {"id": "a1", "role": "code-review-engineer"} + with patch("app.services.policy.ActionLogModel") as mock_log: + mock_log.record.side_effect = RuntimeError("db down") + # Allow path still returns cleanly. + require_action(agent, "github.review.submit") + # Deny path still raises the policy 403, not the audit error. + with pytest.raises(HTTPException) as exc: + require_action(agent, "github.pr.merge") + assert exc.value.status_code == 403 From 7b1d1885b33146c142af372f215ba26b15f9320e Mon Sep 17 00:00:00 2001 From: Michael Wang Date: Sat, 23 May 2026 22:32:33 -0700 Subject: [PATCH 2/2] docs: reflect phases A/B/D across the md files CLAUDE.md, README.md, ROADMAP.md, PROJECT_CONTEXT.md, LOCAL_SETUP.md all now describe what's actually built rather than the original hackathon-only scope. The drift was substantial because the trust moat (Phase B) and the memory layer (Phase D) were being described in PROJECT_CONTEXT.md and ROADMAP.md as defensible-layer ambitions when they are now real for the Code Review Engineer. CLAUDE.md gains a Status line that names the three shipped phases, architecture bullets for template-driven runtime / agent-token auth / memory + audit log, an updated backend layout including policy.py and agent_auth.py and the new agent-scoped models, the actual backend test count (97 -> 125), a note on migration 004, and two new conventions sections (trust moat, memory) so future edits stay inside the established patterns. README.md updates the tagline to mention the depth work, refreshes the routers/services/models trees, rewrites the agent-runtime section around the template-shaped container, bumps the test count (78 -> 125), and rewrites Scope and Status as Foundation + Hardening + 3-of-4 phases shipped with issue and PR references. ROADMAP.md annotates each of the Enforced Specialization five layers with built / partial / not-built status, rewrites What's Been Built as Foundation / Hardening / Code Review Engineer epic with PR numbers, and replaces What Needs Doing Next with the current backlog (referencing #11, #12, #13, #14, #15, #17, #18, #23). PROJECT_CONTEXT.md annotates the OAuth gateway bullets with which steps Phases B + D made real, and marks each Defensible Layer as partial with a pointer to what is and isn't built. LOCAL_SETUP.md gets a drift notice flagging the stale Next.js + Compose sections (tracked in #11), switches the migration step to schema.sql (the consolidated fresh-install snapshot), strengthens the rebuild-image guidance after skill changes, and bumps the test count. Also removes AGENT_SYSTEM_PROMPT.md and HANDOFF.md (now superseded by SOUL.md per-template and git history respectively) and scrubs their references from README.md and CLAUDE.md. Co-Authored-By: Claude Opus 4.7 --- AGENT_SYSTEM_PROMPT.md | 113 ------------------------- CLAUDE.md | 21 +++-- HANDOFF.md | 184 ----------------------------------------- LOCAL_SETUP.md | 10 ++- PROJECT_CONTEXT.md | 18 ++-- README.md | 54 +++++++----- ROADMAP.md | 75 +++++++++-------- 7 files changed, 105 insertions(+), 370 deletions(-) delete mode 100644 AGENT_SYSTEM_PROMPT.md delete mode 100644 HANDOFF.md diff --git a/AGENT_SYSTEM_PROMPT.md b/AGENT_SYSTEM_PROMPT.md deleted file mode 100644 index 5a08349..0000000 --- a/AGENT_SYSTEM_PROMPT.md +++ /dev/null @@ -1,113 +0,0 @@ -# Claude Agent System Prompt — AI Dashboard + Marketplace - -## Overview - -You are the core intelligence behind an AI Agent Dashboard. - -Your job is to: -- Read data from multiple integrations (Slack, Gmail, GitHub) -- Reason across them -- Generate useful, actionable outputs -- Simulate intelligent agent behavior (not just summarization) - ---- - -## Core Principles - -1. You are NOT a chatbot. -2. You are an AI agent that: - - analyzes - - connects information - - suggests actions -3. You prioritize usefulness over verbosity. - ---- - -## Available Integrations (Mocked) - -You may receive structured data from: - -### Gmail -- emails -- threads -- sender, subject, body - -### Slack -- messages -- channels -- timestamps - -### GitHub -- pull requests -- comments -- issues - ---- - -## Your Responsibilities - -### 1. Cross-Integration Reasoning - -You must: -- connect information across tools - -Example: -- email mentions a task -- Slack discusses it -- GitHub has related PR - -→ You unify this into a coherent understanding - ---- - -### 2. Agent Behavior - -Each agent has a purpose. You must behave accordingly. - -Examples: - -#### Inbox Agent -- summarize emails -- detect priority -- draft replies - -#### Slack Agent -- summarize discussions -- extract action items - -#### GitHub Agent -- summarize PRs -- suggest improvements - ---- - -### 3. Action-Oriented Outputs - -DO NOT just summarize. - -Always include: -- recommended actions -- suggested next steps - ---- - -### 4. Structured Thinking - -Internally, follow this process: - -1. Identify key information -2. Detect relationships -3. Identify user intent -4. Generate useful output - ---- - -## Linkup Integration (External Context) - -If external knowledge is needed: - -You may receive: -```json -{ - "linkup_results": [...] -} \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md index aabf3fb..b35ddb8 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -5,7 +5,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project AgentOS — "Fiverr for OpenClaw." Managed platform that packages OpenClaw instances as specialized, containerized AI employees. See `README.md`, `ROADMAP.md`, and `PROJECT_CONTEXT.md` for product context; `LOCAL_SETUP.md` is the authoritative setup guide. -**Status: hackathon mode.** Demo bar is "hired and running." Hire flow is the v1 frontend scope. LLM execution is live end-to-end via OpenClaw + Kimi K2.5. +**Status: hackathon mode, post-hackathon depth in progress.** Demo bar was "hired and running"; the Code Review Engineer is now also a differentiated, trust-moated, memory-backed employee. Three of the four "make one employee real" phases have shipped (A: template-driven runtime; B: enforced action policy; D: agent memory + work log). Phase C (autonomous PR-watcher) is next. LLM execution is live end-to-end via OpenClaw + Kimi K2.5. ## Architecture (read this before editing) @@ -20,17 +20,22 @@ Host (Mac) - **Platform → agent dispatch** is HTTP POST to the container's internal IP on the `openclaw-agents` Docker bridge network. The platform finds the IP via the Docker SDK. There is no message bus. - **Each agent container** runs the official OpenClaw gateway as the engine plus a FastAPI sidecar (`backend/agent-runtime/server.py`) on port 8080. The sidecar accepts `POST /task` with a token (`openclaw-internal` by default) and proxies to OpenClaw's OpenAI-compatible `/v1/chat/completions`. +- **Template-driven container shape (Phase A).** The orchestrator base64-encodes the resolved role template into `AGENT_TEMPLATE_B64`; `entrypoint.sh` decodes it, writes `SOUL.md` from the template's `system_prompt`, and installs only the skills the template lists. `resource_limits` from the template cap the Docker container. +- **Agent-side auth + action policy (Phase B).** The orchestrator mints a per-agent bearer token and persists it on `agents.agent_token` while the container runs. Agent-authed gateway endpoints (currently the 4 GitHub ones) use `get_current_agent` (`backend/app/agent_auth.py`) instead of `get_current_user`, and call `require_action` (`backend/app/services/policy.py`) before doing work. Denied-by-default against the role template's `allowed_actions`. +- **Memory + audit log (Phase D).** Per-agent key/value memory (`agent_memory` table) the agent writes via the `update-memory` skill and reads back as injected `role_context` on every dispatch — survives container restarts. Every agent-authed gateway call writes a row to `agent_action_log` (allow + deny). `reviewed_prs` is the dedup table Phase C's watcher will read. - **LLM** is Kimi (Moonshot AI) — `moonshot/kimi-k2.5` — wired via `openclaw.json` inside the agent image. The chat-completions endpoint must be explicitly enabled in that config. -- **Persistence** is Supabase only (users, hired employees, encrypted credentials). Credentials are Fernet-encrypted at rest in `backend/app/services/credential_store.py`. +- **Persistence** is Supabase only (users, hired employees, encrypted credentials, per-agent memory + action log + reviewed PRs). User credentials are Fernet-encrypted at rest in `backend/app/services/credential_store.py`; agent tokens are stored plaintext (rotated on stop). - **OAuth fidelity (hackathon):** GitHub is real OAuth; Slack/Gmail use a simulated consent screen that writes a placeholder token via `POST /credentials`. - **Frontend → backend** is via the Vite dev proxy: `/api/*` → `http://localhost:8000/*` (see `app/vite.config.ts`). Do not bake `BACKEND_URL` into the build. Backend layout under `backend/app/`: - `routers/` — `users`, `auth` (+ `compat_router`), `agents`, `roles`, `credentials`, `tasks`, `gateway`, `chat` -- `services/` — `orchestrator` (Docker spawn/teardown), `dispatcher` (task routing), `credential_store` (Fernet vault), `gateway` (OAuth URL build + token exchange), `template_loader` (YAML role templates) -- `models/` — Supabase data access +- `services/` — `orchestrator` (Docker spawn/teardown), `dispatcher` (task routing + memory injection), `credential_store` (Fernet vault), `gateway` (OAuth URL build + token exchange), `template_loader` (YAML role templates), `policy` (action-policy check + audit write) +- `models/` — Supabase data access. User-scoped: `user`, `agent`, `credential`. Agent-scoped (Phase D): `agent_memory`, `action_log`, `reviewed_pr` - `schemas/` — Pydantic request/response models -- Role templates live in `backend/agent-config/templates/` (`secretary`, `code-review-engineer`, `customer-support`). `AGENT_SYSTEM_PROMPT.md` is the base system prompt mounted into containers. +- `auth.py` — user-side `get_current_user` (X-Api-Key). `agent_auth.py` — agent-side `get_current_agent` (Bearer agent token, Phase B) +- Role templates live in `backend/agent-config/templates/` (`secretary`, `code-review-engineer`, `customer-support`). The template's `skills` list selects which skill folders from `backend/agent-runtime/skills/` install into the container (Phase A); `allowed_actions` are enforced server-side (Phase B). The agent's `SOUL.md` is written from each template's `system_prompt` at container boot — there is no shared base prompt. +- Migrations live in `backend/migrations/`. `001`–`003` are the original user/credentials/password schema. `004_code_review_engineer.sql` is shared across Phases B (agent_token column) and D (memory + action log + reviewed_prs tables) of the Code Review Engineer epic. `schema.sql` is the consolidated fresh-install snapshot. ## Common commands @@ -53,7 +58,7 @@ bun run build # tsc -b && vite build bun run lint # eslint . ``` -Backend tests (97 tests): +Backend tests (125 tests): ```bash cd backend arch -arm64 .venv/bin/python -m pytest # all @@ -82,6 +87,10 @@ API docs: `http://localhost:8000/docs`. Health: `GET /health`. **Scope discipline (hackathon).** Post-hire surfaces (work log, team page, performance review), billing/Stripe, and VPS deploy are explicitly post-hackathon. Don't scaffold them unless asked. The full post-hackathon candidate pool of 10 employees lives in `PROJECT_CONTEXT.md` — the MVP ships 2 (Code Review Engineer, Customer Support) plus `secretary.yaml` as a reference template. +**Trust moat conventions (Phase B+).** Gateway endpoints that an agent skill calls must (1) use `agent: dict = Depends(get_current_agent)` instead of `user: dict = Depends(get_current_user)`, and (2) call `require_action(agent, "")` before doing work — denied-by-default against the role template's `allowed_actions`. The action id is the audit-log row's `action` field; choose stable, dot-namespaced strings (e.g. `github.review.submit`, `agent.memory.write`). Skills authenticate with `Authorization: Bearer ${AGENT_TOKEN}`, never `X-Api-Key`. + +**Memory conventions (Phase D+).** Agent-side memory persistence lives in `agent_memory` and is exposed via `GET`/`POST /gateway/memory`. The dispatcher injects all of an agent's memory keys into `role_context` on every dispatch — compaction strategies (LRU / LLM reflection) are tracked in issue #23, not yet built. Keep memory keys stable and namespaced; the agent's `update-memory` skill is the *write* path, the dispatcher is the *read* path. + **Git.** Commit after every meaningful fix. Keep messages short and reflective of intent. **Docs.** When behavior or setup changes, update the relevant md (`README.md`, `LOCAL_SETUP.md`, `ROADMAP.md`, this file) in the same change. diff --git a/HANDOFF.md b/HANDOFF.md deleted file mode 100644 index 407c534..0000000 --- a/HANDOFF.md +++ /dev/null @@ -1,184 +0,0 @@ -# Backend Handoff - -What's been built, how it works, and what's left. - ---- - -## What's Done - -### Platform Backend (FastAPI) -Fully working backend with these API endpoints: - -| Endpoint | Method | What it does | -|---|---|---| -| `/health` | GET | Platform health check | -| `/users` | POST | Create a user (returns API key) | -| `/users` | GET | List users | -| `/users/{id}` | GET/DELETE | Get or delete a user | -| `/agents` | POST | Hire an agent (spins up a Docker container) | -| `/agents` | GET | List your agents | -| `/agents/{id}` | GET | Get agent status | -| `/agents/{id}` | DELETE | Fire an agent (stops + removes container) | -| `/agents/{id}/tasks` | POST | Assign a task to an agent | -| `/agents/{id}/tasks/status` | GET | Check task progress | -| `/agents/{id}/tasks/cancel` | POST | Cancel a running task | -| `/credentials` | POST | Store an OAuth token (encrypted) | -| `/credentials` | GET | List stored credentials | -| `/credentials/{service}` | DELETE | Remove a credential | -| `/gateway/email/send` | POST | Send email via stored Gmail token | -| `/gateway/slack/message` | POST | Send Slack message via stored token | -| `/gateway/discord/message` | POST | Send Discord message via stored token | - -All endpoints require `X-Api-Key` header (except `/users` POST and `/health`). - -### Agent Runtime (OpenClaw + Kimi) -Each agent container runs: -1. **OpenClaw gateway** (port 18789) — the open-source AI agent engine, configured with Kimi K2.5 as the LLM -2. **Task server** (port 8080) — our FastAPI sidecar that receives tasks from the platform and forwards them to OpenClaw - -The `entrypoint.sh` bootstraps everything: -- Writes `openclaw.json` with Moonshot/Kimi provider config (enables `/v1/chat/completions` endpoint, token auth) -- Generates `SOUL.md` (agent persona) and `AGENTS.md` (operating instructions) from the role env vars -- Starts OpenClaw via `node openclaw.mjs gateway --allow-unconfigured` in background, waits for health check, then starts the task server - -**LLM calls are verified end-to-end.** Container builds, gateway starts with `moonshot/kimi-k2.5`, and tasks produce real Kimi responses via the OpenAI-compatible chat completions API. - -### Test Suite -78 tests, all passing. Run with: -```bash -cd backend -pip install -e ".[dev]" -pytest tests/ -v -``` - -Covers: all routers, orchestrator, dispatcher, schemas, crypto, agent runtime, OpenClaw integration. - ---- - -## Architecture - -``` -User → Platform API (:8000) - │ - ├── Supabase (users, agents, credentials tables) - ├── Docker socket (create/stop agent containers) - │ - └── Agent Container (on openclaw-agents network) - ├── OpenClaw Gateway (:18789) → Kimi K2.5 API - └── Task Server (:8080) ← platform dispatches here -``` - -All communication between platform and agents happens over the Docker bridge network. No agent ports are exposed externally. - ---- - -## Key Files - -| File | Purpose | -|---|---| -| `backend/app/main.py` | FastAPI app entry point, includes all routers | -| `backend/app/services/orchestrator.py` | Creates/stops Docker containers, resolves container IPs | -| `backend/app/services/dispatcher.py` | Sends tasks to agent containers via HTTP | -| `backend/app/routers/tasks.py` | Task assignment API endpoints | -| `backend/app/routers/agents.py` | Agent hire/fire/list API endpoints | -| `backend/app/services/gateway.py` | Proxies requests to Gmail/Slack/Discord APIs | -| `backend/app/services/credential_store.py` | Encrypts and stores OAuth tokens | -| `backend/agent-runtime/server.py` | Task server inside each agent container | -| `backend/agent-runtime/entrypoint.sh` | Bootstraps OpenClaw config + starts both services | -| `backend/agent-runtime/Dockerfile` | Extends official OpenClaw image with our sidecar | -| `backend/agent-config/templates/secretary.yaml` | Role definition for the secretary agent | -| `backend/docker-compose.yml` | Platform service definition | - ---- - -## How to Run Locally - -See `LOCAL_SETUP.md` for full instructions. Quick version: - -```bash -# 1. Make sure Docker Desktop is running - -# 2. Build the agent image -docker build -t openclaw/agent:latest backend/agent-runtime/ - -# 3. Set up env -cp backend/.env.example backend/.env -# Edit backend/.env with your Supabase + Kimi API keys - -# 4. Run Supabase migration (SQL editor or psql) - -# 5. Start the platform -cd backend -pip install -e . -uvicorn app.main:app --reload --port 8000 - -# 6. Test it -curl http://localhost:8000/health -``` - ---- - -## Environment Variables - -Set in `backend/.env`: - -| Var | What | -|---|---| -| `SUPABASE_URL` | Your Supabase project URL | -| `SUPABASE_KEY` | Supabase service role key | -| `ENCRYPTION_KEY` | Fernet key for encrypting OAuth tokens | -| `LLM_API_KEY` | Kimi/Moonshot API key (passed to agent containers as `MOONSHOT_API_KEY`) | -| `PLATFORM_GATEWAY_URL` | How agent containers reach the platform (`http://host.docker.internal:8000/gateway` locally) | - ---- - -## What's Left - -### Backend -- [ ] Add `code-review-engineer.yaml` and `customer-support.yaml` role templates -- [ ] Add `GET /roles` endpoint so the frontend can list available roles dynamically -- [ ] Wire real GitHub OAuth redirect/callback flow -- [ ] Deploy to VPS (see `VPS_SETUP.md`) - -### Frontend -- [ ] Hire flow: landing → talent directory → employee profile → hire wizard → confirmation -- [ ] Wire to backend API (create user, hire agent, connect credentials) - -### Post-Hackathon -- [ ] Billing (Stripe subscriptions per agent) -- [ ] Enforcement layer (tool lockdown, output validation, scoped memory) -- [ ] More agent roles from the 10-employee list in `PROJECT_CONTEXT.md` -- [ ] Work log / performance review UI - ---- - -## Quick API Test Flow - -```bash -# Create user -curl -s -X POST http://localhost:8000/users \ - -H "Content-Type: application/json" \ - -d '{"email":"test@test.com","name":"Test"}' | jq . - -# Save the api_key, then: -export API_KEY="oc_..." - -# Hire an agent -curl -s -X POST http://localhost:8000/agents \ - -H "Content-Type: application/json" \ - -H "X-Api-Key: $API_KEY" \ - -d '{"role":"secretary"}' | jq . - -# Save the agent id, then: -export AGENT_ID="..." - -# Assign a task -curl -s -X POST http://localhost:8000/agents/$AGENT_ID/tasks \ - -H "Content-Type: application/json" \ - -H "X-Api-Key: $API_KEY" \ - -d '{"instruction":"Draft a welcome email for new team members"}' | jq . - -# Check status -curl -s http://localhost:8000/agents/$AGENT_ID/tasks/status \ - -H "X-Api-Key: $API_KEY" | jq . -``` diff --git a/LOCAL_SETUP.md b/LOCAL_SETUP.md index 345f805..cc387cc 100644 --- a/LOCAL_SETUP.md +++ b/LOCAL_SETUP.md @@ -4,6 +4,8 @@ Run the full OpenClaw platform, agent containers, and hire-flow frontend on your > Retiring note: `VPS_SETUP.md` is retired as of 2026-04-12. VPS deploy is post-hackathon. +> **Drift notice.** Sections of this file still describe the pre-migration Next.js + Docker Compose setup. The frontend has since moved to Vite (`app/`, port 5173) and the backend runs natively via `start-mac.sh` / `start.sh`. Migration in flight in issue #11. Treat `CLAUDE.md` and `start-mac.sh` as the current source of truth. + --- ## What runs where @@ -73,7 +75,9 @@ python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key(). ## 3. Create the Supabase tables -In your Supabase project → **SQL Editor** → **New query** → paste the contents of `backend/migrations/001_initial_schema.sql` → **Run**. +In your Supabase project → **SQL Editor** → **New query** → paste the contents of `backend/migrations/schema.sql` → **Run**. This single file is the consolidated fresh-install snapshot covering migrations 001 (users / agents / credentials), 002 (GitHub credentials), 003 (password hash), and 004 (per-agent bearer token + memory + action log + reviewed PRs). + +If you're upgrading an existing DB rather than starting fresh, run the individual `00X_*.sql` migrations in order. All migrations are idempotent (`create table if not exists`, `alter table ... add column if not exists`). If you skip this, `POST /users` fails on first call with a `relation "users" does not exist` error. @@ -83,7 +87,7 @@ If you skip this, `POST /users` fails on first call with a `relation "users" doe docker build -t openclaw/agent:latest backend/agent-runtime/ ``` -**Do not skip this.** If this image is missing, `docker compose up` still starts the platform, but the first `POST /agents` fails at runtime with `No such image: openclaw/agent:latest`. +**Do not skip this.** If this image is missing, the first `POST /agents` fails at runtime with `No such image: openclaw/agent:latest`. Rebuild after any change under `backend/agent-runtime/` (Dockerfile, `entrypoint.sh`, or any skill folder — e.g., after editing or adding a `SKILL.md`). ## 5. Start the platform @@ -210,7 +214,7 @@ pip install -e ".[dev]" pytest ``` -All 68 tests run with mocked Supabase and Docker — no live infra required. +All 125 tests run with mocked Supabase and Docker — no live infra required. On Apple Silicon, run with `arch -arm64 .venv/bin/python -m pytest` instead, or `pydantic-core` / other native wheels fail to import. --- diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md index 1887bf3..4a1b04b 100644 --- a/PROJECT_CONTEXT.md +++ b/PROJECT_CONTEXT.md @@ -73,13 +73,13 @@ When a task arrives, the task server sends the instruction to OpenClaw's local ` - Customer clicks "Connect GitHub" -> standard OAuth consent flow -> platform stores tokens - OpenClaw containers NEVER hold raw tokens - Auth gateway sits between container and external APIs: - 1. Validates request - 2. Checks role permissions ("can comment but NOT merge") - 3. Injects OAuth token - 4. Logs action (audit trail) + 1. Validates request — **✅ Phase B: agent-side `get_current_agent` resolves the calling agent by its bearer token (`backend/app/agent_auth.py`).** + 2. Checks role permissions ("can comment but NOT merge") — **✅ Phase B: `require_action` denied-by-default against the template's `allowed_actions` for the GitHub surface; other surfaces tracked in #17.** + 3. Injects OAuth token — ✅ user's stored credential is fetched server-side by agent_id → user_id. + 4. Logs action (audit trail) — **✅ Phase D: `agent_action_log` row written for every allow and deny.** 5. Forwards to API -- Configurable autonomy: gateway can hold actions for customer approval or pass through -- Offboarding = revoke token, one click +- Configurable autonomy: gateway can hold actions for customer approval or pass through *(not yet built)* +- Offboarding = revoke token, one click — partially: `stop_agent` clears the per-agent bearer token; user-side OAuth revocation not yet wired. ### Per-Employee Isolation - One container per employee per customer @@ -131,9 +131,9 @@ Preserved from the pre-hackathon brainstorm. Optimized for visible impact within 4. **LLM costs vs low price point tension** — container + API calls + memory = $30-50/month floor per employee ## Defensible Layers (in order of priority) -1. **Skill marketplace** — curated, tested skill packages that specialize raw OpenClaw into roles -2. **Trust infrastructure** — permissions, audit, approval flows -3. **Memory/coaching layer** — employees that get better at their specific job over time, cross-customer learnings +1. **Skill marketplace** — curated, tested skill packages that specialize raw OpenClaw into roles. *Partial: the template's `skills` list now drives which skills install per role (Phase A); the broader marketplace UX is post-hackathon.* +2. **Trust infrastructure** — permissions, audit, approval flows. *Partial: permissions enforced for the GitHub surface, audit log persisted for every agent-authed call (Phases B + D); approval flows not yet built; #17 brings other surfaces to parity.* +3. **Memory/coaching layer** — employees that get better at their specific job over time, cross-customer learnings. *Partial: per-agent persistent memory with dispatch-time injection (Phase D); LLM-driven memory compaction / "coaching" is the moat candidate tracked in #23; cross-customer learning is not yet built.* ## Target Customer 20-80 person teams (Series A-C startups). Big enough to have real pain, small enough to not have dedicated specialists for every function, culturally open to trying new tools. diff --git a/README.md b/README.md index 6401f53..4c7e621 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ A managed platform for hiring, onboarding, and running specialized AI employees. Think "Fiverr for OpenClaw" — each employee is a containerized OpenClaw instance with a defined role, persistent memory, and scoped access to your tools (GitHub, Slack, Gmail). -Built as a hackathon MVP. The current demo bar is "hired and running": a user can browse the talent directory, onboard an employee through a 4-step hire flow, and dispatch tasks that execute inside a live Docker container backed by Kimi K2.5 through the OpenClaw gateway. +Built as a hackathon MVP, now growing depth. The original demo bar was "hired and running"; the Code Review Engineer is now a genuinely differentiated AI employee with a role-shaped container, a server-side action policy (provably unable to merge/push), and persistent memory that survives container restarts. A user can browse the talent directory, onboard an employee through a 4-step hire flow, and dispatch tasks that execute inside a live Docker container backed by Kimi K2.5 through the OpenClaw gateway. --- @@ -20,7 +20,9 @@ Host (your Mac) - Platform backend dispatches tasks to agent containers over the Docker bridge network via HTTP POST. - Each agent container runs the official OpenClaw gateway alongside a FastAPI task server (`backend/agent-runtime/server.py`) on port 8080. - LLM inference uses Kimi (Moonshot AI) via OpenClaw's OpenAI-compatible `/v1/chat/completions` endpoint. -- Supabase is the only external dependency — it holds users, hired employees, and encrypted credentials. +- The role template shapes the container: `system_prompt` becomes the agent's `SOUL.md`, the `skills` list filters which skills install, and `resource_limits` caps the container. +- Skills that need to call the platform back (e.g., GitHub review, memory writes) authenticate with a per-agent bearer token; a server-side action policy enforces the role template's `allowed_actions` (denied-by-default) and writes an audit row per call. +- Supabase is the only external dependency — it holds users, hired employees, encrypted credentials, per-agent memory, the action log, and the reviewed-PR dedup index. --- @@ -31,14 +33,17 @@ AgentOS/ ├── backend/ FastAPI platform API │ ├── app/ │ │ ├── routers/ users, agents, tasks, credentials, gateway, chat, auth, roles -│ │ ├── services/ orchestrator, dispatcher, credential_store, gateway, template_loader -│ │ ├── models/ Supabase data access +│ │ ├── services/ orchestrator, dispatcher, credential_store, gateway, template_loader, policy +│ │ ├── models/ Supabase data access (users, agents, credentials, agent_memory, action_log, reviewed_pr) │ │ ├── schemas/ Pydantic request/response models -│ │ └── utils/ crypto helpers +│ │ ├── utils/ crypto helpers +│ │ ├── auth.py user-side auth (X-Api-Key) +│ │ └── agent_auth.py agent-side auth (Bearer agent token) │ ├── agent-runtime/ OpenClaw + task server sidecar (Dockerfile, entrypoint, server.py) +│ │ └── skills/ github-list-prs, github-pr-review, update-memory, send-email, send-slack-message │ ├── agent-config/ │ │ └── templates/ Role templates (secretary, code-review-engineer, customer-support) -│ ├── migrations/ Supabase SQL migrations +│ ├── migrations/ Supabase SQL migrations (001–004 + schema.sql) │ └── tests/ Pytest unit tests │ ├── app/ Vite + React 19 frontend (started by start.sh) @@ -48,8 +53,7 @@ AgentOS/ ├── LOCAL_SETUP.md Authoritative local setup guide ├── ROADMAP.md Hackathon scope + post-hackathon phases ├── PROJECT_CONTEXT.md Product brainstorming + design decisions -├── AGENT_SYSTEM_PROMPT.md System prompt used by agent containers -└── HANDOFF.md Frontend build brief +└── CLAUDE.md Conventions for Claude Code collaborators ``` --- @@ -71,11 +75,12 @@ FastAPI app mounted at `backend/app/main.py`. Router surface: Key services: -- `orchestrator` — provisions and tears down agent containers via the Docker SDK. -- `dispatcher` — routes tasks from platform to agent container internal IPs. +- `orchestrator` — provisions and tears down agent containers via the Docker SDK; base64-encodes the role template into the container env; mints and persists the per-agent bearer token. +- `dispatcher` — routes tasks from platform to agent container internal IPs; injects each agent's persisted memory into `role_context` at dispatch. - `credential_store` — Fernet-encrypted credential vault. - `template_loader` — reads YAML role templates from `agent-config/templates/`. - `gateway` — builds OAuth URLs and handles token exchange. +- `policy` — server-side action-policy check (`require_action`). Denied-by-default against the role template's `allowed_actions`; persists allow + deny rows to `agent_action_log` (best-effort). Stack: Python 3.12, FastAPI, Supabase, Docker SDK, cryptography (Fernet), httpx, PyYAML. @@ -95,7 +100,7 @@ Each hired employee runs as a Docker container built from `backend/agent-runtime - Sidecar: a FastAPI task server (`server.py`) on port 8080 that accepts `POST /task` from the platform. - Config: `openclaw.json` wires the gateway to Kimi (`moonshot/kimi-k2.5`) and enables the OpenAI-compatible chat completions endpoint. - Auth: task server is protected by a token (`openclaw-internal` by default) so only the platform can dispatch. -- Role: the role template (e.g., `code-review-engineer.yaml`) is mounted in and used to build the system prompt. +- Role: the resolved role template is passed in as a base64-encoded env var. The entrypoint decodes it and writes `SOUL.md` from `system_prompt`, installs only the skills the template lists, and applies its `resource_limits`. Skills that hit the platform's gateway authenticate with the per-agent bearer token. Containers are spawned on demand by the platform orchestrator and attached to the `openclaw-agents` bridge network. @@ -139,7 +144,7 @@ Full walkthrough is in `LOCAL_SETUP.md`. The short version: ## Tests -Backend has 78 passing unit tests: +Backend has 125 passing unit tests: ```bash cd backend @@ -147,18 +152,27 @@ source .venv/bin/activate pytest ``` +On Apple Silicon use `arch -arm64 .venv/bin/python -m pytest` instead — see `CLAUDE.md`. + --- ## Scope and status -- Platform backend scaffold: done, with task dispatch wired end-to-end. -- LLM execution inside containers: live via OpenClaw + Kimi. -- Hire flow: in-progress v1 scope for the hackathon. -- Post-hire surfaces (work log, team page, performance review): post-hackathon. -- Billing / Stripe / payment gating: post-hackathon. -- VPS deployment: post-hackathon. The MVP runs entirely on local Docker Desktop. +**Platform foundation — done** +- Platform backend scaffold with task dispatch wired end-to-end. +- LLM execution inside containers via OpenClaw + Kimi. +- Hardening pass merged (real password login, OAuth state signing, rate limiting, CORS, error handling, frontend route guards + minimal Vitest suite). + +**Code Review Engineer specialization — 3 of 4 phases shipped** +- **Phase A — Template-driven runtime.** The role template shapes the container (system_prompt → SOUL.md, skills filter, resource_limits applied). PR #19. +- **Phase B — Enforced action policy.** Per-agent bearer token + denied-by-default `allowed_actions` check at the gateway. The Code Review Engineer is provably unable to merge/close/push. PR #20. +- **Phase D — Memory & work log.** Per-agent key/value memory (writes via the `update-memory` skill, reads back via dispatch-time injection), full audit log of every agent-authed call (allow + deny), `reviewed_prs` dedup index. PR #25. +- **Phase C — Autonomous PR-watcher.** Next. Polls watched repos and dispatches reviews within minutes. + +**Backlog (post-hackathon)** +- AWS deployment via CDK (#11), frontend do-over (#12), agent memory compaction via LLM reflection (#23), other roles brought up to A/B/D parity (#17), hire/offboard UI (#13), CI for backend + frontend tests (#14). -See `ROADMAP.md` for the full phase plan. +See `ROADMAP.md` for the longer phase plan. --- @@ -178,6 +192,4 @@ Avoid: agents, marketplace, configuration, prompt, dashboard, teardown. - `LOCAL_SETUP.md` — authoritative local setup and networking notes - `ROADMAP.md` — hackathon scope and post-hackathon phases - `PROJECT_CONTEXT.md` — product brainstorming and design decisions -- `AGENT_SYSTEM_PROMPT.md` — system prompt the containers run with -- `HANDOFF.md` — backend handoff notes - `CLAUDE.md` — conventions for Claude Code collaborators diff --git a/ROADMAP.md b/ROADMAP.md index 7900310..85dcf69 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -62,13 +62,13 @@ Non-technical buyers don't want to configure automation. They want to hire teamm An AI employee is not a workflow with a friendlier name. It's a specialized, boundaried identity. Raw OpenClaw specialization is a suggestion (a prompt). Ours is structural. Five layers of enforcement: -1. **Tool Lockdown** — each employee's container ships with only the skills its role needs. No shell, no browser, no file access unless the role calls for it. The employee can't go off-script because the tools to go off-script don't exist in its environment. -2. **Action Gateway** — every API call flows through a central gateway that checks the action against the role's policy (read/write/delete per resource). CRM Cleanup can update contacts, not delete them. Content Repurposer can draft, not publish. -3. **Output Schema Validation** — each role has a defined output shape. Responses are validated before any action is taken. Invalid outputs trigger retries. This prevents role drift. -4. **Scoped Memory** — each employee has its own typed memory store with role-specific fields. The employee literally cannot accumulate context outside its role because there's nowhere to store it. -5. **Input Filtering** — employees only receive data relevant to their job. The Support Ticket Router never sees engineering Slack. The Competitor Monitor never sees internal data. +1. **Tool Lockdown** — each employee's container ships with only the skills its role needs. No shell, no browser, no file access unless the role calls for it. The employee can't go off-script because the tools to go off-script don't exist in its environment. **✅ Built (Phase A) — template `skills` list filters skill install at container boot.** +2. **Action Gateway** — every API call flows through a central gateway that checks the action against the role's policy (read/write/delete per resource). CRM Cleanup can update contacts, not delete them. Content Repurposer can draft, not publish. **✅ Built for the GitHub surface (Phase B) — `require_action` denied-by-default against the template's `allowed_actions`, plus every call audit-logged to `agent_action_log` (Phase D). Code Review Engineer is provably unable to merge/close/push. Other gateway endpoints (email/Slack/Discord) still need the same treatment — tracked in #17.** +3. **Output Schema Validation** — each role has a defined output shape. Responses are validated before any action is taken. Invalid outputs trigger retries. This prevents role drift. *(Not yet built.)* +4. **Scoped Memory** — each employee has its own typed memory store with role-specific fields. The employee literally cannot accumulate context outside its role because there's nowhere to store it. **✅ Built (Phase D) — `agent_memory` is per-`agent_id` and the dispatcher injects only the calling agent's memory into `role_context`. Compaction (LRU / LLM reflection) tracked in #23.** +5. **Input Filtering** — employees only receive data relevant to their job. The Support Ticket Router never sees engineering Slack. The Competitor Monitor never sees internal data. *(Not yet built.)* -**This is the moat.** The UI and the hiring metaphor are the wedge. The enforcement layer is why trust exists. +**This is the moat.** The UI and the hiring metaphor are the wedge. The enforcement layer is why trust exists. Three of the five layers are now real for the Code Review Engineer; bringing other roles to parity is tracked in #17. --- @@ -233,34 +233,41 @@ Copied from `CLAUDE.md` — enforce these consistently in product copy, docs, an ## What's Been Built -- [x] **Platform backend scaffold.** FastAPI backend with user management, agent lifecycle (hire/fire), credential vault, and auth gateway -- [x] **Container orchestration.** Docker-based agent containers with per-agent isolation, run locally on Docker Desktop for the hackathon -- [x] **Platform → agent task dispatch.** HTTP-based task assignment, status checking, and cancellation between platform and agent containers -- [x] **Agent runtime.** Lightweight FastAPI server inside each container that receives and executes tasks -- [x] **OpenClaw integration.** Agent containers run the official OpenClaw gateway with Kimi (Moonshot AI) as the backend LLM. Tasks are forwarded to OpenClaw's OpenAI-compatible `/v1/chat/completions` endpoint. -- [x] **LLM calls verified end-to-end.** Container builds, OpenClaw gateway starts with Kimi K2.5, tasks produce real LLM responses. -- [x] **Role definition templates.** Secretary, Code Review Engineer, and Customer Support YAMLs with allowed actions, required tools, system prompts, and OpenClaw model settings. -- [x] **`GET /roles` endpoint + `template_loader` service** feeding the frontend talent directory. -- [x] **Unit test suite.** 78 tests covering all backend modules (routers, services, schemas, agent runtime, OpenClaw integration). -- [x] **Local deploy guide.** `LOCAL_SETUP.md` covers the full happy-path run on Docker Desktop. -- [x] **Start script.** `start.sh` launches Docker image build, backend, and frontend in one command. - -## What Needs Doing Next (Hackathon) - -**Backend track:** -- [x] **Role templates** — `code-review-engineer.yaml` and `customer-support.yaml` live in `backend/agent-config/templates/`. -- [x] **`GET /roles` endpoint** — lists templates via shared `template_loader` service. -- [x] **Real agent logic** — agent runtime forwards tasks to the local OpenClaw gateway, which uses Kimi K2.5. -- [x] **Local deploy path** — `LOCAL_SETUP.md` covers the full happy-path run on Docker Desktop. -- [ ] **Register a real GitHub OAuth App.** Kevin owns registration; client ID/secret land in `app/.env.local` as `GITHUB_OAUTH_CLIENT_ID` / `GITHUB_OAUTH_CLIENT_SECRET`. The frontend owns the OAuth dance; backend only stores encrypted credentials. - -**Frontend track:** -- [ ] Build the hire flow in `app/`. Landing → talent directory → employee profile → 4-step hire wizard → confirmation. - -**Post-hackathon questions (not blocking):** -- Build the auth layer ourselves or use Nango from day one? -- How do we handle offboarding memory — archive, delete, or keep for re-hiring? -- What's our story if OpenClaw Cloud launches? +**Platform foundation** +- [x] **Platform backend scaffold.** FastAPI with user management, agent lifecycle (hire/fire), credential vault, auth. +- [x] **Container orchestration.** Docker-based agent containers, one per hired employee, local Docker Desktop for the hackathon. +- [x] **Platform → agent task dispatch.** HTTP-based task assignment, status checking, and cancellation. +- [x] **Agent runtime.** Lightweight FastAPI sidecar inside each container that receives and executes tasks. +- [x] **OpenClaw + Kimi integration verified end-to-end.** Container builds, gateway starts with Kimi K2.5, real LLM responses. +- [x] **Role definition templates.** Secretary, Code Review Engineer, Customer Support YAMLs. +- [x] **`GET /roles` + `template_loader` service** feeding the talent directory. +- [x] **Local deploy guide + start script.** `LOCAL_SETUP.md` (parts now stale post-Vite migration; tracked in #11) and `start-mac.sh` / `start.sh`. + +**Production hardening (PR #2, merged)** +- [x] Real password-based login with bcrypt + SHA-256 pre-hash (sidesteps 72-byte truncation). +- [x] Signed HMAC OAuth state tokens. +- [x] slowapi rate limiting; FastAPI global exception handlers; CORS lockdown. +- [x] Frontend route guards, error boundary, friendly errors, minimal Vitest suite (PR #4). + +**Code Review Engineer specialization** (the "make one employee real" epic, #10) +- [x] **Phase A — Template-driven runtime (#6, PR #19).** Orchestrator base64-encodes the resolved role template into `AGENT_TEMPLATE_B64`; `entrypoint.sh` decodes it, writes `SOUL.md` from `system_prompt`, installs only the skills the template lists, applies `resource_limits`. +- [x] **Phase B — Enforced action policy (#7, PR #20).** Per-agent bearer token persisted on `agents.agent_token`; `get_current_agent` dependency; `require_action` denied-by-default against the template's `allowed_actions`. The four GitHub gateway endpoints converted to agent-auth + policy. +- [x] **Phase D — Memory & work log (#8, PR #25).** `agent_memory` (key/value per agent), `agent_action_log` (allow + deny rows for every agent-authed call), `reviewed_prs` (dedup for Phase C). `update-memory` skill; dispatcher injects memory into `role_context` on every dispatch. +- [ ] **Phase C — Autonomous PR-watcher (#9).** FastAPI `lifespan` poll loop, `watched_repos` table + endpoint, dispatches reviews within minutes, dedups against `reviewed_prs`. + +**Tests:** 125 backend unit tests passing. + +## Backlog (post-hackathon) + +- **AWS deployment via CDK** (#11) — EC2 + Docker (Fargate ruled out by the Docker-socket spawn model), ECR + Secrets Manager + ALB/ACM + S3/CloudFront. +- **Frontend do-over** (#12) — rebuild `app/` as a coherent product UI around hire → onboard → monitor → review. +- **Hire & offboard flow** (#13) — `/directory` lists roles but has no hire action; `hireEmployee()` / `fireEmployee()` exist in `app/src/lib/api.ts` but are never called. +- **CI for backend + frontend tests** (#14) — only Claude review workflows exist today. +- **`backend/.env.example`** (#15) — `LOCAL_SETUP.md` tells contributors to copy it but the file doesn't exist. +- **Reconcile docs with reality** (#11) — `README.md`, `ROADMAP.md`, `PROJECT_CONTEXT.md` partially addressed; `LOCAL_SETUP.md` still references the old Next.js/Compose setup. +- **Bring Customer Support & Secretary to A/B/D parity** (#17) — they still run the generic container with no enforced boundaries. +- **Stop passing secrets as plaintext Docker env vars** (#18). +- **Agent memory compaction via LLM reflection** (#23) — the interesting moat candidate. Mechanical eviction → heuristic clustering → LLM-driven consolidation. ---