feat: coordination primitives — blocking locks, append, JSON, task DAG, worktree DB fix#5
Conversation
… source Introduce an in-place ALTER TABLE migration step (`_migrate`/`_ensure_column`) so an existing database upgrades itself with no data loss and no version table, and add the `task_deps` table for task ordering. Also add building blocks used by the new CLI surface: - `Lock.kind`, `Message.reply_to`/`acked_at`, and `as_dict()` on the dataclasses. - `identity_source()` so `whoami` can report how the agent id was resolved. - `stale_after()`/`offline_after()` reading `AGENT_SYNC_STALE_MINUTES` / `AGENT_SYNC_OFFLINE_MINUTES`, defaulting to the existing 15/120 minutes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`acquire_lock_blocking` polls inside the CLI process until a busy lock frees (holder unlocks, goes stale, or the TTL expires) or a deadline passes, in which case it still raises LockConflict (exit 2) so the fail-closed contract holds. This lets an agent wait with a single blocking call instead of busy-retrying. `acquire_lock` now also takes a `kind`, so a lock can key an arbitrary named resource (e.g. db-migrations) rather than a file path; both share the locks table and resource locks never interfere with file-edit checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tasks can declare `depends_on` edges. "Blocked by a dependency" is computed from the dependency's status, so `claim-next` skips a blocked task, `claim_task` refuses it unless forced, and completing a dependency unblocks its dependents with no extra write. `dependents_unblocked_by` surfaces what just became claimable. `lock_task_files` best-effort locks a claimed task's files (normalized to the form the PreToolUse hook checks), warning on conflicts instead of failing the claim — closing the window between owning a task and owning its files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`send_message` accepts a `reply_to` parent (validated to exist) to thread a reply, and `ack_message` records an `acked_at` so a sender can confirm a message was handled — distinct from `read_at`, which only marks it seen. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A linked worktree's `.git` is a file, so `repo_root` resolved each worktree to its own `.claude/coordination/state.sqlite` and agents on different worktrees could not see each other — breaking the worktree workflow the skill recommends. Resolve a worktree to its main worktree via `git rev-parse --git-common-dir` (falling back to the old behaviour when git is unavailable) so all worktrees of one repo share a single database. `AGENT_SYNC_ROOT` still overrides everything. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run `gc_agents`/`gc_locks` at the start of every session (inside the existing fail-open guard) so a crashed agent's expired locks never block the next session until their TTL, removing the need to remember a manual `gc`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surface the new coordination primitives through the CLI: - `append <file>` — atomic lock -> append -> unlock for files many agents write (body from --content or stdin), honoring --wait. - `lock --wait[=SEC]` and `lock/unlock --resource KEY`. - `--json` for status/locks/inbox/tasks (plus render helpers) so agents decide from structure, not prose. - `whoami [--json]` to report the resolved agent id and its source. - `claim-task/claim-next --lock`, `claim-task --force`, `create-task --depends-on`, `send --reply-to`, and a new `ack` command. - `log` message is now positional (the `--message` flag still works). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SKILL.md: add a TL;DR loop, a concrete lock-conflict protocol (wait -> message -> other work, covering append too), a liveness/heartbeat section, the file-lock vs append distinction, subagent-identity guidance with whoami, and reply/ack usage. README/SECURITY: document append (and that it writes the working tree), --json, whoami, named/resource locks, task dependencies, and the new tuning env vars. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: acd49cb7b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if marker == ".git" and marker_path.is_file(): | ||
| shared = _worktree_main_root(candidate) | ||
| if shared is not None: | ||
| return shared |
There was a problem hiding this comment.
Keep linked-worktree paths relative to the active checkout
When this branch returns the main worktree as repo_root, every caller that is trying to address files in the current linked worktree also starts using the main checkout as its root. In a linked worktree, agent-sync append shared.txt now writes to <main>/shared.txt, and the PreToolUse hook normalizes Claude's absolute path /linked/shared.txt to that absolute string rather than shared.txt, so it will not see a lock that was taken on the relative path. This breaks the advertised shared-DB worktree flow by bypassing file locks/writing to the wrong tree; the DB location should be decoupled from file path normalization/current worktree paths.
Useful? React with 👍 / 👎.
Context
A real multi-agent Claude Code test session surfaced gaps where
agent-syncforced agents to improvise instead of giving them a primitive or a documented protocol — most acutely the headline scenarios N agents → one file and wait for a busy lock, which had no primitive and no protocol. This PR adds the missing primitives, fixes one real bug, and makesSKILL.mdprescribe what to do on conflict.What's in it
Tier 1 — the things the test hit directly
lock <file> --wait[=SEC]— the CLI subprocess blocks/polls until the lock frees (or the deadline passes → still exit 2, fail-closed preserved). One blocking call; no agent-side busy-retry.agent-sync append <file>— the missing N→1 primitive: atomic lock→append→unlock (body from--contentor stdin), honoring--wait.--jsonforstatus/locks/inbox/tasks— structured state so agents decide from data, not by parsing prose.SKILL.mdrewritten with a concrete conflict protocol and an explicit subagent-identity warning (give each parallel subagent a distinctAGENT_SYNC_ID).Worktree DB fix (real bug) — a linked worktree's
.gitis a file, so each worktree got its ownstate.sqliteand agents couldn't see each other.repo_rootnow resolves a worktree to its main worktree viagit rev-parse --git-common-dir, so all worktrees share one DB.Quick wins —
log "msg"positional (keeps--message);gcruns automatically onSessionStart;claim-task/claim-next --lockauto-locks a task's files.Larger design — named/resource locks (
lock --resource KEY); task dependency DAG (--depends-on, dependency-awareclaim-next,--force, auto-unblock on completion); message--reply-tothreading andack.From dogfooding the skill —
agent-sync whoami(shows resolved id + source); configurable staleness viaAGENT_SYNC_STALE_MINUTES/AGENT_SYNC_OFFLINE_MINUTES; SKILL.md TL;DR loop, heartbeat/liveness section, and append-vs-lock guidance.Infra — additive
ALTER TABLEmigration layer upgrades existing DBs in place with no data loss; newtask_depstable; model fields +as_dict()serializers.Verification
ruff checkclean,scripts/dev-smoke-test.pypasses.AGENT_SYNC_IDs) raced concurrently → 15 intact ledger lines viaappend(no torn writes), correct dependency-aware task distribution,run-testsrefused while blocked then auto-unblocked, reply-threading + ack, andlock --waitfail-closed.SKILL.md, independently chosewhoami→claim-next --lock→append --wait→complete-task— confirming the docs guide agents to the new primitives.Note:
CHANGELOG.mdis intentionally not edited — it is generated automatically from these Conventional Commits by the release workflow.🤖 Generated with Claude Code