feat: coordination primitives — blocking locks, append, JSON, task DAG, worktree DB fix by denfry · Pull Request #5 · denfry/agent-sync

denfry · 2026-06-16T06:25:57Z

Context

A real multi-agent Claude Code test session surfaced gaps where agent-sync forced agents to improvise instead of giving them a primitive or a documented protocol — most acutely the headline scenarios N agents → one file and wait for a busy lock, which had no primitive and no protocol. This PR adds the missing primitives, fixes one real bug, and makes SKILL.md prescribe what to do on conflict.

What's in it

Tier 1 — the things the test hit directly

lock <file> --wait[=SEC] — the CLI subprocess blocks/polls until the lock frees (or the deadline passes → still exit 2, fail-closed preserved). One blocking call; no agent-side busy-retry.
agent-sync append <file> — the missing N→1 primitive: atomic lock→append→unlock (body from --content or stdin), honoring --wait.
--json for status / locks / inbox / tasks — structured state so agents decide from data, not by parsing prose.
SKILL.md rewritten with a concrete conflict protocol and an explicit subagent-identity warning (give each parallel subagent a distinct AGENT_SYNC_ID).

Worktree DB fix (real bug) — a linked worktree's .git is a file, so each worktree got its own state.sqlite and agents couldn't see each other. repo_root now resolves a worktree to its main worktree via git rev-parse --git-common-dir, so all worktrees share one DB.

Quick wins — log "msg" positional (keeps --message); gc runs automatically on SessionStart; claim-task/claim-next --lock auto-locks a task's files.

Larger design — named/resource locks (lock --resource KEY); task dependency DAG (--depends-on, dependency-aware claim-next, --force, auto-unblock on completion); message --reply-to threading and ack.

From dogfooding the skill — agent-sync whoami (shows resolved id + source); configurable staleness via AGENT_SYNC_STALE_MINUTES / AGENT_SYNC_OFFLINE_MINUTES; SKILL.md TL;DR loop, heartbeat/liveness section, and append-vs-lock guidance.

Infra — additive ALTER TABLE migration layer upgrades existing DBs in place with no data loss; new task_deps table; model fields + as_dict() serializers.

Verification

159 tests pass (40 new), ruff check clean, scripts/dev-smoke-test.py passes.
Live test: three real Claude Code subagents (distinct AGENT_SYNC_IDs) raced concurrently → 15 intact ledger lines via append (no torn writes), correct dependency-aware task distribution, run-tests refused while blocked then auto-unblocked, reply-threading + ack, and lock --wait fail-closed.
A second live agent, given only a goal + SKILL.md, independently chose whoami → claim-next --lock → append --wait → complete-task — confirming the docs guide agents to the new primitives.

Note: CHANGELOG.md is intentionally not edited — it is generated automatically from these Conventional Commits by the release workflow.

🤖 Generated with Claude Code

… source Introduce an in-place ALTER TABLE migration step (`_migrate`/`_ensure_column`) so an existing database upgrades itself with no data loss and no version table, and add the `task_deps` table for task ordering. Also add building blocks used by the new CLI surface: - `Lock.kind`, `Message.reply_to`/`acked_at`, and `as_dict()` on the dataclasses. - `identity_source()` so `whoami` can report how the agent id was resolved. - `stale_after()`/`offline_after()` reading `AGENT_SYNC_STALE_MINUTES` / `AGENT_SYNC_OFFLINE_MINUTES`, defaulting to the existing 15/120 minutes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`acquire_lock_blocking` polls inside the CLI process until a busy lock frees (holder unlocks, goes stale, or the TTL expires) or a deadline passes, in which case it still raises LockConflict (exit 2) so the fail-closed contract holds. This lets an agent wait with a single blocking call instead of busy-retrying. `acquire_lock` now also takes a `kind`, so a lock can key an arbitrary named resource (e.g. db-migrations) rather than a file path; both share the locks table and resource locks never interfere with file-edit checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Tasks can declare `depends_on` edges. "Blocked by a dependency" is computed from the dependency's status, so `claim-next` skips a blocked task, `claim_task` refuses it unless forced, and completing a dependency unblocks its dependents with no extra write. `dependents_unblocked_by` surfaces what just became claimable. `lock_task_files` best-effort locks a claimed task's files (normalized to the form the PreToolUse hook checks), warning on conflicts instead of failing the claim — closing the window between owning a task and owning its files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`send_message` accepts a `reply_to` parent (validated to exist) to thread a reply, and `ack_message` records an `acked_at` so a sender can confirm a message was handled — distinct from `read_at`, which only marks it seen. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A linked worktree's `.git` is a file, so `repo_root` resolved each worktree to its own `.claude/coordination/state.sqlite` and agents on different worktrees could not see each other — breaking the worktree workflow the skill recommends. Resolve a worktree to its main worktree via `git rev-parse --git-common-dir` (falling back to the old behaviour when git is unavailable) so all worktrees of one repo share a single database. `AGENT_SYNC_ROOT` still overrides everything. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Run `gc_agents`/`gc_locks` at the start of every session (inside the existing fail-open guard) so a crashed agent's expired locks never block the next session until their TTL, removing the need to remember a manual `gc`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Surface the new coordination primitives through the CLI: - `append <file>` — atomic lock -> append -> unlock for files many agents write (body from --content or stdin), honoring --wait. - `lock --wait[=SEC]` and `lock/unlock --resource KEY`. - `--json` for status/locks/inbox/tasks (plus render helpers) so agents decide from structure, not prose. - `whoami [--json]` to report the resolved agent id and its source. - `claim-task/claim-next --lock`, `claim-task --force`, `create-task --depends-on`, `send --reply-to`, and a new `ack` command. - `log` message is now positional (the `--message` flag still works). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

SKILL.md: add a TL;DR loop, a concrete lock-conflict protocol (wait -> message -> other work, covering append too), a liveness/heartbeat section, the file-lock vs append distinction, subagent-identity guidance with whoami, and reply/ack usage. README/SECURITY: document append (and that it writes the working tree), --json, whoami, named/resource locks, task dependencies, and the new tuning env vars. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: acd49cb7b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-16T06:31:53Z

+            if marker == ".git" and marker_path.is_file():
+                shared = _worktree_main_root(candidate)
+                if shared is not None:
+                    return shared


Keep linked-worktree paths relative to the active checkout

When this branch returns the main worktree as repo_root, every caller that is trying to address files in the current linked worktree also starts using the main checkout as its root. In a linked worktree, agent-sync append shared.txt now writes to <main>/shared.txt, and the PreToolUse hook normalizes Claude's absolute path /linked/shared.txt to that absolute string rather than shared.txt, so it will not see a lock that was taken on the relative path. This breaks the advertised shared-DB worktree flow by bypassing file locks/writing to the wrong tree; the DB location should be decoupled from file path normalization/current worktree paths.

Useful? React with 👍 / 👎.

denfry and others added 8 commits June 16, 2026 09:24

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

denfry merged commit ac15631 into main Jun 16, 2026
5 checks passed

denfry deleted the feat/coordination-primitives branch June 16, 2026 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: coordination primitives — blocking locks, append, JSON, task DAG, worktree DB fix#5

feat: coordination primitives — blocking locks, append, JSON, task DAG, worktree DB fix#5
denfry merged 8 commits into
mainfrom
feat/coordination-primitives

denfry commented Jun 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

denfry commented Jun 16, 2026

Context

What's in it

Verification

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant