Skip to content

feat: coordination primitives — blocking locks, append, JSON, task DAG, worktree DB fix#5

Merged
denfry merged 8 commits into
mainfrom
feat/coordination-primitives
Jun 16, 2026
Merged

feat: coordination primitives — blocking locks, append, JSON, task DAG, worktree DB fix#5
denfry merged 8 commits into
mainfrom
feat/coordination-primitives

Conversation

@denfry

@denfry denfry commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Context

A real multi-agent Claude Code test session surfaced gaps where agent-sync forced agents to improvise instead of giving them a primitive or a documented protocol — most acutely the headline scenarios N agents → one file and wait for a busy lock, which had no primitive and no protocol. This PR adds the missing primitives, fixes one real bug, and makes SKILL.md prescribe what to do on conflict.

What's in it

Tier 1 — the things the test hit directly

  • lock <file> --wait[=SEC] — the CLI subprocess blocks/polls until the lock frees (or the deadline passes → still exit 2, fail-closed preserved). One blocking call; no agent-side busy-retry.
  • agent-sync append <file> — the missing N→1 primitive: atomic lock→append→unlock (body from --content or stdin), honoring --wait.
  • --json for status / locks / inbox / tasks — structured state so agents decide from data, not by parsing prose.
  • SKILL.md rewritten with a concrete conflict protocol and an explicit subagent-identity warning (give each parallel subagent a distinct AGENT_SYNC_ID).

Worktree DB fix (real bug) — a linked worktree's .git is a file, so each worktree got its own state.sqlite and agents couldn't see each other. repo_root now resolves a worktree to its main worktree via git rev-parse --git-common-dir, so all worktrees share one DB.

Quick winslog "msg" positional (keeps --message); gc runs automatically on SessionStart; claim-task/claim-next --lock auto-locks a task's files.

Larger design — named/resource locks (lock --resource KEY); task dependency DAG (--depends-on, dependency-aware claim-next, --force, auto-unblock on completion); message --reply-to threading and ack.

From dogfooding the skillagent-sync whoami (shows resolved id + source); configurable staleness via AGENT_SYNC_STALE_MINUTES / AGENT_SYNC_OFFLINE_MINUTES; SKILL.md TL;DR loop, heartbeat/liveness section, and append-vs-lock guidance.

Infra — additive ALTER TABLE migration layer upgrades existing DBs in place with no data loss; new task_deps table; model fields + as_dict() serializers.

Verification

  • 159 tests pass (40 new), ruff check clean, scripts/dev-smoke-test.py passes.
  • Live test: three real Claude Code subagents (distinct AGENT_SYNC_IDs) raced concurrently → 15 intact ledger lines via append (no torn writes), correct dependency-aware task distribution, run-tests refused while blocked then auto-unblocked, reply-threading + ack, and lock --wait fail-closed.
  • A second live agent, given only a goal + SKILL.md, independently chose whoamiclaim-next --lockappend --waitcomplete-task — confirming the docs guide agents to the new primitives.

Note: CHANGELOG.md is intentionally not edited — it is generated automatically from these Conventional Commits by the release workflow.

🤖 Generated with Claude Code

denfry and others added 8 commits June 16, 2026 09:24
… source

Introduce an in-place ALTER TABLE migration step (`_migrate`/`_ensure_column`)
so an existing database upgrades itself with no data loss and no version table,
and add the `task_deps` table for task ordering.

Also add building blocks used by the new CLI surface:
- `Lock.kind`, `Message.reply_to`/`acked_at`, and `as_dict()` on the dataclasses.
- `identity_source()` so `whoami` can report how the agent id was resolved.
- `stale_after()`/`offline_after()` reading `AGENT_SYNC_STALE_MINUTES` /
  `AGENT_SYNC_OFFLINE_MINUTES`, defaulting to the existing 15/120 minutes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`acquire_lock_blocking` polls inside the CLI process until a busy lock frees
(holder unlocks, goes stale, or the TTL expires) or a deadline passes, in which
case it still raises LockConflict (exit 2) so the fail-closed contract holds.
This lets an agent wait with a single blocking call instead of busy-retrying.

`acquire_lock` now also takes a `kind`, so a lock can key an arbitrary named
resource (e.g. db-migrations) rather than a file path; both share the locks
table and resource locks never interfere with file-edit checks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tasks can declare `depends_on` edges. "Blocked by a dependency" is computed
from the dependency's status, so `claim-next` skips a blocked task, `claim_task`
refuses it unless forced, and completing a dependency unblocks its dependents
with no extra write. `dependents_unblocked_by` surfaces what just became
claimable.

`lock_task_files` best-effort locks a claimed task's files (normalized to the
form the PreToolUse hook checks), warning on conflicts instead of failing the
claim — closing the window between owning a task and owning its files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`send_message` accepts a `reply_to` parent (validated to exist) to thread a
reply, and `ack_message` records an `acked_at` so a sender can confirm a message
was handled — distinct from `read_at`, which only marks it seen.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A linked worktree's `.git` is a file, so `repo_root` resolved each worktree to
its own `.claude/coordination/state.sqlite` and agents on different worktrees
could not see each other — breaking the worktree workflow the skill recommends.
Resolve a worktree to its main worktree via `git rev-parse --git-common-dir`
(falling back to the old behaviour when git is unavailable) so all worktrees of
one repo share a single database. `AGENT_SYNC_ROOT` still overrides everything.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run `gc_agents`/`gc_locks` at the start of every session (inside the existing
fail-open guard) so a crashed agent's expired locks never block the next
session until their TTL, removing the need to remember a manual `gc`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surface the new coordination primitives through the CLI:
- `append <file>` — atomic lock -> append -> unlock for files many agents write
  (body from --content or stdin), honoring --wait.
- `lock --wait[=SEC]` and `lock/unlock --resource KEY`.
- `--json` for status/locks/inbox/tasks (plus render helpers) so agents decide
  from structure, not prose.
- `whoami [--json]` to report the resolved agent id and its source.
- `claim-task/claim-next --lock`, `claim-task --force`, `create-task --depends-on`,
  `send --reply-to`, and a new `ack` command.
- `log` message is now positional (the `--message` flag still works).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SKILL.md: add a TL;DR loop, a concrete lock-conflict protocol (wait -> message
-> other work, covering append too), a liveness/heartbeat section, the
file-lock vs append distinction, subagent-identity guidance with whoami, and
reply/ack usage. README/SECURITY: document append (and that it writes the
working tree), --json, whoami, named/resource locks, task dependencies, and the
new tuning env vars.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: acd49cb7b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agent_sync/paths.py
Comment on lines +86 to +89
if marker == ".git" and marker_path.is_file():
shared = _worktree_main_root(candidate)
if shared is not None:
return shared

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep linked-worktree paths relative to the active checkout

When this branch returns the main worktree as repo_root, every caller that is trying to address files in the current linked worktree also starts using the main checkout as its root. In a linked worktree, agent-sync append shared.txt now writes to <main>/shared.txt, and the PreToolUse hook normalizes Claude's absolute path /linked/shared.txt to that absolute string rather than shared.txt, so it will not see a lock that was taken on the relative path. This breaks the advertised shared-DB worktree flow by bypassing file locks/writing to the wrong tree; the DB location should be decoupled from file path normalization/current worktree paths.

Useful? React with 👍 / 👎.

@denfry denfry merged commit ac15631 into main Jun 16, 2026
5 checks passed
@denfry denfry deleted the feat/coordination-primitives branch June 16, 2026 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant