Compact receipt-backed completions by hardbyte · Pull Request #352 · hardbyte/awa

hardbyte · 2026-06-14T05:08:29Z

Summary

This PR changes the queue-storage successful receipt completion path from one terminal row per job to compact append-only receipt completion batches. Successful short-job completions now write:

one explicit lease_claim_closures row per claim for durable close evidence
one compact receipt_completion_batches row per completion batch
append-only terminal count deltas
no done_entries rows on the successful receipt fast path

The terminal_jobs view expands compact batches through retained ready_entries, and SQL compatibility deletes write receipt_completion_tombstones instead of mutating the batch rows. Queue prune now truncates the matching receipt batch/tombstone and terminal-delta children with the ready/done children.

The PR also tightens the claim path after the first benchmark pass exposed a planner problem. The earlier head-evidence query could be planned as hashed subplans over retained receipt/ready history. This version uses short-circuited PL/pgSQL point probes for the head row, avoiding the retained-history scan while preserving the lost post-commit cursor-advance recovery path.

Validation

Local checks run:

cargo fmt --all -- --check
git diff --check
cargo check --workspace --all-targets
DATABASE_URL=postgres://postgres:test@localhost:15432/awa_test cargo test --package awa --test queue_storage_runtime_test -- --nocapture (101 passed)
DATABASE_URL=postgres://postgres:test@localhost:15432/awa_test cargo test --package awa --test migration_test -- --nocapture (43 passed)
./correctness/run-tlc.sh storage/AwaSegmentedStorage.tla
./correctness/run-tlc.sh storage/AwaSegmentedStorage.tla storage/AwaSegmentedStorageInterleavings.cfg
./correctness/run-tlc.sh storage/AwaDeadTupleContract.tla
./correctness/run-tlc.sh storage/AwaStorageLockOrder.tla
./correctness/run-tlc.sh storage/AwaStorageTransition.tla
./correctness/run-tlc.sh storage/AwaShardedPrune.tla

No shared-library files are tracked by git.

Smoke Benchmarks

Short local 5k/s offered-load smokes used the benchmarking repo against this dirty Awa checkout before commit. Postgres was the harness default postgres:18.3-alpine with the default 4 CPU container limit.

Shape	Run	Median completion	Median enqueue	Median depth	Dead tuples	WAL bytes	Claim rows/call
64 workers, 1 claimer	`custom-20260614T045536Z-40edc6`	2,558/s	5,003/s	126k	105 median / 146 peak	368 MB	~32
64 workers, 2 claimers	`custom-20260614T045924Z-1a396f`	2,694/s	5,003/s	103k	110 median / 170 peak	348 MB	many empty claim calls; productive calls ~31-32
128 workers, 1 claimer	`custom-20260614T050110Z-454f38`	5,028/s	5,003/s	24k	97 median / 137 peak	512 MB	~64

The 128-worker run is the first clean local 5k/s evidence for the compact path. It used zero done_entries; successful terminal state came through receipt_completion_batches. The named residual mechanism is permit-granularity claim amortization: with one claimer, the dispatcher only claims as many worker permits as have freed when it wakes. At 64 workers that averaged ~32 jobs per claim call; at 128 workers it averaged ~64 and reached the offered load. Extra claimers helped only modestly and introduced more empty claim calls / relation-lock samples.

I also pushed a benchmark harness descriptor fix in hardbyte/postgresql-job-queue-benchmarking (f082bfc) so future runs track receipt_completion_batches_*, receipt_completion_tombstones_*, and queue_terminal_count_deltas_* dead tuples directly.

coderabbitai · 2026-06-14T05:08:36Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d7dce70-496d-44ab-8be8-274493314cae

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

hardbyte · 2026-06-14T07:39:45Z

Code Review — Compact receipt-backed completions

Reviewed against base codex/maintenance-prune-nowait (git diff origin/codex/maintenance-prune-nowait...HEAD, 2197 insertions / 27 files). Findings below were each adversarially verified against the actual SQL/Rust before inclusion; B1 was reproduced end-to-end on a local v036 substrate.

Verdict

Needs rework — not mergeable yet. The design is structurally sound: the terminal_jobs view expansion is column-faithful, the v036/v023 migration is additive-only, and the TLA+ lock-order/prune edits hold against the SQL. But the fast path moves the durable "this attempt finished" evidence from done_entries to receipt_completion_batches, and not every reader of the old evidence was taught about the new one — which is where the blockers live. CI is also red on the only failing check ("Telemetry OTLP validation"), for the same class of gap.

Blockers

B1 — Claim idempotency battery never consults `receipt_completion_batches` → duplicate execution of a completed job

claim_ready_runtime's attempt-spent probes (awa-model/migrations/v023_install_queue_storage_substrate.sql:2089-2176 head probe, :2203-2252 and :2328-2396 both selected_with_spent CTEs) check ready_tombstones, lease_claims, lease_claim_closures, leases, done_entries, deferred_jobs, dlq_entries, ready_entries at run_lease+1 — but never receipt_completion_batches (0 occurrences in the installed function via pg_get_functiondef, vs 3 for done_entries).

The successful fast path (complete_receipt_runtime_batch_fast, awa-model/src/queue_storage.rs:5459, batch insert :5582) writes only a receipt_completion_batches row plus a terminal-count delta — no done_entries, no lease_claim_closures, and it leaves the lease_claims row in place. That lease_claims row lives in the claim ring, pruned independently of the queue ring. prune_oldest_claims (:12835) treats the claim as closed via receipt_closed_evidence_sql (:489-518), which does count the batch, so it truncates lease_claims_<slot>/lease_claim_closures_<slot>. Meanwhile the queue-prune pending guard (:12290-12320) retains the ready row while the best-effort post-commit advance_claim_cursors (:3378-3406) is lost. Net: the only spent-evidence the claim guard can see is gone; the only evidence that remains (the batch) is invisible to it.

Concrete interleaving (reproduced live): claim job 9001 (run_lease 1) → fast-complete (batch only; done_rows=0 closure_rows=0) → crash before cursor advance → claim-ring prune truncates lease_claims → queue-prune skips the ready row (cursor lag) → claim_ready_runtime re-claims job 9001 at run_lease 1 (same idempotency key = same attempt, executed twice). Control (leave lease_claims in place) returns 0 rows, isolating the cause to the missing batch probe. The base branch wrote done_entries on this path, which the head-spent probe does check — so base recovered correctly and this PR regresses it.

⚠️ The existing test test_queue_storage_receipt_claim_dedupes_when_post_commit_cursor_advance_is_lost (awa/tests/queue_storage_runtime_test.rs:5245-5294) masks this — it seeds a done_entries row a real compact completion never writes, so the done_entries probe matches and the bug is hidden. It must be reworked, not relied on.

Fix: add a receipt_completion_batches EXISTS probe to all three attempt-spent sites, mirroring the receipt-batch arm already in receipt_closed_evidence_sql (:499-518) — ideally by factoring a single shared evidence helper so the claim battery and receipt_closed_evidence_sql stay in lockstep. Alternatively, gate prune_oldest_claims so a claim partition can't be truncated until the lane's claim cursor has advanced past the completed lane. Add a regression test that fast-completes, drops only lease_claims/closures (leaving the batch + ready row), and asserts claim_ready_runtime returns no rows.

B2 — Red CI: "Telemetry OTLP validation" fails deterministically

The only failing check, caused by M1 below (test_collector_death_does_not_block_job_processing times out, panic at telemetry_test.rs:682, exit 101). Must be green to merge.

Major

M1 — Telemetry test `running` detection misses compact-batch completions (double-counts completed jobs)

awa/tests/telemetry_test.rs:262-266 (queue_state_breakdown) and :188-192 (queue_job_count) count each lease_claims row as running unless one of five NOT EXISTS arms fires (lease_claim_closures, leases, deferred_jobs, done_entries, dlq_entries). The fast path writes none of those, so completed jobs stay running forever — and the terminal arm (correctly updated to read terminal_jobs) also counts them, so each completed job is counted twice. Reproduced: Timed out waiting for queue ... to drain; live_jobs=N, breakdown=[("running", N)].

Production closure-idempotence uses receipt_closed_evidence_sql and is unaffected (no data loss) — this is a test/CI defect, but it is the red check. Fix: add a sixth NOT EXISTS arm to both running-detection blocks excluding lease_claims rows present in receipt_completion_batches, mirroring receipt_closed_evidence_sql — better, have the test reuse that helper rather than maintain a parallel exclusion list that has now drifted.

M2 — Receipt-path terminal-count deltas use constant `counter_bucket=0` while rebuild re-buckets by `job_id%256` → decrements clamped away, counter stuck high

The fast path writes its +1 delta at hardcoded counter_bucket=0 (awa-model/src/queue_storage.rs:5680), and the v036 compat-delete writes its -1 at 0 (awa-model/migrations/v036_compact_receipt_completions.sql:524). But rebuild_terminal_counters (queue_storage.rs:11737-11744) re-derives each row's bucket as mod(job_id,256), and the rollup fold (:12085) does per-bucket GREATEST(0, live + delta) with INSERT gated on delta > 0. Base used mod(job_id,256) here; the PR changed it to a constant.

Interleaving: (1) fast-complete job 1000 → +1@bucket0; (2) operator runs rebuild_terminal_counters (the documented drift-recovery op) → places job at bucket232, wipes deltas; (3) compat-delete of 1000 → -1@bucket0; (4) rollup folds bucket0: UPDATE matches no live row (it's at 232), INSERT skipped (delta<0) — the -1 is truncated away. Net: counter permanently one too high, violating the documented invariant SUM(live)+SUM(deltas) == count(*) FROM terminal_jobs (:4053-4054), in the very counter rebuild exists to make billing-grade. Fix: bucket the receipt-path deltas by job_id (mod(mod(job_id,256)+256,256)::smallint) at both sites and GROUP BY accordingly.

Minor

correctness/storage/MAPPING.md:28 (echoed at :93) — prose claims the compact success path writes a 'completed' closure row; it doesn't (closure is derived from the batch via receipt_closed_evidence_sql). Contradicts the doc's own FastComplete row at :40. The TLA+ specs are fine — only the prose. A literal 'completed' closure row is written only on the slow fallback (complete_runtime_batch_slow_in_tx, :5843-5859).
awa-model/src/queue_storage.rs:219 — public QueueCounts::terminal doc says "count(*) FROM done_entries semantics"; now spans both tables. The inline invariant at :4054 already says terminal_jobs; four sibling sites were updated, this public field doc was missed.
awa-model/migrations/v036_compact_receipt_completions.sql:507-529 — the receipt-batch delta INSERT passes a 6th arg (p_id) in USING that's never referenced (harmless but misleading). Becomes live if M2 is fixed by bucketing on job_id — coordinate the two.
CHANGELOG.md [Unreleased] — no entry. This adds migration v036, two public BusyCounts fields (queue_receipt_completion_batches, queue_receipt_completion_tombstones), two metrics, and changes the meaning of direct SQL against done_entries for completed jobs. Comparable storage changes (Failed terminal rows are prunable with no retention floor; admin retry/DLQ-move silently return empty #337, Introduce partitioned queues #348) got entries; add before merge and call out the done_entries semantic change explicitly.

What's solid (verified — no need to re-litigate)

terminal_jobs view expansion — column-by-column faithful: done_entries UNION ALL (receipt_completion_batches expanded via CROSS JOIN LATERAL UNNEST(job_ids, run_leases) WITH ORDINALITY against retained ready_entries); state/run_lease/job_id alignment correct (v023:1356-1431).
Migration discipline — v036 net-new and additive-only; v023 substrate changes additive (no drop/rename/retype/tighten).
TLA+ fidelity — AwaSegmentedStorage.FastComplete/StatefulComplete adding <<j,runLease>> to claimClosed is a faithful abstraction (the batch is durable closure evidence); AwaDeadTupleContract.CompleteReceiptsTx, the prune truncate/lock lists, DeleteCompactReceiptTerminalCompatTx, the PruneReadySegment cursor-guard change, and all AwaStorageLockOrder plan edits match the SQL.
receipt_closed_evidence_sql (:489-537) is the correct, complete evidence helper — it does consult the batch. The blockers/majors are precisely the readers that should have reused it but didn't.
Refuted, for awareness: the Python chaos test and bench/ADR-artifact docs were flagged and cleared — rescued completions route through the slow path (materialized lease → writes done_entries), so those readers are correctly unaffected. The bug surfaces specifically where the fast/receipt path is the completion route (the default for plain receipt-eligible jobs, not for deadline-rescued re-claims).

Open questions

Single source of truth for closure evidence — will you factor one shared "is this attempt's terminal evidence durable?" helper used by claim_ready_runtime's attempt-spent battery, receipt_closed_evidence_sql, and the telemetry test SQL? Three hand-maintained copies already diverged (B1, M1). Any reason the claim battery can't reuse receipt_closed_evidence_sql directly?
Prune ordering invariant — is there meant to be an invariant that claim-ring prune must not run before the queue cursor has advanced past a completed lane? It isn't enforced today; if not intended, B1's interleaving is reachable by design.
Bucketing rationale (M2) — was the constant counter_bucket=0 deliberate (collapse contention on one bucket) or a regression from base's mod(job_id,256)? If deliberate, how should rebuild_terminal_counters reconcile, given it re-buckets by job_id?
Draft scope — are the telemetry-test fix and CHANGELOG entry intentionally deferred to ready-for-review, or missed?

Compact receipt-backed completions

d435706

hardbyte force-pushed the codex/compact-receipt-completions branch from adfb235 to d435706 Compare June 14, 2026 05:17

hardbyte added 3 commits June 14, 2026 18:07

Keep aged receipt completions compact

df248b2

Reduce compact completion ledger amplification

fe8985d

Use compact receipt batches as closure evidence

bbf59f7

hardbyte added 2 commits June 14, 2026 20:01

Tighten compact receipt completion evidence

a9aa6b9

Narrow receipt rescue scan windows

1887923

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact receipt-backed completions#352

Compact receipt-backed completions#352
hardbyte wants to merge 6 commits into
codex/maintenance-prune-nowaitfrom
codex/compact-receipt-completions

hardbyte commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

Review skipped

Uh oh!

hardbyte commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hardbyte commented Jun 14, 2026

Summary

Validation

Smoke Benchmarks

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

hardbyte commented Jun 14, 2026

Code Review — Compact receipt-backed completions

Verdict

Blockers

B1 — Claim idempotency battery never consults receipt_completion_batches → duplicate execution of a completed job

B2 — Red CI: "Telemetry OTLP validation" fails deterministically

Major

M1 — Telemetry test running detection misses compact-batch completions (double-counts completed jobs)

M2 — Receipt-path terminal-count deltas use constant counter_bucket=0 while rebuild re-buckets by job_id%256 → decrements clamped away, counter stuck high

Minor

What's solid (verified — no need to re-litigate)

Open questions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

B1 — Claim idempotency battery never consults `receipt_completion_batches` → duplicate execution of a completed job

M1 — Telemetry test `running` detection misses compact-batch completions (double-counts completed jobs)

M2 — Receipt-path terminal-count deltas use constant `counter_bucket=0` while rebuild re-buckets by `job_id%256` → decrements clamped away, counter stuck high