Skip to content

Compact receipt-backed completions#352

Draft
hardbyte wants to merge 6 commits into
codex/maintenance-prune-nowaitfrom
codex/compact-receipt-completions
Draft

Compact receipt-backed completions#352
hardbyte wants to merge 6 commits into
codex/maintenance-prune-nowaitfrom
codex/compact-receipt-completions

Conversation

@hardbyte

Copy link
Copy Markdown
Owner

Summary

This PR changes the queue-storage successful receipt completion path from one terminal row per job to compact append-only receipt completion batches. Successful short-job completions now write:

  • one explicit lease_claim_closures row per claim for durable close evidence
  • one compact receipt_completion_batches row per completion batch
  • append-only terminal count deltas
  • no done_entries rows on the successful receipt fast path

The terminal_jobs view expands compact batches through retained ready_entries, and SQL compatibility deletes write receipt_completion_tombstones instead of mutating the batch rows. Queue prune now truncates the matching receipt batch/tombstone and terminal-delta children with the ready/done children.

The PR also tightens the claim path after the first benchmark pass exposed a planner problem. The earlier head-evidence query could be planned as hashed subplans over retained receipt/ready history. This version uses short-circuited PL/pgSQL point probes for the head row, avoiding the retained-history scan while preserving the lost post-commit cursor-advance recovery path.

Validation

Local checks run:

  • cargo fmt --all -- --check
  • git diff --check
  • cargo check --workspace --all-targets
  • DATABASE_URL=postgres://postgres:test@localhost:15432/awa_test cargo test --package awa --test queue_storage_runtime_test -- --nocapture (101 passed)
  • DATABASE_URL=postgres://postgres:test@localhost:15432/awa_test cargo test --package awa --test migration_test -- --nocapture (43 passed)
  • ./correctness/run-tlc.sh storage/AwaSegmentedStorage.tla
  • ./correctness/run-tlc.sh storage/AwaSegmentedStorage.tla storage/AwaSegmentedStorageInterleavings.cfg
  • ./correctness/run-tlc.sh storage/AwaDeadTupleContract.tla
  • ./correctness/run-tlc.sh storage/AwaStorageLockOrder.tla
  • ./correctness/run-tlc.sh storage/AwaStorageTransition.tla
  • ./correctness/run-tlc.sh storage/AwaShardedPrune.tla

No shared-library files are tracked by git.

Smoke Benchmarks

Short local 5k/s offered-load smokes used the benchmarking repo against this dirty Awa checkout before commit. Postgres was the harness default postgres:18.3-alpine with the default 4 CPU container limit.

Shape Run Median completion Median enqueue Median depth Dead tuples WAL bytes Claim rows/call
64 workers, 1 claimer custom-20260614T045536Z-40edc6 2,558/s 5,003/s 126k 105 median / 146 peak 368 MB ~32
64 workers, 2 claimers custom-20260614T045924Z-1a396f 2,694/s 5,003/s 103k 110 median / 170 peak 348 MB many empty claim calls; productive calls ~31-32
128 workers, 1 claimer custom-20260614T050110Z-454f38 5,028/s 5,003/s 24k 97 median / 137 peak 512 MB ~64

The 128-worker run is the first clean local 5k/s evidence for the compact path. It used zero done_entries; successful terminal state came through receipt_completion_batches. The named residual mechanism is permit-granularity claim amortization: with one claimer, the dispatcher only claims as many worker permits as have freed when it wakes. At 64 workers that averaged ~32 jobs per claim call; at 128 workers it averaged ~64 and reached the offered load. Extra claimers helped only modestly and introduced more empty claim calls / relation-lock samples.

I also pushed a benchmark harness descriptor fix in hardbyte/postgresql-job-queue-benchmarking (f082bfc) so future runs track receipt_completion_batches_*, receipt_completion_tombstones_*, and queue_terminal_count_deltas_* dead tuples directly.

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d7dce70-496d-44ab-8be8-274493314cae

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hardbyte hardbyte force-pushed the codex/compact-receipt-completions branch from adfb235 to d435706 Compare June 14, 2026 05:17
@hardbyte

Copy link
Copy Markdown
Owner Author

Code Review — Compact receipt-backed completions

Reviewed against base codex/maintenance-prune-nowait (git diff origin/codex/maintenance-prune-nowait...HEAD, 2197 insertions / 27 files). Findings below were each adversarially verified against the actual SQL/Rust before inclusion; B1 was reproduced end-to-end on a local v036 substrate.

Verdict

Needs rework — not mergeable yet. The design is structurally sound: the terminal_jobs view expansion is column-faithful, the v036/v023 migration is additive-only, and the TLA+ lock-order/prune edits hold against the SQL. But the fast path moves the durable "this attempt finished" evidence from done_entries to receipt_completion_batches, and not every reader of the old evidence was taught about the new one — which is where the blockers live. CI is also red on the only failing check ("Telemetry OTLP validation"), for the same class of gap.


Blockers

B1 — Claim idempotency battery never consults receipt_completion_batches → duplicate execution of a completed job

claim_ready_runtime's attempt-spent probes (awa-model/migrations/v023_install_queue_storage_substrate.sql:2089-2176 head probe, :2203-2252 and :2328-2396 both selected_with_spent CTEs) check ready_tombstones, lease_claims, lease_claim_closures, leases, done_entries, deferred_jobs, dlq_entries, ready_entries at run_lease+1 — but never receipt_completion_batches (0 occurrences in the installed function via pg_get_functiondef, vs 3 for done_entries).

The successful fast path (complete_receipt_runtime_batch_fast, awa-model/src/queue_storage.rs:5459, batch insert :5582) writes only a receipt_completion_batches row plus a terminal-count delta — no done_entries, no lease_claim_closures, and it leaves the lease_claims row in place. That lease_claims row lives in the claim ring, pruned independently of the queue ring. prune_oldest_claims (:12835) treats the claim as closed via receipt_closed_evidence_sql (:489-518), which does count the batch, so it truncates lease_claims_<slot>/lease_claim_closures_<slot>. Meanwhile the queue-prune pending guard (:12290-12320) retains the ready row while the best-effort post-commit advance_claim_cursors (:3378-3406) is lost. Net: the only spent-evidence the claim guard can see is gone; the only evidence that remains (the batch) is invisible to it.

Concrete interleaving (reproduced live): claim job 9001 (run_lease 1) → fast-complete (batch only; done_rows=0 closure_rows=0) → crash before cursor advance → claim-ring prune truncates lease_claims → queue-prune skips the ready row (cursor lag) → claim_ready_runtime re-claims job 9001 at run_lease 1 (same idempotency key = same attempt, executed twice). Control (leave lease_claims in place) returns 0 rows, isolating the cause to the missing batch probe. The base branch wrote done_entries on this path, which the head-spent probe does check — so base recovered correctly and this PR regresses it.

⚠️ The existing test test_queue_storage_receipt_claim_dedupes_when_post_commit_cursor_advance_is_lost (awa/tests/queue_storage_runtime_test.rs:5245-5294) masks this — it seeds a done_entries row a real compact completion never writes, so the done_entries probe matches and the bug is hidden. It must be reworked, not relied on.

Fix: add a receipt_completion_batches EXISTS probe to all three attempt-spent sites, mirroring the receipt-batch arm already in receipt_closed_evidence_sql (:499-518) — ideally by factoring a single shared evidence helper so the claim battery and receipt_closed_evidence_sql stay in lockstep. Alternatively, gate prune_oldest_claims so a claim partition can't be truncated until the lane's claim cursor has advanced past the completed lane. Add a regression test that fast-completes, drops only lease_claims/closures (leaving the batch + ready row), and asserts claim_ready_runtime returns no rows.

B2 — Red CI: "Telemetry OTLP validation" fails deterministically

The only failing check, caused by M1 below (test_collector_death_does_not_block_job_processing times out, panic at telemetry_test.rs:682, exit 101). Must be green to merge.


Major

M1 — Telemetry test running detection misses compact-batch completions (double-counts completed jobs)

awa/tests/telemetry_test.rs:262-266 (queue_state_breakdown) and :188-192 (queue_job_count) count each lease_claims row as running unless one of five NOT EXISTS arms fires (lease_claim_closures, leases, deferred_jobs, done_entries, dlq_entries). The fast path writes none of those, so completed jobs stay running forever — and the terminal arm (correctly updated to read terminal_jobs) also counts them, so each completed job is counted twice. Reproduced: Timed out waiting for queue ... to drain; live_jobs=N, breakdown=[("running", N)].

Production closure-idempotence uses receipt_closed_evidence_sql and is unaffected (no data loss) — this is a test/CI defect, but it is the red check. Fix: add a sixth NOT EXISTS arm to both running-detection blocks excluding lease_claims rows present in receipt_completion_batches, mirroring receipt_closed_evidence_sql — better, have the test reuse that helper rather than maintain a parallel exclusion list that has now drifted.

M2 — Receipt-path terminal-count deltas use constant counter_bucket=0 while rebuild re-buckets by job_id%256 → decrements clamped away, counter stuck high

The fast path writes its +1 delta at hardcoded counter_bucket=0 (awa-model/src/queue_storage.rs:5680), and the v036 compat-delete writes its -1 at 0 (awa-model/migrations/v036_compact_receipt_completions.sql:524). But rebuild_terminal_counters (queue_storage.rs:11737-11744) re-derives each row's bucket as mod(job_id,256), and the rollup fold (:12085) does per-bucket GREATEST(0, live + delta) with INSERT gated on delta > 0. Base used mod(job_id,256) here; the PR changed it to a constant.

Interleaving: (1) fast-complete job 1000 → +1@bucket0; (2) operator runs rebuild_terminal_counters (the documented drift-recovery op) → places job at bucket232, wipes deltas; (3) compat-delete of 1000 → -1@bucket0; (4) rollup folds bucket0: UPDATE matches no live row (it's at 232), INSERT skipped (delta<0) — the -1 is truncated away. Net: counter permanently one too high, violating the documented invariant SUM(live)+SUM(deltas) == count(*) FROM terminal_jobs (:4053-4054), in the very counter rebuild exists to make billing-grade. Fix: bucket the receipt-path deltas by job_id (mod(mod(job_id,256)+256,256)::smallint) at both sites and GROUP BY accordingly.


Minor

  • correctness/storage/MAPPING.md:28 (echoed at :93) — prose claims the compact success path writes a 'completed' closure row; it doesn't (closure is derived from the batch via receipt_closed_evidence_sql). Contradicts the doc's own FastComplete row at :40. The TLA+ specs are fine — only the prose. A literal 'completed' closure row is written only on the slow fallback (complete_runtime_batch_slow_in_tx, :5843-5859).
  • awa-model/src/queue_storage.rs:219 — public QueueCounts::terminal doc says "count(*) FROM done_entries semantics"; now spans both tables. The inline invariant at :4054 already says terminal_jobs; four sibling sites were updated, this public field doc was missed.
  • awa-model/migrations/v036_compact_receipt_completions.sql:507-529 — the receipt-batch delta INSERT passes a 6th arg (p_id) in USING that's never referenced (harmless but misleading). Becomes live if M2 is fixed by bucketing on job_id — coordinate the two.
  • CHANGELOG.md [Unreleased] — no entry. This adds migration v036, two public BusyCounts fields (queue_receipt_completion_batches, queue_receipt_completion_tombstones), two metrics, and changes the meaning of direct SQL against done_entries for completed jobs. Comparable storage changes (Failed terminal rows are prunable with no retention floor; admin retry/DLQ-move silently return empty #337, Introduce partitioned queues #348) got entries; add before merge and call out the done_entries semantic change explicitly.

What's solid (verified — no need to re-litigate)

  • terminal_jobs view expansion — column-by-column faithful: done_entries UNION ALL (receipt_completion_batches expanded via CROSS JOIN LATERAL UNNEST(job_ids, run_leases) WITH ORDINALITY against retained ready_entries); state/run_lease/job_id alignment correct (v023:1356-1431).
  • Migration discipline — v036 net-new and additive-only; v023 substrate changes additive (no drop/rename/retype/tighten).
  • TLA+ fidelityAwaSegmentedStorage.FastComplete/StatefulComplete adding <<j,runLease>> to claimClosed is a faithful abstraction (the batch is durable closure evidence); AwaDeadTupleContract.CompleteReceiptsTx, the prune truncate/lock lists, DeleteCompactReceiptTerminalCompatTx, the PruneReadySegment cursor-guard change, and all AwaStorageLockOrder plan edits match the SQL.
  • receipt_closed_evidence_sql (:489-537) is the correct, complete evidence helper — it does consult the batch. The blockers/majors are precisely the readers that should have reused it but didn't.
  • Refuted, for awareness: the Python chaos test and bench/ADR-artifact docs were flagged and cleared — rescued completions route through the slow path (materialized lease → writes done_entries), so those readers are correctly unaffected. The bug surfaces specifically where the fast/receipt path is the completion route (the default for plain receipt-eligible jobs, not for deadline-rescued re-claims).

Open questions

  1. Single source of truth for closure evidence — will you factor one shared "is this attempt's terminal evidence durable?" helper used by claim_ready_runtime's attempt-spent battery, receipt_closed_evidence_sql, and the telemetry test SQL? Three hand-maintained copies already diverged (B1, M1). Any reason the claim battery can't reuse receipt_closed_evidence_sql directly?
  2. Prune ordering invariant — is there meant to be an invariant that claim-ring prune must not run before the queue cursor has advanced past a completed lane? It isn't enforced today; if not intended, B1's interleaving is reachable by design.
  3. Bucketing rationale (M2) — was the constant counter_bucket=0 deliberate (collapse contention on one bucket) or a regression from base's mod(job_id,256)? If deliberate, how should rebuild_terminal_counters reconcile, given it re-buckets by job_id?
  4. Draft scope — are the telemetry-test fix and CHANGELOG entry intentionally deferred to ready-for-review, or missed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant