Skip to content

Pr poc parallel workload antithesis#36536

Draft
def- wants to merge 37 commits into
MaterializeInc:mainfrom
def-:pr-poc-parallel-workload-antithesis
Draft

Pr poc parallel workload antithesis#36536
def- wants to merge 37 commits into
MaterializeInc:mainfrom
def-:pr-poc-parallel-workload-antithesis

Conversation

@def-
Copy link
Copy Markdown
Contributor

@def- def- commented May 13, 2026

Forked from #36536

mitchwagner-antithesis and others added 30 commits May 11, 2026 13:25
…older .env

mzbuild's _build_locked runs `git clean -ffdX <image_path>` before each
build, which wipes any gitignored file in the build context — including
the .env we generate. Two fixes:

1. publish:false on antithesis-config so the standard ci.test.build flow
   skips it entirely on regular nightly builds (where .env never exists).
   Only build-antithesis.sh / push-antithesis.py builds this image, and
   they write .env first.

2. Commit a placeholder .env so the file is tracked (survives git clean)
   and participates in mzbuild's fingerprint computation. build-antithesis.sh
   overwrites it with real registry refs before the build runs;
   fingerprint reflects the overwritten content per build.
Add 16 Antithesis properties for Kafka source ingestion (NONE + UPSERT
envelopes) to the scratchbook, plus the workload-side implementation of
upsert-key-reflects-latest-value.

Scratchbook additions:
  - sut-analysis Appendix A: kafka source pipeline detail
  - existing-assertions: enumerated SUT-side panic/assert sites that are
    candidates for Antithesis SDK instrumentation
  - property-catalog Category 7: 16 new Kafka/UPSERT properties
  - property-relationships clusters 7-10 plus cross-cluster connections
  - 16 per-property evidence files
  - evaluation/synthesis.md: four-lens review

Workload:
  - parallel_driver_upsert_latest_value.py: produces upserts+tombstones
    with deterministic randomness, requests a quiet period, polls
    mz_source_statistics for catchup, and asserts per-key value match
    (two always() assertions + one sometimes() liveness anchor).
  - helper_pg / helper_kafka / helper_quiet / helper_random /
    helper_source_stats / helper_upsert_source: shared utilities for
    subsequent Kafka source properties.
… catalog-recovery-consistency workload driver
…imeouts; remove dead upsert.rs (classic) antithesis asserts
…are RocksDB lock

When I added clusterd2 in 4366c9e, both clusterds inherited the
DEFAULT_MZ_VOLUMES list, which uses a single named volume scratch:/scratch.
Docker named volumes are shared across containers by name, so the two
clusterds mounted the same /scratch and contended for RocksDB locks at
/scratch/storage/upsert/<id>/<worker>/LOCK.

This wedged clusterd1: it could never open its upsert RocksDB
("Resource temporarily unavailable" on the LOCK file), entered
Stalled health with "Failed to rehydrate state", broadcast
suspend-and-restart, and looped retry-fail-suspend-restart for the
entire run. The continuous restart loop drove the upsert
feedback-driven snapshot replay path in ways that produced visibly
wrong durable state for the source — exactly the
upsert-state-rehydrates-correctly assertions caught in the
2026-05-12 05:39 UTC Antithesis report.

Fix: give each clusterd its own per-instance named volume for /scratch.
The other volumes stay shared because they don't take exclusive locks.

Also patch export-compose.py to auto-declare any service-referenced
named volume at the top level — Composition only auto-declares
DEFAULT_MZ_VOLUMES, so without this the custom names broke
`docker compose config`.
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your submission! We really appreciate it. Like many source-available projects, we require that you sign our Contributor License Agreement (CLA) before we can accept your contribution.

You can sign the CLA by posting a comment with the message below.


I have read the Contributor License Agreement (CLA) and I hereby sign the CLA.


3 out of 4 committers have signed the CLA.
✅ (DAlperin)[https://github.com/DAlperin]
✅ (patrickwwbutler)[https://github.com/patrickwwbutler]
✅ (def-)[https://github.com/def-]
@mitchwagner-antithesis
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@def- def- force-pushed the pr-poc-parallel-workload-antithesis branch from e795981 to 9727324 Compare May 13, 2026 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants