Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
4170e43
feat: working local antithesis build
mitchwagner-antithesis May 6, 2026
127f67e
feat: tweaks for basic_test
mitchwagner-antithesis May 6, 2026
8323b8a
feat: working instrumentation
mitchwagner-antithesis May 7, 2026
2d9ae67
test/antithesis: consolidate antithesis/ into test/antithesis/
DAlperin May 11, 2026
40100e8
test/antithesis: add antithesis-config mzbuild image (FROM scratch + …
DAlperin May 11, 2026
92569ac
test/antithesis: add copyright headers
DAlperin May 11, 2026
0a7d801
test/antithesis: rewrite export-compose.py to use mzbuild specs
DAlperin May 11, 2026
d7cc3c4
test/antithesis: strip Makefile to mzbuild-driven build (drop registr…
DAlperin May 11, 2026
106c9a9
ci: nightly antithesis builds via CI_ANTITHESIS env passthrough
DAlperin May 11, 2026
e021460
ci: lint check that test/antithesis compose YAML matches mzcompose.py
DAlperin May 11, 2026
ff5c6d7
ci: drop branches:main on build-x86_64-antithesis (validating)
DAlperin May 11, 2026
0f59e7d
test/antithesis: switch to Kafka stack + external clusterd
DAlperin May 11, 2026
359402a
ci: regenerate antithesis compose YAML before build (avoid stale fing…
DAlperin May 11, 2026
2cfa6a3
test/antithesis: parameterize compose via .env (no more baked-in fing…
DAlperin May 11, 2026
d4373eb
ci: distinct ANTITHESIS_GCP_SERVICE_ACCOUNT_JSON for Antithesis regis…
DAlperin May 11, 2026
3278bda
test/antithesis: mark antithesis-config publish:false + commit placeh…
DAlperin May 11, 2026
007c7af
test/antithesis: pass Arch enum to Repository, not string
DAlperin May 11, 2026
8e459cd
test/antithesis: kafka source property catalog + first workload property
DAlperin May 11, 2026
7033cce
src/storage: wrap kafka source + upsert panic sites with antithesis-s…
DAlperin May 11, 2026
12f2c79
test/antithesis: implement kafka-source-no-data-loss + kafka-source-n…
DAlperin May 11, 2026
fd6722e
test/antithesis: implement frontier-monotonic, tombstone-removes-key,…
DAlperin May 11, 2026
bb02873
ci: scope CI_ANTITHESIS build to materialized + antithesis-{workload,…
DAlperin May 11, 2026
0a1fa97
test/antithesis: pre-create kafka topics before CREATE SOURCE
DAlperin May 12, 2026
624149c
test/antithesis: tolerate orphan _progress collision + add upsert-v2 …
DAlperin May 12, 2026
520f908
test/antithesis: add four workload drivers + reclock SUT anchor for c…
DAlperin May 12, 2026
7c026ca
test/antithesis: persist-cas-monotonicity SUT anchor + strict-seriali…
DAlperin May 12, 2026
06d90fb
test/antithesis: catalog cluster — partial epoch-fencing SUT anchor +…
DAlperin May 12, 2026
3b9bac5
test/antithesis: drop unfireable rehydration anchor; bump pg client t…
DAlperin May 12, 2026
4366c9e
test/antithesis: add second clusterd replica to antithesis_cluster fo…
DAlperin May 12, 2026
46664f8
test/antithesis: per-clusterd scratch volume so two replicas don't sh…
DAlperin May 12, 2026
e98f3dc
test/antithesis: add workload for mysql multithreaded replication chain
patrickwwbutler May 12, 2026
8dedd7b
test/antithesis: clusterd workers=4 per replica to exercise multi-wor…
DAlperin May 12, 2026
d56e33a
test/antithesis: drop --binlog_transaction_dependency_tracking; remov…
DAlperin May 12, 2026
445f452
test/antithesis: drop --scratch-directory from clusterd so upsert Roc…
DAlperin May 12, 2026
492c30a
Not what we want, but what we deserve?
def- May 13, 2026
9727324
approach #2
def- May 13, 2026
d72fc00
try to fix logging
def- May 13, 2026
bd5fbc4
test/antithesis: helper_pg.query_retry — opt-in real_time_recency kwarg
DAlperin May 13, 2026
312537f
test/antithesis: drivers use real_time_recency for queryability gate
DAlperin May 13, 2026
d43144d
test/antithesis: parallel_driver_parallel_workload setup phase tolera…
DAlperin May 13, 2026
26a70ce
test/antithesis: _replica_non_online queries history table, not curre…
DAlperin May 13, 2026
adaed90
parallel_workload: pool-backed mode with seed-scoped names and extern…
DAlperin May 13, 2026
19db537
test/antithesis: add configurable clusterd pool for parallel-workload
DAlperin May 13, 2026
550f6f6
test/antithesis: parallel-workload driver runs on per-invocation pool…
DAlperin May 13, 2026
008830b
test/antithesis/scratchbook: per-cluster fault isolation for parallel…
DAlperin May 13, 2026
84bdebe
parallel_workload: pool mode provisions one cluster with N replicas, …
DAlperin May 13, 2026
bb766ee
test/antithesis: tolerate Antithesis fault-injection errors in parall…
DAlperin May 13, 2026
ff76a27
test/antithesis: pivot pool design to permanent pool clusters
DAlperin May 14, 2026
820f76a
test/antithesis: add upsert-ancient-key-writable cross-invocation pro…
DAlperin May 14, 2026
891668b
add assertion for gtid monotonicity violation in mysql
patrickwwbutler May 14, 2026
86d1fbb
test/antithesis: bump clusterd workers to 16 and shrink pool to 2
DAlperin May 14, 2026
d0aa7fb
test/antithesis: add MyISAM cdc table to mysql workload
DAlperin May 14, 2026
8060eb2
test/antithesis: per-service container_name + hostname + explicit bri…
DAlperin May 14, 2026
7d5aa56
test/antithesis: gate service_started depends_on on healthcheck when …
DAlperin May 14, 2026
54fdf00
test/antithesis: route every workload draw through Antithesis SDK
DAlperin May 14, 2026
63e1074
test/antithesis: swarm tombstone / drop probabilities per invocation
DAlperin May 14, 2026
6479ea8
test/antithesis: move quiet/active windows to a global fault-orchestr…
DAlperin May 14, 2026
cc65e8e
test/antithesis: bump connect/retry timeouts to span fault-orchestrat…
DAlperin May 14, 2026
bb7c5cb
test/antithesis: fault-orchestrator: bash -s -> bash -c so script act…
DAlperin May 14, 2026
81413cf
test/antithesis: lifecycle logging + per-invocation correlation IDs
DAlperin May 14, 2026
d3962c5
test/antithesis: helper_pg: retry server-side InternalError from brok…
DAlperin May 15, 2026
714e252
test/antithesis: revert clusterd workers back to 4 (bisection)
DAlperin May 15, 2026
3c50003
test/antithesis: add Postgres CDC driver + testdrive-runner singleton
DAlperin May 15, 2026
4cab4f1
test/antithesis: local-dev (non-antithesis) build/up via make build-l…
DAlperin May 15, 2026
7df5d96
test/antithesis: drivers targeting SinceViolation bug family (#11200 …
DAlperin May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,7 @@ ahash = { version = "0.8.12", default-features = false }
aho-corasick = "1.1.4"
allocation-counter = "0"
anyhow = "1.0.102"
antithesis_sdk = "0.2.8"
array-concat = "0.5.5"
arrayvec = "0.7.6"
arrow = { version = "57", default-features = false }
Expand Down
48 changes: 35 additions & 13 deletions bin/ci-builder
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ set -euo pipefail

NIGHTLY_RUST_DATE=2026-05-06

# Allow overriding the container runtime (e.g. MZ_DEV_CI_BUILDER_RUNTIME=podman).
DOCKER="${MZ_DEV_CI_BUILDER_RUNTIME:-docker}"

workdir=$(pwd)
cd "$(dirname "$0")/.."

Expand Down Expand Up @@ -128,10 +131,14 @@ gid=$(id -g)
[[ "$gid" -lt 500 ]] && gid=$uid

build() {
local cache_args=()
if [[ "$DOCKER" != "podman" ]]; then
cache_args+=(--cache-from=materialize/ci-builder:"$cache_tag")
cache_args+=(--cache-to=type=inline,mode=max)
fi
# shellcheck disable=SC2086 # intentional splitting of build args string
docker buildx build --pull \
--cache-from=materialize/ci-builder:"$cache_tag" \
--cache-to=type=inline,mode=max \
"$DOCKER" buildx build --pull \
"${cache_args[@]}" \
$docker_build_args \
--tag materialize/ci-builder:"$tag" \
--tag ghcr.io/materializeinc/materialize/ci-builder:"$tag" \
Expand Down Expand Up @@ -181,13 +188,13 @@ case "$cmd" in
build "$@"
;;
exists)
docker manifest inspect "$image_registry"/ci-builder:"$tag" &> /dev/null
"$DOCKER" manifest inspect "$image_registry"/ci-builder:"$tag" &> /dev/null
;;
tag)
echo "$tag"
;;
push)
docker login ghcr.io -u materialize-bot --password "$GITHUB_GHCR_TOKEN"
"$DOCKER" login ghcr.io -u materialize-bot --password "$GITHUB_GHCR_TOKEN"
build --push "$@"
;;
run)
Expand Down Expand Up @@ -274,6 +281,7 @@ case "$cmd" in
--env AZURE_SERVICE_ACCOUNT_PASSWORD
--env AZURE_SERVICE_ACCOUNT_TENANT
--env GCP_SERVICE_ACCOUNT_JSON
--env ANTITHESIS_GCP_SERVICE_ACCOUNT_JSON
--env GITHUB_TOKEN
--env GITHUB_GHCR_TOKEN
--env GPG_KEY
Expand Down Expand Up @@ -372,20 +380,26 @@ case "$cmd" in
)
fi
if [[ "$(uname -s)" = Linux ]]; then
args+=(
--user "$(id -u):$(stat -c %g /var/run/docker.sock)"
)
if [[ "${MZ_DEV_CI_BUILDER_RUNTIME:-docker}" == "podman" ]]; then
args+=(--userns=keep-id)
else
args+=(
--user "$(id -u):$(stat -c %g /var/run/docker.sock)"
)
fi

if [[ $secrets == "true" ]]; then
# Allow Docker-in-Docker by mounting the Docker socket in the
# container. Host networking allows us to see ports created by
# containers that we launch.
args+=(
--volume "/var/run/docker.sock:/var/run/docker.sock"
--network host
--env "DOCKER_TLS_VERIFY=${DOCKER_TLS_VERIFY-}"
--env "DOCKER_HOST=${DOCKER_HOST-}"
)
if [[ -S /var/run/docker.sock ]]; then
args+=(--volume "/var/run/docker.sock:/var/run/docker.sock")
fi

# Forward Docker configuration too, if available.
docker_dir=${DOCKER_CONFIG:-$HOME/.docker}
Expand Down Expand Up @@ -431,14 +445,22 @@ case "$cmd" in
image="$image_registry/ci-builder:$tag"
# Try downloading the image a few times in case of registry flakiness
if [[ "${CI:-}" ]]; then
if ! docker inspect "$image" > /dev/null 2>&1; then
docker pull "$image" || (sleep 3 && docker pull "$image") || (sleep 3 && docker pull "$image") || sleep 3
if ! "$DOCKER" inspect "$image" > /dev/null 2>&1; then
"$DOCKER" pull "$image" || (sleep 3 && "$DOCKER" pull "$image") || (sleep 3 && "$DOCKER" pull "$image") || sleep 3
fi
fi
docker run "${args[@]}" "$image" eatmydata "${docker_command[@]}"
if [[ "$DOCKER" == "podman" ]]; then
# --userns=keep-id already maps the host UID/GID into the
# container, so autouseradd is unnecessary. Override the
# entrypoint to skip it.
args+=(--entrypoint eatmydata)
"$DOCKER" run "${args[@]}" "$image" "${docker_command[@]}"
else
"$DOCKER" run "${args[@]}" "$image" eatmydata "${docker_command[@]}"
fi
;;
root-shell)
docker exec --interactive --tty --user 0:0 "$(<"$cid_file")" eatmydata ci/builder/root-shell.sh
"$DOCKER" exec --interactive --tty --user 0:0 "$(<"$cid_file")" eatmydata ci/builder/root-shell.sh
;;
*)
printf "unknown command %q\n" "$cmd"
Expand Down
5 changes: 5 additions & 0 deletions ci/builder/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,11 @@ ENV CARGO_HOME=/cargo
RUN mkdir /cargo && chmod 777 /cargo
VOLUME /cargo

# Antithesis coverage instrumentation library (used when --antithesis is passed)
RUN curl -sSL https://antithesis.com/assets/instrumentation/libvoidstar.so \
-o /usr/lib/libvoidstar.so \
&& ldconfig

# Stage 3: Build a lightweight CI Builder image for console/playwright jobs.
FROM ubuntu:noble-20260324 AS ci-builder-console

Expand Down
32 changes: 32 additions & 0 deletions ci/mkpipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,12 @@ def main() -> int:
type=Sanitizer,
choices=Sanitizer,
)
parser.add_argument(
"--antithesis",
action="store_true",
default=ui.env_is_truthy("CI_ANTITHESIS"),
help="enable Antithesis coverage instrumentation",
)
parser.add_argument(
"--priority",
type=int,
Expand Down Expand Up @@ -166,6 +172,7 @@ def get_hashes(arch: Arch) -> tuple[str, bool]:
arch=arch,
coverage=args.coverage,
sanitizer=args.sanitizer,
antithesis=args.antithesis,
)
deps = repo.resolve_dependencies(image for image in repo if image.publish)
check = deps.check()
Expand Down Expand Up @@ -209,6 +216,7 @@ def fetch_hashes() -> None:
args.coverage,
args.sanitizer,
lto,
args.antithesis,
)
trim_ci_glue_exempt_steps(pipeline)
else:
Expand All @@ -218,9 +226,11 @@ def fetch_hashes() -> None:
args.coverage,
args.sanitizer,
lto,
args.antithesis,
)
truncate_skip_length(pipeline)
handle_sanitizer_skip(pipeline, args.sanitizer)
handle_antithesis_skip(pipeline, args.antithesis)
increase_agents_timeouts(pipeline, args.sanitizer, args.coverage)
prioritize_pipeline(pipeline, args.priority)
switch_jobs_to_aws(pipeline, args.priority)
Expand All @@ -240,6 +250,7 @@ def fetch_hashes() -> None:
args.coverage,
args.sanitizer,
lto,
args.antithesis,
)
add_nightly_deploy_dependency(pipeline, args.pipeline)
remove_dependencies_on_prs(pipeline, args.pipeline, hash_check)
Expand Down Expand Up @@ -328,6 +339,21 @@ def handle_sanitizer_skip(pipeline: Any, sanitizer: Sanitizer) -> None:
step["skip"] = True


def handle_antithesis_skip(pipeline: Any, antithesis: bool) -> None:
if antithesis:
pipeline.setdefault("env", {})["CI_ANTITHESIS"] = "1"

for step in steps(pipeline):
if step.get("antithesis") == "skip":
step["skip"] = True

else:

for step in steps(pipeline):
if step.get("antithesis") == "only":
step["skip"] = True


def increase_agents_timeouts(
pipeline: Any, sanitizer: Sanitizer, coverage: bool
) -> None:
Expand Down Expand Up @@ -711,6 +737,7 @@ def trim_tests_pipeline(
coverage: bool,
sanitizer: Sanitizer,
lto: bool,
antithesis: bool = False,
) -> None:
"""Trim pipeline steps whose inputs have not changed in this branch.

Expand All @@ -731,6 +758,7 @@ def trim_tests_pipeline(
profile=mzbuild.Profile.RELEASE if lto else mzbuild.Profile.OPTIMIZED,
coverage=coverage,
sanitizer=sanitizer,
antithesis=antithesis,
)
deps = repo.resolve_dependencies(image for image in repo)

Expand Down Expand Up @@ -917,6 +945,7 @@ def add_cargo_test_dependency(
coverage: bool,
sanitizer: Sanitizer,
lto: bool,
antithesis: bool = False,
) -> None:
"""Cargo Test normally doesn't have to wait for the build to complete, but it requires a few images (ubuntu-base, postgres), which are rarely changed. So only add a dependency when those images are not on Dockerhub yet."""
if pipeline_name not in ("test", "nightly"):
Expand All @@ -933,6 +962,7 @@ def add_cargo_test_dependency(
profile=mzbuild.Profile.RELEASE if lto else mzbuild.Profile.OPTIMIZED,
coverage=coverage,
sanitizer=sanitizer,
antithesis=antithesis,
)
composition = Composition(repo, name="cargo-test")
deps = composition.dependencies
Expand Down Expand Up @@ -1090,6 +1120,8 @@ def remove_mz_specific_keys(pipeline: Any) -> None:
del step["coverage"]
if "sanitizer" in step:
del step["sanitizer"]
if "antithesis" in step:
del step["antithesis"]
if "ci_glue_exempt" in step:
del step["ci_glue_exempt"]
if (
Expand Down
23 changes: 23 additions & 0 deletions ci/nightly/pipeline.template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,29 @@ steps:
branches: "main"
skip: "currently broken"

- id: build-x86_64-antithesis
label: ":rust: Build x86_64 (Antithesis)"
# Regenerate the antithesis compose YAML before building so the
# `antithesis-config` image's fingerprint captures the same
# materialized fingerprint we're about to publish — otherwise
# Antithesis would try to pull a stale `materialized:mzbuild-…`
# whenever the committed YAML lagged behind source changes.
command: bin/ci-builder run stable ci/test/build-antithesis.sh
inputs:
- "*"
depends_on: []
timeout_in_minutes: 90
agents:
queue: l-builder-linux-x86_64
env:
CI_ANTITHESIS: "1"
# Antithesis-flavored images get distinct mzbuild fingerprints, so
# they coexist with regular GHCR tags. The build is x86_64-only —
# Antithesis runs amd64 sandboxes.
sanitizer: skip
coverage: skip
antithesis: skip

- id: build-rust-latest-beta
label: "Build with Latest Rust Beta"
command: bin/ci-builder run stable ci/test/rust-beta-build.sh
Expand Down
Loading
Loading