feat(correctness): fan out from one millstone process by webern · Pull Request #1724 · DataDog/saluki

webern · 2026-05-22T15:47:26Z

Summary

Reduce payload timing divergence by running a single millstone process that fans out to both baseline and comparison in correctness tests.

Before this change, the millstone container ran sh -c '... & P1=$! ... & P2=$! ...' to fork two
independent millstones - one per target which is one place where drift may be introduced.

In follow up PRs I intend to extend the solution to include

waiting for a ready signal from both consumers before starting
respecting a wall-clock-time-based boundary pause to avoid the bucket boundary

Key changes:

bin/correctness/millstone: Config accepts targets: as a named map (baseline: /
comparison:) instead of a single target:; TargetSender fans out each generated payload to
all configured sinks; driver and corpus collapsed to a single send loop; errors include the
target name.
bin/correctness/panoramic: Docker (runner.rs) and k8s (k8s.rs) paths each spawn one
millstone container/pod per test instead of two. New shared helpers in correctness/config.rs
(resolve_group_placeholders, millstone_targets_all_sockets, millstone_first_network_port)
substitute the $GROUP placeholder per-target on the host. The resolved YAML is written under
the per-test log_dir (deliberately not mounts_dir, which is overlaid into the agent
containers).
19 test/correctness/*/millstone.yaml migrated mechanically to the targets: shape.

Change Type

Enhancement

How did you test this PR

Locally on macOS against rebuilt correctness-tools and datadog-agent images at this commit:

Full suite (-d test/integration/cases -d test/correctness) parallel (-p 4 default): 51/52
pass. Single failure: dsd-origin-detection-matrix/unified-high-cardinality — window-edge
divergence, the residual window boundary failure class that subsequent PRs intend to target.
Full suite sequential (-p 1): 52/52 pass. This is slow! Parallelization is a good idea if we can fix it.

Compare to the pre-fanout parallel baseline of 50/52 with all four dsd-origin-detection-matrix/*
variants plus dsd-service-checks failing under value-divergence.

No leaked airlock resources after either run. I did see leaked airlock resources on an aborted run, and that is something I also intend to work on downstream.

References

Not directly, but documenting my ongoing sensitivity to local flakes:

Millstone now holds multiple sinks per process and writes the same payload bytes to all configured targets. Eliminates per-payload divergence between Agent (baseline) and Agent+ADP (comparison) by construction. - millstone: `targets:` named map replaces single `target:`; in-process fan-out via TargetSender; fail-fast with named-target errors. - panoramic: single millstone process per test for both Docker and k8s paths; shared helpers for $GROUP placeholder resolution and socket enumeration. - 19 `test/correctness/*/millstone.yaml` migrated to `targets:` shape. - 4 new millstone unit tests + 6 new panoramic unit tests.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c353efc6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pr-commenter · 2026-05-22T15:56:34Z

Binary Size Analysis (Agent Data Plane)

Baseline: f907c91 · Comparison: 5e9ddc9 · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 37.68 MiB (baseline) vs 37.68 MiB (comparison)
Size Change: +0 B (+0.00%)

✅ Binary size difference within threshold

Changes by Module

Module	File Size	Symbols
`anon.e23c78aa09c99bb915937a91f6b5f237.1.llvm.5513365544103422328`	+129 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.1.llvm.6491998991054823396`	-129 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.4.llvm.5513365544103422328`	+114 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.4.llvm.6491998991054823396`	-114 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.3.llvm.5513365544103422328`	+108 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.3.llvm.6491998991054823396`	-108 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.0.llvm.5513365544103422328`	+96 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.0.llvm.6491998991054823396`	-96 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.2.llvm.5513365544103422328`	+94 B	1
`anon.e23c78aa09c99bb915937a91f6b5f237.2.llvm.6491998991054823396`	-94 B	1

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW]    +129  [NEW]     +40    anon.e23c78aa09c99bb915937a91f6b5f237.1.llvm.5513365544103422328
  [NEW]    +114  [NEW]     +25    anon.e23c78aa09c99bb915937a91f6b5f237.4.llvm.5513365544103422328
  [NEW]    +108  [NEW]     +19    anon.e23c78aa09c99bb915937a91f6b5f237.3.llvm.5513365544103422328
  [NEW]     +96  [NEW]      +7    anon.e23c78aa09c99bb915937a91f6b5f237.0.llvm.5513365544103422328
  [NEW]     +94  [NEW]      +5    anon.e23c78aa09c99bb915937a91f6b5f237.2.llvm.5513365544103422328
  [DEL]     -94  [DEL]      -5    anon.e23c78aa09c99bb915937a91f6b5f237.2.llvm.6491998991054823396
  [DEL]     -96  [DEL]      -7    anon.e23c78aa09c99bb915937a91f6b5f237.0.llvm.6491998991054823396
  [DEL]    -108  [DEL]     -19    anon.e23c78aa09c99bb915937a91f6b5f237.3.llvm.6491998991054823396
  [DEL]    -114  [DEL]     -25    anon.e23c78aa09c99bb915937a91f6b5f237.4.llvm.6491998991054823396
  [DEL]    -129  [DEL]     -40    anon.e23c78aa09c99bb915937a91f6b5f237.1.llvm.6491998991054823396
  [ = ]       0  [ = ]       0    TOTAL

pr-commenter · 2026-05-22T16:11:23Z

Regression Detector (Agent Data Plane)

Run ID: a565584f-e51b-42ea-835d-e64c78706513
Baseline: f907c912 · Comparison: 5e9ddc93 · diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment (35)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment	goal	Δ mean %	links
otlp_ingest_metrics_5mb_memory	memory	⚪ +1.38	metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_cpu (erratic)	cpu	⚪ +0.96	metrics profiles logs
dsd_uds_1mb_3k_contexts_cpu (erratic)	cpu	⚪ +0.86	metrics profiles logs
otlp_ingest_traces_5mb_memory	memory	⚪ +0.29	metrics profiles logs
otlp_ingest_traces_5mb_cpu (erratic)	cpu	⚪ +0.26	metrics profiles logs
otlp_ingest_logs_5mb_memory (ignored)	memory	⚪ +0.25	metrics profiles logs
dsd_uds_512kb_3k_contexts_memory	memory	⚪ +0.25	metrics profiles logs
otlp_ingest_logs_5mb_cpu (ignored)	cpu	⚪ +0.18	metrics profiles logs
quality_gates_rss_dsd_heavy	memory	⚪ +0.13	metrics profiles logs
quality_gates_rss_dsd_medium	memory	⚪ +0.11	metrics profiles logs
dsd_uds_500mb_3k_contexts_throughput	throughput	⚪ -0.08	metrics profiles logs
dsd_uds_10mb_3k_contexts_memory	memory	⚪ +0.06	metrics profiles logs
otlp_ingest_traces_5mb_throughput	throughput	⚪ -0.04	metrics profiles logs
dsd_uds_500mb_3k_contexts_memory	memory	⚪ +0.03	metrics profiles logs
dsd_uds_100mb_3k_contexts_memory	memory	⚪ +0.03	metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_throughput	throughput	⚪ -0.01	metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_throughput	throughput	⚪ -0.01	metrics profiles logs
otlp_ingest_logs_5mb_throughput (ignored)	throughput	⚪ -0.01	metrics profiles logs
otlp_ingest_metrics_5mb_throughput	throughput	⚪ -0.00	metrics profiles logs
dsd_uds_1mb_3k_contexts_throughput	throughput	⚪ -0.00	metrics profiles logs
dsd_uds_10mb_3k_contexts_throughput	throughput	⚪ +0.00	metrics profiles logs
dsd_uds_512kb_3k_contexts_throughput	throughput	⚪ +0.00	metrics profiles logs
quality_gates_rss_dsd_ultraheavy	memory	⚪ -0.01	metrics profiles logs
dsd_uds_100mb_3k_contexts_throughput	throughput	⚪ +0.01	metrics profiles logs
dsd_uds_1mb_3k_contexts_memory	memory	⚪ -0.02	metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_memory	memory	⚪ -0.06	metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_memory	memory	⚪ -0.11	metrics profiles logs
quality_gates_rss_dsd_low	memory	⚪ -0.14	metrics profiles logs
quality_gates_rss_idle	memory	⚪ -0.15	metrics profiles logs
dsd_uds_500mb_3k_contexts_cpu (erratic)	cpu	⚪ -0.40	metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_cpu (erratic)	cpu	⚪ -0.61	metrics profiles logs
dsd_uds_100mb_3k_contexts_cpu (erratic)	cpu	⚪ -1.43	metrics profiles logs
dsd_uds_512kb_3k_contexts_cpu (erratic)	cpu	⚪ -2.34	metrics profiles logs
otlp_ingest_metrics_5mb_cpu (erratic)	cpu	⚪ -2.59	metrics profiles logs
dsd_uds_10mb_3k_contexts_cpu (erratic)	cpu	⚪ -4.10	metrics profiles logs

Bounds Checks: ✅ Passed (5)

experiment	check	replicates	observed	links
quality_gates_rss_dsd_heavy	memory_usage	10/10	✅ 119 MiB ≤ 140 MiB	metrics profiles logs
quality_gates_rss_dsd_low	memory_usage	10/10	✅ 40.1 MiB ≤ 50 MiB	metrics profiles logs
quality_gates_rss_dsd_medium	memory_usage	10/10	✅ 60.2 MiB ≤ 75 MiB	metrics profiles logs
quality_gates_rss_dsd_ultraheavy	memory_usage	10/10	✅ 179 MiB ≤ 200 MiB	metrics profiles logs
quality_gates_rss_idle	memory_usage	10/10	✅ 26.8 MiB ≤ 40 MiB	metrics profiles logs

Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

webern requested a review from a team as a code owner May 22, 2026 15:47

dd-octo-sts Bot added the area/test All things testing: unit/integration, correctness, SMP regression, etc. label May 22, 2026

This comment has been minimized.

Sign in to view

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Comment thread bin/correctness/millstone/src/corpus.rs

Comment thread bin/correctness/panoramic/src/correctness/runner.rs Outdated

fmt

1c0500d

check

18fe0ca

webern commented May 22, 2026

View reviewed changes

webern added 2 commits May 22, 2026 19:19

fixes

2572b9d

fixup

6fc61dc

webern marked this pull request as draft May 22, 2026 17:40

webern changed the title ~~feat(correctness): fan out millstone to multiple sinks~~ feat(correctness): fan out from one millstone process May 22, 2026

fix log dir path issue

5e9ddc9

webern marked this pull request as ready for review May 22, 2026 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(correctness): fan out from one millstone process#1724

feat(correctness): fan out from one millstone process#1724
webern wants to merge 6 commits into
mainfrom
matt.briggs/millstone-fan-out

webern commented May 22, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

pr-commenter Bot commented May 22, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

webern commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type

How did you test this PR

References

Uh oh!

This comment has been minimized.

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

pr-commenter Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

✅ Binary size difference within threshold

Uh oh!

pr-commenter Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Optimization Goals: ✅ No significant changes detected

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

webern commented May 22, 2026 •

edited

Loading

pr-commenter Bot commented May 22, 2026 •

edited

Loading

pr-commenter Bot commented May 22, 2026 •

edited

Loading