feat(correctness): fan out from one millstone process#1724
Conversation
Millstone now holds multiple sinks per process and writes the same payload bytes to all configured targets. Eliminates per-payload divergence between Agent (baseline) and Agent+ADP (comparison) by construction. - millstone: `targets:` named map replaces single `target:`; in-process fan-out via TargetSender; fail-fast with named-target errors. - panoramic: single millstone process per test for both Docker and k8s paths; shared helpers for $GROUP placeholder resolution and socket enumeration. - 19 `test/correctness/*/millstone.yaml` migrated to `targets:` shape. - 4 new millstone unit tests + 6 new panoramic unit tests.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c353efc6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Binary Size Analysis (Agent Data Plane)Baseline: f907c91 · Comparison: 5e9ddc9 · diff ✅ Binary size difference within thresholdChanges by Module
Detailed Symbol Changes |
Regression Detector (Agent Data Plane)Run ID: Optimization Goals: ✅ No significant changes detectedFine details of change detection per experiment (35)Experiments configured
Bounds Checks: ✅ Passed (5)
ExplanationA change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression ( |
Summary
Reduce payload timing divergence by running a single millstone process that fans out to both
baselineandcomparisonin correctness tests.Before this change, the millstone container ran
sh -c '... & P1=$! ... & P2=$! ...'to fork twoindependent millstones - one per target which is one place where drift may be introduced.
In follow up PRs I intend to extend the solution to include
Key changes:
bin/correctness/millstone:Configacceptstargets:as a named map (baseline:/comparison:) instead of a singletarget:;TargetSenderfans out each generated payload toall configured sinks; driver and corpus collapsed to a single send loop; errors include the
target name.
bin/correctness/panoramic: Docker (runner.rs) and k8s (k8s.rs) paths each spawn onemillstone container/pod per test instead of two. New shared helpers in
correctness/config.rs(
resolve_group_placeholders,millstone_targets_all_sockets,millstone_first_network_port)substitute the
$GROUPplaceholder per-target on the host. The resolved YAML is written underthe per-test
log_dir(deliberately notmounts_dir, which is overlaid into the agentcontainers).
test/correctness/*/millstone.yamlmigrated mechanically to thetargets:shape.Change Type
How did you test this PR
Locally on macOS against rebuilt
correctness-toolsanddatadog-agentimages at this commit:-d test/integration/cases -d test/correctness) parallel (-p 4default): 51/52pass. Single failure:
dsd-origin-detection-matrix/unified-high-cardinality— window-edgedivergence, the residual window boundary failure class that subsequent PRs intend to target.
-p 1): 52/52 pass. This is slow! Parallelization is a good idea if we can fix it.Compare to the pre-fanout parallel baseline of 50/52 with all four
dsd-origin-detection-matrix/*variants plus
dsd-service-checksfailing under value-divergence.No leaked airlock resources after either run. I did see leaked airlock resources on an aborted run, and that is something I also intend to work on downstream.
References
Not directly, but documenting my ongoing sensitivity to local flakes:
dsd-mapper-blocklistfails locally. #1579