[ABLD-364] Select tar compression based on developer vs. release builds.#47895
[ABLD-364] Select tar compression based on developer vs. release builds.#47895
Conversation
- Apply to the existing pkg_tar rules. - Developer builds are not compressed - CI and release builds use high compression (triggerof off --config=release) - Pull in enough of the omnibus logic so all pipelines still work during migration.
Files inventory check summaryFile checks results against ancestor 46622092: Results for datadog-agent_7.78.0~devel.git.649.8e7f189.pipeline.103047739-1_amd64.deb:No change detected |
Static quality checks✅ Please find below the results from static quality gates 31 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 4662209 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +0.99 | [-2.09, +4.07] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +0.99 | [-2.09, +4.07] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.88 | [+0.82, +0.93] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | +0.85 | [+0.78, +0.92] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | +0.43 | [+0.19, +0.67] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | +0.41 | [+0.19, +0.64] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | +0.28 | [+0.11, +0.46] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.19 | [+0.15, +0.22] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.12 | [-0.04, +0.28] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.10 | [-0.33, +0.52] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | +0.08 | [+0.03, +0.14] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | +0.02 | [-0.17, +0.21] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.02 | [-0.37, +0.40] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.10, +0.10] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | -0.01 | [-0.10, +0.08] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.01 | [-0.19, +0.17] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.02 | [-0.08, +0.03] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.06 | [-0.21, +0.10] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | -0.06 | [-0.56, +0.43] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | -0.14 | [-0.25, -0.04] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.17 | [-0.24, -0.09] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.50 | [-0.65, -0.36] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -0.59 | [-0.72, -0.45] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | -0.87 | [-2.43, +0.69] | 1 | Logs bounds checks dashboard |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 708 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 269.86MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 709 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.19GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.23GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.21GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 = 3 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 174.21MiB ≤ 175MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 = 3 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 489.81MiB ≤ 550MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 204.84MiB ≤ 220MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 353.42 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 410.32MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
alopezz
left a comment
There was a problem hiding this comment.
This is mostly fine in structure (and could be harmlessly merged because it's not wired in yet). But I think we need to:
- Make sure we have a common understanding of the current state of things w.r.t. how compression is applied in different scenarios.
- Try to stick to the existing behavior we're replacing, only changing it either before or after converting it to bazel. For example, the compression level of 5 was probably chosen for a reason (probably a tradeoff between speed and size, as the resulting artifacts do get moved around and stored).
On a side note: FORCED_PACKAGE_COMPRESSION_LEVEL was introduced as a way to override the level on 32 bits dpkg as per this comment. We can keep the existing behavior for now but in the not too far future we may be able to remove it.
| # DEPLOY_AGENT is mostly true. It's not clear to me yet if "deploy" means to | ||
| # build the release version, or simply to include the agent binary. It is only | ||
| # set false for installer and integrations_core, so it seems the second, even if | ||
| # the word deploy generally is associated with release. |
There was a problem hiding this comment.
So what do you think is true. We don't just use it for manual and nightly. It is set true as a baseline default for the CI builds in tasks/libs/common/omnibus.py.
The new behavior is to use --config=release in CI (which we have to do for rust in CI) as the trigger for compression. I think this captures the intent of what we need.
A developer won't normally set --config=release, so they won't get compression, but they could if they wanted to do performance testing.
I could make it more fine grained, but I would rather use a specific word than DEPLOY_AGENT.
Actually, we have that, with FORCE_COMPRESSION_LEVEL. I'll change it so that gets used regardless of the release flag, so you can set it however you want. Would that work for you?
There was a problem hiding this comment.
So what do you think is true. We don't just use it for manual and nightly. It is set true as a baseline default for the CI builds in tasks/libs/common/omnibus.py.
I already gave you references to where DEPLOY_AGENT gets set to true, so I mostly already responded to that question. And I don't see anything in https://github.com/DataDog/datadog-agent/blob/main/tasks/libs/common/omnibus.py setting it as a default.
Would that work for you?
As I already said, all I'm suggesting is I'd rather stick to the existing behavior while migrating to Bazel, making changes either before or after. Since this is not getting wired to anything yet, I could certainly live with getting this merged despite that. But the comment here seems to be factually wrong, let alone the "It's not clear to me" (who is me? that's an awkward reference to have in code that is owned by a team, not individuals) – that's what I meant with "have a common understanding of the current state of things".
There was a problem hiding this comment.
I changed it to that. PTAL
There was a problem hiding this comment.
Wrong file. It was tasks/libs/ciproviders/gitlab_api.py
I don't think the existing behavior is the right model. DEPLOY_AGENT in the environment does things which we need for a release style build (compression and maybe more) but the word "deploy" generally means the act of deploying, which is distinct from how you build it.
We are getting near the point where bazel should work without having omnibus control the job. It's time to start removing using communications through environment variables and build time choices with bazel flags. I could keep deploy_agent today, and be back in a month to remove it. If developers want a release build from their desktop they will use --config=release.
An alternative is to add a //:compression_level integer flag to define how much to compress and add that to the bazelrc file in the release section. That makes things very explicit, but adds another knob for people to tweak and then complain that the caching breaks when they try to adjust it.
Now... if we really wanted to get fancy we could make compression a non-flag build_setting, so that it triggers on release builds, but that is essentially what we are doing here anyway.
I got rid of most of the comment, so it's more accurate about what omnibus does. It's not that important because it's essentially temporary. In a few months all the omnibus history can go away.
Verification
Note the compression level changing.