Skip to content

feat(dogstatsd): Add DogStatsD packet forwarding support#1712

Open
aqian01 wants to merge 17 commits into
mainfrom
andrewq/packet-forwarding-gap
Open

feat(dogstatsd): Add DogStatsD packet forwarding support#1712
aqian01 wants to merge 17 commits into
mainfrom
andrewq/packet-forwarding-gap

Conversation

@aqian01
Copy link
Copy Markdown
Contributor

@aqian01 aqian01 commented May 20, 2026

Summary

Source

  • Add DogStatsD forwarding support for statsd_forward_host and statsd_forward_port.
  • Forward each decoded/framed DogStatsD message over UDP to the configured forwarding target before parsing, filtering, mapping, or aggregation.
  • Support forwarding for messages received over UDP, UDS datagram, UDS stream, and TCP while preserving normal ingestion behavior.
  • Keep forwarding best effort: setup and send failures are logged/tracked through telemetry and do not stop DogStatsD ingestion.
  • Align transient forwarding backpressure by awaiting enqueue of forwarded copies instead of dropping them when the forwarding queue is temporarily full.
  • Use a fixed forwarding queue capacity and support IPv4 and IPv6 forwarding targets without modeling the core Agent packet-buffer/flush pipeline.
  • Promote statsd_forward_host and statsd_forward_port to supported DogStatsD config keys.
  • Keep dogstatsd_packet_buffer_size and dogstatsd_packet_buffer_flush_timeout marked not applicable for ADP because ADP decodes DogStatsD inline.
  • Update DogStatsD docs/config inventory, integration coverage, and correctness coverage for forwarding behavior.

Design Notes

This PR intentionally forwards decoded DogStatsD messages rather than preserving core Agent UDP packet-buffer grouping or UDS/TCP stream outer-frame grouping.

For example, a stream frame containing:

metric.one:1|c
metric.two:2|c

is forwarded as two UDP messages:

metric.one:1|c

and:

metric.two:2|c

This follows review feedback to avoid emulating the core Agent ingest -> packet-buffer -> worker pipeline in ADP. ADP still forwards the same DogStatsD messages, but it does not attempt byte-for-byte parity of forwarded UDP packet boundaries.

Testing

  • make fmt
  • cargo test -p saluki-components sources::dogstatsd::tests --lib -- --nocapture
  • cargo test -p saluki-components sources::dogstatsd::framer::tests --lib -- --nocapture
  • cargo test -p saluki-components packet_forwarder --lib -- --nocapture
  • cargo test -p saluki-io deser::framing::tests --lib -- --nocapture
  • cargo test -p panoramic dogstatsd_forwarding --bin panoramic -- --nocapture
  • cargo check --workspace
  • cargo check --workspace --tests
  • git diff --check

Coverage

  • Unit coverage for forwarding one decoded message payload, IPv6 forwarding targets, queue backpressure, send-failure telemetry, and statsd_forward_port enable/disable behavior.
  • Integration coverage for UDP malformed payloads, UDP valid metrics, TCP stream messages, UDS datagram messages, and UDS stream messages.
  • Correctness coverage compares forwarded DogStatsD messages rather than exact UDP packet grouping, so core Agent and ADP can differ in packet boundaries while preserving forwarded message content.

@dd-octo-sts dd-octo-sts Bot added area/io General I/O and networking. area/components Sources, transforms, and destinations. source/dogstatsd DogStatsD source. area/docs Reference documentation. area/test All things testing: unit/integration, correctness, SMP regression, etc. labels May 20, 2026
@datadog-prod-us1-5

This comment has been minimized.

Comment thread test/integration/cases/dogstatsd-forwarding/run_forwarding_test.py Fixed
@aqian01 aqian01 changed the title Add DogStatsD packet forwarding support feat(dogstatsd): Add DogStatsD packet forwarding support May 20, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 20, 2026

Binary Size Analysis (Agent Data Plane)

Baseline: 8c475ed · Comparison: 8c52510 · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 37.57 MiB (baseline) vs 37.65 MiB (comparison)
Size Change: +83.20 KiB (+0.22%)

✅ Binary size difference within threshold

Changes by Module
Module File Size Symbols
saluki_components::sources::dogstatsd +42.23 KiB 81
tokio +36.74 KiB 907
anyhow -27.58 KiB 352
serde_with -21.77 KiB 22
core +21.72 KiB 1448
saluki_components::transforms::dogstatsd_mapper +16.50 KiB 6
[sections] +15.47 KiB 8
piecemeal -15.23 KiB 21
saluki_components::common::datadog -12.66 KiB 69
serde_core +12.65 KiB 92
&mut serde_json +12.08 KiB 24
tracing +10.30 KiB 18
saluki_components::destinations::prometheus -8.73 KiB 6
datadog_protos::trace_piecemeal_include::datadog +8.44 KiB 13
tower -7.51 KiB 37
serde_json -5.90 KiB 26
hyper +5.77 KiB 52
prost +5.36 KiB 64
async_compression -5.09 KiB 20
serde_path_to_error -4.29 KiB 22
Detailed Symbol Changes
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +26.3Ki  [NEW] +26.2Ki    saluki_components::sources::dogstatsd::drive_stream::_{{closure}}::h8f17aa07b8c33197
  [NEW] +23.0Ki  [NEW] +22.7Ki    _<saluki_components::sources::dogstatsd::_::<impl serde_core::de::Deserialize for saluki_components::sources::dogstatsd::DogStatsDConfiguration>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map::h53bcee0caa47ede6
  [NEW] +21.7Ki  [NEW] +21.5Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::hed9234a71ae20cd4
  [NEW] +19.6Ki  [NEW] +19.4Ki    _<saluki_components::sources::dogstatsd::DogStatsDConfiguration as saluki_core::components::sources::builder::SourceBuilder>::build::_{{closure}}::h7209f3c188695118
  +0.2% +16.9Ki  +0.2% +10.7Ki    [7628 Others]
  [NEW] +15.0Ki  [NEW] +14.9Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h6948f527d7ba396f
  [NEW] +14.0Ki  [NEW] +13.9Ki    h2::server::Connection<T,B>::poll_closed::h83203de9713e50cd
  [NEW] +13.9Ki  [NEW] +13.7Ki    _<tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h475ea867f63df1d4
  [NEW] +13.0Ki  [NEW] +12.9Ki    h2::server::Connection<T,B>::poll_closed::h5b7f429aa9aaf99d
  [NEW] +12.9Ki  [NEW] +12.8Ki    h2::server::Connection<T,B>::poll_closed::h2d88ac9d7f1e941c
  [NEW] +9.31Ki  [NEW] +9.22Ki    saluki_components::sources::dogstatsd::handle_frame::hb052a5b34e96aed0
  [NEW] +8.38Ki  [NEW] +8.14Ki    saluki_components::transforms::dogstatsd_mapper::_::_<impl serde_core::de::Deserialize for saluki_components::transforms::dogstatsd_mapper::MetricMappingConfig>::deserialize::h0d673f2a8506914a
  [NEW] +7.18Ki  [NEW] +7.02Ki    saluki_components::sources::dogstatsd::ConnectedPacketForwarder::connect_from_bind_addr::_{{closure}}::h8add69824fa18976
 -77.0% -8.34Ki -78.7% -8.34Ki    _<saluki_components::destinations::prometheus::PrometheusConfiguration as saluki_core::components::destinations::builder::DestinationBuilder>::build::_{{closure}}::h300aa1a82bbc3ff2
  [DEL] -9.06Ki  [DEL] -8.89Ki    _<serde_with::content::de::ContentDeserializer<E> as serde_core::de::Deserializer>::deserialize_struct::hde8238dcbadc320b
  [DEL] -9.21Ki  [DEL] -9.11Ki    saluki_components::sources::dogstatsd::handle_frame::h7b5495783f91591d
  [DEL] -11.2Ki  [DEL] -11.0Ki    _<serde_with::content::de::ContentDeserializer<E> as serde_core::de::Deserializer>::deserialize_struct::he89178a22575a0cf
  [DEL] -13.7Ki  [DEL] -13.6Ki    _<tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h0f03995fada720d8
 -96.3% -21.3Ki -97.0% -21.3Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_any::hb80a4a8889e8a2e3
  [DEL] -21.6Ki  [DEL] -21.4Ki    _<saluki_components::sources::dogstatsd::DogStatsDConfiguration as saluki_core::components::sources::builder::SourceBuilder>::build::_{{closure}}::h41bfdebefd5553b1
  [DEL] -23.4Ki  [DEL] -23.2Ki    saluki_components::sources::dogstatsd::drive_stream::_{{closure}}::h3be9f0885f90b15b
  +0.2% +83.2Ki  +0.2% +76.1Ki    TOTAL

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 20, 2026

Regression Detector (Agent Data Plane)

Run ID: 38eba568-abde-447a-aa95-e4d811314b63
Baseline: 8c475ed7 · Comparison: 8c52510e · diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment (35)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment goal Δ mean % links
dsd_uds_512kb_3k_contexts_cpu (erratic) cpu ⚪ +7.57 metrics profiles logs
otlp_ingest_logs_5mb_memory (ignored) memory ⚪ +4.63 metrics profiles logs
dsd_uds_500mb_3k_contexts_throughput throughput ⚪ -1.93 metrics profiles logs
dsd_uds_500mb_3k_contexts_cpu (erratic) cpu ⚪ +1.01 metrics profiles logs
otlp_ingest_traces_5mb_cpu (erratic) cpu ⚪ +0.92 metrics profiles logs
dsd_uds_100mb_3k_contexts_cpu (erratic) cpu ⚪ +0.78 metrics profiles logs
dsd_uds_100mb_3k_contexts_memory memory ⚪ +0.67 metrics profiles logs
dsd_uds_10mb_3k_contexts_memory memory ⚪ +0.53 metrics profiles logs
otlp_ingest_traces_5mb_memory memory ⚪ +0.45 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_memory memory ⚪ +0.45 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_cpu (erratic) cpu ⚪ +0.39 metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory ⚪ +0.36 metrics profiles logs
quality_gates_rss_dsd_heavy memory ⚪ +0.36 metrics profiles logs
otlp_ingest_logs_5mb_cpu (ignored) cpu ⚪ +0.30 metrics profiles logs
otlp_ingest_traces_5mb_throughput throughput ⚪ -0.28 metrics profiles logs
quality_gates_rss_idle memory ⚪ +0.20 metrics profiles logs
dsd_uds_1mb_3k_contexts_memory memory ⚪ +0.14 metrics profiles logs
quality_gates_rss_dsd_low memory ⚪ +0.11 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_throughput throughput ⚪ -0.10 metrics profiles logs
dsd_uds_512kb_3k_contexts_memory memory ⚪ +0.07 metrics profiles logs
otlp_ingest_metrics_5mb_throughput throughput ⚪ -0.04 metrics profiles logs
dsd_uds_500mb_3k_contexts_memory memory ⚪ +0.03 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_throughput throughput ⚪ -0.01 metrics profiles logs
dsd_uds_100mb_3k_contexts_throughput throughput ⚪ -0.01 metrics profiles logs
dsd_uds_512kb_3k_contexts_throughput throughput ⚪ -0.01 metrics profiles logs
dsd_uds_1mb_3k_contexts_throughput throughput ⚪ -0.00 metrics profiles logs
dsd_uds_10mb_3k_contexts_throughput throughput ⚪ +0.01 metrics profiles logs
otlp_ingest_logs_5mb_throughput (ignored) throughput ⚪ +0.04 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_memory memory ⚪ -0.09 metrics profiles logs
quality_gates_rss_dsd_medium memory ⚪ -0.23 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_cpu (erratic) cpu ⚪ -0.80 metrics profiles logs
otlp_ingest_metrics_5mb_cpu (erratic) cpu ⚪ -0.85 metrics profiles logs
otlp_ingest_metrics_5mb_memory memory ⚪ -1.40 metrics profiles logs
dsd_uds_1mb_3k_contexts_cpu (erratic) cpu ⚪ -2.70 metrics profiles logs
dsd_uds_10mb_3k_contexts_cpu (erratic) cpu ⚪ -2.97 metrics profiles logs
Bounds Checks: ✅ Passed (5)
experiment check replicates observed links
quality_gates_rss_dsd_heavy memory_usage 10/10 ✅ 122 MiB ≤ 140 MiB metrics profiles logs
quality_gates_rss_dsd_low memory_usage 10/10 ✅ 39.6 MiB ≤ 50 MiB metrics profiles logs
quality_gates_rss_dsd_medium memory_usage 10/10 ✅ 61.5 MiB ≤ 75 MiB metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 ✅ 178 MiB ≤ 200 MiB metrics profiles logs
quality_gates_rss_idle memory_usage 10/10 ✅ 26.7 MiB ≤ 40 MiB metrics profiles logs
Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

@aqian01 aqian01 marked this pull request as ready for review May 21, 2026 16:51
@aqian01 aqian01 requested a review from a team as a code owner May 21, 2026 16:51
@aqian01 aqian01 marked this pull request as draft May 21, 2026 16:55
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da487305bd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/saluki-components/src/sources/dogstatsd/mod.rs Outdated
@aqian01 aqian01 marked this pull request as ready for review May 21, 2026 17:31
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d41a26b2f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1633 to +1636
let mut packet_forward_flush = interval(packet_forwarder.as_ref().map_or(
DEFAULT_DOGSTATSD_PACKET_BUFFER_FLUSH_TIMEOUT,
PacketForwarder::udp_batch_flush_timeout,
));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard zero flush timeout before creating interval

dogstatsd_packet_buffer_flush_timeout is deserialized via DurationString (which accepts 0), but this value is passed directly into tokio::time::interval; Tokio panics when the interval period is zero. With forwarding enabled and dogstatsd_packet_buffer_flush_timeout: 0, the DogStatsD handler task will panic at startup, so forwarding and ingestion on that listener can stop unexpectedly instead of handling the config safely.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some specific feedback, but overall, I want to simplify this design: just forward the individual metrics packets (don't differentiate between the outer UDS frame and inner newline-delimited frames, and remove the code that captures/holds on to outer frames in NestedFramer) and avoid all of the "packet buffer" stuff.

My reasoning here is:

  • we're still sending all of the same metrics in the end, but we skip needing to handle nested vs unnested frames
  • the packet buffer infrastructure on the Agent side is really a consequence of the ingest -> worker model, so modeling it here isn't something we really need to do
  • additionally, modeling the packet buffer infrastructure means time-based non-determinism (due to the flushing support) which is harder to test for correctness than if all we do if blast out the individual metric packets in real-time

///
/// Defaults to 0.
#[serde(rename = "statsd_forward_port", default = "default_statsd_forward_port")]
statsd_forward_port: i64,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this u16. That's the only thing a valid port can be.

Comment on lines +1213 to +1219
warn!(
port = config.statsd_forward_port,
min = MIN_STATSD_FORWARD_PORT,
max = MAX_STATSD_FORWARD_PORT,
"Invalid statsd forward port. Packet forwarding disabled."
);
return None;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how it works in the Core Agent? If the port is invalid, it just ignores it and packet forwarding is disabled?

async fn connect(&self) {
let host = &self.target_host;
let port = self.target_port;
let targets = match timeout(FORWARDER_RESOLVE_TIMEOUT, resolve_forward_targets(host, port)).await {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually need to do this separately. tokio::net::UdpSocket::connect already takes the address parameter as anything that implements ToSocketAddrs, which includes (&str, u16).

Using that instead (still coupled with the timeout, though) lets us get rid of a ton of the code here.

@dd-octo-sts dd-octo-sts Bot removed the area/io General I/O and networking. label May 22, 2026
for path in paths:
try:
os.remove(path)
except FileNotFoundError:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/components Sources, transforms, and destinations. area/docs Reference documentation. area/test All things testing: unit/integration, correctness, SMP regression, etc. source/dogstatsd DogStatsD source.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants