Openmetric stats improvements#617
Conversation
📝 WalkthroughWalkthroughThis PR changes the IFACE_STATS helper macros to add an accumulator/context parameter and updates datapath nodes to call IFACE_STATS_VARS/INC/FLUSH with a per-instance context. It also enriches metrics: iface collection gains link-speed gauge, MAC and VRF-name labels; VLAN interfaces emit a parent label; port collection adds rx/tx error counters and emits PMD xstats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3b59ed7 to
509ffeb
Compare
| // the metric. 0 means unknown / link down. | ||
| metric_emit(&ctx, &m_speed_bps, (uint64_t)iface->speed * 1000000ULL); |
There was a problem hiding this comment.
To be checked, but I think "unknown" is UINT32_MAX.
| METRIC_GAUGE(m_rxq_size, "iface_port_rxq_size", "Number of descriptors in RX queues."); | ||
| METRIC_GAUGE(m_txq_size, "iface_port_txq_size", "Number of descriptors in TX queues."); | ||
| METRIC_COUNTER(m_rx_missed, "iface_port_rx_missed", "Number of packets dropped by HW."); | ||
| METRIC_COUNTER(m_rx_errors, "iface_port_rx_errors", "Number of RX packets with errors."); |
There was a problem hiding this comment.
Perhaps also add iface_port_rx_nobuf --> stats.rx_nombuf
| const struct iface_info_vlan *vlan = iface_info_vlan(iface); | ||
| const struct iface *parent = iface_from_id(vlan->parent_id); | ||
|
|
||
| metrics_labels_add(ctx, "parent", parent ? parent->name : "[deleted]", NULL); |
There was a problem hiding this comment.
Maybe add a vlan-specific metric for VLAN ID.
|
|
||
| RTE_INIT(metrics_process_init) { | ||
| struct timespec ts = {0}; | ||
| clock_gettime(CLOCK_REALTIME, &ts); |
509ffeb to
a8ba60d
Compare
| METRIC_COUNTER(m_rx_missed, "iface_port_rx_missed", "Number of packets dropped by HW."); | ||
| METRIC_COUNTER(m_rx_errors, "iface_port_rx_errors", "Number of RX packets with errors."); | ||
| METRIC_COUNTER(m_tx_errors, "iface_port_tx_errors", "Number of TX failures."); | ||
| METRIC_COUNTER(m_port_rx_bytes, "iface_port_rx_bytes", "Number of bytes received by HW."); |
There was a problem hiding this comment.
This is already covered by the generic software interface stats.
Add the interface id and primary MAC address as labels on the per-iface metrics. The id label allows a metric series to be correlated with the interface even when its name changes. The mac label exposes the L2 address alongside the other interface attributes already published. This keeps the OpenMetrics endpoint self-sufficient: monitoring consumers no longer need a side request to associate a series with its underlying interface id or hardware address. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add iface_speed_bps as a gauge metric reporting the current link speed in bits per second. The value is derived from gr_iface.speed (which is stored in Megabit/sec) multiplied by 1e6. A value of 0 indicates the speed is unknown or the link is down. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
port_metrics_collect already exposes iface_port_rx_missed (imissed) and iface_port_tx_errors (oerrors). Add iface_port_rx_errors to also surface ierrors from rte_eth_stats. These hardware counters are port-only by nature: drops and bad frames happen on the NIC before any sub-interface demux (VLAN, tunnel) can attribute them, so the iface_port_ namespace is the correct home rather than a generic iface_ metric with a 0 fallback for non-ports. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
The vrf label on per-iface metrics is currently the numeric vrf_id
formatted as a string ("1", "2", ...), whereas the domain label
emitted in the non-VRF branch is already resolved to the parent
interface name. Align the two by resolving vrf_id to the VRF iface
and emitting its name.
Before:
grout_iface_up{name="te9.835",mode="VRF",vrf="1",...} 1
After:
grout_iface_up{name="te9.835",mode="VRF",vrf="main",...} 1
Now that VRFs are first-class iface objects with a stable name, the
numeric id is mostly an internal allocation artefact and the name is
what an operator or monitoring consumer expects to see.
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a vlan-specific metrics_collect callback that attaches the parent interface name as a label on every VLAN metric. This lets consumers reconstruct the iface stack from a single OpenMetrics scrape without keeping the iface_metrics_collect dispatcher aware of VLAN internals. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
The IFACE_STATS_VARS / IFACE_STATS_INC / IFACE_STATS_FLUSH macros used a single dir parameter to derive both the temporary variable names (rx_packets, rx_bytes, ...) and the struct iface_stats field names. The two roles were tied together, which prevents declaring more than one accumulator per direction in the same scope. Add a second acc parameter that names the accumulator. The local temporaries are now derived from dir##_##acc, while the struct iface_stats fields stay tied to dir. Existing call sites become IFACE_STATS_VARS(rx, self) and IFACE_STATS_VARS(tx, self), which expand to the same code as before. No behavior change. This prepares for call sites that need to tally a packet against two ifaces in one pass. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
When a VLAN-tagged packet arrives on a port, iface_input.c reassigns d->iface to the matched VLAN sub-iface before calling IFACE_STATS_INC, which means the parent port's iface_stats.rx_packets / rx_bytes stay at zero for all VLAN-tagged traffic. The TX path in iface_output.c has the symmetric behavior on packets emitted via a VLAN sub-iface. This diverges from Linux which double-counts: ip -s link on the physical netdev shows all wire traffic, while ip -s link on a VLAN sub-iface shows the per-VLAN subset. The current grout behavior results in grcli interface stats and the OpenMetrics scrape showing a port with apparently zero traffic when all of it is VLAN-tagged. Match the Linux behavior: on RX, increment the parent port's stats in addition to the destination VLAN sub-iface; on TX, increment the parent port's stats in addition to the source VLAN sub-iface. A second batched accumulator (parent) is declared next to the existing self accumulator to preserve per-iface batching across the burst. The hot path is unchanged when no VLAN demux is involved. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a single iface_port_xstat counter metric whose xstat label carries the driver-native xstat name. Emitting the full set of xstats verbatim keeps the dataplane free of differential calculations and per-PMD alias tables; consumers that want canonical leaves (broadcast, multicast, discards, ...) match by the xstat label, which on dpaa2 follows the DPNI counter names (ingress_broadcast_frames, egress_discarded_frames, ingress_nobuffer_discards, ...). Per-port cardinality is bounded by what the PMD exposes (around 60 samples per port on net_dpaa2). Each scrape rebuilds the label string in place to avoid one ctx_init per xstat. Only physical ports collect xstats; VLAN sub-ifaces, VRFs, bonds and bridges have no PMD xstats and produce no iface_port_xstat series. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
a8ba60d to
cb02028
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modules/infra/control/vlan.c`:
- Around line 232-237: vlan_metrics_collect currently only calls
metrics_labels_add("parent", ...) so the parent label is never attached to any
emitted samples; either emit a VLAN-specific metric here (call metric_emit from
vlan_metrics_collect after adding the label, e.g., emit a VLAN-only sample like
"iface_vlan_parent" or similar) or move the metrics_labels_add call into the
caller before the generic metric_emit sequence so the label is applied to the
generic iface metrics; update vlan_metrics_collect (and/or the caller that runs
generic metric_emit) to use metric_emit with the new label so the parent appears
in exported metrics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2c5332f9-192c-46fe-982a-39b55ca9bdba
📒 Files selected for processing (11)
modules/infra/api/iface.cmodules/infra/control/port.cmodules/infra/control/vlan.cmodules/infra/datapath/bond_output.cmodules/infra/datapath/iface_input.cmodules/infra/datapath/iface_output.cmodules/infra/datapath/rxtx.hmodules/infra/datapath/xconnect.cmodules/infra/datapath/xvrf.cmodules/ipip/datapath_in.cmodules/ipip/datapath_out.c
| static void vlan_metrics_collect(struct metrics_ctx *ctx, const struct iface *iface) { | ||
| const struct iface_info_vlan *vlan = iface_info_vlan(iface); | ||
| const struct iface *parent = iface_from_id(vlan->parent_id); | ||
|
|
||
| metrics_labels_add(ctx, "parent", parent ? parent->name : "[deleted]", NULL); | ||
| } |
There was a problem hiding this comment.
Parent label is currently a no-op and never appears in exported metrics.
At Line 232-237, vlan_metrics_collect() only appends "parent" to the context but does not emit a metric. In modules/infra/api/iface.c, generic iface metrics are already emitted before the type callback is invoked (Line 286-319 vs Line 321-322), so this label is never attached to any sample.
Please either emit a VLAN-specific metric here or move the type-specific label augmentation before generic metric_emit() calls.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/infra/control/vlan.c` around lines 232 - 237, vlan_metrics_collect
currently only calls metrics_labels_add("parent", ...) so the parent label is
never attached to any emitted samples; either emit a VLAN-specific metric here
(call metric_emit from vlan_metrics_collect after adding the label, e.g., emit a
VLAN-only sample like "iface_vlan_parent" or similar) or move the
metrics_labels_add call into the caller before the generic metric_emit sequence
so the label is applied to the generic iface metrics; update
vlan_metrics_collect (and/or the caller that runs generic metric_emit) to use
metric_emit with the new label so the parent appears in exported metrics.
modules/infra/datapath/rxtx.h
Datapath nodes updated to use the new IFACE_STATS API (pass explicit acc/context):
Behavior-preserving notes