SAI API Performance Monitoring#2279
Conversation
Signed-off-by: JaiOCP <jai.kumar@broadcom.com>
rck-innovium
left a comment
There was a problem hiding this comment.
While most of the measurements can be done at the application level, this proposal provides a way to measure the metrics per object operation inside bulk APIs which cannot be done by application level performance monitoring.
|
As discussed, the community concluded that we should not preserve this perfmon data across warmboot (especially since we thought it does not make sense for warm upgrades/ downgrades) |
|
@JaiOCP, could you address the comments? thanks, |
Signed-off-by: JaiOCP <jai.kumar@broadcom.com>
|
Review comments address. Please take a look @j-bos @rck-innovium |
Commens addressed. Please review |
|
Question: Per-object latency with aggregated reads across variable batch sizes The spec describes PERFDATA as clear-on-read, with AVG_LATENCY computed across multiple API invocations between reads. In a typical route convergence scenario, orchagent may issue several bulk_create calls with varying batch sizes (e.g., 50, 3000, 200) before reading PERFDATA. The returned average latency is per-call — but each call processes a different number of objects. Without knowing the total object count across those calls, the consumer cannot derive per-object latency: The previous revision (#2265) addressed this with |
| /** | ||
| * @brief SAI Performance Monitoring API set | ||
| */ |
i would assume that create_bulk route is one of the heaviest api to call, and i would guess that other bulk api maybe faster, and readig performance also should be fast, internally it should be just reading/copying a table |
HI Deepak, As we talked about this, computation done this way is a wrong implementation in SAI adapter.
|
|
@rck-innovium @j-bos Please approve the PR |
|
|
||
| ``` | ||
| /* | ||
| * Configure CSIG Compact Tag for ABW signal processing and time interval of 256 micro seconds |
| ``` | ||
|
|
||
| #### 4.3.3 Perfmon Object Switch Binding | ||
| List of perfmon objects can be bound to the switch object. This binding can be done as a SET operation when perfmon object is created. |
There was a problem hiding this comment.
Section should be updated for latest changes -- may be good just to do an AI pass to update the .md for latest code changes and fix some typos (like the SAI_OBJECT_TYPE_PERFMO$ truncation error below).
There was a problem hiding this comment.
Good Idea. I was using cursor and it doesn't do a text validation.
Fixed the section.
Signed-off-by: JaiOCP <jai.kumar@broadcom.com>
|
Hi, is this PR ready for merge? thanks, |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@JaiOCP - please squash your commits and force push |
Signed-off-by: JaiOCP <jai.kumar@broadcom.com> SAI API performance monitoring Signed-off-by: JaiOCP <jai.kumar@broadcom.com> Fix gensairpc.pl crash on Doxygen 1.9.8+ by reusing NeedsTwoPassProcessing (opencomputeproject#2282) Why: To fix below build error Uncaught exception from user code: at gensairpc.pl line 480. main::assign_attr_types(HASH(0x55e190dc20c8), ARRAY(0x55e190d2d080)) called at gensairpc.pl line 434 main::get_definitions() called at gensairpc.pl line 156 main::assign_attr_types(HASH(0x55e190dc20c8), ARRAY(0x55e190d2d080)) called at gensairpc.pl line 434 main::get_definitions() called at gensairpc.pl line 156 How: gensairpc.pl crashed during SAI thrift build with an uncaught exception at line 480 (assign_attr_types) because its inline Doxygen layout detection was too weak - it only checked sai_8h.xml for any enumvalue presence, missing cases where the new Doxygen 1.9.8+ XML structure requires group__*.xml files to be processed for complete definitions. This caused incomplete parsing, leading to missing types and a croak in assign_attr_types when sai_attribute_value_t could not be found. Changes: - xmlutils.pm: Add NeedsTwoPassProcessing and export it. - parse.pl: Remove local NeedsTwoPassProcessing; use imported version. - gensairpc.pl: Replace inline detection with NeedsTwoPassProcessing() call, fixing the build failure and eliminating code duplication. Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com> Count BFD session state changes from UP to DOWN (opencomputeproject#2268) Signed-off-by: Chikkegowda Chikkaiah <cchikkai@cisco.com> HW FRR switchover notification support for protection groups (opencomputeproject#2269) Signed-off-by: Chikkegowda Chikkaiah <cchikkai@cisco.com> Port storm control enhancemnets (opencomputeproject#2258) (opencomputeproject#2258) Signed-off-by: rpmarvell <rperumal@marvell.com>
|
#2287 |
This PR brings in support for measuring SAI API performance. This is based on presentation done in OCP 2023.