Perf framework: align with .NET Azure.Test.Perf#7201
Conversation
Brings the C++ perf framework (sdk/core/perf) and Storage Blob perf tests
to format / contract parity with the .NET Azure.Test.Perf reference, which
the cross-language perf-automation pipeline keys off.
Framework (sdk/core/perf)
-------------------------
* New per-op latency collector (latency_stats.{hpp,cpp}) emitting the .NET
8-percentile distribution: 50 / 75 / 90 / 99 / 99.9 / 99.99 / 99.999 /
100 with the exact `=== Latency Distribution ===` header and
`{pct,7:N3}% {ms,8:N2}ms` row format.
* New CPU + memory sampler (process_stats.{hpp,cpp}); the throughput
`Completed N operations ... (Y ops/s, Z s/op, P% CPU)` line now
includes inline CPU like .NET while preserving the existing `(... ops/s`
substring that downstream Cpp.cs regex relies on.
* New result_output.{hpp,cpp}:
- `--results-file` writes `[{ "Time": <ms>, "Size": <bytes> }, ...]`
matching .NET OperationResult schema (PascalCase, Size = -1 when test
has no SizeOptions).
- `--statistics` / `--job-statistics` wraps a `BenchmarkOutput`
envelope between `#StartJobStatistics` and `#EndJobStatistics`
with Metadata before Measurements (key order matches .NET).
- Timestamp emitted at 100-nanosecond (7-digit) resolution like
.NET DateTime.ToString("O").
* New versions.{hpp,cpp} printing a `=== Versions ===` block as the
last thing emitted by the run (matches .NET ordering).
* New options: `--status-interval`, `--results-file`, `--sync`
(all present in .NET PerfOptions).
* New non-breaking CLI aliases matching .NET names:
- `--job-statistics` (bare switch) alongside existing `--statistics <0|1>`
- `--no-cleanup` (bare switch) alongside existing `--noclean <0|1>`
* GTest coverage for latency, process_stats, circular_stream, and
result_output (9 tests, all passing).
Storage Blob perf tests (sdk/storage/azure-storage-blobs/test/perf)
-------------------------------------------------------------------
* New blob-test flags aligning the C++ UploadBlob / DownloadBlob / ListBlob
scenarios with the .NET / Go test surface:
- `--upload-method` (buffer | stream | single)
- `--download-method` (buffer | stream)
- `--block-size`, `--concurrency`, `--num-blobs`, `--page-size`
* Memory-budget guard (memory_budget.hpp) prevents OOM in buffer-mode
tests at multi-GiB payloads.
Verification
------------
Built MinSizeRel on Windows / VS 2026 with vcpkg x64-windows-static
(curl, openssl, gtest). All 9 unit tests pass. A live perf run against
the `euap` storage account using AzureCliCredential produced output
that diffs byte-clean against an equivalent .NET Azure.Storage.Blobs.Perf
run for every contract emitted by this change (latency distribution
header / rows, throughput line shape including `% CPU`,
BenchmarkOutput JSON shape and key order, timestamp precision,
Versions-block ordering, results-file schema).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The `=== Versions ===` block didn't actually mirror .NET's PerfProgram.PrintAssemblyVersions: it printed compiler / __cplusplus strings rather than runtime + Azure assembly versions, no caller ever populated the `injectedVersions` extension point, and the data the perf-automation pipeline consumes (the per-test `VCPKG_*_VERSION` lines and the throughput / latency / BenchmarkOutput contracts) is unaffected. The module produced output that no parser reads and that isn't faithful to the framework it claims parity with, so drop it. Also revert the azure-storage-blobs CHANGELOG entry from the previous commit: the perf harness is internal and isn't customer-facing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR aligns the C++ perf harness (sdk/core/perf) and Storage Blobs perf scenarios with the output/CLI contract expected by cross-language perf automation (modeled after .NET Azure.Test.Perf), including standardized job-statistics/results-file outputs plus CPU/memory and latency reporting.
Changes:
- Add per-operation latency collection with .NET-compatible percentile distribution output and optional per-op results-file JSON (
{Time, Size}). - Add always-on process CPU% and resident-memory sampling, surfacing metrics in live status lines and the final throughput line.
- Update Storage Blobs perf scenarios/options to match cross-language flags (upload/download methods, paging, chunk size, concurrency) and introduce a memory-budget guard header.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/storage/azure-storage-blobs/test/perf/inc/azure/storage/blobs/test/upload_blob_test.hpp | Adds upload-method variants and forwards block-size/concurrency; adds memory-budget guard usage. |
| sdk/storage/azure-storage-blobs/test/perf/inc/azure/storage/blobs/test/memory_budget.hpp | New cross-platform “fail fast” memory budget guard for buffer-mode perf tests. |
| sdk/storage/azure-storage-blobs/test/perf/inc/azure/storage/blobs/test/list_blob_test.hpp | Adds --num-blobs aliasing and --page-size support for ListBlobs. |
| sdk/storage/azure-storage-blobs/test/perf/inc/azure/storage/blobs/test/download_blob_test.hpp | Adds download-method variants, forwarding of block-size/concurrency, and memory-budget guard usage. |
| sdk/storage/azure-storage-blobs/test/perf/CMakeLists.txt | Adds the new memory budget header to the perf test target inputs. |
| sdk/storage/azure-storage-blobs/perf-tests.yml | Extends perf matrix to exercise new upload/download/list flags (streaming, chunk/concurrency, paging). |
| sdk/core/perf/test/src/result_output_test.cpp | New unit tests for --results-file JSON shape and job-statistics envelope ordering. |
| sdk/core/perf/test/src/process_stats_test.cpp | New unit tests for process sampler start/stop/reset. |
| sdk/core/perf/test/src/latency_stats_test.cpp | New unit tests for percentile computation, grouping, reset, and concurrent record. |
| sdk/core/perf/test/CMakeLists.txt | Adds new perf-framework unit test sources to the test target. |
| sdk/core/perf/src/result_output.cpp | Implements results-file writer and #StartJobStatistics/#EndJobStatistics output with .NET-like timestamp formatting. |
| sdk/core/perf/src/program.cpp | Wires in latency collection, process stats sampling, status-interval, results-file emission, and latency distribution printing. |
| sdk/core/perf/src/process_stats.cpp | Implements cross-platform CPU and RSS sampling for Windows/Linux/macOS. |
| sdk/core/perf/src/options.cpp | Adds new global options (--status-interval, --results-file) and bare-switch aliases for .NET parity. |
| sdk/core/perf/src/latency_stats.cpp | Implements latency collector storage and percentile/mean summarization. |
| sdk/core/perf/src/arg_parser.cpp | Parses new global options and bare-switch aliases into GlobalTestOptions. |
| sdk/core/perf/inc/azure/perf/result_output.hpp | Declares RunSummary, OperationResult, and result/job-statistics output helpers. |
| sdk/core/perf/inc/azure/perf/process_stats.hpp | Declares the ProcessStatsSampler API and snapshot types. |
| sdk/core/perf/inc/azure/perf/options.hpp | Extends GlobalTestOptions with status-interval, results-file, and sync flag. |
| sdk/core/perf/inc/azure/perf/latency_stats.hpp | Declares the latency collector and percentile summary types. |
| sdk/core/perf/inc/azure/perf.hpp | Exposes new perf framework components via the top-level convenience header. |
| sdk/core/perf/CMakeLists.txt | Adds new perf framework headers/sources to the azure-perf library build. |
The guard threw a friendly std::runtime_error when `--size x --parallel` would exceed 80%% of system memory, instead of letting buffer-mode allocations OOM-kill the process. Assuming perf runs target hosts with enough memory for the configured size, this is dead defensive code: drop the header and the two `CheckMemoryBudget` calls in upload_blob_test.hpp / download_blob_test.hpp. Oversized buffer-mode runs now fail with std::bad_alloc / OS OOM as they would natively. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses two Copilot review comments on PR Azure#7201 that are real bugs: 1. download_blob_test.hpp: the streaming-download per-op loop declared `uint8_t buffer[1024*1024]` on the stack. On Windows the default thread stack is 1 MiB, so a single call already overflows; high `--parallel` makes it worse. Replace with a function-local `static thread_local std::vector<uint8_t>`: each worker thread allocates the 1 MiB drain buffer once on the heap and reuses it across operations. 2. process_stats.cpp: `ProcessStatsSampler::Reset()` updated members under the mutex but the sampler thread's `Run()` had already cached `previousCpuSeconds` / `previousTime` in locals, so the first sample after reset computed cpuDelta / wall against the pre-reset baseline and reported a wrong CPU%. Reset now stops and restarts the thread, which forces `Run()` to re-read the fresh baselines. Verified: 9/9 perf unit tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace em-dashes with -- in two comments - Add 'perfstress' to cspell dictionary (used in BenchmarkDotNet-compatible metric name to match .NET Azure.Test.Perf output) - Apply clang-format to perf framework files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jalauzon-msft
left a comment
There was a problem hiding this comment.
Not too familiar with C++ so I did not dig into the implementation much, but overall change looks fine. I did leave some comments as this probably contains more than we need/want but if you want to keep everything, that is also probably fine.
…/memory sampler Per @jalauzon-msft review on PR Azure#7201: - Remove --sync (parsed-and-ignored option had no behavior). PerfAutomation is updated separately to set NoSync=true for the Cpp language so it never appends --sync to test arguments. - Remove the --job-statistics bare-switch alias; keep --statistics <0|1> which is what perf-automation actually invokes. - Remove the CPU/memory sampler (ProcessStatsSampler) and the associated ' Memory(MiB)' / '% CPU' columns; perf-automation tracks the process itself, so per-run sampling in C++ added complexity without value. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PerfAutomation appends '--sync' to test runs for sync-only languages. C++ has no async variant, but the driver still passes --sync, so the perf binary must accept it. Register --sync as a bare switch that is parsed and intentionally ignored, with no corresponding Sync field on GlobalTestOptions (so it doesn't show up in the JSON options dump or anywhere else). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Running Azure SDK for C++ Storage Perf TestsPrereqsaz login
$env:AZURE_STORAGE_ACCOUNT_NAME = "<your-storage-account>" # token-credential mode
# or, for accounts that allow shared key:
# $env:STORAGE_CONNECTION_STRING = (az storage account show-connection-string -n <acct> -g <rg> --query connectionString -o tsv)Run all commands from a VS x64 Native Tools prompt (or after Build and runcd azure-sdk-for-cpp
cmake -G Ninja -B build `
-DBUILD_TESTING=ON -DBUILD_PERFORMANCE_TESTS=ON `
-DDISABLE_AZURE_CORE_OPENTELEMETRY=ON -DCMAKE_BUILD_TYPE=MinSizeRel `
-DVCPKG_TARGET_TRIPLET=x64-windows-static
cmake --build build --target azure-storage-blobs-perf
.\build\sdk\storage\azure-storage-blobs\test\perf\azure-storage-blobs-perf.exe `
UploadBlob --size 10240 --parallel 4 --warmup 3 --duration 5 `
--latency 1 --statistics 1 --token-credentialRun with no test name to list available tests ( Useful flags: |
The new --upload-method/--download-method/--block-size/--concurrency/--page-size flags are available on the binaries but should be tuned per host, not baked into the CI matrix. Keep perf-tests.yml at the pre-PR baseline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Brings the C++ perf framework (
sdk/core/perf) and the Storage Blobs perftests to format / contract parity with the .NET
Azure.Test.Perfreference,which the cross-language perf-automation pipeline keys off.
This is purely a perf-harness / engineering-system change. No public SDK
surface is touched and external customers shouldn't notice it — there is
intentionally no customer-facing CHANGELOG entry.
Framework changes (
sdk/core/perf)latency_stats.{hpp,cpp}): per-op collectoremitting the .NET 8-percentile distribution (50 / 75 / 90 / 99 / 99.9 /
99.99 / 99.999 / 100) with the exact
=== Latency Distribution ===header and
{pct,7:N3}% {ms,8:N2}msrow format.process_stats.{hpp,cpp}): the throughputCompleted N operations ... (Y ops/s, Z s/op, P% CPU)line nowincludes inline CPU like .NET, while preserving the existing
(... ops/ssubstring downstreamCpp.csregex relies on.result_output.{hpp,cpp}):--results-filewrites[{ "Time": <ms>, "Size": <bytes> }, ...]matching .NETOperationResultschema (PascalCase,Size = -1when the testhas no
SizeOptions).--statistics/--job-statisticswraps aBenchmarkOutputenvelope between
#StartJobStatisticsand#EndJobStatisticswith
MetadatabeforeMeasurements(key order matches .NET)..NET
DateTime.ToString("O").PerfOptions:--status-interval,--results-file,--sync.--job-statistics(bare switch) alongside existing--statistics <0|1>,and
--no-cleanup(bare switch) alongside existing--noclean <0|1>.Storage Blob perf tests (
sdk/storage/azure-storage-blobs/test/perf)New blob-test flags aligning the C++
UploadBlob/DownloadBlob/ListBlobscenarios with the .NET / Go test surface:--upload-method(buffer|stream|single)--download-method(buffer|stream)--block-size,--concurrency,--num-blobs,--page-sizeA memory-budget guard (
memory_budget.hpp) prevents OOM in buffer-modetests at multi-GiB payloads.
Things deliberately not included
=== Versions ===block. .NET'sPrintAssemblyVersionsprintsruntime + loaded Azure assembly versions; the natural C++ analogue
would be the per-test
VCPKG_*_VERSIONlines that the storage perftest already emits independently. A separate compiler-info module added
no value and isn't faithful to .NET, so it's omitted.
Verification
Built MinSizeRel on Windows / VS 2026 with vcpkg
x64-windows-static(curl, openssl, gtest, opentelemetry-cpp). All 9 perf unit tests pass.
A live perf run against a storage account using
AzureCliCredentialproduced output that diffs byte-clean against an equivalent
Azure.Storage.Blobs.Perf.NET run for every contract emitted by thischange: latency-distribution header / rows, throughput line shape
(including
% CPU),BenchmarkOutputJSON shape and key order,timestamp precision, and results-file schema.