Skip to content

Perf framework: align with .NET Azure.Test.Perf#7201

Merged
Jinming-Hu merged 8 commits into
Azure:mainfrom
Jinming-Hu:jinmhu/perf-framework-dotnet-parity
Jun 30, 2026
Merged

Perf framework: align with .NET Azure.Test.Perf#7201
Jinming-Hu merged 8 commits into
Azure:mainfrom
Jinming-Hu:jinmhu/perf-framework-dotnet-parity

Conversation

@Jinming-Hu

Copy link
Copy Markdown
Member

Summary

Brings the C++ perf framework (sdk/core/perf) and the Storage Blobs perf
tests to format / contract parity with the .NET Azure.Test.Perf reference,
which the cross-language perf-automation pipeline keys off.

This is purely a perf-harness / engineering-system change. No public SDK
surface is touched and external customers shouldn't notice it — there is
intentionally no customer-facing CHANGELOG entry.

Framework changes (sdk/core/perf)

  • Latency distribution (latency_stats.{hpp,cpp}): per-op collector
    emitting the .NET 8-percentile distribution (50 / 75 / 90 / 99 / 99.9 /
    99.99 / 99.999 / 100) with the exact === Latency Distribution ===
    header and {pct,7:N3}% {ms,8:N2}ms row format.
  • CPU + memory sampler (process_stats.{hpp,cpp}): the throughput
    Completed N operations ... (Y ops/s, Z s/op, P% CPU) line now
    includes inline CPU like .NET, while preserving the existing
    (... ops/s substring downstream Cpp.cs regex relies on.
  • Result output (result_output.{hpp,cpp}):
    • --results-file writes
      [{ "Time": <ms>, "Size": <bytes> }, ...] matching .NET
      OperationResult schema (PascalCase, Size = -1 when the test
      has no SizeOptions).
    • --statistics / --job-statistics wraps a BenchmarkOutput
      envelope between #StartJobStatistics and #EndJobStatistics
      with Metadata before Measurements (key order matches .NET).
    • Timestamps emitted at 100-nanosecond (7-digit) resolution like
      .NET DateTime.ToString("O").
  • New options matching .NET PerfOptions:
    --status-interval, --results-file, --sync.
  • Non-breaking CLI aliases matching .NET names:
    --job-statistics (bare switch) alongside existing --statistics <0|1>,
    and --no-cleanup (bare switch) alongside existing --noclean <0|1>.

Storage Blob perf tests (sdk/storage/azure-storage-blobs/test/perf)

New blob-test flags aligning the C++ UploadBlob / DownloadBlob /
ListBlob scenarios with the .NET / Go test surface:

  • --upload-method (buffer | stream | single)
  • --download-method (buffer | stream)
  • --block-size, --concurrency, --num-blobs, --page-size

A memory-budget guard (memory_budget.hpp) prevents OOM in buffer-mode
tests at multi-GiB payloads.

Things deliberately not included

  • No === Versions === block. .NET's PrintAssemblyVersions prints
    runtime + loaded Azure assembly versions; the natural C++ analogue
    would be the per-test VCPKG_*_VERSION lines that the storage perf
    test already emits independently. A separate compiler-info module added
    no value and isn't faithful to .NET, so it's omitted.
  • No CHANGELOG entry — perf harness is internal.

Verification

Built MinSizeRel on Windows / VS 2026 with vcpkg x64-windows-static
(curl, openssl, gtest, opentelemetry-cpp). All 9 perf unit tests pass.
A live perf run against a storage account using AzureCliCredential
produced output that diffs byte-clean against an equivalent
Azure.Storage.Blobs.Perf .NET run for every contract emitted by this
change: latency-distribution header / rows, throughput line shape
(including % CPU), BenchmarkOutput JSON shape and key order,
timestamp precision, and results-file schema.

Jinming-Hu and others added 2 commits June 27, 2026 23:21
Brings the C++ perf framework (sdk/core/perf) and Storage Blob perf tests
to format / contract parity with the .NET Azure.Test.Perf reference, which
the cross-language perf-automation pipeline keys off.

Framework (sdk/core/perf)
-------------------------
* New per-op latency collector (latency_stats.{hpp,cpp}) emitting the .NET
  8-percentile distribution: 50 / 75 / 90 / 99 / 99.9 / 99.99 / 99.999 /
  100 with the exact `=== Latency Distribution ===` header and
  `{pct,7:N3}%   {ms,8:N2}ms` row format.
* New CPU + memory sampler (process_stats.{hpp,cpp}); the throughput
  `Completed N operations ... (Y ops/s, Z s/op, P% CPU)` line now
  includes inline CPU like .NET while preserving the existing `(... ops/s`
  substring that downstream Cpp.cs regex relies on.
* New result_output.{hpp,cpp}:
   - `--results-file` writes `[{ "Time": <ms>, "Size": <bytes> }, ...]`
     matching .NET OperationResult schema (PascalCase, Size = -1 when test
     has no SizeOptions).
   - `--statistics` / `--job-statistics` wraps a `BenchmarkOutput`
     envelope between `#StartJobStatistics` and `#EndJobStatistics`
     with Metadata before Measurements (key order matches .NET).
   - Timestamp emitted at 100-nanosecond (7-digit) resolution like
     .NET DateTime.ToString("O").
* New versions.{hpp,cpp} printing a `=== Versions ===` block as the
  last thing emitted by the run (matches .NET ordering).
* New options: `--status-interval`, `--results-file`, `--sync`
  (all present in .NET PerfOptions).
* New non-breaking CLI aliases matching .NET names:
   - `--job-statistics` (bare switch) alongside existing `--statistics <0|1>`
   - `--no-cleanup` (bare switch) alongside existing `--noclean <0|1>`
* GTest coverage for latency, process_stats, circular_stream, and
  result_output (9 tests, all passing).

Storage Blob perf tests (sdk/storage/azure-storage-blobs/test/perf)
-------------------------------------------------------------------
* New blob-test flags aligning the C++ UploadBlob / DownloadBlob / ListBlob
  scenarios with the .NET / Go test surface:
   - `--upload-method` (buffer | stream | single)
   - `--download-method` (buffer | stream)
   - `--block-size`, `--concurrency`, `--num-blobs`, `--page-size`
* Memory-budget guard (memory_budget.hpp) prevents OOM in buffer-mode
  tests at multi-GiB payloads.

Verification
------------
Built MinSizeRel on Windows / VS 2026 with vcpkg x64-windows-static
(curl, openssl, gtest). All 9 unit tests pass. A live perf run against
the `euap` storage account using AzureCliCredential produced output
that diffs byte-clean against an equivalent .NET Azure.Storage.Blobs.Perf
run for every contract emitted by this change (latency distribution
header / rows, throughput line shape including `% CPU`,
BenchmarkOutput JSON shape and key order, timestamp precision,
Versions-block ordering, results-file schema).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The `=== Versions ===` block didn't actually mirror .NET's
PerfProgram.PrintAssemblyVersions: it printed compiler / __cplusplus
strings rather than runtime + Azure assembly versions, no caller ever
populated the `injectedVersions` extension point, and the data the
perf-automation pipeline consumes (the per-test `VCPKG_*_VERSION`
lines and the throughput / latency / BenchmarkOutput contracts) is
unaffected. The module produced output that no parser reads and that
isn't faithful to the framework it claims parity with, so drop it.

Also revert the azure-storage-blobs CHANGELOG entry from the previous
commit: the perf harness is internal and isn't customer-facing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 28, 2026 03:41
@github-actions github-actions Bot added Azure.Core Storage Storage Service (Queues, Blobs, Files) labels Jun 28, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns the C++ perf harness (sdk/core/perf) and Storage Blobs perf scenarios with the output/CLI contract expected by cross-language perf automation (modeled after .NET Azure.Test.Perf), including standardized job-statistics/results-file outputs plus CPU/memory and latency reporting.

Changes:

  • Add per-operation latency collection with .NET-compatible percentile distribution output and optional per-op results-file JSON ({Time, Size}).
  • Add always-on process CPU% and resident-memory sampling, surfacing metrics in live status lines and the final throughput line.
  • Update Storage Blobs perf scenarios/options to match cross-language flags (upload/download methods, paging, chunk size, concurrency) and introduce a memory-budget guard header.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/storage/azure-storage-blobs/test/perf/inc/azure/storage/blobs/test/upload_blob_test.hpp Adds upload-method variants and forwards block-size/concurrency; adds memory-budget guard usage.
sdk/storage/azure-storage-blobs/test/perf/inc/azure/storage/blobs/test/memory_budget.hpp New cross-platform “fail fast” memory budget guard for buffer-mode perf tests.
sdk/storage/azure-storage-blobs/test/perf/inc/azure/storage/blobs/test/list_blob_test.hpp Adds --num-blobs aliasing and --page-size support for ListBlobs.
sdk/storage/azure-storage-blobs/test/perf/inc/azure/storage/blobs/test/download_blob_test.hpp Adds download-method variants, forwarding of block-size/concurrency, and memory-budget guard usage.
sdk/storage/azure-storage-blobs/test/perf/CMakeLists.txt Adds the new memory budget header to the perf test target inputs.
sdk/storage/azure-storage-blobs/perf-tests.yml Extends perf matrix to exercise new upload/download/list flags (streaming, chunk/concurrency, paging).
sdk/core/perf/test/src/result_output_test.cpp New unit tests for --results-file JSON shape and job-statistics envelope ordering.
sdk/core/perf/test/src/process_stats_test.cpp New unit tests for process sampler start/stop/reset.
sdk/core/perf/test/src/latency_stats_test.cpp New unit tests for percentile computation, grouping, reset, and concurrent record.
sdk/core/perf/test/CMakeLists.txt Adds new perf-framework unit test sources to the test target.
sdk/core/perf/src/result_output.cpp Implements results-file writer and #StartJobStatistics/#EndJobStatistics output with .NET-like timestamp formatting.
sdk/core/perf/src/program.cpp Wires in latency collection, process stats sampling, status-interval, results-file emission, and latency distribution printing.
sdk/core/perf/src/process_stats.cpp Implements cross-platform CPU and RSS sampling for Windows/Linux/macOS.
sdk/core/perf/src/options.cpp Adds new global options (--status-interval, --results-file) and bare-switch aliases for .NET parity.
sdk/core/perf/src/latency_stats.cpp Implements latency collector storage and percentile/mean summarization.
sdk/core/perf/src/arg_parser.cpp Parses new global options and bare-switch aliases into GlobalTestOptions.
sdk/core/perf/inc/azure/perf/result_output.hpp Declares RunSummary, OperationResult, and result/job-statistics output helpers.
sdk/core/perf/inc/azure/perf/process_stats.hpp Declares the ProcessStatsSampler API and snapshot types.
sdk/core/perf/inc/azure/perf/options.hpp Extends GlobalTestOptions with status-interval, results-file, and sync flag.
sdk/core/perf/inc/azure/perf/latency_stats.hpp Declares the latency collector and percentile summary types.
sdk/core/perf/inc/azure/perf.hpp Exposes new perf framework components via the top-level convenience header.
sdk/core/perf/CMakeLists.txt Adds new perf framework headers/sources to the azure-perf library build.

Comment thread sdk/core/perf/src/program.cpp Outdated
Jinming-Hu and others added 3 commits June 28, 2026 13:55
The guard threw a friendly std::runtime_error when `--size x --parallel`
would exceed 80%% of system memory, instead of letting buffer-mode
allocations OOM-kill the process. Assuming perf runs target hosts with
enough memory for the configured size, this is dead defensive code: drop
the header and the two `CheckMemoryBudget` calls in
upload_blob_test.hpp / download_blob_test.hpp. Oversized buffer-mode
runs now fail with std::bad_alloc / OS OOM as they would natively.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses two Copilot review comments on PR Azure#7201 that are real bugs:

1. download_blob_test.hpp: the streaming-download per-op loop declared
   `uint8_t buffer[1024*1024]` on the stack. On Windows the default
   thread stack is 1 MiB, so a single call already overflows; high
   `--parallel` makes it worse. Replace with a function-local
   `static thread_local std::vector<uint8_t>`: each worker thread
   allocates the 1 MiB drain buffer once on the heap and reuses it
   across operations.

2. process_stats.cpp: `ProcessStatsSampler::Reset()` updated members
   under the mutex but the sampler thread's `Run()` had already cached
   `previousCpuSeconds` / `previousTime` in locals, so the first
   sample after reset computed cpuDelta / wall against the pre-reset
   baseline and reported a wrong CPU%. Reset now stops and restarts the
   thread, which forces `Run()` to re-read the fresh baselines.

Verified: 9/9 perf unit tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace em-dashes with -- in two comments
- Add 'perfstress' to cspell dictionary (used in BenchmarkDotNet-compatible
  metric name to match .NET Azure.Test.Perf output)
- Apply clang-format to perf framework files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@jalauzon-msft jalauzon-msft left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familiar with C++ so I did not dig into the implementation much, but overall change looks fine. I did leave some comments as this probably contains more than we need/want but if you want to keep everything, that is also probably fine.

Comment thread sdk/core/perf/inc/azure/perf/options.hpp Outdated
Comment thread sdk/core/perf/inc/azure/perf/process_stats.hpp Outdated
Comment thread sdk/core/perf/src/options.cpp Outdated
Jinming-Hu and others added 2 commits June 30, 2026 21:37
…/memory sampler

Per @jalauzon-msft review on PR Azure#7201:
- Remove --sync (parsed-and-ignored option had no behavior). PerfAutomation
  is updated separately to set NoSync=true for the Cpp language so it never
  appends --sync to test arguments.
- Remove the --job-statistics bare-switch alias; keep --statistics <0|1>
  which is what perf-automation actually invokes.
- Remove the CPU/memory sampler (ProcessStatsSampler) and the associated
  ' Memory(MiB)' / '% CPU' columns; perf-automation tracks the process
  itself, so per-run sampling in C++ added complexity without value.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PerfAutomation appends '--sync' to test runs for sync-only languages. C++ has
no async variant, but the driver still passes --sync, so the perf binary must
accept it. Register --sync as a bare switch that is parsed and intentionally
ignored, with no corresponding Sync field on GlobalTestOptions (so it doesn't
show up in the JSON options dump or anywhere else).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Jinming-Hu

Copy link
Copy Markdown
Member Author

Running Azure SDK for C++ Storage Perf Tests

Prereqs

az login
$env:AZURE_STORAGE_ACCOUNT_NAME = "<your-storage-account>"   # token-credential mode
# or, for accounts that allow shared key:
# $env:STORAGE_CONNECTION_STRING = (az storage account show-connection-string -n <acct> -g <rg> --query connectionString -o tsv)

Run all commands from a VS x64 Native Tools prompt (or after vcvars64.bat).
You don't need to install vcpkg — CMake auto-fetches a pinned copy on first
configure (set AZURE_SDK_DISABLE_AUTO_VCPKG=1 to opt out).

Build and run

cd azure-sdk-for-cpp
cmake -G Ninja -B build `
  -DBUILD_TESTING=ON -DBUILD_PERFORMANCE_TESTS=ON `
  -DDISABLE_AZURE_CORE_OPENTELEMETRY=ON -DCMAKE_BUILD_TYPE=MinSizeRel `
  -DVCPKG_TARGET_TRIPLET=x64-windows-static
cmake --build build --target azure-storage-blobs-perf

.\build\sdk\storage\azure-storage-blobs\test\perf\azure-storage-blobs-perf.exe `
    UploadBlob --size 10240 --parallel 4 --warmup 3 --duration 5 `
    --latency 1 --statistics 1 --token-credential

Run with no test name to list available tests (UploadBlob, DownloadBlob, ListBlob, ...).

Useful flags: --size, --parallel, --warmup, --duration, --latency 1, --statistics 1, --results-file out.json, --token-credential, --upload-method buffer|stream|single, --download-method buffer|stream, --block-size, --concurrency, --num-blobs, --page-size.

The new --upload-method/--download-method/--block-size/--concurrency/--page-size
flags are available on the binaries but should be tuned per host, not baked
into the CI matrix. Keep perf-tests.yml at the pre-PR baseline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Jinming-Hu Jinming-Hu merged commit b92f4d9 into Azure:main Jun 30, 2026
85 checks passed
@Jinming-Hu Jinming-Hu deleted the jinmhu/perf-framework-dotnet-parity branch June 30, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Azure.Core Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants