Skip to content

Add Umbraco load-testing pipeline (Locust + ALT)#6

Open
andr317c wants to merge 163 commits into
mainfrom
add/seed-package-usage
Open

Add Umbraco load-testing pipeline (Locust + ALT)#6
andr317c wants to merge 163 commits into
mainfrom
add/seed-package-usage

Conversation

@andr317c
Copy link
Copy Markdown
Contributor

Summary

  • Ephemeral Terraform infrastructure provisioning per-case App Services + SQL DBs across (Umbraco version × tier × scenario).
  • Locust workload with inventory-driven tasks, replacing the previous JMeter test plan.
  • Two shipped scenarios: Default (vanilla) and DeliveryApi (headless mode, with code overlay).
  • Long-lived history storage in Azure Blob (NDJSON per run); per-run pipeline artifacts on the build.
  • Local analysis scripts: show-trends.ps1, compare-runs.ps1, check-regression.ps1 (pipeline gate, permissive by default).
  • Pipeline split into six stages: validateTestCases → ensureHistoryInfra → provision → loadTest → regression → cleanup.

Test plan

  • PR-validation pipeline passes (terraform fmt + validate, PSScriptAnalyzer, py_compile).
  • Smoke run (loadProfile=smoke, runStarter=true) completes end-to-end through all six stages.
  • Tier SKUs in loadtests/tiers.json confirmed (currently placeholders pending input).

andr317c and others added 30 commits January 23, 2026 08:22
Pipeline gains a skipLoadTests parameter so we can validate
provisioning + deploy + seed without burning ALT time. When set,
runLoadTests is skipped and a verifyDeployments smoke job hits each
App Service homepage instead, exiting non-zero on any non-200.

Other hardening: pin terraformVersion to 1.13.3, broaden
deleteResourceGroup condition to also fire on Canceled/Skipped (so
mid-run cancellation doesn't leak the ephemeral RG), bump the manual
keep/delete window from 1h to 2h, and gitignore .terraform/ + .idea/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
testSummary's "Results available in: Pipeline artifacts" line was
misleading when skipLoadTests=true (no artifacts produced). Gate
the wording on the parameter at compile time. Also adds the missing
engineInstances row to the README parameters table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… lint

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'each' and 'if' directives are not allowed inside script: | string blocks.
Use $env vars and runtime foreach/if instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Under Workload Identity Federation, the AzureCLI task gets idToken instead of a
client secret. Plumb both through terraform; install script picks WIF if the
oidc token is present, falls back to client-secret otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
testSummary now gates on apply succeeding specifically, so a failed apply
doesn't print a misleading "summary" with no real data. Adding apply to the
cleanup chain's dependsOn ensures RG cleanup still triggers on apply-failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move SP credentials from the local-exec environment block to the parent task's
env (where the install script inherits them). With sensitive vars in the
environment block, terraform suppresses local-exec output - which hides the
seeder polling status. They're still masked in pipeline logs via issecret=true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Login: loadtest@example.invalid / LoadTest123!

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After moving auth to ARM_* env-var inheritance, the client_id/secret/oidc_token/
tenant_id terraform variables are no longer read. Cleaning up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… errors

`dotnet add package --version "17.*" --prerelease` wasn't resolving
17.0.0-beta.1 reliably; the build silently proceeded without the package.
Use explicit prerelease floating syntax `17.*-*` and check $LASTEXITCODE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump this when a new prerelease/stable ships.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make the ALT testId scenario-scoped (one test per scenario, with
every version/tier as a run inside) so the portal's "Compare runs"
view can overlay them natively. History storage path mirrors the
same axes (scenario/major/version/tier/date_build) for prefix-listed
sweeps. Run name trimmed to fit ALT's 50-char cap.
Cloud runs all plans on one P1V3 ASP and differentiates via per-site
quotas inside a shared pool. We can't replicate the per-site quotas on
dedicated plans, but we can match the SKU and the SQL eDTU split
(S1/S2/S3 = 20/50/100 eDTU). Also sidesteps the Pv4 worker-pool stamp
error on RG re-creation.
CpuPercentage/MemoryPercentage live on the App Service Plan
(Microsoft.Web/serverfarms), not the site, and SQL DTU/CPU/IO
metrics scope to the database, not the server. Surface the plan
and database resource IDs as terraform outputs and split the
appComponents block into Site / Plan / Database with the right
namespaces. Drops two now-dead AzDO variables (sqlServerName,
sqlServerResourceId).

Also: rename "ALT" -> "Azure Load Testing" / "load test"
throughout, and rename the Hydrate step to "Read terraform
outputs" for clarity.
Replace the homepage-only smoke locustfile with a workload that fetches
the seeder's /umbraco/api/seederstatus/inventory at test start, buckets
the seeded URLs by content type, and spreads requests across sections,
categories, pages, details, and media. Detail pages are weighted highest
since they're the deepest read path - that's where SQL pressure surfaces.
Falls back to homepage-only if the inventory endpoint is unreachable.

Switch to FastHttpUser for higher requests/sec per engine, store shared
state on the locust environment (canonical pattern), use logging instead
of print, and pre-bucket URLs so tasks don't linear-scan on every call.

Also: rename the prepare step to "Validate + resolve scenario overrides"
and add justifying comments to the publish steps' continueOnError flags.
Picks up the URL rebuild fix + inventory enhancements (mediaTypes /
includeMemberPassword / cultures / paginated /inventory/urls).
One reference missed during the rename pass.
- New @task(3) submit_contact_form posts JSON to the seeder's anonymous
  contact-form endpoint. Exercises the SQL write path (each submission
  becomes an Umbraco content node) which has very different perf
  characteristics from reads, especially on lower SQL tiers.
- Added a docstring caveat about the 1-3s wait_time being aggressive
  (~30 req/s per VU vs real human pacing of 5-30s) so readers don't
  mis-interpret VU counts as concurrent humans.

Skipped (explicit non-goals for now):
- catch_response content validation: FastHttpUser already marks 4xx/5xx
  as failures; 200-with-error-template is too rare to justify the noise
  in every task.
- Member auth flow: needs anti-forgery token handling + cookie state +
  per-VU on_start. Worth adding when we specifically want to measure
  authenticated browsing perf, but not blocking a first real run.
- Helpers extraction: there's only one locustfile and _hit() already
  covers the obvious DRY win. Premature for a single-file workload.
EOF
The original weight 3 produced ~1.4 form submissions/sec at 100 VUs,
roughly 2.8% write share - too light to differentiate Starter S1 vs
Pro S3 on write characteristics. Bumping to 8 lands at ~3.7 RPS / ~7%
write share, which is within the 5-15% realistic CMS production range
and enough to actually exercise SQL Log IO on the lower tiers.
andr317c and others added 30 commits May 18, 2026 09:49
Adds the stage that runs ensure-monitoring-infra.ps1 + deploy-workbook.ps1 and propagates DCE/DCR/Stream as cross-stage outputs. Template re-declares the 3 LA params; pipeline passes them at every template call site. Without this the Workbook gets no data and the YAML fails to compile (undeclared parameter refs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…own + glossary

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add LoadTestSeries_CL table for per-minute resource pressure, workbook Top
issues panel + Stability/Bottleneck columns, history-RG build cache,
deterministic locust PRNG seed, seeder-status non-JSON guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KQL parse error on Top issues (top→sort|take), n<5 stability floor,
case/whitespace-insensitive regression joins, Trends sampler filter,
chart legend separator (·→| for chart series), Compare delta Note column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 warmup now primes every URL in every bucket instead of one
per type — the dominant run-to-run noise source on p95/p99 was
first-touch latency on URLs the load test reached via random.choice()
but warmup never visited. Cost: ~30-60s extra provisioning. Also
treats seeder duration_seconds <= 0 as null so a misreported
ElapsedMs can't drag the dashboard median to 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Get-MetricSummary now emits a logissue warning on query failure in
addition to Write-Warning, so partial-metrics gaps (e.g. plan_*
missing while sql_*/app_* populate) surface in the AzDO summary
panel instead of blending into the step log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trends Sampler filter switches from single-select to multi-select
with an All sentinel. KQL filter uses scenario_name in ({Sampler}).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Get-MetricSummary's silent-success branch (API responded OK but
returned empty timeseries — e.g. VM hadn't emitted yet, wrong
window) now logs a logissue warning naming the metric and
resource. Closes the diagnostic gap where plan_* columns came
back null with no indication why.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
provision and loadTest run on different agents, so the
.seeder-results/<testCaseId>.json files written by
install-umbraco-cms-on-appservice.ps1's local-exec couldn't be
read by the loadTest stage — the dashboard never showed real
seeder durations as a result. The provision.apply.outputVars step
now aggregates all per-case JSONs into a single seederResults
output variable; load-test-job reads it via $env:SEEDER_RESULTS
instead of touching the (absent) filesystem.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two queue-UI changes that move in opposite directions:

* Add seederPresetOverride (Auto/Small/Medium/Large/Massive). Auto
  keeps the existing preset-coupled-to-profile default; explicit
  values let off-diagonal cells run (e.g. Massive content + smoke
  load) and unblock the otherwise-unreachable Massive preset.

* Remove skipLoadTests. Infra-only smoke runs aren't a workflow we
  keep using — the per-stage condition checks and the verifyDeployments
  job go away with it; README references the workflow no longer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three override pills now follow the same '(Auto = match X)'
pattern: 'match tier' for the SKU/DTU pair (tier-coupled) and
'match load profile' for the seeder preset (profile-coupled).
Drops 'use the tier's default' jargon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Azure Load Testing rejects run descriptions over 100 chars. Worst-
case combo (long prerelease + Enterprise + DeliveryApi + Massive +
stress numbers) was hitting ~110 chars; compact separator-driven
template now lands around 70.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves the tolower/trim normalization ahead of arg_max so the
summarize groups by the normalized keys directly, and adds run_id
to the normalization set. Sidesteps KQL arg_max column-naming
quirks that prevented the regression row from joining to the load-
test row even when the underlying values matched.

Applied at all three sites (Top issues, Latest-runs card, Runs
tab). The drill-down panel (which doesn't use a join) was already
showing the correct verdict; this should now make the Runs table
column agree with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…harts

* Glossary moved to its own tab so the wall-of-vocabulary doesn't
  greet every viewer. Top banner now points to it.
* Filter-scope note under the global pills makes the Trends/Compare/
  Runs vs Tiers/Versions split visible without reading the README.
* Tier rows sort by capacity rank (Starter → Enterprise) instead of
  alphabetical — upgrade story reads top-to-bottom.
* Seeder duration median drops 0/negative readings so a bogus
  ElapsedMs from a single run can't drag the column to 0.
* Compare baseline/candidate dropdowns hide failed runs (no_metrics /
  no_results_dir) so picking a known-broken option isn't easy.
* Trends latency chart and resource-pressure chart now render
  side-by-side at 50% width each, sharing the same run-indexed x-axis.
  Direct visual correlation of code-bound vs infra-bound symptoms.
* Runs drill-into-run is a dropdown instead of a typo-prone text
  field, sourced from the same time/filter scope as the Runs table.
* Per-minute resource chart split in two so HTTP error counts get
  their own auto-scaled Y-axis instead of being crushed against the
  0-100 percentage floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Parameter table + paragraph for seederPresetOverride
* Dashboard description names six tabs (Glossary added), reflects
  side-by-side Trends charts, multi-select sampler picker, drill
  dropdown, split per-minute charts on Runs, tier-rank ordering
* loadtest.workbook.json one-liner mentions all six tabs
* .gitignore picks up __pycache__ folders that Locust runs leave behind

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the user deselects all sampler options, the multi-select picker
substituted empty into 'scenario_name in ({Sampler})', producing 'in ()'
which is a KQL syntax error. The Trends line chart and matrix both
failed to render in that state. isRequired=true keeps at least the
'All' sentinel selected, so the in-clause is always well-formed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The in-app filter-scope note above the tabs lists Top issues; the
Glossary's matching line was missing it. One-line consistency fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
It always rendered '✓ Clear' in practice — the regression-table join
that drove it has been silently broken, and even when working the
panel duplicated information the Runs tab + per-tab verdict columns
already surface. Cleared the markdown banner above the tabs that
referenced it, and dropped Top issues from the Glossary + global-
filter scope note + README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The leftouter join on the Runs table column kept missing the
regression_status even when the drill-down panel (no join) found
the row. Replacing arg_max(TimeGenerated, regression_status) with
take_any + max(TimeGenerated) sidesteps arg_max's column-binding
quirks that may have been the cause. take_any is safe here because
check-regression.ps1 writes exactly one regression_check row per
(run × scenario × version × tier).

If this still doesn't fix the join, the diagnostic KQL in the
earlier conversation pinpoints which column actually mismatches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
 Add Umbraco load-testing pipeline + Azure Workbook dashboard
Added Jmeter load tests for v13 and v17
Three coupled fixes landing together because they touch the same
post-build flow:

* Build cache (both local .build-cache/ and shared blob mirror)
  removed — observed time savings didn't materialise in practice.
  Each run now does a fresh dotnet publish. Build-dir cleanup
  happens in the finally{} block as before.

* Deploy step moved INSIDE the try{} block (was after the finally
  cleanup). The previous structure relied on the cache zip living
  outside the build dir so it survived cleanup; without a cache,
  the publish.zip sits inside the build dir and must be deployed
  before that tree is removed.

* az webapp stop wrapped in Stop-AppServiceBestEffort: 3 retries
  with 5/10/20s backoff, then warn-and-continue if all fail.
  Transient 503s from Azure's management API right after a deploy
  were failing provisioning for what's functionally a polish step
  (the load-test stage's az webapp start is idempotent). Applies
  to both call sites — failure-cleanup path and normal-cleanup.

Pipeline side: FetchBuildCacheKey task and BUILD_CACHE_* env vars
removed from azure-pipeline.yml's provision stage. README's build-
cache and storage-cost sections removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants