spec-sheet: add new envd-object-scalability and cluster-object-limits scenarios#36540
Draft
aljoscha wants to merge 7 commits into
Draft
spec-sheet: add new envd-object-scalability and cluster-object-limits scenarios#36540aljoscha wants to merge 7 commits into
aljoscha wants to merge 7 commits into
Conversation
The existing scenarios scale cluster size or envd CPU cores -- nothing
measures how adapter/envd latency moves as the catalog itself grows. Add
two scenarios under a new `envd_scalability` group that fix the
measurement cluster and vary the number of catalog objects.
`envd_scalability_tables` puts N empty tables in the catalog -- pure
catalog/adapter pressure, no controller load. `envd_scalability_mvs`
does N materialized views over a single 1-row base table -- same
catalog footprint, plus controller load proportional to N. The MV
scenario shards across single-replica pad clusters at 10000 MVs per
cluster (so 100k MVs spans 10 clusters), since one cluster can't
reasonably host that many dataflows.
For each N in {1, 10, 100, 1k, 3k, 5k, 10k, 20k, 30k, 50k, 100k} we run
10 reps each of `CREATE TABLE` (DDL through the coordinator) and
`SELECT * FROM <1-row table>` (a simple peek on a fixed 100cc cluster).
The catalog is built incrementally across size points, so going from
N=k to the next size point only adds (next - k) objects -- otherwise
we'd pay an O(sizes * N) build cost. The size list is overridable via
`--envd-scalability-sizes` for scaffolding runs.
Results land in a third CSV (`*.envd_scalability.csv`) reusing the
cluster CSV schema; `mode='envd_scalability'` distinguishes the rows.
Test analytics rides on the existing `cluster_spec_sheet_result` table
-- no schema change needed. The analyzer plots `time_ms` vs N per
(scenario, category, test_name).
This is going to be long-running, especially the MV scenario where each
create exercises the controller -- expect hours for the full size
range.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add two new scenarios -- cluster_object_limits_indexes and cluster_object_limits_mvs -- that find, per cluster size, the maximum number of idle materializations one cluster can keep fresh. The materializations are derived from a one-row, never-updated base table so the only work the cluster has to do is keep advancing each materialization's write_frontier in step with the upstream table. Once the cluster can't keep up, freshness collapses; the driver records the largest N at which `max(local_lag) < 2s` was still achievable, with the unhealthy data point recorded too so the cliff is visible. Staging-only (rejects --target=cloud-production), to avoid burning production resources on long object-limit searches.
…lability default at 50k When a materialization stalls completely (write_frontier never advances past the minimum timestamp), `mz_internal.mz_materialization_lag` reports `now() - 0` = current unix time in ms (~1.78e12). Recorded as-is this crushes every healthy data point to ~0 on the plot. Cap the recorded value at 10x the healthy threshold (= 20 s), preserve the underlying truth via the `healthy` column, and label the plot to make the cap and healthy threshold explicit. Also drop 100_000 from the envd_scalability default size list: 50_000 is a more sensible default ceiling for staging. The full size list is still override-able via --envd-scalability-sizes for ad-hoc runs.
…tion The release-qualification pipeline already runs three cluster-spec-sheet groups (cluster_compute on production, source_ingestion on production, environmentd on staging). Add two more groups -- envd_scalability and cluster_object_limits -- both running against staging, since both push the catalog / cluster to limits we don't want to exercise on production.
The three "envd / cluster" groups in the cluster-spec-sheet were named inconsistently. Settle on the three concept names the cluster-spec-sheet effort uses verbally: environmentd -> envd_qps_scalability (QPS vs envd CPU) envd_scalability -> envd_objects_scalability (latency vs catalog N) cluster_object_limits -> cluster_object_limits (unchanged) Renames apply to: scenario constants, scenario-name string values, group keys in SCENARIO_GROUPS, class names, the run/analyze function names, the --envd-scalability-sizes CLI flag, the result CSV suffix, and the `mode` field written into CSV rows. The pre-existing QPS scenarios keep their individual `*_envd_strong_scaling` names since only the group is renamed. Also updates the release-qualification pipeline step ids/args and the README to match.
…w start When debugging cluster-spec-sheet runs on staging it's hard to tell which environment we're actually talking to and whether the system parameter defaults we expect (lifted via LaunchDarkly or similar) are actually applied. Add a one-shot diagnostic right after target.initialize() that prints mz_environment_id() and SHOWs the limits the test depends on (max_tables, max_materialized_views, max_objects_per_schema, max_clusters, max_credit_consumption_rate, memory_limiter_interval). Best-effort: any probe error is logged and swallowed so a transient failure does not abort the workflow.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes SQL-222