Skip to content

Add CSv3 cluster operations doc (add-node, resize, cordon, drain)#17

Draft
lvangool wants to merge 1 commit into
mainfrom
feature/sup-1009-csv3-k3s-cluster-operations-resize-drain-add-node-lack
Draft

Add CSv3 cluster operations doc (add-node, resize, cordon, drain)#17
lvangool wants to merge 1 commit into
mainfrom
feature/sup-1009-csv3-k3s-cluster-operations-resize-drain-add-node-lack

Conversation

@lvangool
Copy link
Copy Markdown
Member

Summary

Adds build-and-config/3/cluster-operations.mdx documenting the four CSv3 (K3s) cluster operations:

  • Adding a node — provisioning + K3s join flow, with the exact timeline error strings (Cloud 66 cannot connect to at least one of your stack servers..., Cannot fetch agent_join_token..., etc.) and the "failed scale-ups don't auto-retry" caveat.
  • Resizing — scale up/down, the database-replication scale-down guard, and an explicit note that in-place vertical resize is not supported (with the horizontal add-then-drain workaround).
  • Cordoningkubectl cordon via the dashboard, 5-min timeout, healthy-control-plane precondition.
  • Drainingkubectl drain via the dashboard, 30-min timeout, PDB/DaemonSet semantics, and timeout troubleshooting.

A prominent Callout notes all four are dashboard-only (no cx command, no public API) and that customers can drop to kubectl with a downloaded kubeconfig.

Why

CSv3 had zero customer-facing docs for these operations — only CSv2/Maestro equivalents. A trial customer on HS #33547 was blocked: "unable to figure out how to resize the cluster, couldn't determine the correct procedure to drain... adding to the cluster repeatedly errored without clear guidance."

Verification

Every behaviour, precondition, and error string was verified against central source code (server_pool.rb, csv3_k3s_clusters/cluster_action_utils.rb, domain/services/csv3/server.rb, operations/services/csv3/server/{drain,cordon}.rb, clusters_controller.rb, clusters/pools_controller.rb) rather than written from assumption.

Linear

  • SUP-1009
  • Originating ticket: HS #33547

Test plan

  • yarn validate:mdx passes
  • New page renders under the CSv3 (/3/) docs; tables and Callouts display
  • Related-links resolve (HA doc, database-replication, upstream K8s drain doc)
  • Re-confirm error strings against current central main before merge (code can drift)

🤖 Generated with Claude Code

CSv3 (K3s) had no customer-facing docs for day-to-day cluster operations;
only CSv2/Maestro equivalents existed. A trial customer on HS #33547 was
blocked trying to resize, drain, and add nodes. Adds
build-and-config/3/cluster-operations.mdx documenting all four operations,
including the dashboard-only nature (no cx/API), async timeline behaviour,
preconditions (healthy control-plane for cordon/drain; replication guards
for scale-down), the lack of in-place vertical resize, and the exact
error strings the platform surfaces. All behaviours verified against
central source.

Linear: SUP-1009

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented May 20, 2026

SUP-1009

@lvangool lvangool marked this pull request as draft May 20, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant