docs: add concise ACP HCP etcd design (no ManagedEtcd CRD) by jiazhiguang · Pull Request #1 · alauda/etcd-operator

jiazhiguang · 2026-06-05T13:11:02Z

Add docs/design/hcp-etcd-design.md, a trimmed, human-readable design for managing etcd under ACP HCP. It carries over the goals of the original acp-hcp-managed-etcd-crd-design.md but corrects them against the current codebase and drops the high-level ManagedEtcd CRD.

Key decisions:

No new high-level CRD. A codebase audit showed the low-level EtcdCluster controller already drives etcd membership directly (MemberList/Add/Remove/ PromoteLearner via internal/etcdutils), so the capabilities the original design assigned to ManagedEtcd (status aggregation, scheduling protection, single-member recovery) are generic etcd concerns. They belong on EtcdCluster and are potentially upstreamable, not in a separate CRD.
Two tracks instead of a wrapper CRD:
- Track 1 (generic, upstreamable): enrich EtcdCluster. Populate the currently-empty EtcdClusterStatus (members/leader/health/conditions) from the data already gathered in-process; extend podTemplate with nodeSelector/tolerations/affinity/topologySpreadConstraints; add a PDB and a -client Service; add HyperShift-style single-member recovery (Job-based, gated on quorum + gracePeriod).
- Track 2 (ACP-specific, optional): publish the Kamaji DataStore. Since track 1 makes the client endpoint and -client-tls secret stable and predictable, "who creates the DataStore" is a replaceable integration point with three options: (A) manual/declarative as a zero-code fallback, (B) a watcher in the Kamaji control-plane provider (preferred — keeps the Kamaji dependency out of etcd-operator), (C) an opt-in reconciler inside the operator. None of them add Kamaji fields to EtcdCluster.spec.

coderabbitai · 2026-06-05T13:11:10Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)

main
master
^\d.x$

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9cb7d629-03bb-4caf-a577-fe98e5d074bd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev/release-0.2-alauda

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Add docs/design/hcp-etcd-design.md, a concise design for highly-available managed etcd under ACP HCP. Goal: keep the hosted control plane's etcd available while management nodes are upgraded/drained. Structure: benchmark against OpenShift HCP (HyperShift), inventory our gap, then close it. - OCP/HyperShift baseline: hosted etcd runs as a 3-member StatefulSet in the management cluster and stays quorum-safe across rolling node upgrades via a PodDisruptionBudget (one member evicted at a time), pod anti-affinity + topology spread, ordered StatefulSet rollout with readiness gating, member-level self-healing, and observable status. - Gap vs OCP: the low-level EtcdCluster controller already drives etcd membership directly (MemberList/Add/Remove/PromoteLearner via internal/etcdutils), but the HA layers are mostly missing — EtcdClusterStatus is an empty struct, podTemplate only carries metadata labels/annotations (no affinity/topology), there is no PDB, no recovery workflow, and only a headless Service (no client Service). - Close the gap in two tracks, no new high-level CRD: - Track 1 (generic HA, upstreamable): enrich EtcdCluster — populate status from the data already gathered in-process; extend podTemplate with scheduling fields; add a PDB and a <name>-client Service; add HyperShift-style single-member recovery (Job-based, gated on quorum + gracePeriod). - Track 2 (ACP-specific, optional): publish the Kamaji DataStore. Track 1 makes the client endpoint and <name>-client-tls secret stable, so "who creates the DataStore" is a replaceable integration point: (A) manual as a zero-code fallback, (B) a watcher in the Kamaji control-plane provider (preferred — keeps the Kamaji dependency out of etcd-operator), (C) an opt-in reconciler inside the operator. None add Kamaji fields to EtcdCluster.spec. No high-level CRD is introduced: the missing pieces are generic etcd HA features that belong on EtcdCluster itself; only DataStore publishing is ACP-specific and does not warrant a separate CRD. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Design for running etcd as the datastore of ACP HCP hosted control planes, kept highly available across management-node rolling upgrades and supporting etcd version upgrades. Benchmarked against OpenShift HCP (HyperShift). Contents: - Benchmark vs OCP and gap analysis; EtcdCluster CRD fields and the StorageClass contract (per-member RWO local PV). - Deploy & upgrade overview: deploy flow (management nodes + TopoLVM → operator → EtcdCluster → optional Kamaji DataStore); two upgrade paths — node rolling (serialized by PDB) and etcd version (serialized by the readyz probe), with the two guarantees: PDB keeps the other members available before a member is disrupted, and readyz only reports a started, voting member (leader/follower) as Ready. - Management-plane changes: dedicated CAPI MachineDeployment (label cpaas.io/hcp-management-node), MachineConfigPool planning of IP/hostname/disk, TopoLVM local storage (TopolvmCluster CR + sc-topolvm-vdc), Baremetal Provider reusing the old node's IP/hostname/disk on roll, and the nodeDrainTimeout=0 invariant (never drain while a PDB is unsatisfied). - etcd-operator changes: status, :9980 readyz/healthz probe (serializable health plus an added not-learner check), scheduling fields, PDB (maxUnavailable:1 + AlwaysAllow), version upgrade (downgrade rejected; minor-by-minor enforced by the upgrade flow), conditional reset-member initContainer, client Service, single-member recovery. - DataStore publishing (optional); create & recovery workflows; observability & operations (where to read live status, stuck-not-crashed failure behavior, when manual intervention is needed, troubleshooting playbook); future work. Single-member recovery only handles genuine data loss/corruption; on ACP the PV/disk is reused on node roll, so members rejoin with their data automatically without recovery. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reorganize the doc into 背景 → Goal/Non-Goal → 核心总结 → OCP 升级对标 → 展开章节, and add §13 备份与恢复 (manual etcd snapshot + Velero backup of the hosted control-plane namespace, restore flow), benchmarked against OCP. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e file Revert hcp-etcd-design.md to the original (a6c3402) and add the restructured + backup/restore version as hcp-etcd-design-v2.md, so both revisions coexist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Background ends after the gap list with a one-line wrap-up - Remove the "不新增高层 CRD" item from Goal/Non-Goal - Rename 核心总结 → 总结 - Reword the quorum trade-off sentence in plain language Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Condense explanatory wording across every section; no design content, tables, diagrams, code blocks, or cross-references removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Keep the design discussion (what to back up with which tool, key trade-offs, restore order); move the concrete runbook out as a separate deliverable. Retitle to 方案 and update cross-references accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… handling PDB-constrained drain is not unique to etcd (any PDB-backed service constrains drain); reframe from "difference" to "same mechanism, the burden is on etcd's own PDB + probes". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reframe §4 as four parallel dimensions (deploy / node upgrade / etcd version upgrade / etcd HA), add the HCP node-topology isolation tiers (Shared Everything / Shared Nothing / Dedicated Request Serving) table. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a four-dimension summary table (deploy topology / node upgrade / etcd version upgrade / etcd HA), keep the detailed HA-mechanism table. Correct §4 point 1: shared mgmt node pool across hosted clusters is Shared Everything (first tier supported), not Shared Nothing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… etcd-operator - Add a divider after §5: chapters before it are background/benchmarking, after it is the formal design. - Move EtcdCluster CRD into §8.1 (first under etcd-operator), fold the full example there as a collapsed <details>. - Rename 高可用改造 → 生产可用改造 (§7/§8/§9). - Renumber all sections and cross-references accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- §7.1: list-format key points + zone-label (topology.kubernetes.io/zone) requirement - §3 point 3: add spec.deletion.nodeDrainTimeout=0/unset prerequisite - §12: trim trade-offs to a single restore-order line - §13 + §2 Non-Goal: drop 永久换机 / local PV 迁移 - §4: remove inaccurate "隔离强度递增" - §5: clarify 降级硬校验 → etcd 版本 skew 校验（拦降级、限制跨 minor） Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

基于 Kamaji + Cluster API 的 ACP HCP 托管控制面 etcd 生产可用设计，对标 OCP HCP（HyperShift）。 - 分析：背景、Goal/Non-Goal、总结、OCP 对标（部署拓扑 / 节点升级 / 版本升级 / etcd HA）、差距盘点。 - 方案：总体部署与升级流程；生产可用改造（管控面节点池 + TopoLVM 复用盘 + PDB；etcd-operator 含 CRD、status、readyz 探针、调度字段、PDB、版本升级、reset-member、client Service、单成员自愈；DataStore 发布）；工作流程；可观测与运维；备份恢复方案（etcd snapshot + Velero）。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Summarize how Hosted Control Planes (HyperShift) handle DR: manual runbook vs scheduled backup, etcd snapshot and OADP/Velero mechanics, upgrade-time backup behavior, backup storage location, fleet-scale backup, and whether OADP can back up and restore the etcd PV. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tidy §8/§10 §2 Goal: 保证 HCP managed 节点升级时 etcd 可用性 / 支持 etcd 版本升级 / etcd 单成员故障自动恢复 / 明确容灾方案(§12). Drops the generic "扩缩容→高可用" and Kamaji/DataStore bullets and the "本期" qualifiers on the Goal/Non-Goal headers. §2 Non-Goal: backup/restore automation; the concrete runbook for backup-restore and quorum-loss recovery (manual, documented separately); TLS cert rotation; auto defrag. Quorum-loss recovery is a manual runbook, not deferred automation. Management-cluster placement and StatefulSet rendering (§3, §7.1, §8): - HCP control plane (incl. etcd) must run on management-cluster WORKER nodes, never master/control-plane — on a master the OVN (ovn-kubernetes) pods can never be evicted, so node drain never completes and blocks machine replacement / upgrades. - etcd pods get a high-priority PriorityClass (e.g. system-cluster-critical) so they aren't preempted / node-pressure evicted; operator sets a default, overridable via podTemplate. - One etcd per HCP cluster (one EtcdCluster). The etcd StatefulSet uses podManagementPolicy: Parallel — since learners are intentionally NotReady (§8.3), the default OrderedReady would stall the StatefulSet on a NotReady learner. Membership is serialized by the operator (one learner at a time) and readyz, not by pod ordering. §12 is now tiered disaster recovery rather than just etcd backup: - Single-member failure (quorum intact) self-heals via §10.2 — no snapshot, no downtime. Quorum loss / control-plane resource loss restores from backup, hosted apiserver unavailable meanwhile. - Two tools: Velero backs up the control-plane namespace resources (EtcdCluster CR, TLS Secret, DataStore) AND PV volumes (incl. etcd's PVC/PV), batching many hosted clusters via includedNamespaces (mirrors OADP); etcd snapshot gives a single-cluster consistent point-in-time image. - Restore order "resources first, then data"; backups land in S3-compatible object storage. §8.2 status conditions: SingleMemberRecoveryActive is the live "recovery in progress" signal; status.recovery keeps only history (lastResult, lastRecoveredMember), dropping the duplicate active field. SingleMemberDegraded dropped as derivable from members[].healthy + QuorumAvailable. §10.2/§11.3 reference SingleMemberRecoveryActive / recovery.lastResult accordingly. §8.1 example / §8.4: drop the hostname topologySpread that duplicated the hostname podAntiAffinity; anti-affinity alone enforces one member per node. Zone spread (ScheduleAnyway) is the opt-in for failure-domain distribution. §10.2: NOSPACE is no longer a single-member rebuild trigger — it's a cluster-wide quota/fragmentation alarm whose fix is compact + defrag + disarm (needs the §13 auto-defrag, not in scope this iteration), so it only alarms. Auto-rebuild handles CORRUPT / member-missing / db-load-failure only. Cross-references (§1, part intro, §11.3) say 容灾 instead of 备份恢复. §14 adds the OCP HCP DR research doc plus the manual snapshot and OADP runbooks. §10.2 health-check Job ETCD_POD_SELECTOR uses app=<name>, matching operator pod labels (utils.go:163) and the §8.1 example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jiazhiguang force-pushed the dev/release-0.2-alauda branch 11 times, most recently from 541b58e to ec4b51d Compare June 5, 2026 14:59

jiazhiguang force-pushed the dev/release-0.2-alauda branch 2 times, most recently from 0672077 to 473f603 Compare June 5, 2026 16:26

docs: refine HCP etcd HA status and PDB design

30b8c3d

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jiazhiguang force-pushed the dev/release-0.2-alauda branch 5 times, most recently from d165ffa to 9338aa5 Compare June 9, 2026 10:42

jiazhiguang force-pushed the dev/release-0.2-alauda branch from 9338aa5 to a6c3402 Compare June 9, 2026 10:49

zgjia and others added 4 commits June 10, 2026 07:06

docs(v2): tighten prose throughout, keep all sections/tables/code

e84e2f3

Condense explanatory wording across every section; no design content, tables, diagrams, code blocks, or cross-references removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jiazhiguang force-pushed the dev/release-0.2-alauda branch from 02b2168 to e84e2f3 Compare June 10, 2026 07:58

jiazhiguang force-pushed the dev/release-0.2-alauda branch from 6c9e2ab to 9439c3a Compare June 10, 2026 08:20

zgjia and others added 5 commits June 10, 2026 08:24

docs(v2): trim §4 — end at "drain PDB constraint not unique to etcd"

cdf71de

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jiazhiguang force-pushed the dev/release-0.2-alauda branch from 45f5795 to 2251f69 Compare June 10, 2026 09:05

jiazhiguang force-pushed the dev/release-0.2-alauda branch from 24c8616 to a08c88e Compare June 10, 2026 09:26

jiazhiguang force-pushed the dev/release-0.2-alauda branch from 5a68ef1 to 3cd2875 Compare June 10, 2026 09:32

jiazhiguang force-pushed the dev/release-0.2-alauda branch 7 times, most recently from 8bdb04a to 20f9c4c Compare June 11, 2026 07:54

jiazhiguang force-pushed the dev/release-0.2-alauda branch from 20f9c4c to f4e3bef Compare June 11, 2026 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add concise ACP HCP etcd design (no ManagedEtcd CRD)#1

docs: add concise ACP HCP etcd design (no ManagedEtcd CRD)#1
jiazhiguang wants to merge 17 commits into
release-0.2-alaudafrom
dev/release-0.2-alauda

jiazhiguang commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiazhiguang commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading