docs: How to deploy a replicated ClickHouse cluster with embedded ClickHouse Keeper#790
docs: How to deploy a replicated ClickHouse cluster with embedded ClickHouse Keeper#790zlcnju wants to merge 1 commit into
Conversation
…kHouse Keeper Covers coordination topology choice (external ZooKeeper / standalone Keeper / embedded Keeper), a full ClickHouseInstallation manifest with init-container-generated raft configuration, quorum and replication verification, and recommended settings for log storage workloads. Notes that the platform ClickHouse Operator (0.20.x) does not include the upstream ClickHouseKeeperInstallation controller, so embedded or standalone Keeper is required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WalkthroughThis pull request adds a comprehensive how-to guide for deploying a replicated ClickHouse cluster using embedded ClickHouse Keeper on Kubernetes. The document covers architecture decisions, prerequisites, manifest examples with init-container configuration logic, step-by-step verification procedures, and operational constraints. ChangesEmbedded ClickHouse Keeper Deployment Guide
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md`:
- Around line 286-287: Update the paragraph to explicitly state that scaling
shards requires updating both SHARDS_COUNT and REPLICAS_COUNT because the Raft
config generation loop uses SHARDS_COUNT as well as REPLICAS_COUNT; mention the
server_id formula `SHARD * REPLICAS_COUNT + REPLICA + 1`, the init container
that writes into the in-memory emptyDir and merges via `include_from`, and that
static fragments under `layout.shards[].files` remain unchanged—so when adding
shards you must increase SHARDS_COUNT (not just REPLICAS_COUNT) to avoid
truncating the generated member list.
- Around line 212-241: The init script uses DOMAIN=$(hostname -d) which returns
only the DNS search suffix, breaking the regex that expects the full pod FQDN;
change DOMAIN assignment to use the full hostname (DOMAIN=$(hostname -f) or
`hostname --fqdn`) so the existing regex that extracts DOMAIN_NAME and
DOMAIN_SUFFIX from the full FQDN works correctly; keep the current regex and
downstream variables (DOMAIN_NAME, DOMAIN_SUFFIX, MY_ID, KEEPER_ID) unchanged so
the generated Raft peer hostnames are correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6cef2b9e-dc4b-42d4-bc22-4ceea6233807
📒 Files selected for processing (1)
docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md
| HOST=$(hostname -s) | ||
| DOMAIN=$(hostname -d) | ||
| # StatefulSet Pod hostname: <chi>-<cluster>-<shard>-<replica>-<ordinal> | ||
| if [[ $HOST =~ (.*)-([0-9]+)-([0-9]+)-([0-9]+)$ ]]; then | ||
| SHARD=${BASH_REMATCH[2]} | ||
| REPLICA=${BASH_REMATCH[3]} | ||
| else | ||
| echo "Failed to parse shard/replica from hostname $HOST"; exit 1 | ||
| fi | ||
| # Pod FQDN domain: <chi>-<cluster>-<shard>-<replica>.<namespace>.svc.<zone> | ||
| if [[ $DOMAIN =~ ^(.*)-([0-9]+)-([0-9]+)\.(.*)$ ]]; then | ||
| DOMAIN_NAME=${BASH_REMATCH[1]} | ||
| DOMAIN_SUFFIX=.${BASH_REMATCH[4]} | ||
| else | ||
| echo "Failed to parse domain $DOMAIN"; exit 1 | ||
| fi | ||
|
|
||
| MY_ID=$((SHARD * REPLICAS_COUNT + REPLICA + 1)) | ||
| KEEPER_ID=1 | ||
| { | ||
| echo "<clickhouse>" | ||
| echo " <keeper_server>" | ||
| echo " <server_id>${MY_ID}</server_id>" | ||
| echo " <raft_configuration>" | ||
| for (( i=0; i<SHARDS_COUNT; i++ )); do | ||
| for (( j=0; j<REPLICAS_COUNT; j++ )); do | ||
| echo " <server>" | ||
| echo " <id>${KEEPER_ID}</id>" | ||
| echo " <hostname>${DOMAIN_NAME}-${i}-${j}${DOMAIN_SUFFIX}</hostname>" | ||
| echo " <port>${RAFT_PORT}</port>" |
There was a problem hiding this comment.
Use the full FQDN here, not hostname -d.
The regex below expects the full pod FQDN, but hostname -d only returns the DNS suffix. As written, the init container will fail to derive the correct DOMAIN_NAME / DOMAIN_SUFFIX, and the generated Raft peer list will be wrong.
🛠️ Suggested fix
- DOMAIN=$(hostname -d)
+ DOMAIN=$(hostname -f)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| HOST=$(hostname -s) | |
| DOMAIN=$(hostname -d) | |
| # StatefulSet Pod hostname: <chi>-<cluster>-<shard>-<replica>-<ordinal> | |
| if [[ $HOST =~ (.*)-([0-9]+)-([0-9]+)-([0-9]+)$ ]]; then | |
| SHARD=${BASH_REMATCH[2]} | |
| REPLICA=${BASH_REMATCH[3]} | |
| else | |
| echo "Failed to parse shard/replica from hostname $HOST"; exit 1 | |
| fi | |
| # Pod FQDN domain: <chi>-<cluster>-<shard>-<replica>.<namespace>.svc.<zone> | |
| if [[ $DOMAIN =~ ^(.*)-([0-9]+)-([0-9]+)\.(.*)$ ]]; then | |
| DOMAIN_NAME=${BASH_REMATCH[1]} | |
| DOMAIN_SUFFIX=.${BASH_REMATCH[4]} | |
| else | |
| echo "Failed to parse domain $DOMAIN"; exit 1 | |
| fi | |
| MY_ID=$((SHARD * REPLICAS_COUNT + REPLICA + 1)) | |
| KEEPER_ID=1 | |
| { | |
| echo "<clickhouse>" | |
| echo " <keeper_server>" | |
| echo " <server_id>${MY_ID}</server_id>" | |
| echo " <raft_configuration>" | |
| for (( i=0; i<SHARDS_COUNT; i++ )); do | |
| for (( j=0; j<REPLICAS_COUNT; j++ )); do | |
| echo " <server>" | |
| echo " <id>${KEEPER_ID}</id>" | |
| echo " <hostname>${DOMAIN_NAME}-${i}-${j}${DOMAIN_SUFFIX}</hostname>" | |
| echo " <port>${RAFT_PORT}</port>" | |
| HOST=$(hostname -s) | |
| DOMAIN=$(hostname -f) | |
| # StatefulSet Pod hostname: <chi>-<cluster>-<shard>-<replica>-<ordinal> | |
| if [[ $HOST =~ (.*)-([0-9]+)-([0-9]+)-([0-9]+)$ ]]; then | |
| SHARD=${BASH_REMATCH[2]} | |
| REPLICA=${BASH_REMATCH[3]} | |
| else | |
| echo "Failed to parse shard/replica from hostname $HOST"; exit 1 | |
| fi | |
| # Pod FQDN domain: <chi>-<cluster>-<shard>-<replica>.<namespace>.svc.<zone> | |
| if [[ $DOMAIN =~ ^(.*)-([0-9]+)-([0-9]+)\.(.*)$ ]]; then | |
| DOMAIN_NAME=${BASH_REMATCH[1]} | |
| DOMAIN_SUFFIX=.${BASH_REMATCH[4]} | |
| else | |
| echo "Failed to parse domain $DOMAIN"; exit 1 | |
| fi | |
| MY_ID=$((SHARD * REPLICAS_COUNT + REPLICA + 1)) | |
| KEEPER_ID=1 | |
| { | |
| echo "<clickhouse>" | |
| echo " <keeper_server>" | |
| echo " <server_id>${MY_ID}</server_id>" | |
| echo " <raft_configuration>" | |
| for (( i=0; i<SHARDS_COUNT; i++ )); do | |
| for (( j=0; j<REPLICAS_COUNT; j++ )); do | |
| echo " <server>" | |
| echo " <id>${KEEPER_ID}</id>" | |
| echo " <hostname>${DOMAIN_NAME}-${i}-${j}${DOMAIN_SUFFIX}</hostname>" | |
| echo " <port>${RAFT_PORT}</port>" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md`
around lines 212 - 241, The init script uses DOMAIN=$(hostname -d) which returns
only the DNS search suffix, breaking the regex that expects the full pod FQDN;
change DOMAIN assignment to use the full hostname (DOMAIN=$(hostname -f) or
`hostname --fqdn`) so the existing regex that extracts DOMAIN_NAME and
DOMAIN_SUFFIX from the full FQDN works correctly; keep the current regex and
downstream variables (DOMAIN_NAME, DOMAIN_SUFFIX, MY_ID, KEEPER_ID) unchanged so
the generated Raft peer hostnames are correct.
| - **Quorum and layout.** `shardsCount: 1` and `replicasCount: 3` produce 3 Pods, each a Keeper Raft member. Keep an odd member count; 3 members tolerate one failure. If you add shards, every replica of every shard joins the quorum, and the `server_id` formula `SHARD * REPLICAS_COUNT + REPLICA + 1` stays unique as long as `REPLICAS_COUNT` in the init container matches the real layout. | ||
| - **Static vs dynamic Keeper config.** The static part (`tcp_port`, data `path`, coordination settings) is injected per shard through `layout.shards[].files`. The identity-dependent part (`server_id`, `raft_configuration`) is generated by the init container into an in-memory `emptyDir` and merged via `include_from`. Replica changes therefore require only updating `REPLICAS_COUNT`, not editing the static fragment. |
There was a problem hiding this comment.
Clarify that shard scaling also requires SHARDS_COUNT.
The Raft config loop uses both SHARDS_COUNT and REPLICAS_COUNT. Calling out only REPLICAS_COUNT makes the “add shards” guidance incomplete and can leave the generated member list truncated.
✍️ Suggested wording change
- - **Quorum and layout.** `shardsCount: 1` and `replicasCount: 3` produce 3 Pods, each a Keeper Raft member. Keep an odd member count; 3 members tolerate one failure. If you add shards, every replica of every shard joins the quorum, and the `server_id` formula `SHARD * REPLICAS_COUNT + REPLICA + 1` stays unique as long as `REPLICAS_COUNT` in the init container matches the real layout.
+ - **Quorum and layout.** `shardsCount: 1` and `replicasCount: 3` produce 3 Pods, each a Keeper Raft member. Keep an odd member count; 3 members tolerate one failure. If you add shards, update both `SHARDS_COUNT` and `REPLICAS_COUNT` in the init container; the `server_id` formula `SHARD * REPLICAS_COUNT + REPLICA + 1` stays unique as long as both values match the real layout.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md`
around lines 286 - 287, Update the paragraph to explicitly state that scaling
shards requires updating both SHARDS_COUNT and REPLICAS_COUNT because the Raft
config generation loop uses SHARDS_COUNT as well as REPLICAS_COUNT; mention the
server_id formula `SHARD * REPLICAS_COUNT + REPLICA + 1`, the init container
that writes into the in-memory emptyDir and merges via `include_from`, and that
static fragments under `layout.shards[].files` remain unchanged—so when adding
shards you must increase SHARDS_COUNT (not just REPLICAS_COUNT) to avoid
truncating the generated member list.
Summary
Adds a How-To article: How to Deploy a Replicated ClickHouse Cluster with Embedded ClickHouse Keeper (
docs/en/solutions/).ClickHouseKeeperInstallation(CHK) controller — the CHK CRD is not installed — so Keeper must run embedded in the ClickHouse Pods or as a standalone StatefulSet.ClickHouseInstallationmanifest: 1 shard × 3 replicas forming a 3-member Keeper Raft quorum, statickeeper_config.xmlinjected per shard, dynamicserver_id/raft_configurationgenerated by an init container viainclude_from, anti-affinity, raft-port readiness probe, per-replica Service exposing 9181/9444.mntrfour-letter command,system.zookeeper, ReplicatedMergeTree smoke test across replicas,system.replicashealth.max_execution_time, retention TTLs) and the trade-offs/constraints of the embedded topology.Validation
Validated on an ACP 4.2 environment (ClickHouse Operator v4.2.3 / chop 0.20.0, ClickHouse Server 25.x):
🤖 Generated with Claude Code
Summary by CodeRabbit