From 5e6a8941ae4022c73e93b9bd80a81054274fd573 Mon Sep 17 00:00:00 2001
From: Wei Cao <cyg.cao@gmail.com>
Date: Wed, 6 May 2026 19:48:41 +0800
Subject: [PATCH] docs: add fake server protocol validation guide

---
 docs/SKILL-INDEX.md                           |   2 +
 ...n-fake-server-protocol-validation-guide.md | 282 ++++++++++++++++++
 skills/README.md                              |   1 +
 .../fake-server-protocol-validation/SKILL.md  |  94 ++++++
 4 files changed, 379 insertions(+)
 create mode 100644 docs/addon-fake-server-protocol-validation-guide.md
 create mode 100644 skills/fake-server-protocol-validation/SKILL.md
diff --git a/docs/SKILL-INDEX.md b/docs/SKILL-INDEX.md
index 295c8db..9080c3e 100644
--- a/docs/SKILL-INDEX.md
+++ b/docs/SKILL-INDEX.md
@@ -27,12 +27,14 @@
 - [`addon-clusterdef-topology-componentdef-regex-guide.md`](addon-clusterdef-topology-componentdef-regex-guide.md) — 新增 `serviceVersion` / `ComponentDefinition` family 时，同步更新 `ClusterDefinition.spec.topologies[].components[].compDef` 正则；避免 Cluster `PreCheckFailed` 后无 pod 创建，且 `ClusterDefinition` 自身仍显示 Available 的误判
 - [`addon-cmpd-image-override-jsonpath-guide.md`](addon-cmpd-image-override-jsonpath-guide.md) — CMPD 与 ComponentVersion 的两层镜像解析规则、各 container slot 正确的 `kubectl -o jsonpath` 表达式、Oracle 12c/19c/23ai 镜像位置矩阵、T01 sentinel 断言写法、`spec.releases` vs `spec.versions` 陷阱
 - [`addon-pvc-rebind-via-workload-intent-guide.md`](addon-pvc-rebind-via-workload-intent-guide.md) — 当一条 OpsRequest 需要把同名 PVC 从一块 PV 改绑到另一块（rebuild / restore-into-place / PV migration），用 Workload CR annotation 把意图交给 Workload 控制器（唯一写者），避免 OpsRequest 控制器、Workload 控制器、动态 provisioner 三方抢同名 PVC 所有权造成 `PersistentVolume "" not found` 或绑错 PV
+- [`addon-fake-server-protocol-validation-guide.md`](addon-fake-server-protocol-validation-guide.md) — FakeSentinel / FakeMySQL / fake mongos 等 fake protocol server 的 golden real-server 对照验收方法：raw protocol request、协议感知 reader、命令矩阵分类、known delta 记录、client-consumed fields 覆盖证明；附 Redis Sentinel RESP 案例
 
 ### 2. 写新 smoke / chaos 测试
 
 设计 helper / runner / 验收口径，把第一次撞 bug 的责任落对层：
 
 - [`addon-test-acceptance-and-first-blocker-guide.md`](addon-test-acceptance-and-first-blocker-guide.md) — 成功语义分层、first blocker 分层、validation-only gate 身份固定、现场冻结
+- [`addon-fake-server-protocol-validation-guide.md`](addon-fake-server-protocol-validation-guide.md) — *(also relevant in: 设计 / 开发新 addon)* fake server 测试不能只看 SDK 调用成功，必须跟真实服务做协议帧级 golden 对照，并把 full match / known delta / blocker 分开写清楚
 - [`addon-test-probe-classification-guide.md`](addon-test-probe-classification-guide.md) — 探针失败分到 `route_api` / `<client>_<channel>` / `empty_output` / `parse_empty` / `runtime_mismatch` / `real_*_mismatch` 等正确层
 - [`addon-test-dg-helper-completeness-guide.md`](addon-test-dg-helper-completeness-guide.md) — 多步骤异步操作的 test helper 必须用 multi-gate 和 unfakeable observable invariant，避免单一状态字符串在中间态提前返回成功
 - [`addon-bounded-eventual-convergence-guide.md`](addon-bounded-eventual-convergence-guide.md) — 异步收敛系统的状态判定必须 bounded retry，禁止单次 snapshot 当结论 *(also relevant in: 设计 / 开发新 addon — addon 启动 / rejoin / reconfigure 后的判定面)*
diff --git a/docs/addon-fake-server-protocol-validation-guide.md b/docs/addon-fake-server-protocol-validation-guide.md
new file mode 100644
index 0000000..6f4868d
--- /dev/null
+++ b/docs/addon-fake-server-protocol-validation-guide.md
@@ -0,0 +1,282 @@
+# Addon Fake Server Protocol Validation Guide
+
+> **Audience**: addon dev / test / TL
+> **Status**: stable
+> **Applies to**: any KubeBlocks addon that implements a fake / stub database protocol service
+> **Applies to KB version**: any
+> **Affected by version skew**: not affected by KubeBlocks version; protocol compatibility depends on the database client and protocol version being emulated
+
+This guide explains how to validate a fake database protocol service, such as a FakeSentinel, FakeMySQL proxy, fake mongos, or fake cluster endpoint. The main rule is simple: compare protocol frames against a real server, not only client SDK behavior. Engine-specific commands and observed Redis Sentinel results are kept in the appendix.
+
+## Plain-Language Summary
+
+### What Problem This Solves
+
+Fake services are useful in tests and control-plane simulations, but they fail silently when their wire protocol is slightly wrong. A client library may tolerate, retry, or hide the mismatch. That makes a high-level SDK test pass while a real workload later fails.
+
+Examples of protocol-level mistakes:
+
+- returning a simple string where the real server returns an array
+- returning the right field names but a different array shape
+- reading responses by timeout instead of by complete protocol frame
+- treating a known delta as acceptable without checking whether the client reads that field
+
+### What You Can Decide After Reading
+
+- whether a fake protocol service is compatible enough for the clients it supports
+- which commands must be byte/frame matched against a real server
+- which differences can be documented as known deltas
+- when a fake server PR should be blocked before merge
+
+## 1. Core Rule: Compare Protocol Frames, Not SDK Results
+
+Do not validate fake-server compatibility only through a high-level SDK call. SDKs often add fallback logic, default values, retries, and parsing tolerance. Those behaviors can hide protocol drift.
+
+The validation target is the wire protocol:
+
+1. Start a real server and the fake server.
+2. Send the same raw protocol request to both.
+3. Read exactly one complete protocol response from each side.
+4. Compare the parsed frame shape and the normalized payload.
+5. Classify every delta with an explicit reason.
+
+For RESP, this means comparing frame markers such as `+`, `-`, `:`, `$`, and `*`, not just checking that a Go / Java / Python client call returns without error.
+
+## 2. Use a Protocol-Aware Reader
+
+The reader must stop after one complete response. Do not read a fixed number of lines. Do not read until timeout.
+
+Timeout-based reads create two problems:
+
+- they make tests slow, because every command waits for its deadline
+- they hide framing bugs, because a partial response and a complete response can look similar in logs
+
+A protocol-aware reader follows the protocol grammar:
+
+- simple string / error / integer: read one line
+- bulk string: read the length, then read exactly that many bytes plus CRLF
+- array: read the array length, then recursively read that many elements
+
+For non-RESP protocols, use the same principle: read one full message according to that protocol's packet/header rules.
+
+## 3. Build a Command Matrix
+
+Every fake-server PR should carry a command matrix. Each row defines the expected comparison rule for one command.
+
+| Category | Validation Rule | Merge Meaning |
+|---|---|---|
+| Full match | Normalize volatile lengths / IDs, then compare frame type and payload | Required for commands the client uses directly |
+| Known content delta | Frame type matches; documented payload fields differ | Acceptable only if client-consumed fields are present |
+| Known type delta | Frame type differs and `allowTypeMismatch` is explicitly set with a reason | Rare; requires architecture reason and owner sign-off |
+| Unknown delta | Any unclassified difference | Block merge |
+
+At minimum, each row should record:
+
+- command name and raw request
+- real response frame type
+- fake response frame type
+- comparison category
+- client libraries / paths that consume the response
+- required fields for those clients
+- reason for any delta
+
+## 4. Normalize Only Volatile Values
+
+Normalization keeps the comparison stable without weakening the protocol contract.
+
+Allowed normalization:
+
+- bulk-string length markers when payload text is compared separately
+- timestamps, run IDs, election IDs, generated names
+- endpoint addresses that differ between real and fake test fixtures
+
+Not allowed:
+
+- changing frame type (`+` vs `*`, `$` vs `-`)
+- dropping array elements that a supported client reads
+- ignoring error vs success semantics
+- hiding incomplete responses caused by a reader bug
+
+If a value is normalized, the test should say what was normalized and why.
+
+## 5. Acceptance Criteria
+
+### Must Match
+
+- Protocol frame type for every command used by a supported client path.
+- Array length for fixed-shape responses.
+- Error vs success response type.
+- All fields read by supported clients.
+- Negative paths for wrong names, missing targets, unknown commands, and auth errors when relevant.
+
+### Acceptable Known Deltas
+
+Known deltas can be accepted only when all of the following are true:
+
+- the delta is listed in the command matrix
+- the reason is documented
+- the client-consumed fields are present
+- a real-server comparison proves the remaining shape is compatible
+
+Examples:
+
+- real server returns many metadata fields, fake returns the subset consumed by supported clients
+- same frame type but different generated value
+- environment-specific auth behavior, with a separate production-mode test covering the real auth path
+
+### Not Acceptable
+
+- untagged protocol type mismatch
+- missing field consumed by a supported client
+- tests that pass only because the reader waits for timeout
+- SDK-only verification with no raw protocol comparison
+- broad "client works" claims without the command matrix
+
+## 6. Review Checklist
+
+Before approving a fake-server protocol implementation:
+
+- [ ] Real server and fake server are both exercised by the same test suite.
+- [ ] Each command sends the same raw request to both sides.
+- [ ] The test reads one complete protocol response, not "N lines" or "until timeout".
+- [ ] The command matrix covers happy paths and negative paths.
+- [ ] All client-consumed fields are listed and verified.
+- [ ] Every known delta has a reason and a compatibility statement.
+- [ ] Unknown type mismatches block merge.
+- [ ] The final report separates full matches, known deltas, and blockers.
+
+## Appendix A: Redis Sentinel Golden RESP Case
+
+This appendix records one grounded application of the method: FakeSentinel compatibility validation against a real Redis Sentinel.
+
+### Test Location
+
+Redis line reference implementation:
+
+```text
+engines/redis/golden_resp_test.go
+```
+
+Feature branch at the time of writing:
+
+```text
+feat/fake-sentinel
+```
+
+Run command:
+
+```bash
+go test ./engines/redis/... -run TestGoldenRESP -v
+```
+
+### RESP Reader Shape
+
+The key test helper is a RESP-aware single-response reader:
+
+```go
+func readOneRESP(br *bufio.Reader) ([]string, error) {
+    line, err := br.ReadString('\n')
+    if err != nil {
+        return nil, err
+    }
+    line = strings.TrimRight(line, "\r\n")
+    if line == "" {
+        return []string{line}, nil
+    }
+
+    switch line[0] {
+    case '+', '-', ':':
+        return []string{line}, nil
+    case '$':
+        n, err := strconv.Atoi(line[1:])
+        if err != nil || n < 0 {
+            return []string{line}, err
+        }
+        buf := make([]byte, n+2)
+        if _, err := io.ReadFull(br, buf); err != nil {
+            return nil, err
+        }
+        return []string{line, string(buf[:n])}, nil
+    case '*':
+        count, err := strconv.Atoi(line[1:])
+        if err != nil || count < 0 {
+            return []string{line}, err
+        }
+        lines := []string{line}
+        for i := 0; i < count; i++ {
+            sub, err := readOneRESP(br)
+            if err != nil {
+                return nil, err
+            }
+            lines = append(lines, sub...)
+        }
+        return lines, nil
+    default:
+        return []string{line}, nil
+    }
+}
+```
+
+### Normalization Example
+
+Bulk-string lengths are normalized when content is compared separately:
+
+```go
+func normalizeRESP(s string) string {
+    lines := strings.Split(s, "\n")
+    for i, line := range lines {
+        if strings.HasPrefix(line, "$") {
+            lines[i] = "$N"
+        }
+    }
+    return strings.Join(lines, "\n")
+}
+```
+
+### Observed Result Summary
+
+Reference environment: Redis 7.0.15 real Sentinel vs FakeSentinel, tested on 2026-05-06.
+
+| Command | Result | Notes |
+|---|---|---|
+| `PING` | full match | simple string response |
+| `CLIENT SETNAME` | full match | simple string response |
+| `INFO` | full match | bulk-string response type matches |
+| `SENTINEL GET-MASTER-ADDR-BY-NAME` | full match | `*2` structure matches |
+| `SENTINEL GET-MASTER-ADDR-BY-NAME` wrong name | full match | null array behavior matches |
+| `SENTINEL MASTER` wrong name | full match | error response matches |
+| `SENTINEL SLAVES` / `REPLICAS` | full match | empty array matches |
+| `SENTINEL SENTINELS` | full match | empty array matches |
+| `SENTINEL CKQUORUM` | full match | success type matches |
+| `SENTINEL RESET` | full match | integer response matches |
+| `SUBSCRIBE +switch-master` | full match | subscription confirmation shape matches |
+| `SENTINEL MASTER` correct name | known content delta | real has more fields; fake includes fields required by go-redis |
+| `SENTINEL MASTERS` | known content delta | same field coverage rule as `MASTER` |
+| `SENTINEL IS-MASTER-DOWN-BY-ADDR` | known content delta | vote ID differs; same bulk-string type; go-redis does not use the field |
+| `AUTH` | known type delta | no-password test fixture differs from production password path |
+| `SENTINEL FAILOVER` | known type delta | no-replica real Sentinel returns no-good-slave; fake returns OK for the tested control path |
+
+### Required go-redis Field Coverage
+
+For `SENTINEL MASTER` and `SENTINEL MASTERS`, the fake response must include every field read by go-redis:
+
+```text
+name
+ip
+port
+flags
+num-slaves
+quorum
+```
+
+If any of these fields is missing, the known content delta is no longer acceptable.
+
+## Appendix B: Applying The Same Pattern Elsewhere
+
+The same method applies to other fake protocol services:
+
+- **MySQL**: compare handshake, auth switch, OK packet, ERR packet, and capability flags against a real server.
+- **MongoDB**: compare hello / isMaster, topology fields, wire version, and mongos routing responses.
+- **Redis Cluster**: compare `CLUSTER INFO`, `CLUSTER SLOTS`, MOVED / ASK error frames, and empty-cluster cases.
+
+The protocol changes; the acceptance rule does not. Read one complete real response, read one complete fake response, compare typed frames, then classify each delta.
diff --git a/skills/README.md b/skills/README.md
index 2204ccc..b14a70a 100644
--- a/skills/README.md
+++ b/skills/README.md
@@ -62,6 +62,7 @@ Some skills are distilled directly from the methodology guides in `docs/`. These
 
 - `bounded-eventual-convergence` — companion to `docs/addon-bounded-eventual-convergence-guide.md`
 - `evidence-discipline` — companion to `docs/addon-evidence-discipline-guide.md`
+- `fake-server-protocol-validation` — companion to `docs/addon-fake-server-protocol-validation-guide.md`
 - `first-blocker-classification` — companion to `docs/addon-test-acceptance-and-first-blocker-guide.md`
 - `github-submission-discipline` — companion to `docs/addon-github-submission-discipline-guide.md`
 - `paramdef-range-validation` — companion to `docs/addon-paramdef-cue-range-validation-guide.md`
diff --git a/skills/fake-server-protocol-validation/SKILL.md b/skills/fake-server-protocol-validation/SKILL.md
new file mode 100644
index 0000000..439f56a
--- /dev/null
+++ b/skills/fake-server-protocol-validation/SKILL.md
@@ -0,0 +1,94 @@
+---
+name: fake-server-protocol-validation
+description: Use when implementing, reviewing, or testing a fake/stub database protocol server such as FakeSentinel, FakeMySQL, fake mongos, or Redis Cluster emulation. Forces golden real-server protocol comparison, protocol-aware response reads, command matrix classification, and explicit known-delta signoff instead of trusting high-level SDK behavior.
+allowed-tools: Bash(go test *) Bash(rg *) Read
+---
+
+# Fake Server Protocol Validation
+
+## Hard Rules
+
+1. **Compare against a real server.** A fake server is not compatible until the same raw request has been sent to both real and fake implementations.
+2. **Validate protocol frames, not only SDK results.** Client SDKs can hide fallback, retries, and tolerant parsing.
+3. **Use a protocol-aware reader.** Read exactly one complete response according to the protocol grammar; never read "N lines" or "until timeout".
+4. **Classify every delta.** Each command is `full match`, `known content delta`, `known type delta`, or `blocker`.
+5. **Client-consumed fields are mandatory.** A fake may omit unused metadata only after the supported client read set is listed and verified.
+6. **Unknown type mismatch blocks merge.** Do not accept `+` vs `*`, `$` vs `-`, OK vs ERR, or equivalent protocol type drift without explicit architecture signoff.
+7. **Final report must separate matches, known deltas, and blockers.**
+
+## When To Invoke
+
+Use this skill when:
+
+- implementing a fake / stub database protocol service
+- reviewing a fake server PR
+- adding golden protocol tests for FakeSentinel, FakeMySQL, fake mongos, Redis Cluster emulation, or similar services
+- debugging a client that silently fails or behaves differently against the fake service
+- someone says "the SDK test passed" but no raw protocol comparison exists
+
+Common trigger phrases: `fake server protocol compat check`, `fake server RESP validation`, `golden protocol comparison`, `FakeSentinel compatibility`.
+
+## Workflow
+
+1. **Start both services**
+   - Real server: the closest supported upstream version.
+   - Fake server: the implementation under review.
+   - Keep fixtures minimal and deterministic.
+
+2. **Send identical raw requests**
+   - Use the same command bytes for real and fake.
+   - Include happy paths and negative paths.
+
+3. **Read one complete response**
+   - RESP: simple string / error / integer = one line; bulk string = length + bytes; array = recursive elements.
+   - Other protocols: use packet/header length rules.
+   - A timeout is a test failure, not a successful read strategy.
+
+4. **Normalize only volatile values**
+   - Allowed: generated IDs, timestamps, endpoint addresses, bulk length markers when payload is compared separately.
+   - Not allowed: protocol type, array shape, error-vs-success semantics, required client fields.
+
+5. **Classify each command**
+
+| Category | Meaning | Review Action |
+|---|---|---|
+| `full match` | Frame type and normalized payload match | Accept |
+| `known content delta` | Same frame type; documented content difference | Accept only if client-read fields are covered |
+| `known type delta` | Frame type differs with explicit architecture reason | Rare; needs owner signoff |
+| `blocker` | Unknown delta, missing required field, or wrong type | Fix before merge |
+
+6. **Verify client field coverage**
+   - List the fields read by each supported client/library.
+   - Prove those fields exist in the fake response.
+   - Add a subtest for field coverage if the response intentionally omits real-server metadata.
+
+## Review Checklist
+
+Before approving:
+
+- [ ] Real server and fake server are both exercised.
+- [ ] The test sends the same raw request to both.
+- [ ] The reader is protocol-aware and does not wait for timeout as normal control flow.
+- [ ] Every supported command appears in a command matrix.
+- [ ] Negative/error paths are included.
+- [ ] All client-consumed fields are listed and verified.
+- [ ] Every accepted delta has a reason.
+- [ ] The final report clearly distinguishes matches, known deltas, and blockers.
+
+## Closeout Format
+
+Use a specific closeout:
+
+```text
+Protocol validation passed: real and fake services compared with raw requests; reader is protocol-aware; command matrix has N full matches and M documented known deltas; supported client fields are covered; no unclassified type mismatch remains.
+```
+
+Avoid vague closeouts:
+
+```text
+Fake server works with the client.
+```
+
+## Related Docs
+
+- `docs/addon-fake-server-protocol-validation-guide.md`