From 5e6a8941ae4022c73e93b9bd80a81054274fd573 Mon Sep 17 00:00:00 2001 From: Wei Cao Date: Wed, 6 May 2026 19:48:41 +0800 Subject: [PATCH] docs: add fake server protocol validation guide --- docs/SKILL-INDEX.md | 2 + ...n-fake-server-protocol-validation-guide.md | 282 ++++++++++++++++++ skills/README.md | 1 + .../fake-server-protocol-validation/SKILL.md | 94 ++++++ 4 files changed, 379 insertions(+) create mode 100644 docs/addon-fake-server-protocol-validation-guide.md create mode 100644 skills/fake-server-protocol-validation/SKILL.md diff --git a/docs/SKILL-INDEX.md b/docs/SKILL-INDEX.md index 295c8db..9080c3e 100644 --- a/docs/SKILL-INDEX.md +++ b/docs/SKILL-INDEX.md @@ -27,12 +27,14 @@ - [`addon-clusterdef-topology-componentdef-regex-guide.md`](addon-clusterdef-topology-componentdef-regex-guide.md) — 新增 `serviceVersion` / `ComponentDefinition` family 时,同步更新 `ClusterDefinition.spec.topologies[].components[].compDef` 正则;避免 Cluster `PreCheckFailed` 后无 pod 创建,且 `ClusterDefinition` 自身仍显示 Available 的误判 - [`addon-cmpd-image-override-jsonpath-guide.md`](addon-cmpd-image-override-jsonpath-guide.md) — CMPD 与 ComponentVersion 的两层镜像解析规则、各 container slot 正确的 `kubectl -o jsonpath` 表达式、Oracle 12c/19c/23ai 镜像位置矩阵、T01 sentinel 断言写法、`spec.releases` vs `spec.versions` 陷阱 - [`addon-pvc-rebind-via-workload-intent-guide.md`](addon-pvc-rebind-via-workload-intent-guide.md) — 当一条 OpsRequest 需要把同名 PVC 从一块 PV 改绑到另一块(rebuild / restore-into-place / PV migration),用 Workload CR annotation 把意图交给 Workload 控制器(唯一写者),避免 OpsRequest 控制器、Workload 控制器、动态 provisioner 三方抢同名 PVC 所有权造成 `PersistentVolume "" not found` 或绑错 PV +- [`addon-fake-server-protocol-validation-guide.md`](addon-fake-server-protocol-validation-guide.md) — FakeSentinel / FakeMySQL / fake mongos 等 fake protocol server 的 golden real-server 对照验收方法:raw protocol request、协议感知 reader、命令矩阵分类、known delta 记录、client-consumed fields 覆盖证明;附 Redis Sentinel RESP 案例 ### 2. 写新 smoke / chaos 测试 设计 helper / runner / 验收口径,把第一次撞 bug 的责任落对层: - [`addon-test-acceptance-and-first-blocker-guide.md`](addon-test-acceptance-and-first-blocker-guide.md) — 成功语义分层、first blocker 分层、validation-only gate 身份固定、现场冻结 +- [`addon-fake-server-protocol-validation-guide.md`](addon-fake-server-protocol-validation-guide.md) — *(also relevant in: 设计 / 开发新 addon)* fake server 测试不能只看 SDK 调用成功,必须跟真实服务做协议帧级 golden 对照,并把 full match / known delta / blocker 分开写清楚 - [`addon-test-probe-classification-guide.md`](addon-test-probe-classification-guide.md) — 探针失败分到 `route_api` / `_` / `empty_output` / `parse_empty` / `runtime_mismatch` / `real_*_mismatch` 等正确层 - [`addon-test-dg-helper-completeness-guide.md`](addon-test-dg-helper-completeness-guide.md) — 多步骤异步操作的 test helper 必须用 multi-gate 和 unfakeable observable invariant,避免单一状态字符串在中间态提前返回成功 - [`addon-bounded-eventual-convergence-guide.md`](addon-bounded-eventual-convergence-guide.md) — 异步收敛系统的状态判定必须 bounded retry,禁止单次 snapshot 当结论 *(also relevant in: 设计 / 开发新 addon — addon 启动 / rejoin / reconfigure 后的判定面)* diff --git a/docs/addon-fake-server-protocol-validation-guide.md b/docs/addon-fake-server-protocol-validation-guide.md new file mode 100644 index 0000000..6f4868d --- /dev/null +++ b/docs/addon-fake-server-protocol-validation-guide.md @@ -0,0 +1,282 @@ +# Addon Fake Server Protocol Validation Guide + +> **Audience**: addon dev / test / TL +> **Status**: stable +> **Applies to**: any KubeBlocks addon that implements a fake / stub database protocol service +> **Applies to KB version**: any +> **Affected by version skew**: not affected by KubeBlocks version; protocol compatibility depends on the database client and protocol version being emulated + +This guide explains how to validate a fake database protocol service, such as a FakeSentinel, FakeMySQL proxy, fake mongos, or fake cluster endpoint. The main rule is simple: compare protocol frames against a real server, not only client SDK behavior. Engine-specific commands and observed Redis Sentinel results are kept in the appendix. + +## Plain-Language Summary + +### What Problem This Solves + +Fake services are useful in tests and control-plane simulations, but they fail silently when their wire protocol is slightly wrong. A client library may tolerate, retry, or hide the mismatch. That makes a high-level SDK test pass while a real workload later fails. + +Examples of protocol-level mistakes: + +- returning a simple string where the real server returns an array +- returning the right field names but a different array shape +- reading responses by timeout instead of by complete protocol frame +- treating a known delta as acceptable without checking whether the client reads that field + +### What You Can Decide After Reading + +- whether a fake protocol service is compatible enough for the clients it supports +- which commands must be byte/frame matched against a real server +- which differences can be documented as known deltas +- when a fake server PR should be blocked before merge + +## 1. Core Rule: Compare Protocol Frames, Not SDK Results + +Do not validate fake-server compatibility only through a high-level SDK call. SDKs often add fallback logic, default values, retries, and parsing tolerance. Those behaviors can hide protocol drift. + +The validation target is the wire protocol: + +1. Start a real server and the fake server. +2. Send the same raw protocol request to both. +3. Read exactly one complete protocol response from each side. +4. Compare the parsed frame shape and the normalized payload. +5. Classify every delta with an explicit reason. + +For RESP, this means comparing frame markers such as `+`, `-`, `:`, `$`, and `*`, not just checking that a Go / Java / Python client call returns without error. + +## 2. Use a Protocol-Aware Reader + +The reader must stop after one complete response. Do not read a fixed number of lines. Do not read until timeout. + +Timeout-based reads create two problems: + +- they make tests slow, because every command waits for its deadline +- they hide framing bugs, because a partial response and a complete response can look similar in logs + +A protocol-aware reader follows the protocol grammar: + +- simple string / error / integer: read one line +- bulk string: read the length, then read exactly that many bytes plus CRLF +- array: read the array length, then recursively read that many elements + +For non-RESP protocols, use the same principle: read one full message according to that protocol's packet/header rules. + +## 3. Build a Command Matrix + +Every fake-server PR should carry a command matrix. Each row defines the expected comparison rule for one command. + +| Category | Validation Rule | Merge Meaning | +|---|---|---| +| Full match | Normalize volatile lengths / IDs, then compare frame type and payload | Required for commands the client uses directly | +| Known content delta | Frame type matches; documented payload fields differ | Acceptable only if client-consumed fields are present | +| Known type delta | Frame type differs and `allowTypeMismatch` is explicitly set with a reason | Rare; requires architecture reason and owner sign-off | +| Unknown delta | Any unclassified difference | Block merge | + +At minimum, each row should record: + +- command name and raw request +- real response frame type +- fake response frame type +- comparison category +- client libraries / paths that consume the response +- required fields for those clients +- reason for any delta + +## 4. Normalize Only Volatile Values + +Normalization keeps the comparison stable without weakening the protocol contract. + +Allowed normalization: + +- bulk-string length markers when payload text is compared separately +- timestamps, run IDs, election IDs, generated names +- endpoint addresses that differ between real and fake test fixtures + +Not allowed: + +- changing frame type (`+` vs `*`, `$` vs `-`) +- dropping array elements that a supported client reads +- ignoring error vs success semantics +- hiding incomplete responses caused by a reader bug + +If a value is normalized, the test should say what was normalized and why. + +## 5. Acceptance Criteria + +### Must Match + +- Protocol frame type for every command used by a supported client path. +- Array length for fixed-shape responses. +- Error vs success response type. +- All fields read by supported clients. +- Negative paths for wrong names, missing targets, unknown commands, and auth errors when relevant. + +### Acceptable Known Deltas + +Known deltas can be accepted only when all of the following are true: + +- the delta is listed in the command matrix +- the reason is documented +- the client-consumed fields are present +- a real-server comparison proves the remaining shape is compatible + +Examples: + +- real server returns many metadata fields, fake returns the subset consumed by supported clients +- same frame type but different generated value +- environment-specific auth behavior, with a separate production-mode test covering the real auth path + +### Not Acceptable + +- untagged protocol type mismatch +- missing field consumed by a supported client +- tests that pass only because the reader waits for timeout +- SDK-only verification with no raw protocol comparison +- broad "client works" claims without the command matrix + +## 6. Review Checklist + +Before approving a fake-server protocol implementation: + +- [ ] Real server and fake server are both exercised by the same test suite. +- [ ] Each command sends the same raw request to both sides. +- [ ] The test reads one complete protocol response, not "N lines" or "until timeout". +- [ ] The command matrix covers happy paths and negative paths. +- [ ] All client-consumed fields are listed and verified. +- [ ] Every known delta has a reason and a compatibility statement. +- [ ] Unknown type mismatches block merge. +- [ ] The final report separates full matches, known deltas, and blockers. + +## Appendix A: Redis Sentinel Golden RESP Case + +This appendix records one grounded application of the method: FakeSentinel compatibility validation against a real Redis Sentinel. + +### Test Location + +Redis line reference implementation: + +```text +engines/redis/golden_resp_test.go +``` + +Feature branch at the time of writing: + +```text +feat/fake-sentinel +``` + +Run command: + +```bash +go test ./engines/redis/... -run TestGoldenRESP -v +``` + +### RESP Reader Shape + +The key test helper is a RESP-aware single-response reader: + +```go +func readOneRESP(br *bufio.Reader) ([]string, error) { + line, err := br.ReadString('\n') + if err != nil { + return nil, err + } + line = strings.TrimRight(line, "\r\n") + if line == "" { + return []string{line}, nil + } + + switch line[0] { + case '+', '-', ':': + return []string{line}, nil + case '$': + n, err := strconv.Atoi(line[1:]) + if err != nil || n < 0 { + return []string{line}, err + } + buf := make([]byte, n+2) + if _, err := io.ReadFull(br, buf); err != nil { + return nil, err + } + return []string{line, string(buf[:n])}, nil + case '*': + count, err := strconv.Atoi(line[1:]) + if err != nil || count < 0 { + return []string{line}, err + } + lines := []string{line} + for i := 0; i < count; i++ { + sub, err := readOneRESP(br) + if err != nil { + return nil, err + } + lines = append(lines, sub...) + } + return lines, nil + default: + return []string{line}, nil + } +} +``` + +### Normalization Example + +Bulk-string lengths are normalized when content is compared separately: + +```go +func normalizeRESP(s string) string { + lines := strings.Split(s, "\n") + for i, line := range lines { + if strings.HasPrefix(line, "$") { + lines[i] = "$N" + } + } + return strings.Join(lines, "\n") +} +``` + +### Observed Result Summary + +Reference environment: Redis 7.0.15 real Sentinel vs FakeSentinel, tested on 2026-05-06. + +| Command | Result | Notes | +|---|---|---| +| `PING` | full match | simple string response | +| `CLIENT SETNAME` | full match | simple string response | +| `INFO` | full match | bulk-string response type matches | +| `SENTINEL GET-MASTER-ADDR-BY-NAME` | full match | `*2` structure matches | +| `SENTINEL GET-MASTER-ADDR-BY-NAME` wrong name | full match | null array behavior matches | +| `SENTINEL MASTER` wrong name | full match | error response matches | +| `SENTINEL SLAVES` / `REPLICAS` | full match | empty array matches | +| `SENTINEL SENTINELS` | full match | empty array matches | +| `SENTINEL CKQUORUM` | full match | success type matches | +| `SENTINEL RESET` | full match | integer response matches | +| `SUBSCRIBE +switch-master` | full match | subscription confirmation shape matches | +| `SENTINEL MASTER` correct name | known content delta | real has more fields; fake includes fields required by go-redis | +| `SENTINEL MASTERS` | known content delta | same field coverage rule as `MASTER` | +| `SENTINEL IS-MASTER-DOWN-BY-ADDR` | known content delta | vote ID differs; same bulk-string type; go-redis does not use the field | +| `AUTH` | known type delta | no-password test fixture differs from production password path | +| `SENTINEL FAILOVER` | known type delta | no-replica real Sentinel returns no-good-slave; fake returns OK for the tested control path | + +### Required go-redis Field Coverage + +For `SENTINEL MASTER` and `SENTINEL MASTERS`, the fake response must include every field read by go-redis: + +```text +name +ip +port +flags +num-slaves +quorum +``` + +If any of these fields is missing, the known content delta is no longer acceptable. + +## Appendix B: Applying The Same Pattern Elsewhere + +The same method applies to other fake protocol services: + +- **MySQL**: compare handshake, auth switch, OK packet, ERR packet, and capability flags against a real server. +- **MongoDB**: compare hello / isMaster, topology fields, wire version, and mongos routing responses. +- **Redis Cluster**: compare `CLUSTER INFO`, `CLUSTER SLOTS`, MOVED / ASK error frames, and empty-cluster cases. + +The protocol changes; the acceptance rule does not. Read one complete real response, read one complete fake response, compare typed frames, then classify each delta. diff --git a/skills/README.md b/skills/README.md index 2204ccc..b14a70a 100644 --- a/skills/README.md +++ b/skills/README.md @@ -62,6 +62,7 @@ Some skills are distilled directly from the methodology guides in `docs/`. These - `bounded-eventual-convergence` — companion to `docs/addon-bounded-eventual-convergence-guide.md` - `evidence-discipline` — companion to `docs/addon-evidence-discipline-guide.md` +- `fake-server-protocol-validation` — companion to `docs/addon-fake-server-protocol-validation-guide.md` - `first-blocker-classification` — companion to `docs/addon-test-acceptance-and-first-blocker-guide.md` - `github-submission-discipline` — companion to `docs/addon-github-submission-discipline-guide.md` - `paramdef-range-validation` — companion to `docs/addon-paramdef-cue-range-validation-guide.md` diff --git a/skills/fake-server-protocol-validation/SKILL.md b/skills/fake-server-protocol-validation/SKILL.md new file mode 100644 index 0000000..439f56a --- /dev/null +++ b/skills/fake-server-protocol-validation/SKILL.md @@ -0,0 +1,94 @@ +--- +name: fake-server-protocol-validation +description: Use when implementing, reviewing, or testing a fake/stub database protocol server such as FakeSentinel, FakeMySQL, fake mongos, or Redis Cluster emulation. Forces golden real-server protocol comparison, protocol-aware response reads, command matrix classification, and explicit known-delta signoff instead of trusting high-level SDK behavior. +allowed-tools: Bash(go test *) Bash(rg *) Read +--- + +# Fake Server Protocol Validation + +## Hard Rules + +1. **Compare against a real server.** A fake server is not compatible until the same raw request has been sent to both real and fake implementations. +2. **Validate protocol frames, not only SDK results.** Client SDKs can hide fallback, retries, and tolerant parsing. +3. **Use a protocol-aware reader.** Read exactly one complete response according to the protocol grammar; never read "N lines" or "until timeout". +4. **Classify every delta.** Each command is `full match`, `known content delta`, `known type delta`, or `blocker`. +5. **Client-consumed fields are mandatory.** A fake may omit unused metadata only after the supported client read set is listed and verified. +6. **Unknown type mismatch blocks merge.** Do not accept `+` vs `*`, `$` vs `-`, OK vs ERR, or equivalent protocol type drift without explicit architecture signoff. +7. **Final report must separate matches, known deltas, and blockers.** + +## When To Invoke + +Use this skill when: + +- implementing a fake / stub database protocol service +- reviewing a fake server PR +- adding golden protocol tests for FakeSentinel, FakeMySQL, fake mongos, Redis Cluster emulation, or similar services +- debugging a client that silently fails or behaves differently against the fake service +- someone says "the SDK test passed" but no raw protocol comparison exists + +Common trigger phrases: `fake server protocol compat check`, `fake server RESP validation`, `golden protocol comparison`, `FakeSentinel compatibility`. + +## Workflow + +1. **Start both services** + - Real server: the closest supported upstream version. + - Fake server: the implementation under review. + - Keep fixtures minimal and deterministic. + +2. **Send identical raw requests** + - Use the same command bytes for real and fake. + - Include happy paths and negative paths. + +3. **Read one complete response** + - RESP: simple string / error / integer = one line; bulk string = length + bytes; array = recursive elements. + - Other protocols: use packet/header length rules. + - A timeout is a test failure, not a successful read strategy. + +4. **Normalize only volatile values** + - Allowed: generated IDs, timestamps, endpoint addresses, bulk length markers when payload is compared separately. + - Not allowed: protocol type, array shape, error-vs-success semantics, required client fields. + +5. **Classify each command** + +| Category | Meaning | Review Action | +|---|---|---| +| `full match` | Frame type and normalized payload match | Accept | +| `known content delta` | Same frame type; documented content difference | Accept only if client-read fields are covered | +| `known type delta` | Frame type differs with explicit architecture reason | Rare; needs owner signoff | +| `blocker` | Unknown delta, missing required field, or wrong type | Fix before merge | + +6. **Verify client field coverage** + - List the fields read by each supported client/library. + - Prove those fields exist in the fake response. + - Add a subtest for field coverage if the response intentionally omits real-server metadata. + +## Review Checklist + +Before approving: + +- [ ] Real server and fake server are both exercised. +- [ ] The test sends the same raw request to both. +- [ ] The reader is protocol-aware and does not wait for timeout as normal control flow. +- [ ] Every supported command appears in a command matrix. +- [ ] Negative/error paths are included. +- [ ] All client-consumed fields are listed and verified. +- [ ] Every accepted delta has a reason. +- [ ] The final report clearly distinguishes matches, known deltas, and blockers. + +## Closeout Format + +Use a specific closeout: + +```text +Protocol validation passed: real and fake services compared with raw requests; reader is protocol-aware; command matrix has N full matches and M documented known deltas; supported client fields are covered; no unclassified type mismatch remains. +``` + +Avoid vague closeouts: + +```text +Fake server works with the client. +``` + +## Related Docs + +- `docs/addon-fake-server-protocol-validation-guide.md`