Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/SKILL-INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,14 @@
- [`addon-clusterdef-topology-componentdef-regex-guide.md`](addon-clusterdef-topology-componentdef-regex-guide.md) — 新增 `serviceVersion` / `ComponentDefinition` family 时,同步更新 `ClusterDefinition.spec.topologies[].components[].compDef` 正则;避免 Cluster `PreCheckFailed` 后无 pod 创建,且 `ClusterDefinition` 自身仍显示 Available 的误判
- [`addon-cmpd-image-override-jsonpath-guide.md`](addon-cmpd-image-override-jsonpath-guide.md) — CMPD 与 ComponentVersion 的两层镜像解析规则、各 container slot 正确的 `kubectl -o jsonpath` 表达式、Oracle 12c/19c/23ai 镜像位置矩阵、T01 sentinel 断言写法、`spec.releases` vs `spec.versions` 陷阱
- [`addon-pvc-rebind-via-workload-intent-guide.md`](addon-pvc-rebind-via-workload-intent-guide.md) — 当一条 OpsRequest 需要把同名 PVC 从一块 PV 改绑到另一块(rebuild / restore-into-place / PV migration),用 Workload CR annotation 把意图交给 Workload 控制器(唯一写者),避免 OpsRequest 控制器、Workload 控制器、动态 provisioner 三方抢同名 PVC 所有权造成 `PersistentVolume "" not found` 或绑错 PV
- [`addon-fake-server-protocol-validation-guide.md`](addon-fake-server-protocol-validation-guide.md) — FakeSentinel / FakeMySQL / fake mongos 等 fake protocol server 的 golden real-server 对照验收方法:raw protocol request、协议感知 reader、命令矩阵分类、known delta 记录、client-consumed fields 覆盖证明;附 Redis Sentinel RESP 案例

### 2. 写新 smoke / chaos 测试

设计 helper / runner / 验收口径,把第一次撞 bug 的责任落对层:

- [`addon-test-acceptance-and-first-blocker-guide.md`](addon-test-acceptance-and-first-blocker-guide.md) — 成功语义分层、first blocker 分层、validation-only gate 身份固定、现场冻结
- [`addon-fake-server-protocol-validation-guide.md`](addon-fake-server-protocol-validation-guide.md) — *(also relevant in: 设计 / 开发新 addon)* fake server 测试不能只看 SDK 调用成功,必须跟真实服务做协议帧级 golden 对照,并把 full match / known delta / blocker 分开写清楚
- [`addon-test-probe-classification-guide.md`](addon-test-probe-classification-guide.md) — 探针失败分到 `route_api` / `<client>_<channel>` / `empty_output` / `parse_empty` / `runtime_mismatch` / `real_*_mismatch` 等正确层
- [`addon-test-dg-helper-completeness-guide.md`](addon-test-dg-helper-completeness-guide.md) — 多步骤异步操作的 test helper 必须用 multi-gate 和 unfakeable observable invariant,避免单一状态字符串在中间态提前返回成功
- [`addon-bounded-eventual-convergence-guide.md`](addon-bounded-eventual-convergence-guide.md) — 异步收敛系统的状态判定必须 bounded retry,禁止单次 snapshot 当结论 *(also relevant in: 设计 / 开发新 addon — addon 启动 / rejoin / reconfigure 后的判定面)*
Expand Down
282 changes: 282 additions & 0 deletions docs/addon-fake-server-protocol-validation-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
# Addon Fake Server Protocol Validation Guide

> **Audience**: addon dev / test / TL
> **Status**: stable
> **Applies to**: any KubeBlocks addon that implements a fake / stub database protocol service
> **Applies to KB version**: any
> **Affected by version skew**: not affected by KubeBlocks version; protocol compatibility depends on the database client and protocol version being emulated

This guide explains how to validate a fake database protocol service, such as a FakeSentinel, FakeMySQL proxy, fake mongos, or fake cluster endpoint. The main rule is simple: compare protocol frames against a real server, not only client SDK behavior. Engine-specific commands and observed Redis Sentinel results are kept in the appendix.

## Plain-Language Summary

### What Problem This Solves

Fake services are useful in tests and control-plane simulations, but they fail silently when their wire protocol is slightly wrong. A client library may tolerate, retry, or hide the mismatch. That makes a high-level SDK test pass while a real workload later fails.

Examples of protocol-level mistakes:

- returning a simple string where the real server returns an array
- returning the right field names but a different array shape
- reading responses by timeout instead of by complete protocol frame
- treating a known delta as acceptable without checking whether the client reads that field

### What You Can Decide After Reading

- whether a fake protocol service is compatible enough for the clients it supports
- which commands must be byte/frame matched against a real server
- which differences can be documented as known deltas
- when a fake server PR should be blocked before merge

## 1. Core Rule: Compare Protocol Frames, Not SDK Results

Do not validate fake-server compatibility only through a high-level SDK call. SDKs often add fallback logic, default values, retries, and parsing tolerance. Those behaviors can hide protocol drift.

The validation target is the wire protocol:

1. Start a real server and the fake server.
2. Send the same raw protocol request to both.
3. Read exactly one complete protocol response from each side.
4. Compare the parsed frame shape and the normalized payload.
5. Classify every delta with an explicit reason.

For RESP, this means comparing frame markers such as `+`, `-`, `:`, `$`, and `*`, not just checking that a Go / Java / Python client call returns without error.

## 2. Use a Protocol-Aware Reader

The reader must stop after one complete response. Do not read a fixed number of lines. Do not read until timeout.

Timeout-based reads create two problems:

- they make tests slow, because every command waits for its deadline
- they hide framing bugs, because a partial response and a complete response can look similar in logs

A protocol-aware reader follows the protocol grammar:

- simple string / error / integer: read one line
- bulk string: read the length, then read exactly that many bytes plus CRLF
- array: read the array length, then recursively read that many elements

For non-RESP protocols, use the same principle: read one full message according to that protocol's packet/header rules.

## 3. Build a Command Matrix

Every fake-server PR should carry a command matrix. Each row defines the expected comparison rule for one command.

| Category | Validation Rule | Merge Meaning |
|---|---|---|
| Full match | Normalize volatile lengths / IDs, then compare frame type and payload | Required for commands the client uses directly |
| Known content delta | Frame type matches; documented payload fields differ | Acceptable only if client-consumed fields are present |
| Known type delta | Frame type differs and `allowTypeMismatch` is explicitly set with a reason | Rare; requires architecture reason and owner sign-off |
| Unknown delta | Any unclassified difference | Block merge |

At minimum, each row should record:

- command name and raw request
- real response frame type
- fake response frame type
- comparison category
- client libraries / paths that consume the response
- required fields for those clients
- reason for any delta

## 4. Normalize Only Volatile Values

Normalization keeps the comparison stable without weakening the protocol contract.

Allowed normalization:

- bulk-string length markers when payload text is compared separately
- timestamps, run IDs, election IDs, generated names
- endpoint addresses that differ between real and fake test fixtures

Not allowed:

- changing frame type (`+` vs `*`, `$` vs `-`)
- dropping array elements that a supported client reads
- ignoring error vs success semantics
- hiding incomplete responses caused by a reader bug

If a value is normalized, the test should say what was normalized and why.

## 5. Acceptance Criteria

### Must Match

- Protocol frame type for every command used by a supported client path.
- Array length for fixed-shape responses.
- Error vs success response type.
- All fields read by supported clients.
- Negative paths for wrong names, missing targets, unknown commands, and auth errors when relevant.

### Acceptable Known Deltas

Known deltas can be accepted only when all of the following are true:

- the delta is listed in the command matrix
- the reason is documented
- the client-consumed fields are present
- a real-server comparison proves the remaining shape is compatible

Examples:

- real server returns many metadata fields, fake returns the subset consumed by supported clients
- same frame type but different generated value
- environment-specific auth behavior, with a separate production-mode test covering the real auth path

### Not Acceptable

- untagged protocol type mismatch
- missing field consumed by a supported client
- tests that pass only because the reader waits for timeout
- SDK-only verification with no raw protocol comparison
- broad "client works" claims without the command matrix

## 6. Review Checklist

Before approving a fake-server protocol implementation:

- [ ] Real server and fake server are both exercised by the same test suite.
- [ ] Each command sends the same raw request to both sides.
- [ ] The test reads one complete protocol response, not "N lines" or "until timeout".
- [ ] The command matrix covers happy paths and negative paths.
- [ ] All client-consumed fields are listed and verified.
- [ ] Every known delta has a reason and a compatibility statement.
- [ ] Unknown type mismatches block merge.
- [ ] The final report separates full matches, known deltas, and blockers.

## Appendix A: Redis Sentinel Golden RESP Case

This appendix records one grounded application of the method: FakeSentinel compatibility validation against a real Redis Sentinel.

### Test Location

Redis line reference implementation:

```text
engines/redis/golden_resp_test.go
```

Feature branch at the time of writing:

```text
feat/fake-sentinel
```

Run command:

```bash
go test ./engines/redis/... -run TestGoldenRESP -v
```

### RESP Reader Shape

The key test helper is a RESP-aware single-response reader:

```go
func readOneRESP(br *bufio.Reader) ([]string, error) {
line, err := br.ReadString('\n')
if err != nil {
return nil, err
}
line = strings.TrimRight(line, "\r\n")
if line == "" {
return []string{line}, nil
}

switch line[0] {
case '+', '-', ':':
return []string{line}, nil
case '$':
n, err := strconv.Atoi(line[1:])
if err != nil || n < 0 {
return []string{line}, err
}
buf := make([]byte, n+2)
if _, err := io.ReadFull(br, buf); err != nil {
return nil, err
}
return []string{line, string(buf[:n])}, nil
case '*':
count, err := strconv.Atoi(line[1:])
if err != nil || count < 0 {
return []string{line}, err
}
lines := []string{line}
for i := 0; i < count; i++ {
sub, err := readOneRESP(br)
if err != nil {
return nil, err
}
lines = append(lines, sub...)
}
return lines, nil
default:
return []string{line}, nil
}
}
```

### Normalization Example

Bulk-string lengths are normalized when content is compared separately:

```go
func normalizeRESP(s string) string {
lines := strings.Split(s, "\n")
for i, line := range lines {
if strings.HasPrefix(line, "$") {
lines[i] = "$N"
}
}
return strings.Join(lines, "\n")
}
```

### Observed Result Summary

Reference environment: Redis 7.0.15 real Sentinel vs FakeSentinel, tested on 2026-05-06.

| Command | Result | Notes |
|---|---|---|
| `PING` | full match | simple string response |
| `CLIENT SETNAME` | full match | simple string response |
| `INFO` | full match | bulk-string response type matches |
| `SENTINEL GET-MASTER-ADDR-BY-NAME` | full match | `*2` structure matches |
| `SENTINEL GET-MASTER-ADDR-BY-NAME` wrong name | full match | null array behavior matches |
| `SENTINEL MASTER` wrong name | full match | error response matches |
| `SENTINEL SLAVES` / `REPLICAS` | full match | empty array matches |
| `SENTINEL SENTINELS` | full match | empty array matches |
| `SENTINEL CKQUORUM` | full match | success type matches |
| `SENTINEL RESET` | full match | integer response matches |
| `SUBSCRIBE +switch-master` | full match | subscription confirmation shape matches |
| `SENTINEL MASTER` correct name | known content delta | real has more fields; fake includes fields required by go-redis |
| `SENTINEL MASTERS` | known content delta | same field coverage rule as `MASTER` |
| `SENTINEL IS-MASTER-DOWN-BY-ADDR` | known content delta | vote ID differs; same bulk-string type; go-redis does not use the field |
| `AUTH` | known type delta | no-password test fixture differs from production password path |
| `SENTINEL FAILOVER` | known type delta | no-replica real Sentinel returns no-good-slave; fake returns OK for the tested control path |

### Required go-redis Field Coverage

For `SENTINEL MASTER` and `SENTINEL MASTERS`, the fake response must include every field read by go-redis:

```text
name
ip
port
flags
num-slaves
quorum
```

If any of these fields is missing, the known content delta is no longer acceptable.

## Appendix B: Applying The Same Pattern Elsewhere

The same method applies to other fake protocol services:

- **MySQL**: compare handshake, auth switch, OK packet, ERR packet, and capability flags against a real server.
- **MongoDB**: compare hello / isMaster, topology fields, wire version, and mongos routing responses.
- **Redis Cluster**: compare `CLUSTER INFO`, `CLUSTER SLOTS`, MOVED / ASK error frames, and empty-cluster cases.

The protocol changes; the acceptance rule does not. Read one complete real response, read one complete fake response, compare typed frames, then classify each delta.
1 change: 1 addition & 0 deletions skills/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Some skills are distilled directly from the methodology guides in `docs/`. These

- `bounded-eventual-convergence` — companion to `docs/addon-bounded-eventual-convergence-guide.md`
- `evidence-discipline` — companion to `docs/addon-evidence-discipline-guide.md`
- `fake-server-protocol-validation` — companion to `docs/addon-fake-server-protocol-validation-guide.md`
- `first-blocker-classification` — companion to `docs/addon-test-acceptance-and-first-blocker-guide.md`
- `github-submission-discipline` — companion to `docs/addon-github-submission-discipline-guide.md`
- `paramdef-range-validation` — companion to `docs/addon-paramdef-cue-range-validation-guide.md`
Expand Down
Loading