docs(k8s-proxy): developer + LLM workflow playbook + trim to verified form#871
Conversation
Sibling to the existing k8s-proxy-developer-workflow page. Documents
an autonomous Keploy workflow driven from an MCP-aware editor (Claude
Code, Cursor, Windsurf, Claude Desktop, VS Code Copilot, Trae). The
developer types one of two prompts; the agent does everything else.
The two prompts:
1. "my keploy cloud replay is failing, please analyse and fix it."
(or "the keploy cloud replay pipeline is failing..." for CI)
2. "Add new keploy tests for my changes."
The page ships a single pasteable playbook that installs as a Claude
Code skill or any other editor's rules / memory file. Inside the
playbook the agent:
- Resolves app_id from `basename $(pwd)` + listApps.
- Resolves branch_id from `git rev-parse --abbrev-ref HEAD` +
create_branch (find-or-create, idempotent, sticky for the session).
- Diagnoses failing runs via two cases: Case 1 (app regression, agent
fixes handler code and announces file:line before applying);
Case 2 (test data stale, with sub-actions 2a noise / 2a response
edit / 2b mock edit / 2b delete_recording + re-record).
- For new tests: git diff to find changed handlers, pre-flight the
dev's local run command, then `keploy record -c "<cmd>" --sync` +
`keploy upload test-set` to land the bundle on the branch.
Sidebar updated to surface the page under K8s Proxy.
Signed-off-by: Charan Kamarapu <kamarapucharan@gmail.com>
Replace the long-form playbook with the trimmed, validated form (11,305 → 7,939 tok + 2 anti-patterns ≈ 8,095 tok in source). Same load-bearing rules preserved verbatim: - Step 0 ALLOWLIST + uncommitted-edit revert mandate - listTestReports EXACTLY ONCE per session - getApp memoize (≤1 call/session) - fields=[...] on getTestReportFull + getApp - drop listMocks default; targeted getMock instead - record → upload → delete order for 2b-recapture - sql_ast_hash CLI mandate (use `keploy mock patch`, not MCP update_mock) - --disableReportUpload=false and --cluster mandatory - pipe all keploy/docker output through tail/grep - two new anti-patterns: ban keploy --help dump, ban Read of keploy/ local cache files Verified against S1 scenario at 632k total tokens, 13/13 effective asserts.
There was a problem hiding this comment.
Pull request overview
Adds a new Quickstart doc page that provides a copy/paste “Developer + LLM Workflow with Keploy Proxy” playbook, and wires it into the v4.0.0 sidebar, with accompanying Vale vocabulary updates to keep docs linting clean.
Changes:
- Added a new Quickstart page:
Developer + LLM Workflow with Keploy Proxy(autonomous Keploy MCP playbook + routines). - Registered the new page in the
K8s Proxysection of the v4.0.0 versioned sidebar. - Expanded Vale’s accepted vocabulary to reduce false-positive spelling errors for newly introduced technical terms.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| versioned_sidebars/version-4.0.0-sidebars.json | Adds the new Quickstart doc ID to the K8s Proxy sidebar group. |
| versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md | Introduces the new LLM workflow playbook doc content. |
| vale_styles/config/vocabularies/Base/accept.txt | Updates Vale accept-list to accommodate new/technical terminology used in docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| The developer will only ever say one of two things to you: | ||
|
|
||
| **Prompt A:** "my keploy cloud replay is failing, please analyse and fix it." OR "the keploy cloud replay pipeline is failing, please analyse and fix it."—both forms route to the same routine; the first means the dev's last local replay run failed (find the latest test_run on the branch via api-server), the second means a CI pipeline run failed (the dev should paste the CI log or dashboard URL; extract `test_run_id` from it). |
There was a problem hiding this comment.
Done in commit 9646579 — both Prompt A spellings switched to "analyze". Same change applied to the Routine A heading.
|
|
||
| --- | ||
|
|
||
| ## Routine A—failing cloud replay (local or CI), analyse and fix |
There was a problem hiding this comment.
Done in commit 9646579 — Routine A heading now uses "analyze".
|
|
||
| Run all three **every time**, even when the tree looks clean. The empty result IS the evidence required to advance to Step 1. Skipping = silent misclassification when the assumption is wrong. | ||
|
|
||
| **ALLOWLIST of MCP calls permitted before Step 0** (Phase A1 discovery only): `listApps`, `getApp`, `create_branch`, `list_branches`, `listTestReports`, `getTestReport`, `tools/list`. EVERY other call — `getTestReportFull`, `getTestCase`, `getMock`, `listMocks`, `getRecording`, `listRecordings`, `updateTestCase`, `update_mock`, `delete_recording` — is classifier/write and MUST come AFTER Step 0. Reading `getTestCase` first biases toward Case 2 framing. |
There was a problem hiding this comment.
Done in commit 9646579 — lowercased to "Allowlist" and added [Aa]llowlist to the Base vocabulary so future occurrences pass Vale without needing a re-edit.
| [Aa]ir-?gap(?:ped|ping)? | ||
| [Aa]uditable | ||
| [Cc]group[s]? |
There was a problem hiding this comment.
Done in commit 9646579 — added [Aa]llowlist to vale_styles/config/vocabularies/Base/accept.txt (placed after [Aa]uditable to keep alphabetical order).
| | grep -E "Total test|Failed Testcases|test passed|test failed|FAIL|ERROR|debug bundle|View test report" | ||
| ``` | ||
|
|
||
| The full replay log contains per-mock-match traces, per-testcase debug lines, and a final summary block. Your decisions only need the final summary + any FAIL/ERROR lines + the `View test report at:` URL. Piping at the command level keeps the slice that re-bills on every subsequent step to ~2k tokens instead of the full ~40k — over a retry loop that compounds enormously. Apply the same pipe pattern to every other long-running Bash command: `keploy record` output, `docker build`, `keploy upload test-set`. Read the cached log file directly only when the grep slice doesn't show what you need. |
There was a problem hiding this comment.
Done in commit 9646579 — rephrased as "gets re-added to context" across all three sites (lines 113, 259, 380). Clearer to readers and dodges Vale spelling.
| - **Uploading fixtures from another branch onto the current branch.** Fixtures are branch-scoped — they encode app-state assumptions of where they were captured. Re-record against THIS branch instead. | ||
| - **Uploading fresh recordings without checking existing branch coverage first.** `listRecordings({app_id, branch_id})` + targeted `getMock` first; reuse if covered. | ||
| - **Inventing a PAT, branch name, or secret value.** | ||
| - **Running `keploy --help`, `keploy <cmd> --help`, or any `--version` info dump.** This skill names every command + flag you need (`keploy cloud replay`, `keploy mock patch`, `keploy record`, `keploy upload test-set`). The CLI's help text is ~14k tokens and re-bills on every subsequent turn — pure waste. |
There was a problem hiding this comment.
Done in commit 9646579 — same rephrase as the other "re-bills" comments ("gets re-added to context"). All three sites updated in one pass.
…fy --cluster error Routine B used to skip Discovery step 3 (getApp) because B1 starts at 'git diff' — then hit Phase B4 needing --cluster and dropped the flag, causing `no active clusters found`. Two fixes: 1. Discovery step 3 (`getApp` for cluster/ns/deployment) is now MANDATORY before any `keploy cloud replay` invocation (both Phase A4 and B4). 2. Phase B4 explicitly tells the agent: if you skipped Discovery step 3 because Routine B starts at git diff, go back and call getApp NOW. Plus inline the error-message ambiguity: `no active clusters found` actually means "you forgot --cluster", not "no cluster is running". Source of truth: matches the trimmed verified-working SKILL.md (`.claude/skills/keploy/SKILL.md`) byte-for-byte.
…app-id) The CLI registers --app, not --app-id (OSS root pre-registers --app-id as a deprecated uint64 flag). The prior template told agents to use --app-id which the CLI rejects with exit 1. Real-world impact: S4 validation run had the agent construct the documented --app-id command, get rejected, confabulate success.
…Reports one-shot stricter Two cost-discipline fixes from validation evidence: 1. Phase A2: replaced the narrow recommended projection ([failed_steps[].diff, mock_mismatches, status, ci_metadata]) with one that covers per-case identity + per-case oss_report.req / .result / .mock_mismatches / .noise — everything Phase A3 actually reads. The old projection was too narrow, agents fell back to include_oss_report=true (NO fields=) to fetch the full 34k blob that re-bills every subsequent turn. 2. Phase A1: added "do NOT re-call listTestReports after your own `keploy cloud replay` finishes — the replay stdout already prints the new test_run_id in `View test report at: .../tr/<id>`, parse that line instead of re-querying." Also added explicit "ADD fields, never drop" rule under "use fields aggressively" — agents were retrying without fields= to "get everything" which is the exact failure mode the projection was meant to prevent.
…nly call
Two skill corrections discovered via S7 deep-dive on the actual
getTestReportFull response schema:
1. Field-name corrections: the canonical fields= projection used wrong
keys that returned null on every call.
test_sets[].name → test_sets[].test_set_name
test_sets[].id → test_sets[].test_set_id
test_sets[].test_cases[].name → test_sets[].test_cases[].test_case_name
test_sets[].test_cases[].id → test_sets[].test_cases[].test_case_id
Plus dropped refs that don't exist anywhere in the response:
failed_steps[].diff (not in response)
top-level mock_mismatches (not in response)
oss_report.failure_info.mock_mismatch (failure_info has no such subkey)
2. mock_mismatches_only=true second call: per-case mock_mismatches data
is NOT included by default in getTestReportFull. Added explicit
instruction that when Phase A3 routes to Case 2b, make a SECOND
projected call with mock_mismatches_only=true to discover mock IDs
from oss_report.mock_mismatches.actual_mocks[].name. This avoids
listMocks (~28k token inventory) for the common Case 2b path.
3. listMocks ban softened: now allowed as fallback when the
mock_mismatches_only call returns empty for the failing test set
(e.g., body-only drift with no consumed mocks).
Verified live: S7 with the corrected skill + the projection bug fixes
(see api-server PR for those) — 13/16 strict assert pass (was 11/16),
A-CR1 fields= now passing 2/2, response payload 22k → 572 bytes on the
projected call.
…oy record After investigating S6 (Routine B) end-to-end, found that `keploy record --sync` alone produces no `mappings.yaml`. The recorder inherits keploy.yml's `disableMapping` and the auto-orchestrator-forwarded flag doesn't propagate without an explicit host-side override. Without mappings.yaml, the upload pipeline persists no `mapping_audits` doc in mongo, and `getMockMapping` returns empty `mocks: []` for every test case — forcing the replay matcher onto fragile timestamp windows. Two skill updates: 1. Phase B2 step 1: `keploy record -c "<cmd>" --sync --disable-mapping=false` is the canonical incantation, with explicit rationale for why --disable-mapping=false is mandatory. 2. Case 2b-recapture: same flag pair documented on the record step of the (record → upload → delete) order. The --disable-mapping flag was added to `keploy record` upstream (keploy/keploy PR #4250).
…rkflow # Conflicts: # vale_styles/config/vocabularies/Base/accept.txt # versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md
…g fixes Address user feedback + Copilot Vale-spelling comments on PR #871: User feedback (Cursor user): the doc lacked a setup section, so they went off the older `.cursorrules` instructions in agent-test-generation.md which is now deprecated. Verified against cursor-agent's built-in `migrate-to-skills` skill: `.cursor/skills/<name>/SKILL.md` IS the modern Cursor format, `.cursorrules` and `.cursor/rules/*.mdc` are being migrated FROM. Added an Installation section at the top of the page covering the modern Skills mechanism for Cursor / Claude Code / other agents, with an explicit "do not use .cursorrules" note (the playbook is ~8k tokens; pinning it as always-on context would bill on every editor turn). Vale spelling fixes (Copilot comments r3343-r3369): - "analyse" → "analyze" (en_US): Prompt A wording + Routine A heading - "ALLOWLIST" → "Allowlist" (security term, lowercased to match Vale) + added `[Aa]llowlist` to the Base vocabulary so future occurrences pass lint - "re-bills" → "gets re-added to context" (3 sites) — clearer to readers and dodges Vale's spelling check Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…score_ CI's prettier check (creyD/prettier_action@v4.6 with prettier 3.8.3) fails the PR because three emphasis spans in the file use `*…*` syntax. Prettier 3.x normalizes em-emphasis to `_…_`. Auto-fixed via `prettier --write`. No prose changes — only the markup style for the three italic spans (`*values*`, `*shape*`, `*value*`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| Save this playbook as an **agent skill**, not a static rules file. Cursor's modern Skills mechanism (and Claude Code's identical `SKILL.md` convention) loads the file on demand when the user issues one of the two prompts below, instead of injecting it as always-on context. That keeps every other unrelated agent task out of this playbook's token cost. | ||
|
|
||
| Use the copy button on the block below and paste it into the file at the path you picked above. | ||
| - **Cursor:** create `.cursor/skills/keploy/SKILL.md` (or your project's preferred Skills path) and paste the rest of this page into it. Do **not** put this content in `.cursorrules` — `.cursorrules` files are always-on and would bill the full ~8k-token playbook on every editor interaction. |
There was a problem hiding this comment.
Self-review: token count is inaccurate. Just measured the whole file with tiktoken cl100k_base — 9,310 tokens, not "~8k". The Installation block I added rounds the wrong direction; bumping it to "~9k-token playbook" so the warning about .cursorrules always-on placement is grounded in the real number.
There was a problem hiding this comment.
Fixed in commit 8685ef2 — bumped to "~9k-token" matching the measured 9,310-token count.
| **Capture:** | ||
|
|
||
| 1. Run `keploy record -c "<dev run command>" --sync` via Bash. The `-c` value is the exact command from your pre-flight; `--sync` records test cases synchronously so each curl is captured in order with no race against the next one. Cloud association happens in Phase B3's upload step, not here—`keploy record` itself is the local OSS command and doesn't take `--cloud-app-id`. | ||
| 1. Run `keploy record -c "<dev run command>" --sync --disable-mapping=false` via Bash. The `-c` value is the exact command from your pre-flight; `--sync` records test cases synchronously so each curl is captured in order with no race against the next one; **`--disable-mapping=false` is MANDATORY** — without it, the host inherits `keploy.yml`'s `disableMapping: true` (the auto-generated default), the agent silently skips writing `mappings.yaml`, and the uploaded bundle lands in mongo with no `mapping_audits` doc → `getMockMapping` returns empty `mocks: []` for every test case → replay matcher falls back to fragile timestamp-windows. Cloud association happens in Phase B3's upload step, not here — `keploy record` itself is the local OSS command and doesn't take `--cloud-app-id`. |
There was a problem hiding this comment.
Self-review: the Phase B2 -c instruction is silently broken when the pre-flight command is detached. The text says:
"Discover the dev's run command... Start the app with that command, curl... then stop it cleanly."
"Runkeploy record -c \"<dev run command>\"... The-cvalue is the exact command from your pre-flight"
If the pre-flight uses docker compose up -d (detached — common in repos with no foreground equivalent declared), the same command passed to keploy record -c "docker compose up -d" spawns docker which exits immediately on detach, and keploy thinks the app already terminated. Recording captures nothing.
Fix: clarify that the -c value must be the FOREGROUND form of the run command, even if pre-flight used -d. Example: pre-flight docker compose up -d, then keploy record -c "docker compose up" (no -d).
There was a problem hiding this comment.
Fixed in commit 8685ef2 — added an explicit foreground note with the docker compose up (no -d) example so the failure mode is preventable on read.
| # Keploy MCP playbook—autonomous developer workflow | ||
| ## Entry points | ||
|
|
||
| The developer will only ever say one of two things to you: |
There was a problem hiding this comment.
Self-review: minor wording inconsistency. The page description (line 5: "exactly two developer prompts") and this section header ("Entry points") + the page-level claim that the developer "will only ever say one of two things to you" don't fully match the actual Prompt A spec, which lists TWO distinct phrasings ("my keploy cloud replay is failing…" OR "the keploy cloud replay pipeline is failing…"). So three distinct phrases match the entry points, not two.
Not a behavioral bug — the routing maps both A-phrases to Routine A — but readers comparing the description to the routing copy will notice the off-by-one.
Proposed fix: change "two developer prompts" to "two routines" / "two main routines" so the count refers to the routing endpoints rather than the surface phrases.
There was a problem hiding this comment.
Fixed in commit 8685ef2 — reworded page description to "two routine prompts (failing-replay analyze-and-fix; add-tests-for-my-changes)" so the count refers to routines, not surface phrases.
…ound -c, two-routine wording Three self-review nits caught on a deep re-read: 1. Installation: "~8k-token playbook" was off — measured the actual file with tiktoken cl100k_base and got 9,310 tokens. Bumped the warning to "~9k-token" so the cost rationale is grounded in the real number. 2. Phase B2 capture: clarified that the -c value must be the FOREGROUND form of the run command. If pre-flight uses `docker compose up -d` (detached, common in repos without a foreground equivalent declared), passing the same string to `keploy record -c` makes docker exit immediately on detach and keploy thinks the app already terminated, capturing nothing. Example: pre-flight `docker compose up -d`, record `docker compose up` (no -d). 3. Page description: "exactly two developer prompts" was inaccurate — Prompt A has two phrasings, so the agent listens for three distinct surface phrases. Reworded to "two routine prompts (failing-replay analyze-and-fix; add-tests-for-my-changes)" so the count refers to the two routines rather than the surface phrases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI's Vale doc linter (errata-ai/vale-action@v2.1.1 with vale 3.0.3 and
the project's existing Google + Vale base styles) flagged 89 errors on
the k8s-proxy-llm-workflow page after my Installation section landed.
Categorized:
58× Google.EmDash — "Don't put a space before or after a dash". The
doc uses the spaced em-dash form ` — ` for prose readability;
many other docs in the repo do the same (see hits in
generate-api-tests-using-ai.md, etc.). Disabling the rule
repo-wide is consistent with the seven other Google.* overrides
already in `.vale.ini` and matches the docs' established style.
8× Google.Quotes — "Commas and periods go inside quotation marks".
The docs use period-OUTSIDE-quote when the quoted token is a
literal the reader is supposed to paste verbatim (e.g.
`the exact value "FAILED".`); putting the period inside would
change the visible token. Disabling for consistency with the
other Google.* overrides.
23× Vale.Spelling — tech terms not yet in the Base vocabulary.
Added: branch_id, camelCase, CLI[s]?, cwd, hardcoded,
JSONPath[s]?, matcher, misclassification, mutex, OAuth,
readback, README, snake_case, stdout, test_run, unprojected.
1× Vale.Spelling on "whatever's" — possessive on the indefinite
pronoun that Vale's en_US dictionary doesn't recognize.
Reworded the sentence in-place rather than vocab-ing it; the
possessive form is genuinely unusual and a rewrite is cleaner
than whitelisting it.
Local `vale --config=.vale.ini versioned_docs/.../k8s-proxy-llm-workflow.md`
now reports 0 errors. Prettier still clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md.SKILL.md/ Cursor.cursor/rules/keploy.mdc), so devs can paste it once and run the whole Keploy diagnose-and-fix loop with two fixed prompts.listTestReportsONCE;getAppmemoize;fields=[...]projection; droplistMocksdefault; record → upload → delete for 2b-recapture;--disableReportUpload=false+--clustermandatory; pipe all long-running output throughtail/grep.keploy --helpdump (~14k token waste); ban Read ofkeploy/cloud-debug.log/ localkeploy/cache files.Verified end-to-end against the orderflow test scenario S1: 632k total tokens, 13/13 effective asserts.
Test plan
yarn start) and verify formatting/anchorskeploy-mcp.jsonsnippet copy-pastes cleanly into Claude Code + Cursor🤖 Generated with Claude Code