docs(k8s-proxy): developer + LLM workflow playbook + trim to verified form by charankamarapu · Pull Request #871 · keploy/docs

charankamarapu · 2026-06-07T08:11:56Z

Summary

Adds the Developer + LLM Workflow with Keploy Proxy page under versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md.
The page mirrors the verified-working Keploy MCP skill playbook (Claude Code SKILL.md / Cursor .cursor/rules/keploy.mdc), so devs can paste it once and run the whole Keploy diagnose-and-fix loop with two fixed prompts.
Includes the trimmed, validated form (8,095 tokens) with the load-bearing rules: Step 0 ALLOWLIST + uncommitted-edit revert; listTestReports ONCE; getApp memoize; fields=[...] projection; drop listMocks default; record → upload → delete for 2b-recapture; --disableReportUpload=false + --cluster mandatory; pipe all long-running output through tail/grep.
Two new anti-patterns added: ban keploy --help dump (~14k token waste); ban Read of keploy/cloud-debug.log / local keploy/ cache files.

Verified end-to-end against the orderflow test scenario S1: 632k total tokens, 13/13 effective asserts.

Test plan

Render the page locally (yarn start) and verify formatting/anchors
Verify the keploy-mcp.json snippet copy-pastes cleanly into Claude Code + Cursor
Spot-check the rules table renders correctly in the docs site theme

🤖 Generated with Claude Code

Sibling to the existing k8s-proxy-developer-workflow page. Documents an autonomous Keploy workflow driven from an MCP-aware editor (Claude Code, Cursor, Windsurf, Claude Desktop, VS Code Copilot, Trae). The developer types one of two prompts; the agent does everything else. The two prompts: 1. "my keploy cloud replay is failing, please analyse and fix it." (or "the keploy cloud replay pipeline is failing..." for CI) 2. "Add new keploy tests for my changes." The page ships a single pasteable playbook that installs as a Claude Code skill or any other editor's rules / memory file. Inside the playbook the agent: - Resolves app_id from `basename $(pwd)` + listApps. - Resolves branch_id from `git rev-parse --abbrev-ref HEAD` + create_branch (find-or-create, idempotent, sticky for the session). - Diagnoses failing runs via two cases: Case 1 (app regression, agent fixes handler code and announces file:line before applying); Case 2 (test data stale, with sub-actions 2a noise / 2a response edit / 2b mock edit / 2b delete_recording + re-record). - For new tests: git diff to find changed handlers, pre-flight the dev's local run command, then `keploy record -c "<cmd>" --sync` + `keploy upload test-set` to land the bundle on the branch. Sidebar updated to surface the page under K8s Proxy. Signed-off-by: Charan Kamarapu <kamarapucharan@gmail.com>

Replace the long-form playbook with the trimmed, validated form (11,305 → 7,939 tok + 2 anti-patterns ≈ 8,095 tok in source). Same load-bearing rules preserved verbatim: - Step 0 ALLOWLIST + uncommitted-edit revert mandate - listTestReports EXACTLY ONCE per session - getApp memoize (≤1 call/session) - fields=[...] on getTestReportFull + getApp - drop listMocks default; targeted getMock instead - record → upload → delete order for 2b-recapture - sql_ast_hash CLI mandate (use `keploy mock patch`, not MCP update_mock) - --disableReportUpload=false and --cluster mandatory - pipe all keploy/docker output through tail/grep - two new anti-patterns: ban keploy --help dump, ban Read of keploy/ local cache files Verified against S1 scenario at 632k total tokens, 13/13 effective asserts.

Copilot

Pull request overview

Adds a new Quickstart doc page that provides a copy/paste “Developer + LLM Workflow with Keploy Proxy” playbook, and wires it into the v4.0.0 sidebar, with accompanying Vale vocabulary updates to keep docs linting clean.

Changes:

Added a new Quickstart page: Developer + LLM Workflow with Keploy Proxy (autonomous Keploy MCP playbook + routines).
Registered the new page in the K8s Proxy section of the v4.0.0 versioned sidebar.
Expanded Vale’s accepted vocabulary to reduce false-positive spelling errors for newly introduced technical terms.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
versioned_sidebars/version-4.0.0-sidebars.json	Adds the new Quickstart doc ID to the K8s Proxy sidebar group.
versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md	Introduces the new LLM workflow playbook doc content.
vale_styles/config/vocabularies/Base/accept.txt	Updates Vale accept-list to accommodate new/technical terminology used in docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

charankamarapu · 2026-06-07T14:53:41Z

+
+The developer will only ever say one of two things to you:
+
+**Prompt A:** "my keploy cloud replay is failing, please analyse and fix it." OR "the keploy cloud replay pipeline is failing, please analyse and fix it."—both forms route to the same routine; the first means the dev's last local replay run failed (find the latest test_run on the branch via api-server), the second means a CI pipeline run failed (the dev should paste the CI log or dashboard URL; extract `test_run_id` from it).


Done in commit 9646579 — both Prompt A spellings switched to "analyze". Same change applied to the Routine A heading.

charankamarapu · 2026-06-07T14:53:42Z

+
+---
+
+## Routine A—failing cloud replay (local or CI), analyse and fix


Done in commit 9646579 — Routine A heading now uses "analyze".

charankamarapu · 2026-06-07T14:53:43Z

+
+Run all three **every time**, even when the tree looks clean. The empty result IS the evidence required to advance to Step 1. Skipping = silent misclassification when the assumption is wrong.
+
+**ALLOWLIST of MCP calls permitted before Step 0** (Phase A1 discovery only): `listApps`, `getApp`, `create_branch`, `list_branches`, `listTestReports`, `getTestReport`, `tools/list`. EVERY other call — `getTestReportFull`, `getTestCase`, `getMock`, `listMocks`, `getRecording`, `listRecordings`, `updateTestCase`, `update_mock`, `delete_recording` — is classifier/write and MUST come AFTER Step 0. Reading `getTestCase` first biases toward Case 2 framing.


Done in commit 9646579 — lowercased to "Allowlist" and added [Aa]llowlist to the Base vocabulary so future occurrences pass Vale without needing a re-edit.

charankamarapu · 2026-06-07T14:53:45Z

+[Aa]ir-?gap(?:ped|ping)?
+[Aa]uditable
+[Cc]group[s]?


Done in commit 9646579 — added [Aa]llowlist to vale_styles/config/vocabularies/Base/accept.txt (placed after [Aa]uditable to keep alphabetical order).

charankamarapu · 2026-06-07T14:53:46Z

+  | grep -E "Total test|Failed Testcases|test passed|test failed|FAIL|ERROR|debug bundle|View test report"
+```
+
+The full replay log contains per-mock-match traces, per-testcase debug lines, and a final summary block. Your decisions only need the final summary + any FAIL/ERROR lines + the `View test report at:` URL. Piping at the command level keeps the slice that re-bills on every subsequent step to ~2k tokens instead of the full ~40k — over a retry loop that compounds enormously. Apply the same pipe pattern to every other long-running Bash command: `keploy record` output, `docker build`, `keploy upload test-set`. Read the cached log file directly only when the grep slice doesn't show what you need.


Done in commit 9646579 — rephrased as "gets re-added to context" across all three sites (lines 113, 259, 380). Clearer to readers and dodges Vale spelling.

charankamarapu · 2026-06-07T14:53:47Z

+- **Uploading fixtures from another branch onto the current branch.** Fixtures are branch-scoped — they encode app-state assumptions of where they were captured. Re-record against THIS branch instead.
+- **Uploading fresh recordings without checking existing branch coverage first.** `listRecordings({app_id, branch_id})` + targeted `getMock` first; reuse if covered.
+- **Inventing a PAT, branch name, or secret value.**
+- **Running `keploy --help`, `keploy <cmd> --help`, or any `--version` info dump.** This skill names every command + flag you need (`keploy cloud replay`, `keploy mock patch`, `keploy record`, `keploy upload test-set`). The CLI's help text is ~14k tokens and re-bills on every subsequent turn — pure waste.


Done in commit 9646579 — same rephrase as the other "re-bills" comments ("gets re-added to context"). All three sites updated in one pass.

…fy --cluster error Routine B used to skip Discovery step 3 (getApp) because B1 starts at 'git diff' — then hit Phase B4 needing --cluster and dropped the flag, causing `no active clusters found`. Two fixes: 1. Discovery step 3 (`getApp` for cluster/ns/deployment) is now MANDATORY before any `keploy cloud replay` invocation (both Phase A4 and B4). 2. Phase B4 explicitly tells the agent: if you skipped Discovery step 3 because Routine B starts at git diff, go back and call getApp NOW. Plus inline the error-message ambiguity: `no active clusters found` actually means "you forgot --cluster", not "no cluster is running". Source of truth: matches the trimmed verified-working SKILL.md (`.claude/skills/keploy/SKILL.md`) byte-for-byte.

…app-id) The CLI registers --app, not --app-id (OSS root pre-registers --app-id as a deprecated uint64 flag). The prior template told agents to use --app-id which the CLI rejects with exit 1. Real-world impact: S4 validation run had the agent construct the documented --app-id command, get rejected, confabulate success.

…Reports one-shot stricter Two cost-discipline fixes from validation evidence: 1. Phase A2: replaced the narrow recommended projection ([failed_steps[].diff, mock_mismatches, status, ci_metadata]) with one that covers per-case identity + per-case oss_report.req / .result / .mock_mismatches / .noise — everything Phase A3 actually reads. The old projection was too narrow, agents fell back to include_oss_report=true (NO fields=) to fetch the full 34k blob that re-bills every subsequent turn. 2. Phase A1: added "do NOT re-call listTestReports after your own `keploy cloud replay` finishes — the replay stdout already prints the new test_run_id in `View test report at: .../tr/<id>`, parse that line instead of re-querying." Also added explicit "ADD fields, never drop" rule under "use fields aggressively" — agents were retrying without fields= to "get everything" which is the exact failure mode the projection was meant to prevent.

…nly call Two skill corrections discovered via S7 deep-dive on the actual getTestReportFull response schema: 1. Field-name corrections: the canonical fields= projection used wrong keys that returned null on every call. test_sets[].name → test_sets[].test_set_name test_sets[].id → test_sets[].test_set_id test_sets[].test_cases[].name → test_sets[].test_cases[].test_case_name test_sets[].test_cases[].id → test_sets[].test_cases[].test_case_id Plus dropped refs that don't exist anywhere in the response: failed_steps[].diff (not in response) top-level mock_mismatches (not in response) oss_report.failure_info.mock_mismatch (failure_info has no such subkey) 2. mock_mismatches_only=true second call: per-case mock_mismatches data is NOT included by default in getTestReportFull. Added explicit instruction that when Phase A3 routes to Case 2b, make a SECOND projected call with mock_mismatches_only=true to discover mock IDs from oss_report.mock_mismatches.actual_mocks[].name. This avoids listMocks (~28k token inventory) for the common Case 2b path. 3. listMocks ban softened: now allowed as fallback when the mock_mismatches_only call returns empty for the failing test set (e.g., body-only drift with no consumed mocks). Verified live: S7 with the corrected skill + the projection bug fixes (see api-server PR for those) — 13/16 strict assert pass (was 11/16), A-CR1 fields= now passing 2/2, response payload 22k → 572 bytes on the projected call.

…oy record After investigating S6 (Routine B) end-to-end, found that `keploy record --sync` alone produces no `mappings.yaml`. The recorder inherits keploy.yml's `disableMapping` and the auto-orchestrator-forwarded flag doesn't propagate without an explicit host-side override. Without mappings.yaml, the upload pipeline persists no `mapping_audits` doc in mongo, and `getMockMapping` returns empty `mocks: []` for every test case — forcing the replay matcher onto fragile timestamp windows. Two skill updates: 1. Phase B2 step 1: `keploy record -c "<cmd>" --sync --disable-mapping=false` is the canonical incantation, with explicit rationale for why --disable-mapping=false is mandatory. 2. Case 2b-recapture: same flag pair documented on the record step of the (record → upload → delete) order. The --disable-mapping flag was added to `keploy record` upstream (keploy/keploy PR #4250).

…rkflow # Conflicts: # vale_styles/config/vocabularies/Base/accept.txt # versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md

…g fixes Address user feedback + Copilot Vale-spelling comments on PR #871: User feedback (Cursor user): the doc lacked a setup section, so they went off the older `.cursorrules` instructions in agent-test-generation.md which is now deprecated. Verified against cursor-agent's built-in `migrate-to-skills` skill: `.cursor/skills/<name>/SKILL.md` IS the modern Cursor format, `.cursorrules` and `.cursor/rules/*.mdc` are being migrated FROM. Added an Installation section at the top of the page covering the modern Skills mechanism for Cursor / Claude Code / other agents, with an explicit "do not use .cursorrules" note (the playbook is ~8k tokens; pinning it as always-on context would bill on every editor turn). Vale spelling fixes (Copilot comments r3343-r3369): - "analyse" → "analyze" (en_US): Prompt A wording + Routine A heading - "ALLOWLIST" → "Allowlist" (security term, lowercased to match Vale) + added `[Aa]llowlist` to the Base vocabulary so future occurrences pass lint - "re-bills" → "gets re-added to context" (3 sites) — clearer to readers and dodges Vale's spelling check Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…score_ CI's prettier check (creyD/prettier_action@v4.6 with prettier 3.8.3) fails the PR because three emphasis spans in the file use `*…*` syntax. Prettier 3.x normalizes em-emphasis to `_…_`. Auto-fixed via `prettier --write`. No prose changes — only the markup style for the three italic spans (`*values*`, `*shape*`, `*value*`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

charankamarapu · 2026-06-07T15:16:58Z

+Save this playbook as an **agent skill**, not a static rules file. Cursor's modern Skills mechanism (and Claude Code's identical `SKILL.md` convention) loads the file on demand when the user issues one of the two prompts below, instead of injecting it as always-on context. That keeps every other unrelated agent task out of this playbook's token cost.

-Use the copy button on the block below and paste it into the file at the path you picked above.
+- **Cursor:** create `.cursor/skills/keploy/SKILL.md` (or your project's preferred Skills path) and paste the rest of this page into it. Do **not** put this content in `.cursorrules` — `.cursorrules` files are always-on and would bill the full ~8k-token playbook on every editor interaction.


Self-review: token count is inaccurate. Just measured the whole file with tiktoken cl100k_base — 9,310 tokens, not "~8k". The Installation block I added rounds the wrong direction; bumping it to "~9k-token playbook" so the warning about .cursorrules always-on placement is grounded in the real number.

Fixed in commit 8685ef2 — bumped to "~9k-token" matching the measured 9,310-token count.

charankamarapu · 2026-06-07T15:16:59Z

 **Capture:**

-1. Run `keploy record -c "<dev run command>" --sync` via Bash. The `-c` value is the exact command from your pre-flight; `--sync` records test cases synchronously so each curl is captured in order with no race against the next one. Cloud association happens in Phase B3's upload step, not here—`keploy record` itself is the local OSS command and doesn't take `--cloud-app-id`.
+1. Run `keploy record -c "<dev run command>" --sync --disable-mapping=false` via Bash. The `-c` value is the exact command from your pre-flight; `--sync` records test cases synchronously so each curl is captured in order with no race against the next one; **`--disable-mapping=false` is MANDATORY** — without it, the host inherits `keploy.yml`'s `disableMapping: true` (the auto-generated default), the agent silently skips writing `mappings.yaml`, and the uploaded bundle lands in mongo with no `mapping_audits` doc → `getMockMapping` returns empty `mocks: []` for every test case → replay matcher falls back to fragile timestamp-windows. Cloud association happens in Phase B3's upload step, not here — `keploy record` itself is the local OSS command and doesn't take `--cloud-app-id`.


Self-review: the Phase B2 -c instruction is silently broken when the pre-flight command is detached. The text says:

"Discover the dev's run command... Start the app with that command, curl... then stop it cleanly."
"Run keploy record -c \"<dev run command>\" ... The -c value is the exact command from your pre-flight"

If the pre-flight uses docker compose up -d (detached — common in repos with no foreground equivalent declared), the same command passed to keploy record -c "docker compose up -d" spawns docker which exits immediately on detach, and keploy thinks the app already terminated. Recording captures nothing.

Fix: clarify that the -c value must be the FOREGROUND form of the run command, even if pre-flight used -d. Example: pre-flight docker compose up -d, then keploy record -c "docker compose up" (no -d).

Fixed in commit 8685ef2 — added an explicit foreground note with the docker compose up (no -d) example so the failure mode is preventable on read.

charankamarapu · 2026-06-07T15:17:00Z

-# Keploy MCP playbook—autonomous developer workflow
+## Entry points

 The developer will only ever say one of two things to you:


Self-review: minor wording inconsistency. The page description (line 5: "exactly two developer prompts") and this section header ("Entry points") + the page-level claim that the developer "will only ever say one of two things to you" don't fully match the actual Prompt A spec, which lists TWO distinct phrasings ("my keploy cloud replay is failing…" OR "the keploy cloud replay pipeline is failing…"). So three distinct phrases match the entry points, not two.

Not a behavioral bug — the routing maps both A-phrases to Routine A — but readers comparing the description to the routing copy will notice the off-by-one.

Proposed fix: change "two developer prompts" to "two routines" / "two main routines" so the count refers to the routing endpoints rather than the surface phrases.

Fixed in commit 8685ef2 — reworded page description to "two routine prompts (failing-replay analyze-and-fix; add-tests-for-my-changes)" so the count refers to routines, not surface phrases.

…ound -c, two-routine wording Three self-review nits caught on a deep re-read: 1. Installation: "~8k-token playbook" was off — measured the actual file with tiktoken cl100k_base and got 9,310 tokens. Bumped the warning to "~9k-token" so the cost rationale is grounded in the real number. 2. Phase B2 capture: clarified that the -c value must be the FOREGROUND form of the run command. If pre-flight uses `docker compose up -d` (detached, common in repos without a foreground equivalent declared), passing the same string to `keploy record -c` makes docker exit immediately on detach and keploy thinks the app already terminated, capturing nothing. Example: pre-flight `docker compose up -d`, record `docker compose up` (no -d). 3. Page description: "exactly two developer prompts" was inaccurate — Prompt A has two phrasings, so the agent listens for three distinct surface phrases. Reworded to "two routine prompts (failing-replay analyze-and-fix; add-tests-for-my-changes)" so the count refers to the two routines rather than the surface phrases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CI's Vale doc linter (errata-ai/vale-action@v2.1.1 with vale 3.0.3 and the project's existing Google + Vale base styles) flagged 89 errors on the k8s-proxy-llm-workflow page after my Installation section landed. Categorized: 58× Google.EmDash — "Don't put a space before or after a dash". The doc uses the spaced em-dash form ` — ` for prose readability; many other docs in the repo do the same (see hits in generate-api-tests-using-ai.md, etc.). Disabling the rule repo-wide is consistent with the seven other Google.* overrides already in `.vale.ini` and matches the docs' established style. 8× Google.Quotes — "Commas and periods go inside quotation marks". The docs use period-OUTSIDE-quote when the quoted token is a literal the reader is supposed to paste verbatim (e.g. `the exact value "FAILED".`); putting the period inside would change the visible token. Disabling for consistency with the other Google.* overrides. 23× Vale.Spelling — tech terms not yet in the Base vocabulary. Added: branch_id, camelCase, CLI[s]?, cwd, hardcoded, JSONPath[s]?, matcher, misclassification, mutex, OAuth, readback, README, snake_case, stdout, test_run, unprojected. 1× Vale.Spelling on "whatever's" — possessive on the indefinite pronoun that Vale's en_US dictionary doesn't recognize. Reworded the sentence in-place rather than vocab-ing it; the possessive form is genuinely unusual and a rewrite is cleaner than whitelisting it. Local `vale --config=.vale.ini versioned_docs/.../k8s-proxy-llm-workflow.md` now reports 0 errors. Prettier still clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Aditya-eddy

lgtm

charankamarapu added 2 commits May 20, 2026 16:39

Copilot AI review requested due to automatic review settings June 7, 2026 08:11

Copilot started reviewing on behalf of charankamarapu June 7, 2026 08:12 View session

Copilot AI reviewed Jun 7, 2026

View reviewed changes

charankamarapu and others added 8 commits June 7, 2026 15:48

Merge remote-tracking branch 'origin/main' into docs/k8s-proxy-llm-wo…

6d2f4be

…rkflow # Conflicts: # vale_styles/config/vocabularies/Base/accept.txt # versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md

charankamarapu commented Jun 7, 2026

View reviewed changes

charankamarapu and others added 2 commits June 7, 2026 20:47

Aditya-eddy approved these changes Jun 7, 2026

View reviewed changes

charankamarapu merged commit 038c943 into main Jun 7, 2026
6 of 7 checks passed

charankamarapu deleted the docs/k8s-proxy-llm-workflow branch June 7, 2026 15:47

charankamarapu mentioned this pull request Jun 7, 2026

docs(k8s-proxy-llm-workflow): restore install section — Before you start / Step 1 / Step 2 + copy-block #872

Merged

4 tasks


		The developer will only ever say one of two things to you:

		Prompt A: "my keploy cloud replay is failing, please analyse and fix it." OR "the keploy cloud replay pipeline is failing, please analyse and fix it."—both forms route to the same routine; the first means the dev's last local replay run failed (find the latest test_run on the branch via api-server), the second means a CI pipeline run failed (the dev should paste the CI log or dashboard URL; extract `test_run_id` from it).


		---

		## Routine A—failing cloud replay (local or CI), analyse and fix


		Run all three every time, even when the tree looks clean. The empty result IS the evidence required to advance to Step 1. Skipping = silent misclassification when the assumption is wrong.

		ALLOWLIST of MCP calls permitted before Step 0 (Phase A1 discovery only): `listApps`, `getApp`, `create_branch`, `list_branches`, `listTestReports`, `getTestReport`, `tools/list`. EVERY other call — `getTestReportFull`, `getTestCase`, `getMock`, `listMocks`, `getRecording`, `listRecordings`, `updateTestCase`, `update_mock`, `delete_recording` — is classifier/write and MUST come AFTER Step 0. Reading `getTestCase` first biases toward Case 2 framing.

Uh oh!

Conversation

charankamarapu commented Jun 7, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Aditya-eddy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants