Skip to content

docs(k8s-proxy): developer + LLM workflow playbook + trim to verified form#871

Merged
charankamarapu merged 12 commits into
mainfrom
docs/k8s-proxy-llm-workflow
Jun 7, 2026
Merged

docs(k8s-proxy): developer + LLM workflow playbook + trim to verified form#871
charankamarapu merged 12 commits into
mainfrom
docs/k8s-proxy-llm-workflow

Conversation

@charankamarapu
Copy link
Copy Markdown
Contributor

Summary

  • Adds the Developer + LLM Workflow with Keploy Proxy page under versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md.
  • The page mirrors the verified-working Keploy MCP skill playbook (Claude Code SKILL.md / Cursor .cursor/rules/keploy.mdc), so devs can paste it once and run the whole Keploy diagnose-and-fix loop with two fixed prompts.
  • Includes the trimmed, validated form (8,095 tokens) with the load-bearing rules: Step 0 ALLOWLIST + uncommitted-edit revert; listTestReports ONCE; getApp memoize; fields=[...] projection; drop listMocks default; record → upload → delete for 2b-recapture; --disableReportUpload=false + --cluster mandatory; pipe all long-running output through tail/grep.
  • Two new anti-patterns added: ban keploy --help dump (~14k token waste); ban Read of keploy/cloud-debug.log / local keploy/ cache files.

Verified end-to-end against the orderflow test scenario S1: 632k total tokens, 13/13 effective asserts.

Test plan

  • Render the page locally (yarn start) and verify formatting/anchors
  • Verify the keploy-mcp.json snippet copy-pastes cleanly into Claude Code + Cursor
  • Spot-check the rules table renders correctly in the docs site theme

🤖 Generated with Claude Code

Sibling to the existing k8s-proxy-developer-workflow page. Documents
an autonomous Keploy workflow driven from an MCP-aware editor (Claude
Code, Cursor, Windsurf, Claude Desktop, VS Code Copilot, Trae). The
developer types one of two prompts; the agent does everything else.

The two prompts:
  1. "my keploy cloud replay is failing, please analyse and fix it."
     (or "the keploy cloud replay pipeline is failing..." for CI)
  2. "Add new keploy tests for my changes."

The page ships a single pasteable playbook that installs as a Claude
Code skill or any other editor's rules / memory file. Inside the
playbook the agent:

  - Resolves app_id from `basename $(pwd)` + listApps.
  - Resolves branch_id from `git rev-parse --abbrev-ref HEAD` +
    create_branch (find-or-create, idempotent, sticky for the session).
  - Diagnoses failing runs via two cases: Case 1 (app regression, agent
    fixes handler code and announces file:line before applying);
    Case 2 (test data stale, with sub-actions 2a noise / 2a response
    edit / 2b mock edit / 2b delete_recording + re-record).
  - For new tests: git diff to find changed handlers, pre-flight the
    dev's local run command, then `keploy record -c "<cmd>" --sync` +
    `keploy upload test-set` to land the bundle on the branch.

Sidebar updated to surface the page under K8s Proxy.

Signed-off-by: Charan Kamarapu <kamarapucharan@gmail.com>
Replace the long-form playbook with the trimmed, validated form
(11,305 → 7,939 tok + 2 anti-patterns ≈ 8,095 tok in source). Same
load-bearing rules preserved verbatim:
- Step 0 ALLOWLIST + uncommitted-edit revert mandate
- listTestReports EXACTLY ONCE per session
- getApp memoize (≤1 call/session)
- fields=[...] on getTestReportFull + getApp
- drop listMocks default; targeted getMock instead
- record → upload → delete order for 2b-recapture
- sql_ast_hash CLI mandate (use `keploy mock patch`, not MCP update_mock)
- --disableReportUpload=false and --cluster mandatory
- pipe all keploy/docker output through tail/grep
- two new anti-patterns: ban keploy --help dump, ban Read of
  keploy/ local cache files

Verified against S1 scenario at 632k total tokens, 13/13 effective
asserts.
Copilot AI review requested due to automatic review settings June 7, 2026 08:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Quickstart doc page that provides a copy/paste “Developer + LLM Workflow with Keploy Proxy” playbook, and wires it into the v4.0.0 sidebar, with accompanying Vale vocabulary updates to keep docs linting clean.

Changes:

  • Added a new Quickstart page: Developer + LLM Workflow with Keploy Proxy (autonomous Keploy MCP playbook + routines).
  • Registered the new page in the K8s Proxy section of the v4.0.0 versioned sidebar.
  • Expanded Vale’s accepted vocabulary to reduce false-positive spelling errors for newly introduced technical terms.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
versioned_sidebars/version-4.0.0-sidebars.json Adds the new Quickstart doc ID to the K8s Proxy sidebar group.
versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md Introduces the new LLM workflow playbook doc content.
vale_styles/config/vocabularies/Base/accept.txt Updates Vale accept-list to accommodate new/technical terminology used in docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


The developer will only ever say one of two things to you:

**Prompt A:** "my keploy cloud replay is failing, please analyse and fix it." OR "the keploy cloud replay pipeline is failing, please analyse and fix it."—both forms route to the same routine; the first means the dev's last local replay run failed (find the latest test_run on the branch via api-server), the second means a CI pipeline run failed (the dev should paste the CI log or dashboard URL; extract `test_run_id` from it).
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 9646579 — both Prompt A spellings switched to "analyze". Same change applied to the Routine A heading.


---

## Routine A—failing cloud replay (local or CI), analyse and fix
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 9646579 — Routine A heading now uses "analyze".


Run all three **every time**, even when the tree looks clean. The empty result IS the evidence required to advance to Step 1. Skipping = silent misclassification when the assumption is wrong.

**ALLOWLIST of MCP calls permitted before Step 0** (Phase A1 discovery only): `listApps`, `getApp`, `create_branch`, `list_branches`, `listTestReports`, `getTestReport`, `tools/list`. EVERY other call — `getTestReportFull`, `getTestCase`, `getMock`, `listMocks`, `getRecording`, `listRecordings`, `updateTestCase`, `update_mock`, `delete_recording` — is classifier/write and MUST come AFTER Step 0. Reading `getTestCase` first biases toward Case 2 framing.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 9646579 — lowercased to "Allowlist" and added [Aa]llowlist to the Base vocabulary so future occurrences pass Vale without needing a re-edit.

Comment on lines +1 to +3
[Aa]ir-?gap(?:ped|ping)?
[Aa]uditable
[Cc]group[s]?
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 9646579 — added [Aa]llowlist to vale_styles/config/vocabularies/Base/accept.txt (placed after [Aa]uditable to keep alphabetical order).

| grep -E "Total test|Failed Testcases|test passed|test failed|FAIL|ERROR|debug bundle|View test report"
```

The full replay log contains per-mock-match traces, per-testcase debug lines, and a final summary block. Your decisions only need the final summary + any FAIL/ERROR lines + the `View test report at:` URL. Piping at the command level keeps the slice that re-bills on every subsequent step to ~2k tokens instead of the full ~40k — over a retry loop that compounds enormously. Apply the same pipe pattern to every other long-running Bash command: `keploy record` output, `docker build`, `keploy upload test-set`. Read the cached log file directly only when the grep slice doesn't show what you need.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 9646579 — rephrased as "gets re-added to context" across all three sites (lines 113, 259, 380). Clearer to readers and dodges Vale spelling.

- **Uploading fixtures from another branch onto the current branch.** Fixtures are branch-scoped — they encode app-state assumptions of where they were captured. Re-record against THIS branch instead.
- **Uploading fresh recordings without checking existing branch coverage first.** `listRecordings({app_id, branch_id})` + targeted `getMock` first; reuse if covered.
- **Inventing a PAT, branch name, or secret value.**
- **Running `keploy --help`, `keploy <cmd> --help`, or any `--version` info dump.** This skill names every command + flag you need (`keploy cloud replay`, `keploy mock patch`, `keploy record`, `keploy upload test-set`). The CLI's help text is ~14k tokens and re-bills on every subsequent turn — pure waste.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 9646579 — same rephrase as the other "re-bills" comments ("gets re-added to context"). All three sites updated in one pass.

charankamarapu and others added 8 commits June 7, 2026 15:48
…fy --cluster error

Routine B used to skip Discovery step 3 (getApp) because B1 starts at
'git diff' — then hit Phase B4 needing --cluster and dropped the flag,
causing `no active clusters found`. Two fixes:

1. Discovery step 3 (`getApp` for cluster/ns/deployment) is now MANDATORY
   before any `keploy cloud replay` invocation (both Phase A4 and B4).
2. Phase B4 explicitly tells the agent: if you skipped Discovery
   step 3 because Routine B starts at git diff, go back and call getApp
   NOW. Plus inline the error-message ambiguity: `no active clusters
   found` actually means "you forgot --cluster", not "no cluster is
   running".

Source of truth: matches the trimmed verified-working SKILL.md
(`.claude/skills/keploy/SKILL.md`) byte-for-byte.
…app-id)

The CLI registers --app, not --app-id (OSS root pre-registers --app-id
as a deprecated uint64 flag). The prior template told agents to use
--app-id which the CLI rejects with exit 1.

Real-world impact: S4 validation run had the agent construct the
documented --app-id command, get rejected, confabulate success.
…Reports one-shot stricter

Two cost-discipline fixes from validation evidence:

1. Phase A2: replaced the narrow recommended projection
   ([failed_steps[].diff, mock_mismatches, status, ci_metadata])
   with one that covers per-case identity + per-case oss_report.req /
   .result / .mock_mismatches / .noise — everything Phase A3 actually
   reads. The old projection was too narrow, agents fell back to
   include_oss_report=true (NO fields=) to fetch the full 34k blob
   that re-bills every subsequent turn.

2. Phase A1: added "do NOT re-call listTestReports after your own
   `keploy cloud replay` finishes — the replay stdout already prints
   the new test_run_id in `View test report at: .../tr/<id>`, parse
   that line instead of re-querying."

Also added explicit "ADD fields, never drop" rule under "use fields
aggressively" — agents were retrying without fields= to "get everything"
which is the exact failure mode the projection was meant to prevent.
…nly call

Two skill corrections discovered via S7 deep-dive on the actual
getTestReportFull response schema:

1. Field-name corrections: the canonical fields= projection used wrong
   keys that returned null on every call.
     test_sets[].name              → test_sets[].test_set_name
     test_sets[].id                → test_sets[].test_set_id
     test_sets[].test_cases[].name → test_sets[].test_cases[].test_case_name
     test_sets[].test_cases[].id   → test_sets[].test_cases[].test_case_id
   Plus dropped refs that don't exist anywhere in the response:
     failed_steps[].diff (not in response)
     top-level mock_mismatches (not in response)
     oss_report.failure_info.mock_mismatch (failure_info has no such subkey)

2. mock_mismatches_only=true second call: per-case mock_mismatches data
   is NOT included by default in getTestReportFull. Added explicit
   instruction that when Phase A3 routes to Case 2b, make a SECOND
   projected call with mock_mismatches_only=true to discover mock IDs
   from oss_report.mock_mismatches.actual_mocks[].name. This avoids
   listMocks (~28k token inventory) for the common Case 2b path.

3. listMocks ban softened: now allowed as fallback when the
   mock_mismatches_only call returns empty for the failing test set
   (e.g., body-only drift with no consumed mocks).

Verified live: S7 with the corrected skill + the projection bug fixes
(see api-server PR for those) — 13/16 strict assert pass (was 11/16),
A-CR1 fields= now passing 2/2, response payload 22k → 572 bytes on the
projected call.
…oy record

After investigating S6 (Routine B) end-to-end, found that
`keploy record --sync` alone produces no `mappings.yaml`. The recorder
inherits keploy.yml's `disableMapping` and the auto-orchestrator-forwarded
flag doesn't propagate without an explicit host-side override. Without
mappings.yaml, the upload pipeline persists no `mapping_audits` doc in
mongo, and `getMockMapping` returns empty `mocks: []` for every test
case — forcing the replay matcher onto fragile timestamp windows.

Two skill updates:
1. Phase B2 step 1: `keploy record -c "<cmd>" --sync --disable-mapping=false`
   is the canonical incantation, with explicit rationale for why
   --disable-mapping=false is mandatory.
2. Case 2b-recapture: same flag pair documented on the record step of
   the (record → upload → delete) order.

The --disable-mapping flag was added to `keploy record` upstream
(keploy/keploy PR #4250).
…rkflow

# Conflicts:
#	vale_styles/config/vocabularies/Base/accept.txt
#	versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md
…g fixes

Address user feedback + Copilot Vale-spelling comments on PR #871:

User feedback (Cursor user): the doc lacked a setup section, so they
went off the older `.cursorrules` instructions in agent-test-generation.md
which is now deprecated. Verified against cursor-agent's built-in
`migrate-to-skills` skill: `.cursor/skills/<name>/SKILL.md` IS the
modern Cursor format, `.cursorrules` and `.cursor/rules/*.mdc` are
being migrated FROM. Added an Installation section at the top of the
page covering the modern Skills mechanism for Cursor / Claude Code /
other agents, with an explicit "do not use .cursorrules" note (the
playbook is ~8k tokens; pinning it as always-on context would bill on
every editor turn).

Vale spelling fixes (Copilot comments r3343-r3369):
- "analyse" → "analyze" (en_US): Prompt A wording + Routine A heading
- "ALLOWLIST" → "Allowlist" (security term, lowercased to match Vale)
  + added `[Aa]llowlist` to the Base vocabulary so future occurrences
  pass lint
- "re-bills" → "gets re-added to context" (3 sites) — clearer to
  readers and dodges Vale's spelling check

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…score_

CI's prettier check (creyD/prettier_action@v4.6 with prettier 3.8.3)
fails the PR because three emphasis spans in the file use `*…*`
syntax. Prettier 3.x normalizes em-emphasis to `_…_`. Auto-fixed via
`prettier --write`. No prose changes — only the markup style for
the three italic spans (`*values*`, `*shape*`, `*value*`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Save this playbook as an **agent skill**, not a static rules file. Cursor's modern Skills mechanism (and Claude Code's identical `SKILL.md` convention) loads the file on demand when the user issues one of the two prompts below, instead of injecting it as always-on context. That keeps every other unrelated agent task out of this playbook's token cost.

Use the copy button on the block below and paste it into the file at the path you picked above.
- **Cursor:** create `.cursor/skills/keploy/SKILL.md` (or your project's preferred Skills path) and paste the rest of this page into it. Do **not** put this content in `.cursorrules` — `.cursorrules` files are always-on and would bill the full ~8k-token playbook on every editor interaction.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review: token count is inaccurate. Just measured the whole file with tiktoken cl100k_base — 9,310 tokens, not "~8k". The Installation block I added rounds the wrong direction; bumping it to "~9k-token playbook" so the warning about .cursorrules always-on placement is grounded in the real number.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8685ef2 — bumped to "~9k-token" matching the measured 9,310-token count.

**Capture:**

1. Run `keploy record -c "<dev run command>" --sync` via Bash. The `-c` value is the exact command from your pre-flight; `--sync` records test cases synchronously so each curl is captured in order with no race against the next one. Cloud association happens in Phase B3's upload step, not here`keploy record` itself is the local OSS command and doesn't take `--cloud-app-id`.
1. Run `keploy record -c "<dev run command>" --sync --disable-mapping=false` via Bash. The `-c` value is the exact command from your pre-flight; `--sync` records test cases synchronously so each curl is captured in order with no race against the next one; **`--disable-mapping=false` is MANDATORY** — without it, the host inherits `keploy.yml`'s `disableMapping: true` (the auto-generated default), the agent silently skips writing `mappings.yaml`, and the uploaded bundle lands in mongo with no `mapping_audits` doc → `getMockMapping` returns empty `mocks: []` for every test case → replay matcher falls back to fragile timestamp-windows. Cloud association happens in Phase B3's upload step, not here`keploy record` itself is the local OSS command and doesn't take `--cloud-app-id`.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review: the Phase B2 -c instruction is silently broken when the pre-flight command is detached. The text says:

"Discover the dev's run command... Start the app with that command, curl... then stop it cleanly."
"Run keploy record -c \"<dev run command>\" ... The -c value is the exact command from your pre-flight"

If the pre-flight uses docker compose up -d (detached — common in repos with no foreground equivalent declared), the same command passed to keploy record -c "docker compose up -d" spawns docker which exits immediately on detach, and keploy thinks the app already terminated. Recording captures nothing.

Fix: clarify that the -c value must be the FOREGROUND form of the run command, even if pre-flight used -d. Example: pre-flight docker compose up -d, then keploy record -c "docker compose up" (no -d).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8685ef2 — added an explicit foreground note with the docker compose up (no -d) example so the failure mode is preventable on read.

# Keploy MCP playbook—autonomous developer workflow
## Entry points

The developer will only ever say one of two things to you:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review: minor wording inconsistency. The page description (line 5: "exactly two developer prompts") and this section header ("Entry points") + the page-level claim that the developer "will only ever say one of two things to you" don't fully match the actual Prompt A spec, which lists TWO distinct phrasings ("my keploy cloud replay is failing…" OR "the keploy cloud replay pipeline is failing…"). So three distinct phrases match the entry points, not two.

Not a behavioral bug — the routing maps both A-phrases to Routine A — but readers comparing the description to the routing copy will notice the off-by-one.

Proposed fix: change "two developer prompts" to "two routines" / "two main routines" so the count refers to the routing endpoints rather than the surface phrases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8685ef2 — reworded page description to "two routine prompts (failing-replay analyze-and-fix; add-tests-for-my-changes)" so the count refers to routines, not surface phrases.

charankamarapu and others added 2 commits June 7, 2026 20:47
…ound -c, two-routine wording

Three self-review nits caught on a deep re-read:

1. Installation: "~8k-token playbook" was off — measured the actual file
   with tiktoken cl100k_base and got 9,310 tokens. Bumped the warning
   to "~9k-token" so the cost rationale is grounded in the real number.

2. Phase B2 capture: clarified that the -c value must be the FOREGROUND
   form of the run command. If pre-flight uses `docker compose up -d`
   (detached, common in repos without a foreground equivalent declared),
   passing the same string to `keploy record -c` makes docker exit
   immediately on detach and keploy thinks the app already terminated,
   capturing nothing. Example: pre-flight `docker compose up -d`,
   record `docker compose up` (no -d).

3. Page description: "exactly two developer prompts" was inaccurate —
   Prompt A has two phrasings, so the agent listens for three distinct
   surface phrases. Reworded to "two routine prompts (failing-replay
   analyze-and-fix; add-tests-for-my-changes)" so the count refers to
   the two routines rather than the surface phrases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI's Vale doc linter (errata-ai/vale-action@v2.1.1 with vale 3.0.3 and
the project's existing Google + Vale base styles) flagged 89 errors on
the k8s-proxy-llm-workflow page after my Installation section landed.
Categorized:

  58× Google.EmDash — "Don't put a space before or after a dash". The
        doc uses the spaced em-dash form ` — ` for prose readability;
        many other docs in the repo do the same (see hits in
        generate-api-tests-using-ai.md, etc.). Disabling the rule
        repo-wide is consistent with the seven other Google.* overrides
        already in `.vale.ini` and matches the docs' established style.

   8× Google.Quotes — "Commas and periods go inside quotation marks".
        The docs use period-OUTSIDE-quote when the quoted token is a
        literal the reader is supposed to paste verbatim (e.g.
        `the exact value "FAILED".`); putting the period inside would
        change the visible token. Disabling for consistency with the
        other Google.* overrides.

  23× Vale.Spelling — tech terms not yet in the Base vocabulary.
        Added: branch_id, camelCase, CLI[s]?, cwd, hardcoded,
        JSONPath[s]?, matcher, misclassification, mutex, OAuth,
        readback, README, snake_case, stdout, test_run, unprojected.

   1× Vale.Spelling on "whatever's" — possessive on the indefinite
        pronoun that Vale's en_US dictionary doesn't recognize.
        Reworded the sentence in-place rather than vocab-ing it; the
        possessive form is genuinely unusual and a rewrite is cleaner
        than whitelisting it.

Local `vale --config=.vale.ini versioned_docs/.../k8s-proxy-llm-workflow.md`
now reports 0 errors. Prettier still clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Aditya-eddy Aditya-eddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@charankamarapu charankamarapu merged commit 038c943 into main Jun 7, 2026
6 of 7 checks passed
@charankamarapu charankamarapu deleted the docs/k8s-proxy-llm-workflow branch June 7, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants