`) OR `delete_recording` + re-capture via `keploy record` + `keploy upload test-set`.
- `keploy cloud replay` re-run: `/` tests passed.
### Next step for you
@@ -306,7 +318,7 @@ Run report: https://app.keploy.io/tr/?appId=
**Capture:**
-1. Run `keploy record -c "" --sync` via Bash. The `-c` value is the exact command from your pre-flight; `--sync` records test cases synchronously so each curl is captured in order with no race against the next one. Cloud association happens in Phase B3's upload step, not here—`keploy record` itself is the local OSS command and doesn't take `--cloud-app-id`.
+1. Run `keploy record -c "" --sync --disable-mapping=false` via Bash. The `-c` value must be the **foreground** form of the run command — if your pre-flight used the detached/background form (e.g. `docker compose up -d`), pass the foreground variant here (`docker compose up`, no `-d`). Detached commands return immediately on launch and keploy treats the early exit as "app stopped", capturing nothing. `--sync` records test cases synchronously so each curl is captured in order with no race against the next one; **`--disable-mapping=false` is MANDATORY** — without it, the host inherits `keploy.yml`'s `disableMapping: true` (the auto-generated default), the agent silently skips writing `mappings.yaml`, and the uploaded bundle lands in mongo with no `mapping_audits` doc → `getMockMapping` returns empty `mocks: []` for every test case → replay matcher falls back to fragile timestamp-windows. Cloud association happens in Phase B3's upload step, not here — `keploy record` itself is the local OSS command and doesn't take `--cloud-app-id`.
2. For each new/changed endpoint, drive ONE realistic curl. Infer body shape from the OpenAPI spec if there is one, otherwise from the handler signature itself.
3. Stop `keploy record` (kill the PID you captured at step 1, or send Ctrl-C equivalent).
4. The recording lands at `keploy/test-set-N/` on disk.
@@ -325,14 +337,13 @@ keploy upload test-set \
### Phase B4—Validate
-For **local** validation (dev's laptop) — pass `--cluster` (from Discovery), and start the app yourself via `-c` + `--container-name`:
-
```bash
-keploy cloud replay --app --cluster "" --branch-name \
- -c "" --container-name --disableReportUpload=false
+keploy cloud replay --app --branch-name --cluster --disableReportUpload=false 2>&1 \
+ | tail -n 60 \
+ | grep -E "Total test|Failed Testcases|test passed|test failed|FAIL|ERROR|debug bundle|View test report"
```
-For **CI / active-cluster** runs, omit `-c`/`--container-name`/`--disableReportUpload` and let the in-cluster agent run the deployment.
+`--cluster` is mandatory — resolve from the `getApp` call you made in Discovery (you cached `origin.clusterName`; do NOT re-call `getApp`). **If you skipped Discovery step 3 because Routine B "starts at git diff" — go back and call `getApp` NOW before replay. Without `--cluster`, the CLI dies with `no active clusters found`, which sounds like "no cluster is running" but actually means "you forgot the flag".** `--disableReportUpload=false` is mandatory too — OAuth CLIs default it to `true` which silently skips the `/tr` report upload. Pipe through `tail`/`grep` for the same context-cost reason as Phase A4.
If anything failed, enter Routine A from Phase A2—the diagnosis routine handles it.
@@ -367,112 +378,16 @@ Everything else—what failed and why, which mock to update, what test-set name
## Anti-patterns (refuse these)
-- Editing handler code on a Case-2-shaped failure (contract changed intentionally). The test data is what's stale—update it on the branch instead.
-- Writing to `main` (any tool that omits `branch_id`). Always branch-first.
-- Re-recording to absorb a failure without first reading the diff and deciding the route. Re-record only when Route C applies.
-- Inventing a PAT, branch name, or secret value.
-````
-
-Save the file and fully restart your editor so the skill / rules / memory entry is available in your next session.
-
----
-
-## Step 3—Use the two prompts
-
-That's it. From now on, you only ever type one of:
-
-> **"my keploy cloud replay is failing, please analyse and fix it."**
-
-_or, when the failure was in CI:_
-
-> **"the keploy cloud replay pipeline is failing, please analyse and fix it."**
-
-or
-
-> **"Add new keploy tests for my changes."**
-
-What happens behind the scenes for each:
-
-### Prompt A—analyse and fix a failing replay (local or CI)
-
-| Phase | What the agent does |
-| ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| A0 | Resolve `app_id` from `basename $(pwd)` + `listApps`. Resolve `branch_id` from `git rev-parse --abbrev-ref HEAD` + `create_branch`. |
-| A1 | Get a `test_run_id` to fetch the report against. Local form → `listTestReports({appId: app_id, branch_id, status: "FAILED", limit: 5})` and take the most recent run's id. CI form → extract `test_run_id` from the CI log or dashboard URL the dev pasted (falls back to the local lookup with `source: "ci"` if nothing was pasted). |
-| A2 | Fetch the full report (`getTestReportFull({appId: app_id, reportId: test_run_id})`). Returns roll-up + every test set + per-test-case `oss_report.req`/`resp`/`result`/`mock_mismatches`/`failure_info`/`noise` in one round-trip. Use `mock_mismatches_only=true` to scope to mock-driven failures on large runs. |
-| A3 | Per failing test case, decide Case 1 (bug in the app—recent commit broke it, test is still correct) or Case 2 (app behavior drifted intentionally—test data is stale, with sub-actions 2a noise / 2a response edit / 2b mock edit / 2b delete + re-record). Decision is from `git log` / `git diff` plus the report's `oss_report.result` diff and `oss_report.mock_mismatches`, never from a dev question. |
-| A4 | For Case 1: announce the file:line and a one-line description, then edit the handler code so the dev can stop the agent if they object. For Case 2a: `updateTestCase` to add noise on a non-deterministic field, or to update the recorded `response` body. For Case 2b: `update_mock` on the affected mock, or—if the baseline is too far gone—`delete_recording` and re-record via Routine B's flow. Either way, re-run `keploy cloud replay --branch-name` to verify. |
-| A5 | Report: diagnosis table (case per test case) + fixes applied + next-step-for-you + branch-diff URL + run-report URL. |
-
-### Prompt B—author new keploy tests
-
-| Phase | What the agent does |
-| ----- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| B0 | Discovery (same as A0). |
-| B1 | `git diff origin/main...HEAD` to find handler files that changed; extract added/modified endpoints. |
-| B2 | Pre-flight: discover the dev's run command from the repo (Makefile → docker-compose.yml → Procfile → package.json → README), start the app, curl any 200-returning endpoint to confirm it's serving traffic, stop it. Then run `keploy record -c "" --sync`, drive a realistic curl per new endpoint, stop the recorder. Recording lands at `keploy/test-set-N/`. |
-| B3 | `keploy upload test-set --app --branch --test-set keploy/test-set-N --name ` to land the bundle on the Keploy branch. |
-| B4 | `keploy cloud replay --app --cluster "" --branch-name -c "" --container-name --disableReportUpload=false` to validate locally (drop the local flags in CI / active-cluster). On failure, drop into Routine A. |
-| B5 | Report: captured endpoints table + replay result + next-step (open PR) + branch-diff URL + run-report URL. |
-
-For everything not covered by these two prompts—manually inspecting test data, editing one mock, listing recordings—use the manual flow on the [Developer Workflow](/docs/quickstart/k8s-proxy-developer-workflow) page directly. The two-prompt workflow handles the 90% case; the manual flow is the escape hatch.
-
----
-
-## Putting it together
-
-Here are the typical scenarios the agent handles—one per case it decides between. Every one starts with the same two-prompt UX and ends with the dev pushing once CI catches up. The variable bit is what the agent does in the middle.
-
-### Scenario 1—App regression (Case 1)
-
-You merged a refactor that accidentally broke the price calculation on `/orders/{id}`. The test still expects the right total.
-
-> _"my keploy cloud replay is failing, please analyse and fix it."_
-
-A0 → A1 (latest failed run) → A2 (report shows `total_amount: 0` vs expected `99.99`). A3 sees your recent commit on the price-calc helper and the test's authored response is still correct → **Case 1**. A4 announces the edit at `pkg/order/calc.go:42`—restoring the line-item subtotal branch—then applies the fix and re-runs replay (green). A5 reports the edit + URLs.
-
-### Scenario 2—Test data drift on the response (Case 2a, response edit)
-
-You renamed a response field from `username` to `display_name` on `/users/{id}` on purpose. CI replay now fails because the recorded response still says `username`.
-
-> _"the keploy cloud replay pipeline is failing, please analyse and fix it."_
-
-A3 sees the rename commit and the recorded `oss_report.result.body_result[].expected` still pinned to `username` → **Case 2a**. A4 calls `updateTestCase` to swap the field name on the recorded response, re-runs replay (green). A5 reports the test edit + URLs.
-
-### Scenario 3—Test data drift, non-deterministic field (Case 2a, noise)
-
-The replay started failing on `$.created_at`—a timestamp that differs each run. No code changes near it.
-
-> _"my keploy cloud replay is failing, please analyse and fix it."_
-
-A3 sees the diverging field is genuinely time-varying with no related commit → **Case 2a (noise)**. A4 calls `updateTestCase` to add `$.created_at` to that test case's noise map; replay re-runs green.
-
-### Scenario 4—Mock drift from a DB query change (Case 2b, mock edit)
-
-You added a `discount_percent` column to the orders table and updated the `SELECT` to return it. The handler emits the new field, the test expects it, but the recorded mock for the DB call still has the old shape.
-
-> _"my keploy cloud replay is failing, please analyse and fix it."_
-
-A3 sees the schema-change commit and `mock_mismatches` on the SELECT row → **Case 2b**. A4 calls `update_mock` to add `discount_percent` to the mock spec; replay re-runs green. A5 reports the mock edit + URLs.
-
-### Scenario 5—Mock too far gone, full re-record (Case 2b, fallback)
-
-A downstream gRPC client was swapped for HTTP; the recorded mocks are protobuf bytes that no longer apply.
-
-> _"my keploy cloud replay is failing, please analyse and fix it."_
-
-A3 → **Case 2b**. A4 tries one `update_mock` edit—it doesn't pass. The agent falls back: `delete_recording` on the affected test set, then re-records via Routine B's flow (pre-flight → `keploy record -c "" --sync` → curl → `keploy upload test-set --branch `). Replay re-runs green.
-
-### Scenario 6—Adding tests for a new endpoint (Routine B)
-
-You added `POST /coupons/redeem`.
-
-> _"Add new keploy tests for my changes."_
-
-B0 → B1 (`git diff origin/main...HEAD` surfaces the new route). B2 pre-flight: agent finds `make run` in the Makefile, brings the app up, `curl /health` returns 200, stops it. Then `keploy record -c "make run" --sync`, curls `POST /coupons/redeem` with a realistic body, stops the recorder. B3 uploads via `keploy upload test-set --app --branch --name coupons-redeem`. B4 replay returns 1/1 passed. B5 reports the captured endpoint + URLs.
-
----
-
-Across every scenario, you only ever spoke one of two sentences. You push your code change (and, for Case 1, the agent's app-side edit). CI replays the branch on the PR; merge runs `keploy cloud branch-merge` and the test data lands on main.
-
-For the same flow done manually (CLI / dashboard, no agent), see [Developer Workflow with Keploy Proxy](/docs/quickstart/k8s-proxy-developer-workflow).
+- **Editing handler code on a Case-2 failure.** Contract changed intentionally → fix test data on the branch, not source.
+- **Rewriting deliberate code into non-IO equivalents to satisfy stale mocks** (mutex for `SELECT … FOR UPDATE`, local cache for Redis, hardcoded value for HTTP call). Mutates prod behaviour to pass tests; often regresses the safety property the commit added (process-local mutex doesn't survive replicas).
+- **`delete_recording` as the first action of 2b.** Order is record → upload → delete. Delete-first empties the branch; next replay "passes" trivially (zero tests = zero failures).
+- **Hand-editing local `keploy//` files.** That dir is re-downloaded each replay; edits are overwritten. Use CLI / MCP write paths.
+- **`keploy upload test-set` to re-publish edited mocks.** Upload is for landing fresh recordings only — it creates a duplicate test set, not a replacement. If `keploy mock patch` + `getMock` confirm the write but replay still fails, that's a matcher defect to report.
+- **Editing anything outside the application source tree.** No `Dockerfile*` / `docker-compose*` / `keploy.yml` / `.env*` / k8s manifests / CI workflows; no env-var-driven runtime bypass branches. Real code fix or test-data fix — nothing in between.
+- **Flipping CLI flags to make a failure go away** (`--freezeTime=false`, `--envs FOO=bar`, `--mocking=false`, `--ignoreOrdering=true`). Always a test-data problem instead.
+- **Writing to `main`** (any tool that omits `branch_id`).
+- **Uploading fixtures from another branch onto the current branch.** Fixtures are branch-scoped — they encode app-state assumptions of where they were captured. Re-record against THIS branch instead.
+- **Uploading fresh recordings without checking existing branch coverage first.** `listRecordings({app_id, branch_id})` + targeted `getMock` first; reuse if covered.
+- **Inventing a PAT, branch name, or secret value.**
+- **Running `keploy --help`, `keploy --help`, or any `--version` info dump.** This skill names every command + flag you need (`keploy cloud replay`, `keploy mock patch`, `keploy record`, `keploy upload test-set`). The CLI's help text is ~14k tokens and gets re-added to context on every subsequent turn — pure waste.
+- **Reading `keploy/cloud-debug.log`, `keploy-logs.txt`, or any file under the local `keploy/` cache directory.** That dir is throwaway state wiped on every replay; the cloud-debug.log alone is ~25k tokens. Use `getTestReportFull` for structured failure data — never inspect the raw debug log.