[Test] Web acceptance stability#4506
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR refactors Playwright acceptance tests to reduce network-response dependencies and adds comprehensive tests for the "Use API" documentation drawer. Changes include member-invite modal helpers, prompt-registry UI navigation, new Use API snippet tests for variants and deployments, and a slug validation assertion in evaluator creation. ChangesUse API Snippet Tests
Test Refactoring to UI-Based Waiting
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
web/oss/tests/playwright/acceptance/use-api/index.ts (1)
111-143: 💤 Low valueConsolidate the two drawer-open helpers.
openVariantUseApiDrawerandopenDeploymentUseApiDrawerare identical except for the button locator. Consider parameterizing to remove the duplicated drawer-resolution block.♻️ Proposed consolidation
-const openVariantUseApiDrawer = async (page: any) => { - await page.waitForLoadState("networkidle") - const useApiButton = page.locator('[data-tour="api-code-button"]') - await expect(useApiButton).toBeVisible({timeout: 15000}) - await expect(useApiButton).toBeEnabled({timeout: 5000}) - await useApiButton.click() - - const drawer = page.locator(".ant-drawer-content-wrapper").filter({ - hasText: "How to use API", - }) - await expect(drawer).toBeVisible({timeout: 20000}) - return drawer -} - -const openDeploymentUseApiDrawer = async (page: any) => { - await page.waitForLoadState("networkidle") - const useApiButton = page.getByRole("button", {name: "Use API"}).first() - await expect(useApiButton).toBeVisible({timeout: 15000}) - await expect(useApiButton).toBeEnabled({timeout: 5000}) - await useApiButton.click() - - const drawer = page.locator(".ant-drawer-content-wrapper").filter({ - hasText: "How to use API", - }) - await expect(drawer).toBeVisible({timeout: 20000}) - return drawer -} +const openUseApiDrawer = async (page: any, useApiButton: any) => { + await expect(useApiButton).toBeVisible({timeout: 15000}) + await expect(useApiButton).toBeEnabled({timeout: 5000}) + await useApiButton.click() + + const drawer = page.locator(".ant-drawer-content-wrapper").filter({ + hasText: "How to use API", + }) + await expect(drawer).toBeVisible({timeout: 20000}) + return drawer +} + +const openVariantUseApiDrawer = (page: any) => + openUseApiDrawer(page, page.locator('[data-tour="api-code-button"]')) + +const openDeploymentUseApiDrawer = (page: any) => + openUseApiDrawer(page, page.getByRole("button", {name: "Use API"}).first())web/oss/tests/playwright/acceptance/evaluators/tests.ts (1)
354-355: ⚡ Quick winUpdate slug assertion to account for real slug transformation (UI uses
slugify)The slug field in
CreateEvaluatoris set from the (debounced) evaluator name via aslugifyhelper (toLowerCase()andreplace(/[^a-z0-9_\-]+/g, "-"), plus trimming/collapsing dashes). So the assertion should not assume a no-op transformation in general.That said, the current tests pass
evaluatorNamevalues likee2e-human-eval-${Date.now()}(already lowercase and hyphen-safe), so it matches the slug output today. For robustness, assert againstslugify(evaluatorName)instead ofevaluatorName.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 685200ae-6c11-485f-8a7f-e0a67aa8fa47
📒 Files selected for processing (8)
web/ee/tests/playwright/acceptance/members/index.tsweb/ee/tests/playwright/acceptance/use-api/use-api.spec.tsweb/oss/tests/playwright/10-use-api.tsweb/oss/tests/playwright/acceptance/evaluators/tests.tsweb/oss/tests/playwright/acceptance/features/use-api.featureweb/oss/tests/playwright/acceptance/prompt-registry/index.tsweb/oss/tests/playwright/acceptance/use-api/index.tsweb/oss/tests/playwright/acceptance/use-api/use-api.spec.ts
Railway Preview Environment
Updated at 2026-06-20T23:03:22.241Z |
…iness Without --max-time/--connect-timeout, curl could hang indefinitely when a Railway preview accepted the TCP connection but stalled before sending an HTTP response. This caused the wait-for-readiness job to block for hours instead of cycling through its 30-attempt loop. - Add --max-time 10 --connect-timeout 5 to each curl attempt so the loop is always bounded (~10 min max across both URL checks). - Add timeout-minutes: 25 to the job as a defence-in-depth backstop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n reinstall playwright install --with-deps chromium was downloading 170MB and running apt-get for hundreds of X11/font packages on every run, taking ~17 minutes. This left almost no time for the auth bootstrap within the job timeout. Cache ~/.cache/ms-playwright keyed on the tests package.json hash so the browser binary is restored on cache hits (subsequent runs). playwright install still runs after the cache restore — it detects the binary is present and skips the download, but still verifies/installs any missing system deps via apt which is fast when packages are already cached by the runner. Also bumps the job timeout from 25 to 30 minutes to give the first (cold) run enough headroom. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--with-deps runs apt-get for ~200 X11/font/GTK packages on top of the browser download, adding 25+ minutes on cold runs and causing the job to time out before the auth bootstrap could run. The ubuntu-latest runner already has Chromium's core runtime libraries; the auth bootstrap (login form + save cookies) doesn't need the full X11/font stack. Removing --with-deps cuts the install from ~29 min to ~1-2 min (binary download only, skipped entirely on cache hits). Timeout reduced to 15 min to match the realistic job budget: - URL health check: ≤5 min - checkout + node + pnpm: ~2 min - playwright install (binary only): ~1-2 min - auth bootstrap: ~3 min Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 222c839e25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…laky checkbox failures
EntityTable renders <TableLoadingState> (no thead/checkboxes) while fetching
after a testset is selected. The old getByText("Germany") check could match
stale DOM before the new testset's rows arrived, causing the subsequent
.check() on .ant-table-thead to time out. Scoping to .ant-table-row is the
authoritative signal that real rows — and therefore the full table DOM —
are present. Also corrects the comment: the search is client-side filtering,
not a server-side query change.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d41267af27
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }) | ||
|
|
||
| await page.goto(`${getProjectScopedBasePath(page)}/testsets`, {waitUntil: "domcontentloaded"}) | ||
| await page.goto("/testsets", {waitUntil: "domcontentloaded"}) |
There was a problem hiding this comment.
Use the project-scoped testsets route
When a test calls apiHelpers.getTestsets(), this now navigates to the unscoped /testsets path, but the app only defines the testsets page under /w/[workspace_id]/p/[project_id]/testsets and the existing feature notes that direct /testsets returns 404. In that scenario the waitForApiResponse listener above never sees /api/testsets/query, so the helper times out instead of returning testsets; keep using getProjectScopedBasePath(page) here.
Useful? React with 👍 / 👎.
AntD 6 renders checkbox inputs with opacity:0, which Chrome's
accessibility tree excludes — causing getByRole("checkbox") to return
0 elements and time out. Switch to .ant-checkbox-input (CSS class) to
bypass the ARIA tree; Playwright 1.60 considers opacity:0 elements
visible for CSS locators so check() works without force.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ead of using AntD checkboxes
…ed count in dialog
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5933b1cfd1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| console.log( | ||
| "[global-setup] Ephemeral project disabled (AGENTA_TEST_EPHEMERAL_PROJECT=false)", | ||
| ) | ||
| writeProjectMetadata(projectMetadataPath, defaultProject, page, null) |
There was a problem hiding this comment.
Avoid cleanup metadata for reused projects
When AGENTA_TEST_EPHEMERAL_PROJECT=false, this writes the existing default project into the same metadata file that global-teardown.ts treats as an ephemeral project and unconditionally deletes (deleteEphemeralProject reads project_id and issues DELETE /projects/{id}). That makes disabled-isolation runs, and the similar create-failure fallback below, attempt to delete the real/default project after the suite; skip writing cleanup metadata for reused projects or mark it so teardown does not delete it.
Useful? React with 👍 / 👎.
| }) | ||
|
|
||
| await page.goto(`${getProjectScopedBasePath(page)}/evaluations`, { | ||
| await page.goto("/evaluations", { |
There was a problem hiding this comment.
Keep evaluation runs navigation project-scoped
When a test calls apiHelpers.getEvaluationRuns(), this now opens the unscoped /evaluations path, but the app only defines the evaluations page under the project route (web/oss/src/pages/w/[workspace_id]/p/[project_id]/evaluations/index.tsx:3). In that context the page 404s instead of firing /api/evaluations/runs/query, so the response waiter never resolves; keep this navigation under getProjectScopedBasePath(page).
Useful? React with 👍 / 👎.
Summary
Testing
Verified locally
Added or updated tests
QA follow-up
Demo
Checklist
Contributor Resources