[Test] Web acceptance stability by bekossy · Pull Request #4506 · Agenta-AI/agenta

bekossy · 2026-05-31T20:04:32Z

Summary

Testing

Verified locally

Added or updated tests

QA follow-up

Demo

Checklist

I have included a video or screen recording for UI changes, or marked Demo as N/A
Relevant tests pass locally
Relevant linting and formatting pass locally
I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

vercel · 2026-05-31T20:04:37Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 20, 2026 11:04pm

coderabbitai · 2026-05-31T20:04:39Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c0d7a899-4c71-48fa-bb99-285d80066974

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR refactors Playwright acceptance tests to reduce network-response dependencies and adds comprehensive tests for the "Use API" documentation drawer. Changes include member-invite modal helpers, prompt-registry UI navigation, new Use API snippet tests for variants and deployments, and a slug validation assertion in evaluator creation.

Changes

Use API Snippet Tests

Layer / File(s)	Summary
Acceptance test helpers and variants scenario `web/oss/tests/playwright/acceptance/use-api/index.ts` (lines 1–223)	Imports, acceptance tags, and helper functions (`deployFirstVariantToDevelopment`, `openVariantUseApiDrawer`, `openDeploymentUseApiDrawer`, `switchToTypescriptTab`) for managing drawers and variant deployments. First test validates TypeScript snippet content on the Variants registry page, asserting the snippet includes `application_variant_ref` and `axios.post`.
Deployments scenario and export `web/oss/tests/playwright/acceptance/use-api/index.ts` (lines 224–297)	Second test creates a completion app, deploys its first variant to Development, then validates the Use API drawer on the Deployments page includes `environment_ref` and `axios.post`. Exports `useApiTests` suite.
Feature file and spec registration `web/oss/tests/playwright/acceptance/features/use-api.feature`, `web/ee/tests/playwright/acceptance/use-api/use-api.spec.ts`, `web/oss/tests/playwright/acceptance/use-api/use-api.spec.ts`, `web/oss/tests/playwright/10-use-api.ts`	Cucumber feature file defines two TypeScript scenarios. EE and OSS specs register the `useApiTests` suite via `test.describe`. OSS test index exports the suite for test discovery.

Test Refactoring to UI-Based Waiting

Layer / File(s)	Summary
Members invite modal helpers `web/ee/tests/playwright/acceptance/members/index.ts`	Removes `waitForInviteResponse`. Adds `openInviteMembersModal` (retry loop + email input visibility) and `submitInviteMembersModal` (form submission + modal dismissal). Updates `invitePendingMember` and test to use modal-based flow instead of network response waiting.
Prompt registry UI navigation `web/oss/tests/playwright/acceptance/prompt-registry/index.ts`	Narrows `PromptRegistryApiHelpers` to remove `waitForApiResponse`. Refactors `openWorkflowRevisionsPage` to navigate UI only. Rewrites `openFirstPublishedWorkflowRevision` to click version labels and extract `revisionId` from URL. Removes API response state from test scenario.
Evaluators slug assertion `web/oss/tests/playwright/acceptance/evaluators/tests.ts`	Adds assertion in `createHumanEvaluatorFromDrawer` that the "unique slug" input matches the provided `evaluatorName`, with a 5-second timeout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Agenta-AI/agenta#4308: Both PRs modify the Members invite Playwright flow in the EE tests, refactoring how invite submission is synchronized; this PR removes network-wait helpers in favor of modal-based ones.
Agenta-AI/agenta#4458: Overlaps with the EE members invite refactoring by replacing network/response waits with UI readiness checks in the same file.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is a template with empty sections and placeholder text; it does not provide substantive information about what was changed or why.	Fill in the template sections with actual details: describe the refactored invite flow and new use-api tests, explain why the changes improve stability, and document what was tested.
Title check	❓ Inconclusive	The title '[Test] Web acceptance stability' is vague and generic, using non-descriptive terms that don't convey specific information about the actual changes made.	Consider a more specific title like '[Test] Refactor member invite flow and add use-api snippets tests' that reflects the actual changes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test-/-web-acceptance-stability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

web/oss/tests/playwright/acceptance/use-api/index.ts (1)
111-143: 💤 Low value

Consolidate the two drawer-open helpers.

openVariantUseApiDrawer and openDeploymentUseApiDrawer are identical except for the button locator. Consider parameterizing to remove the duplicated drawer-resolution block.
♻️ Proposed consolidation
-const openVariantUseApiDrawer = async (page: any) => {
-    await page.waitForLoadState("networkidle")
-    const useApiButton = page.locator('[data-tour="api-code-button"]')
-    await expect(useApiButton).toBeVisible({timeout: 15000})
-    await expect(useApiButton).toBeEnabled({timeout: 5000})
-    await useApiButton.click()
-
-    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
-        hasText: "How to use API",
-    })
-    await expect(drawer).toBeVisible({timeout: 20000})
-    return drawer
-}
-
-const openDeploymentUseApiDrawer = async (page: any) => {
-    await page.waitForLoadState("networkidle")
-    const useApiButton = page.getByRole("button", {name: "Use API"}).first()
-    await expect(useApiButton).toBeVisible({timeout: 15000})
-    await expect(useApiButton).toBeEnabled({timeout: 5000})
-    await useApiButton.click()
-
-    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
-        hasText: "How to use API",
-    })
-    await expect(drawer).toBeVisible({timeout: 20000})
-    return drawer
-}
+const openUseApiDrawer = async (page: any, useApiButton: any) => {
+    await expect(useApiButton).toBeVisible({timeout: 15000})
+    await expect(useApiButton).toBeEnabled({timeout: 5000})
+    await useApiButton.click()
+
+    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
+        hasText: "How to use API",
+    })
+    await expect(drawer).toBeVisible({timeout: 20000})
+    return drawer
+}
+
+const openVariantUseApiDrawer = (page: any) =>
+    openUseApiDrawer(page, page.locator('[data-tour="api-code-button"]'))
+
+const openDeploymentUseApiDrawer = (page: any) =>
+    openUseApiDrawer(page, page.getByRole("button", {name: "Use API"}).first())
web/oss/tests/playwright/acceptance/evaluators/tests.ts (1)

354-355: ⚡ Quick win

Update slug assertion to account for real slug transformation (UI uses slugify)

The slug field in CreateEvaluator is set from the (debounced) evaluator name via a slugify helper (toLowerCase() and replace(/[^a-z0-9_\-]+/g, "-"), plus trimming/collapsing dashes). So the assertion should not assume a no-op transformation in general.

That said, the current tests pass evaluatorName values like e2e-human-eval-${Date.now()} (already lowercase and hyphen-safe), so it matches the slug output today. For robustness, assert against slugify(evaluatorName) instead of evaluatorName.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 685200ae-6c11-485f-8a7f-e0a67aa8fa47

📥 Commits

Reviewing files that changed from the base of the PR and between a269527 and 58e6bcb.

📒 Files selected for processing (8)

web/ee/tests/playwright/acceptance/members/index.ts
web/ee/tests/playwright/acceptance/use-api/use-api.spec.ts
web/oss/tests/playwright/10-use-api.ts
web/oss/tests/playwright/acceptance/evaluators/tests.ts
web/oss/tests/playwright/acceptance/features/use-api.feature
web/oss/tests/playwright/acceptance/prompt-registry/index.ts
web/oss/tests/playwright/acceptance/use-api/index.ts
web/oss/tests/playwright/acceptance/use-api/use-api.spec.ts

github-actions · 2026-05-31T20:26:55Z

Railway Preview Environment


Status	Destroyed (PR converted to draft)

Updated at 2026-06-20T23:03:22.241Z

…iness Without --max-time/--connect-timeout, curl could hang indefinitely when a Railway preview accepted the TCP connection but stalled before sending an HTTP response. This caused the wait-for-readiness job to block for hours instead of cycling through its 30-attempt loop. - Add --max-time 10 --connect-timeout 5 to each curl attempt so the loop is always bounded (~10 min max across both URL checks). - Add timeout-minutes: 25 to the job as a defence-in-depth backstop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n reinstall playwright install --with-deps chromium was downloading 170MB and running apt-get for hundreds of X11/font packages on every run, taking ~17 minutes. This left almost no time for the auth bootstrap within the job timeout. Cache ~/.cache/ms-playwright keyed on the tests package.json hash so the browser binary is restored on cache hits (subsequent runs). playwright install still runs after the cache restore — it detects the binary is present and skips the download, but still verifies/installs any missing system deps via apt which is fast when packages are already cached by the runner. Also bumps the job timeout from 25 to 30 minutes to give the first (cold) run enough headroom. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

--with-deps runs apt-get for ~200 X11/font/GTK packages on top of the browser download, adding 25+ minutes on cold runs and causing the job to time out before the auth bootstrap could run. The ubuntu-latest runner already has Chromium's core runtime libraries; the auth bootstrap (login form + save cookies) doesn't need the full X11/font stack. Removing --with-deps cuts the install from ~29 min to ~1-2 min (binary download only, skipped entirely on cache hits). Timeout reduced to 15 min to match the realistic job budget: - URL health check: ≤5 min - checkout + node + pnpm: ~2 min - playwright install (binary only): ~1-2 min - auth bootstrap: ~3 min Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 222c839e25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ject base path

…laky checkbox failures EntityTable renders <TableLoadingState> (no thead/checkboxes) while fetching after a testset is selected. The old getByText("Germany") check could match stale DOM before the new testset's rows arrived, causing the subsequent .check() on .ant-table-thead to time out. Scoping to .ant-table-row is the authoritative signal that real rows — and therefore the full table DOM — are present. Also corrects the comment: the search is client-side filtering, not a server-side query change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d41267af27

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-19T08:35:52Z

    })

-    await page.goto(`${getProjectScopedBasePath(page)}/testsets`, {waitUntil: "domcontentloaded"})
+    await page.goto("/testsets", {waitUntil: "domcontentloaded"})


Use the project-scoped testsets route

When a test calls apiHelpers.getTestsets(), this now navigates to the unscoped /testsets path, but the app only defines the testsets page under /w/[workspace_id]/p/[project_id]/testsets and the existing feature notes that direct /testsets returns 404. In that scenario the waitForApiResponse listener above never sees /api/testsets/query, so the helper times out instead of returning testsets; keep using getProjectScopedBasePath(page) here.

Useful? React with 👍 / 👎.

AntD 6 renders checkbox inputs with opacity:0, which Chrome's accessibility tree excludes — causing getByRole("checkbox") to return 0 elements and time out. Switch to .ant-checkbox-input (CSS class) to bypass the ARIA tree; Playwright 1.60 considers opacity:0 elements visible for CSS locators so check() works without force. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ead of using AntD checkboxes

…ed count in dialog

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5933b1cfd1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-20T23:08:04Z

+            console.log(
+                "[global-setup] Ephemeral project disabled (AGENTA_TEST_EPHEMERAL_PROJECT=false)",
+            )
+            writeProjectMetadata(projectMetadataPath, defaultProject, page, null)


Avoid cleanup metadata for reused projects

When AGENTA_TEST_EPHEMERAL_PROJECT=false, this writes the existing default project into the same metadata file that global-teardown.ts treats as an ephemeral project and unconditionally deletes (deleteEphemeralProject reads project_id and issues DELETE /projects/{id}). That makes disabled-isolation runs, and the similar create-failure fallback below, attempt to delete the real/default project after the suite; skip writing cleanup metadata for reused projects or mark it so teardown does not delete it.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-20T23:08:04Z

    })

-    await page.goto(`${getProjectScopedBasePath(page)}/evaluations`, {
+    await page.goto("/evaluations", {


Keep evaluation runs navigation project-scoped

When a test calls apiHelpers.getEvaluationRuns(), this now opens the unscoped /evaluations path, but the app only defines the evaluations page under the project route (web/oss/src/pages/w/[workspace_id]/p/[project_id]/evaluations/index.tsx:3). In that context the page 404s instead of firing /api/evaluations/runs/query, so the response waiter never resolves; keep this navigation under getProjectScopedBasePath(page).

Useful? React with 👍 / 👎.

bekossy added 2 commits May 31, 2026 20:27

refactor(tests): streamline invite member flow and remove unused code

85d05b6

feat(tests): add use API snippets tests for TypeScript integration

58e6bcb

bekossy marked this pull request as ready for review May 31, 2026 20:04

dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. Frontend tests labels May 31, 2026

vercel Bot deployed to Preview May 31, 2026 20:05 View deployment

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

Comment thread web/ee/tests/playwright/acceptance/members/index.ts

Comment thread web/oss/tests/playwright/acceptance/prompt-registry/index.ts

Comment thread web/oss/tests/playwright/acceptance/use-api/index.ts

bekossy changed the base branch from main to release/v0.100.9 May 31, 2026 20:14

Merge branch 'release/v0.100.9' into test-/-web-acceptance-stability

efe400d

vercel Bot deployed to Preview May 31, 2026 20:15 View deployment

vercel Bot deployed to Preview June 1, 2026 08:53 View deployment

vercel Bot deployed to Preview June 1, 2026 09:21 View deployment

Merge branch 'release/v0.100.9' into test-/-web-acceptance-stability

9caa9c0

vercel Bot deployed to Preview June 1, 2026 09:27 View deployment

vercel Bot deployed to Preview June 1, 2026 10:16 View deployment

Merge branch 'release/v0.100.9' into test-/-web-acceptance-stability

bfb21bf

vercel Bot deployed to Preview June 2, 2026 09:59 View deployment

bekossy changed the base branch from release/v0.100.9 to release/v0.101.0 June 3, 2026 07:47

Merge branch 'release/v0.101.0' into test-/-web-acceptance-stability

8bb2e27

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vercel Bot deployed to Preview June 3, 2026 10:13 View deployment

bekossy changed the base branch from release/v0.101.0 to release/v0.101.1 June 4, 2026 08:14

Merge branch 'release/v0.101.1' into test-/-web-acceptance-stability

c576690

vercel Bot deployed to Preview June 4, 2026 08:18 View deployment

vercel Bot deployed to Preview June 11, 2026 08:27 View deployment

bekossy changed the base branch from release/v0.103.2 to release/v0.103.5 June 14, 2026 10:38

Merge branch 'release/v0.103.5' into test-/-web-acceptance-stability

1b80d65

vercel Bot deployed to Preview June 14, 2026 10:39 View deployment

[fix] update button text and improve locator for Commit button in tests

222c839

vercel Bot deployed to Preview June 14, 2026 15:48 View deployment

bekossy marked this pull request as draft June 15, 2026 07:37

bekossy marked this pull request as ready for review June 15, 2026 07:37

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread ...ts/EvalRunDetails/components/views/SingleScenarioViewerPOC/ScenarioAnnotationPanel/index.tsx

bekossy changed the base branch from release/v0.103.5 to release/v0.104.0 June 18, 2026 07:42

Merge branch 'release/v0.104.0' into test-/-web-acceptance-stability

730d8fa

vercel Bot deployed to Preview June 18, 2026 07:43 View deployment

Merge branch 'release/v0.104.0' into test-/-web-acceptance-stability

a9c2305

vercel Bot deployed to Preview June 18, 2026 11:17 View deployment

refactor(api): simplify navigation to apps and testsets, avoiding pro…

5a038d3

…ject base path

vercel Bot deployed to Preview June 18, 2026 15:15 View deployment

bekossy added 2 commits June 19, 2026 08:27

fix(tests): ensure fresh server-side query for testsets in load dialog

d694a3f

refactor(api): update app navigation to use project scoped base path

18bd007

vercel Bot deployed to Preview June 19, 2026 06:29 View deployment

vercel Bot deployed to Preview June 19, 2026 07:53 View deployment

bekossy changed the base branch from release/v0.104.0 to release/v0.104.1 June 19, 2026 07:54

Merge branch 'release/v0.104.1' into test-/-web-acceptance-stability

d41267a

vercel Bot deployed to Preview June 19, 2026 07:56 View deployment

bekossy marked this pull request as draft June 19, 2026 08:29

chatgpt-codex-connector Bot reviewed Jun 19, 2026

View reviewed changes

bekossy and others added 3 commits June 19, 2026 12:13

fix(tests): update selection method for test cases to click rows inst…

e37db75

…ead of using AntD checkboxes

fix(tests): improve row selection logic and add assertions for select…

5933b1c

…ed count in dialog

chatgpt-codex-connector Bot reviewed Jun 20, 2026

View reviewed changes

Conversation

bekossy commented May 31, 2026

Summary

Testing

Verified locally

Added or updated tests

QA follow-up

Demo

Checklist

Contributor Resources

Uh oh!

vercel Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented May 31, 2026 •

edited

Loading

coderabbitai Bot commented May 31, 2026 •

edited

Loading

github-actions Bot commented May 31, 2026 •

edited

Loading