playwright-e2e-tests
Original:🇺🇸 English
Translated
Write and maintain Playwright end-to-end tests for the Onyx application. Use when creating new E2E tests, debugging test failures, adding test coverage, or when the user mentions Playwright, E2E tests, or browser testing.
2installs
Sourceonyx-dot-app/onyx
Added on
NPX Install
npx skill4agent add onyx-dot-app/onyx playwright-e2e-testsTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Playwright E2E Tests
Project Layout
- Tests: — organized by feature (
web/tests/e2e/,auth/,admin/,chat/,assistants/,connectors/)mcp/ - Config:
web/playwright.config.ts - Utilities:
web/tests/e2e/utils/ - Constants:
web/tests/e2e/constants.ts - Global setup:
web/tests/e2e/global-setup.ts - Output:
web/output/playwright/
Imports
Always use absolute imports with the prefix — never relative paths (, ). The alias is defined in and resolves to .
@tests/e2e/../../../web/tsconfig.jsonweb/tests/typescript
import { loginAs } from "@tests/e2e/utils/auth";
import { OnyxApiClient } from "@tests/e2e/utils/onyxApiClient";
import { TEST_ADMIN_CREDENTIALS } from "@tests/e2e/constants";All new files should be , not .
.ts.jsRunning Tests
bash
# Run a specific test file
npx playwright test web/tests/e2e/chat/default_assistant.spec.ts
# Run a specific project
npx playwright test --project admin
npx playwright test --project exclusiveTest Projects
| Project | Description | Parallelism |
|---|---|---|
| Standard tests (excludes | Parallel |
| Serial, slower tests (tagged | 1 worker |
All tests use storage state by default (pre-authenticated admin session).
admin_auth.jsonAuthentication
Global setup () runs automatically before all tests and handles:
global-setup.ts- Server readiness check (polls health endpoint, 60s timeout)
- Provisioning test users: admin, admin2, and a pool of worker users (through
worker0@example.com) (idempotent)worker7@example.com - API login + saving storage states: ,
admin_auth.json, andadmin2_auth.jsonfor each worker userworker{N}_auth.json - Setting display name to for each worker user
"worker" - Promoting admin2 to admin role
- Ensuring a public LLM provider exists
Both test projects set , so every test starts pre-authenticated as admin with no login code needed.
storageState: "admin_auth.json"When a test needs a different user, use API-based login — never drive the login UI:
typescript
import { loginAs } from "@tests/e2e/utils/auth";
await page.context().clearCookies();
await loginAs(page, "admin2");
// Log in as the worker-specific user (preferred for test isolation):
import { loginAsWorkerUser } from "@tests/e2e/utils/auth";
await page.context().clearCookies();
await loginAsWorkerUser(page, testInfo.workerIndex);Test Structure
Tests start pre-authenticated as admin — navigate and test directly:
typescript
import { test, expect } from "@playwright/test";
test.describe("Feature Name", () => {
test("should describe expected behavior clearly", async ({ page }) => {
await page.goto("/app");
await page.waitForLoadState("networkidle");
// Already authenticated as admin — go straight to testing
});
});User isolation — tests that modify visible app state (creating assistants, sending chat messages, pinning items) should run as a worker-specific user and clean up resources in . Global setup provisions a pool of worker users ( through ). maps to a pool slot via modulo, so retry workers (which get incrementing indices beyond the pool size) safely reuse existing users. This ensures parallel workers never share user state, keeps usernames deterministic for screenshots, and avoids cross-contamination:
afterAllworker0@example.comworker7@example.comloginAsWorkerUsertestInfo.workerIndextypescript
import { test } from "@playwright/test";
import { loginAsWorkerUser } from "@tests/e2e/utils/auth";
test.beforeEach(async ({ page }, testInfo) => {
await page.context().clearCookies();
await loginAsWorkerUser(page, testInfo.workerIndex);
});If the test requires admin privileges and modifies visible state, use instead — it's a pre-provisioned admin account that keeps the primary clean for other parallel tests. Switch to only for privileged setup (creating providers, configuring tools), then back to the worker user for the actual test. See for a full example.
"admin2""admin""admin"chat/default_assistant.spec.tsloginAsRandomUserAPI resource setup — only when tests need to create backend resources (image gen configs, web search providers, MCP servers). Use / with to create and clean up. See or for examples. This is uncommon (~4 of 37 test files).
beforeAllafterAllOnyxApiClientchat/default_assistant.spec.tsmcp/mcp_oauth_flow.spec.tsKey Utilities
OnyxApiClient
(@tests/e2e/utils/onyxApiClient
)
OnyxApiClient@tests/e2e/utils/onyxApiClientBackend API client for test setup/teardown. Key methods:
- Connectors: ,
createFileConnector(),deleteCCPair()pauseConnector() - LLM Providers: ,
ensurePublicProvider(),createRestrictedProvider()setProviderAsDefault() - Assistants: ,
createAssistant(),deleteAssistant()findAssistantByName() - User Groups: ,
createUserGroup(),deleteUserGroup()setUserRole() - Tools: ,
createWebSearchProvider()createImageGenerationConfig() - Chat: ,
createChatSession()deleteChatSession()
chatActions
(@tests/e2e/utils/chatActions
)
chatActions@tests/e2e/utils/chatActions- — sends a message and waits for AI response
sendMessage(page, message) - — clicks new-chat button and waits for intro
startNewChat(page) - — checks Onyx logo is visible
verifyDefaultAssistantIsChosen(page) - — checks assistant name display
verifyAssistantIsChosen(page, name) - — switches LLM model via popover
switchModel(page, modelName)
visualRegression
(@tests/e2e/utils/visualRegression
)
visualRegression@tests/e2e/utils/visualRegressionexpectScreenshot(page, { name, mask?, hide?, fullPage? })expectElementScreenshot(locator, { name, mask?, hide? })- Controlled by env var
VISUAL_REGRESSION=true
theme
(@tests/e2e/utils/theme
)
theme@tests/e2e/utils/theme- —
THEMESarray for iterating over both themes["light", "dark"] as const - — sets
setThemeBeforeNavigation(page, theme)theme vianext-themesbefore navigationlocalStorage
When tests need light/dark screenshots, loop over at the level and call in before any . Include the theme in screenshot names. See or for examples:
THEMEStest.describesetThemeBeforeNavigationbeforeEachpage.goto()admin/admin_pages.spec.tschat/chat_message_rendering.spec.tstypescript
import { THEMES, setThemeBeforeNavigation } from "@tests/e2e/utils/theme";
for (const theme of THEMES) {
test.describe(`Feature (${theme} mode)`, () => {
test.beforeEach(async ({ page }) => {
await setThemeBeforeNavigation(page, theme);
});
test("renders correctly", async ({ page }) => {
await page.goto("/app");
await expectScreenshot(page, { name: `feature-${theme}` });
});
});
}tools
(@tests/e2e/utils/tools
)
tools@tests/e2e/utils/tools- — centralized
TOOL_IDSselectors for tool optionsdata-testid - — opens the tool management popover
openActionManagement(page)
Locator Strategy
Use locators in this priority order:
-
/
data-testid— preferred for Onyx componentsaria-labeltypescriptpage.getByTestId("AppSidebar/new-session") page.getByLabel("admin-page-title") -
Role-based — for standard HTML elementstypescript
page.getByRole("button", { name: "Create" }) page.getByRole("dialog") -
Text/Label — for visible text contenttypescript
page.getByText("Custom Assistant") page.getByLabel("Email") -
CSS selectors — last resort, only when above won't worktypescript
page.locator('input[name="name"]') page.locator("#onyx-chat-input-textarea")
Never use with complex CSS/XPath when a built-in locator works.
page.locatorAssertions
Use web-first assertions — they auto-retry until the condition is met:
typescript
// Visibility
await expect(page.getByTestId("onyx-logo")).toBeVisible({ timeout: 5000 });
// Text content
await expect(page.getByTestId("assistant-name-display")).toHaveText("My Assistant");
// Count
await expect(page.locator('[data-testid="onyx-ai-message"]')).toHaveCount(2, { timeout: 30000 });
// URL
await expect(page).toHaveURL(/chatId=/);
// Element state
await expect(toggle).toBeChecked();
await expect(button).toBeEnabled();Never use statements or hardcoded .
assertpage.waitForTimeout()Waiting Strategy
typescript
// Wait for load state after navigation
await page.goto("/app");
await page.waitForLoadState("networkidle");
// Wait for specific element
await page.getByTestId("chat-intro").waitFor({ state: "visible", timeout: 10000 });
// Wait for URL change
await page.waitForFunction(() => window.location.href.includes("chatId="), null, { timeout: 10000 });
// Wait for network response
await page.waitForResponse(resp => resp.url().includes("/api/chat") && resp.status() === 200);Best Practices
- Descriptive test names — clearly state expected behavior:
"should display greeting message when opening new chat" - API-first setup — use for backend state; reserve UI interactions for the behavior under test
OnyxApiClient - User isolation — tests that modify visible app state (sidebar, chat history) should run as the worker-specific user via (not admin) and clean up resources in
loginAsWorkerUser(page, testInfo.workerIndex). Each parallel worker gets its own user, preventing cross-contamination. ReserveafterAllfor flows that require a brand-new user (e.g. onboarding)loginAsRandomUser - DRY helpers — extract reusable logic into with JSDoc comments
utils/ - No hardcoded waits — use ,
waitFor, or web-first assertionswaitForLoadState - Parallel-safe — no shared mutable state between tests. Prefer static, human-readable names (e.g. ) and clean up resources by ID in
"E2E-CMD Chat 1". This keeps screenshots deterministic and avoids needing to mask/hide dynamic text. Only fall back to timestamps (afterAlltest-${Date.now()}``) when resources cannot be reliably cleaned up or when name collisions across parallel workers would cause functional failures\ - Error context — catch and re-throw with useful debug info (page text, URL, etc.)
- Tag slow tests — mark serial/slow tests with in the test title
@exclusive - Visual regression — use for UI consistency checks
expectScreenshot() - Minimal comments — only comment to clarify non-obvious intent; never restate what the next line of code does