create_e2e_tests
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCreate Local YAML Tests
创建本地YAML测试用例
A spec-driven workflow that front-loads testing expertise through structured planning before any tests are written. Tests run with — no cloud infrastructure required.
npx shiplight test --headed这是一种基于规范的工作流,在编写任何测试用例之前,通过结构化规划提前融入测试专业知识。测试可通过运行——无需云基础设施。
npx shiplight test --headedWhen to use
使用场景
Use when the user wants to:
/create_e2e_tests- Create a new local test project from scratch
- Add YAML tests for a web application
- Set up authentication for a test project
- Plan what to test before writing tests
当用户有以下需求时,可使用:
/create_e2e_tests- 从零开始创建新的本地测试项目
- 为Web应用添加YAML测试用例
- 为测试项目配置认证
- 在编写测试用例前规划测试内容
Principles
核心原则
-
Always produce artifacts. Every phase writes a markdown file. Artifacts clarify your own thinking, give the user something to review, and guide later phases. When the user provides detailed requirements, use them as source material — skip questions already answered, but still produce the artifact.
-
Confirm before implementing. Present the spec (Phase 2 checkpoint) for user confirmation before spending time on browser-walking and test writing. Echo back your understanding as structured scenarios to catch mismatches early.
-
Each phase reads the previous phase's artifact. Discover feeds Specify, Specify feeds Plan, Plan feeds Implement, Implement feeds Verify. If an artifact exists from a prior run, offer to reuse it.
-
Escalate, don't loop. When something fails or is ambiguous, report it and ask the user rather than retrying silently.
-
始终生成产物:每个阶段都会生成一个Markdown文件。产物能明确你的思路,为用户提供可审阅的内容,并指导后续阶段。当用户提供详细需求时,将其作为素材——跳过已解答的问题,但仍需生成对应产物。
-
确认后再实施:在投入时间进行浏览器遍历和测试编写前,先向用户展示规范(阶段2检查点)以确认。将你的理解整理为结构化场景反馈给用户,尽早发现偏差。
-
各阶段衔接前序产物:探索阶段为规范阶段提供输入,规范阶段为规划阶段提供输入,规划阶段为实施阶段提供输入,实施阶段为验证阶段提供输入。如果之前运行已生成产物,可提议复用。
-
上报而非循环重试:当出现失败或歧义时,及时告知用户并询问,而非默默重试。
Phase Overview
阶段概览
Phase 1: Discover → test-strategy.md (understand the app & user goals)
Phase 2: Specify → test-spec.md (define what to test in Given/When/Then)
Phase 3: Plan → test-plan.md (prioritize, structure, per-test guidance)
Phase 4: Implement → *.test.yaml files (setup project, write tests, run them)
Phase 5: Verify → updated spec files (coverage check, reconcile spec ↔ tests)阶段1:探索 → test-strategy.md (了解应用与用户目标)
阶段2:规范 → test-spec.md (用Given/When/Then格式定义测试内容)
阶段3:规划 → test-plan.md (优先级排序、结构梳理、单测试指导)
阶段4:实施 → *.test.yaml文件 (搭建项目、编写测试、运行测试)
阶段5:验证 → 更新后的规范文件 (覆盖率检查、协调规范与测试用例)Fast-Track
快速通道
Check for existing artifacts before starting. The only way to skip artifact generation is if the user explicitly says so.
| Situation | Behavior |
|---|---|
| User explicitly says "skip to implement" or "just write the tests" | Phase 4 only |
Existing | Offer to reuse, skip Phase 1 |
Existing | Offer to reuse, skip Phases 1-2 |
Existing | Offer to reuse, skip to Phase 4 |
开始前检查是否存在已有产物。只有当用户明确要求时,才可以跳过产物生成步骤。
| 场景 | 处理方式 |
|---|---|
| 用户明确表示“直接进入实施阶段”或“只写测试用例” | 仅执行阶段4 |
已存在 | 提议复用该文件,跳过阶段1 |
已存在 | 提议复用该文件,跳过阶段1-2 |
已存在 | 提议复用该文件,直接进入阶段4 |
Phase 1: Discover
阶段1:探索
Goal: Understand the application, the user's role, and what matters most to test.
Output:
<project>/test-specs/test-strategy.md目标:了解应用、用户角色以及最需要测试的内容。
输出:
<project>/test-specs/test-strategy.mdSteps
步骤
-
Get project path — ask where to create the test project (e.g.,). All artifacts and tests will live here. Create the
./my-testsdirectory.test-specs/If cloud MCP tools are available (is set), use theSHIPLIGHT_API_TOKENskill to fetch environments and test accounts — this can pre-fill the target URL and credentials./cloud -
Silent scan — before asking questions, gather context from what's available:
- Codebase: routes, components, , framework
package.json - Git branch diff (what changed recently)
- Existing tests (what's already covered)
- PRDs, docs, README files
- Cloud environments (if cloud MCP tools available)
- Codebase: routes, components,
-
Understand what to test — ask the user what they'd like to test, then ask targeted follow-up questions (one at a time, with recommendations based on your scan) to fill gaps: risk areas, user roles, authentication, data strategy, critical journeys. Skip questions the user has already answered.
-
Writecontaining:
test-strategy.md- App profile: name, URL, framework, key pages/features
- Risk profile: what matters most, what's fragile
- Testing scope: what's in/out, user roles to cover
- Data strategy: how test data will be created and cleaned up
- Environment: target URL, auth method, any special setup
-
获取项目路径 — 询问用户要在哪里创建测试项目(例如:)。所有产物和测试用例都将存放在此处。创建
./my-tests目录。test-specs/如果云MCP工具可用(已设置),使用SHIPLIGHT_API_TOKEN技能获取环境和测试账号信息——这可预先填充目标URL和凭证。/cloud -
静默扫描 — 在提问前,收集可用的上下文信息:
- 代码库:路由、组件、、框架
package.json - Git分支差异(最近的变更内容)
- 已有测试用例(已覆盖的内容)
- PRD、文档、README文件
- 云环境(若云MCP工具可用)
- 代码库:路由、组件、
-
明确测试内容 — 询问用户想要测试的内容,然后根据扫描结果提出针对性的跟进问题(一次一个问题,并给出建议)以填补信息空白:风险区域、用户角色、认证方式、数据策略、关键流程。跳过用户已解答的问题。
-
编写,内容包括:
test-strategy.md- 应用概况:名称、URL、框架、关键页面/功能
- 风险概况:最关键的内容、最脆弱的部分
- 测试范围:包含/排除的内容、要覆盖的用户角色
- 数据策略:测试数据的创建与清理方式
- 环境信息:目标URL、认证方法、特殊配置
Phase 2: Specify
阶段2:规范
Goal: Define concrete test scenarios in structured Given/When/Then format, prioritized by risk. Surface ambiguities that would cause flaky or incomplete tests.
Input: reads
test-specs/test-strategy.mdOutput:
<project>/test-specs/test-spec.md目标:用结构化的Given/When/Then格式定义具体测试场景,并按风险优先级排序。找出可能导致测试不稳定或不完整的歧义点。
输入:读取
test-specs/test-strategy.md输出:
<project>/test-specs/test-spec.mdSteps
步骤
-
Readto understand scope and priorities.
test-strategy.md -
Generate user journey specs — for each critical journey, write:
- Title: descriptive name (e.g., "New user signup with email verification")
- Priority: P0 (must-have), P1 (should-have), P2 (nice-to-have)
- Preconditions: what must be true before the test starts (Given)
- Happy path: step-by-step actions and expected outcomes (When/Then)
- Edge cases: at least 2 per journey (e.g., invalid input, timeout, empty state)
- Data requirements: what test data is needed
-
Review for testing risks — scan each journey for issues that would cause flaky or incomplete tests: data dependencies, timing/async behavior, dynamic content, auth boundaries, third-party services, state isolation, environment differences. Add a Testing Notes section to each journey with identified risks and mitigations. If anything is ambiguous, ask the user (one at a time, with a recommended answer and impact statement).
-
Writewith all journey specs.
test-spec.md -
Checkpoint — present a summary table for user review:
# Journey Priority Steps Edge Cases Risks 1 User signup P0 5 3 Timing 2 ... ... ... ... ... Ask: "Does this look right? Any journeys to add, remove, or reprioritize?"Wait for user confirmation before proceeding.
-
读取以了解测试范围和优先级。
test-strategy.md -
生成用户流程规范 — 针对每个关键流程,编写:
- 标题:描述性名称(例如:“新用户邮箱验证注册”)
- 优先级:P0(必须覆盖)、P1(应该覆盖)、P2(可选覆盖)
- 前置条件:测试开始前必须满足的条件(Given)
- 正常流程:分步操作与预期结果(When/Then)
- 边缘场景:每个流程至少包含2个边缘场景(例如:无效输入、超时、空状态)
- 数据需求:所需的测试数据
-
检查测试风险 — 扫描每个流程,找出可能导致测试不稳定或不完整的问题:数据依赖、时序/异步行为、动态内容、认证边界、第三方服务、状态隔离、环境差异。为每个流程添加测试注意事项部分,列出已识别的风险及缓解措施。若存在歧义,询问用户(一次一个问题,并给出建议答案和影响说明)。
-
编写,包含所有流程规范。
test-spec.md -
检查点 — 向用户展示汇总表格以供审阅:
# 流程 优先级 步骤数 边缘场景数 风险 1 用户注册 P0 5 3 时序问题 2 ... ... ... ... ... 询问:“这份内容是否符合预期?是否需要添加、删除或重新调整流程优先级?”等待用户确认后再继续。
Phase 3: Plan
阶段3:规划
Goal: Create an actionable implementation plan with per-test guidance.
Input: reads
test-specs/test-spec.mdOutput:
<project>/test-specs/test-plan.md目标:创建可执行的实施计划,包含单测试用例的指导信息。
输入:读取
test-specs/test-spec.md输出:
<project>/test-specs/test-plan.mdSteps
步骤
-
Read.
test-spec.md -
Define test file structure — map journeys to test files:
tests/ ├── auth.setup.ts (if auth needed) ├── signup.test.yaml (Journey 1) ├── checkout.test.yaml (Journey 2) └── ... -
Set implementation order — ordered by:
- Dependencies first (auth setup before authenticated tests)
- Then by priority (P0 before P1)
- Then by risk (highest risk first)
-
Per-test guidance — for each test file, specify:
- Data strategy: what data to create/use, cleanup approach
- Wait strategy: where to use WAIT_UNTIL vs WAIT, expected loading points
- Flakiness risks: specific things to watch for in this test
-
Write.
test-plan.md -
Checkpoint — present summary:Ready to implement N test files. Shall I proceed?
-
读取。
test-spec.md -
定义测试文件结构 — 将流程映射到测试文件:
tests/ ├── auth.setup.ts (若需要认证) ├── signup.test.yaml (流程1) ├── checkout.test.yaml (流程2) └── ... -
设置实施顺序 — 按以下顺序排列:
- 先处理依赖项(认证配置优先于需认证的测试)
- 再按优先级排序(P0优先于P1)
- 最后按风险排序(高风险优先)
-
单测试用例指导 — 针对每个测试文件,指定:
- 数据策略:要创建/使用的数据、清理方式
- 等待策略:在哪些位置使用WAIT_UNTIL与WAIT、预期加载点
- 不稳定风险:该测试需要注意的具体问题
-
编写。
test-plan.md -
检查点 — 展示汇总信息:准备实施N个测试文件。是否继续?
Phase 4: Implement
阶段4:实施
Goal: Set up the project and write all YAML tests guided by the plan.
Input: reads
test-specs/test-plan.md目标:搭建项目并根据计划编写所有YAML测试用例。
输入:读取
test-specs/test-plan.mdSetup
项目搭建
Skip any steps already done (project exists, deps installed, auth configured).
-
Configure AI provider — check if the test project already has awith an AI API key. If not, ask the user to choose a provider:
.envTo run YAML tests, I need an AI provider for resolving test steps. Which provider would you like to use?A) Google AI —(Get key) — default model:GOOGLE_API_KEYB) Anthropic —gemini-3.1-flash-lite-preview(Get key) — default model:ANTHROPIC_API_KEYC) OpenAI —claude-haiku-4-5(Get key) — default model:OPENAI_API_KEYD) Azure OpenAI — requiresgpt-5.4-mini+AZURE_OPENAI_API_KEY— setAZURE_OPENAI_ENDPOINTE) AWS Bedrock — uses AWS credential chain — setWEB_AGENT_MODEL=azure:<deployment>F) Google Vertex AI — uses GCP Application Default Credentials — setWEB_AGENT_MODEL=bedrock:<model_id>G) I already have it configuredWEB_AGENT_MODEL=vertex:<model>After the user chooses, ask for their API key and save it to the test project'sfile. For A/B/C, the model is auto-detected from the key. For D/E/F, also save.envwith the appropriateWEB_AGENT_MODELprefix. Optionally, the user can setprovider:modelto override the default model (e.g.,WEB_AGENT_MODEL).WEB_AGENT_MODEL=claude-sonnet-4-6 -
Scaffold the project — callwith the absolute project path. This creates
scaffold_project,package.json,playwright.config.ts,.env.example, and.gitignore. Save the API key totests/..env -
Install dependencies:bash
npm install npx playwright install chromium -
Set up authentication (if needed) — follow the standard Playwright authentication pattern.Add credentials as variables in:
playwright.config.tsts{ name: 'my-app', testDir: './tests/my-app', dependencies: ['my-app-setup'], use: { baseURL: 'https://app.example.com', storageState: 'tests/my-app/.auth/storage-state.json', variables: { username: process.env.MY_APP_EMAIL, password: { value: process.env.MY_APP_PASSWORD, sensitive: true }, // otp_secret_key: { value: process.env.MY_APP_TOTP_SECRET, sensitive: true }, }, }, },Standard variable names:,username,password. Useotp_secret_keyfor secrets. Add values to{ value, sensitive: true }..envWritewith standard Playwright login code. For TOTP, implement RFC 6238 usingauth.setup.ts(HMAC-SHA1 + base32 decode) — no third-party dependency needed.node:cryptoVerify auth before proceeding. Runto execute the auth setup and confirm it savesnpx shiplight test --headed. If it fails, escalate to the user — auth is a prerequisite for everything else.storage-state.jsonIf the test plan involves special auth requirements (e.g., one account per test, multiple roles), confirm the auth strategy with the user before proceeding.
跳过已完成的步骤(项目已存在、依赖已安装、认证已配置)。
-
配置AI提供商 — 检查测试项目是否已有包含AI API密钥的文件。如果没有,询问用户选择提供商:
.env要运行YAML测试用例,需要AI提供商来解析测试步骤。你想使用哪个提供商?A) Google AI — 需要(获取密钥)——默认模型:GOOGLE_API_KEYB) Anthropic — 需要gemini-3.1-flash-lite-preview(获取密钥)——默认模型:ANTHROPIC_API_KEYC) OpenAI — 需要claude-haiku-4-5(获取密钥)——默认模型:OPENAI_API_KEYD) Azure OpenAI — 需要gpt-5.4-mini+AZURE_OPENAI_API_KEY— 设置AZURE_OPENAI_ENDPOINTE) AWS Bedrock — 使用AWS凭证链 — 设置WEB_AGENT_MODEL=azure:<deployment>F) Google Vertex AI — 使用GCP应用默认凭证 — 设置WEB_AGENT_MODEL=bedrock:<model_id>G) 我已完成配置WEB_AGENT_MODEL=vertex:<model>用户选择后,询问其API密钥并保存到测试项目的文件中。对于A/B/C选项,模型会通过密钥自动识别。对于D/E/F选项,还需保存带有相应.env前缀的provider:model。用户也可选择设置WEB_AGENT_MODEL来覆盖默认模型(例如:WEB_AGENT_MODEL)。WEB_AGENT_MODEL=claude-sonnet-4-6 -
搭建项目骨架 — 调用并传入项目绝对路径。这将创建
scaffold_project、package.json、playwright.config.ts、.env.example和.gitignore目录。将API密钥保存到tests/文件中。.env -
安装依赖:bash
npm install npx playwright install chromium -
配置认证(若需要) — 遵循标准的Playwright认证模式。在中添加凭证变量:
playwright.config.tsts{ name: 'my-app', testDir: './tests/my-app', dependencies: ['my-app-setup'], use: { baseURL: 'https://app.example.com', storageState: 'tests/my-app/.auth/storage-state.json', variables: { username: process.env.MY_APP_EMAIL, password: { value: process.env.MY_APP_PASSWORD, sensitive: true }, // otp_secret_key: { value: process.env.MY_APP_TOTP_SECRET, sensitive: true }, }, }, },标准变量名:、username、password。对于敏感信息,使用otp_secret_key格式。将值添加到{ value, sensitive: true }文件中。.env编写,包含标准Playwright登录代码。对于TOTP,使用auth.setup.ts实现RFC 6238(HMAC-SHA1 + base32解码)——无需第三方依赖。node:crypto继续前验证认证有效性。运行执行认证配置,确认已保存npx shiplight test --headed。若失败,告知用户——认证是后续所有操作的前提。storage-state.json如果测试计划涉及特殊认证需求(例如:每个测试使用独立账号、多角色),继续前需与用户确认认证策略。
Write tests
编写测试用例
For each test in the plan (or each test the user wants):
- Open a browser session — call with the app's
new_session.starting_url - Walk through the flow — use to see the page, then
inspect_pageto perform each action. This captures locators from the response.act - Capture locators — use for additional element info when needed.
get_locators - Build the YAML — construct the content following the best practices below.
.test.yaml - Save and validate — write the file, then call
.test.yamlwith the file path to check locator coverage (minimum 50% required).validate_yaml_test - Close the session — call when done.
close_session
Important: Do NOT write YAML tests from imagination. Always walk through the app in a browser session first to capture real locators. Tests without locators are rejected by .
validate_yaml_testWhen guided by :
test-plan.md- Apply the specified wait strategy at loading points
- Cover the edge cases and assertions defined in the spec
针对计划中的每个测试(或用户需要的每个测试):
- 打开浏览器会话 — 调用并传入应用的
new_session。starting_url - 遍历流程 — 使用查看页面,然后使用
inspect_page执行每个操作。这将从响应中捕获定位器。act - 捕获定位器 — 必要时使用获取额外元素信息。
get_locators - 构建YAML内容 — 按照以下最佳实践构建内容。
.test.yaml - 保存并验证 — 写入文件,然后调用
.test.yaml并传入文件路径检查定位器覆盖率(要求至少50%)。validate_yaml_test - 关闭会话 — 完成后调用。
close_session
重要提示: 切勿凭空编写YAML测试用例。必须先在浏览器会话中遍历应用,捕获真实的定位器。没有定位器的测试用例会被拒绝。
validate_yaml_test遵循指导:
test-plan.md- 在加载点应用指定的等待策略
- 覆盖规范中定义的边缘场景和断言
Run tests
运行测试
After writing all tests, run them:
bash
npx shiplight test --headedWhen a test fails:
- Report — tell the user which test failed and why (one sentence).
- Classify the failure:
- Implementation fix (wrong locator, missing wait, timing) → fix and retry.
- Spec mismatch (app behavior differs from spec) → ask the user whether to update the spec or skip the scenario.
- Escalate if a fix doesn't work — don't keep retrying the same approach.
编写完所有测试用例后,运行测试:
bash
npx shiplight test --headed测试失败时:
- 上报 — 告知用户哪个测试失败及原因(一句话说明)。
- 分类失败类型:
- 实现问题(定位器错误、缺少等待、时序问题)→ 修复后重试。
- 规范不匹配(应用行为与规范不符)→ 询问用户是更新规范还是跳过该场景。
- 若修复无效则上报 — 切勿重复使用相同方法重试。
Phase 5: Verify
阶段5:验证
Goal: Validate test coverage against the spec and reconcile any drift.
Input: reads , , and all files
test-specs/test-spec.mdtest-specs/test-plan.md.test.yamlThis phase only runs when spec artifacts exist.
目标:验证测试用例对规范的覆盖率,协调任何偏差。
输入:读取、以及所有文件
test-specs/test-spec.mdtest-specs/test-plan.md.test.yaml仅当存在规范产物时才运行此阶段。
Coverage check
覆盖率检查
For each spec journey, confirm the test covers the happy path and all listed edge cases.
Present a coverage summary:
| Spec Journey | Priority | Scenarios Specified | Tests Written | Coverage |
|---|---|---|---|---|
| User signup | P0 | 4 | 4 | ✓ |
| Checkout | P0 | 3 | 2 | ✗ — edge case "empty cart" not covered |
Flag gaps and extras (test steps not in the spec).
针对每个规范流程,确认测试用例覆盖了正常流程和所有列出的边缘场景。
展示覆盖率汇总:
| 规范流程 | 优先级 | 指定场景数 | 已编写测试数 | 覆盖率 |
|---|---|---|---|---|
| 用户注册 | P0 | 4 | 4 | ✓ |
| 结账流程 | P0 | 3 | 2 | ✗ — 未覆盖边缘场景“空购物车” |
标记覆盖率缺口和额外内容(规范中未提及的测试步骤)。
Reconcile
协调偏差
Update spec artifacts to match what was actually implemented:
- Update — mark skipped scenarios with reason, add scenarios that emerged during implementation, update edge cases to reflect what was tested
test-spec.md - Update — correct file structure, note deviations from the original plan
test-plan.md - Show diff summary — tell the user what changed and why
This keeps artifacts accurate for future test maintenance and expansion.
更新规范产物以匹配实际实现内容:
- 更新— 标记跳过的场景及原因,添加实施过程中出现的新场景,更新边缘场景以反映实际测试内容
test-spec.md - 更新— 修正文件结构,记录与原计划的偏差
test-plan.md - 展示差异汇总 — 告知用户变更内容及原因
这将确保产物准确,便于未来测试维护和扩展。
YAML Format Reference
YAML格式参考
Read the MCP resource for the full language spec (statement types, templates, variables, suites, hooks, parameterized tests).
shiplight://yaml-test-spec-v1.3.0Read the MCP resource for the full list of available actions and their parameters.
shiplight://schemas/action-entity阅读MCP资源获取完整语言规范(语句类型、模板、变量、套件、钩子、参数化测试)。
shiplight://yaml-test-spec-v1.3.0阅读MCP资源获取可用操作及其参数的完整列表。
shiplight://schemas/action-entityYAML Authoring Best Practices
YAML编写最佳实践
These best practices bridge the YAML language spec and the action catalog to help you write fast, reliable tests.
这些最佳实践结合了YAML语言规范和操作目录,帮助你编写快速、可靠的测试用例。
Statement type selection
语句类型选择
- ACTION is the default. Capture locators via MCP tools (,
act) during browser sessions, then write ACTION statements. ACTIONs replay deterministically (~1s).get_locators - DRAFT is a last resort. Only use DRAFT when the locator is genuinely unknowable at authoring time. DRAFTs are slow (~5-10s each, AI resolution at runtime). Tests with too many DRAFTs are rejected by .
validate_yaml_test - VERIFY for assertions. Use for all assertions. Do not write assertion DRAFTs like
VERIFY:."Check that the button is visible" - URL for navigation. Use for navigation instead of
URL: /path.action: go_to_url - CODE for scripting. Use for network mocking, localStorage manipulation, page-level scripting. Not for clicks, assertions, or navigation.
CODE:
- ACTION为默认类型:在浏览器会话期间通过MCP工具(、
act)捕获定位器,然后编写ACTION语句。ACTION语句可确定性重放(约1秒)。get_locators - DRAFT为最后选择:仅当编写时确实无法获取定位器时才使用DRAFT。DRAFT运行缓慢(每个约5-10秒,运行时需AI解析)。包含过多DRAFT的测试用例会被拒绝。
validate_yaml_test - VERIFY用于断言:所有断言均使用。切勿编写类似
VERIFY:的DRAFT断言。"Check that the button is visible" - URL用于导航:使用进行导航,而非
URL: /path。action: go_to_url - CODE用于脚本:使用进行网络模拟、localStorage操作、页面级脚本编写。不用于点击、断言或导航。
CODE:
The intent
field
intentintent
字段
intentintentactionlocatorjsintentBecause drives self-healing, it must be specific enough for an agent to act on without any other context. Describe the user goal, not the DOM element — avoid element indices, CSS selectors, or positional references that break when the UI changes:
intentyaml
undefinedintentactionlocatorjsintent由于驱动自我修复,其必须足够具体,使代理无需其他上下文即可执行操作。描述用户目标,而非DOM元素——避免使用元素索引、CSS选择器或位置引用,这些内容会随UI变更而失效:
intentyaml
undefinedBAD: vague, agent can't re-derive the action
错误:表述模糊,代理无法重新推导操作
- intent: Click button
- intent: Click button
BAD: tied to DOM structure that can change
错误:依赖可能变更的DOM结构
- intent: Click the 3rd button in the form
- intent: Click element at index 42
- intent: Click the 3rd button in the form
- intent: Click element at index 42
GOOD: describes the user goal, stable across UI changes
正确:描述用户目标,在UI变更时保持稳定
- intent: Click the Submit button to save the new project action: click locator: "getByRole('button', { name: 'Submit' })"
undefined- intent: Click the Submit button to save the new project action: click locator: "getByRole('button', { name: 'Submit' })"
undefinedACTION: structured format vs js:
shorthand
js:ACTION:结构化格式 vs js:
简写
js:Use structured format by default for all supported actions. Read the MCP resource for the full list of available actions and their parameters.
shiplight://schemas/action-entityUse only when the action doesn't map to a supported action — e.g., complex multi-step interactions, custom Playwright API calls, or chained operations:
js:yaml
- intent: Drag slider to 50% position
js: "await page.getByRole('slider').first().fill('50')"
- intent: Wait for network idle after form submit
js: "await page.waitForLoadState('networkidle')"默认使用结构化格式处理所有支持的操作。阅读MCP资源获取可用操作及其参数的完整列表。
shiplight://schemas/action-entity仅当操作无法映射到支持的类型时使用——例如复杂的多步骤交互、自定义Playwright API调用或链式操作:
js:yaml
- intent: Drag slider to 50% position
js: "await page.getByRole('slider').first().fill('50')"
- intent: Wait for network idle after form submit
js: "await page.waitForLoadState('networkidle')"js:
coding rules
js:js:
编码规则
js:- Always resolve locators to a single element (e.g., ,
.first()) to avoid Playwright strict-mode errors.nth(1) - Always include on actions for predictable timing
{ timeout: 5000 } - The is critical — it's the input for self-healing when
intentfailsjs - ,
page, andagentare available in scopeexpect
- 始终将定位器解析为单个元素(例如:、
.first()),避免Playwright严格模式错误.nth(1) - 操作始终包含,确保时序可预测
{ timeout: 5000 } - 至关重要——当
intent失效时,它是自我修复的输入js - 、
page和agent在作用域内可用expect
VERIFY best practices
VERIFY最佳实践
- Always set a short timeout (e.g., ) on
{ timeout: 2000 }assertions that have an AI fallback, so stale locators fall back to AI quickly instead of waiting the default 5sjs: - Always use shorthand — do not use
VERIFY:directlyaction: verify - Be aware of false negatives with assertions. The AI fallback only triggers when
js:throws (element not found, timeout). Ifjspasses against the wrong element (stale selector matching a different element), the assertion silently succeeds — no fallback occurs. Keepjsassertions simple and specific to minimize this risk.js:
- 对于带有AI回退的断言,始终设置短超时(例如:
js:),这样过期定位器会快速回退到AI,而非等待默认的5秒{ timeout: 2000 } - 始终使用简写——切勿直接使用
VERIFY:action: verify - 注意断言的误报。AI回退仅在
js:抛出异常(元素未找到、超时)时触发。如果js匹配到错误元素(过期选择器匹配到其他元素)并通过,断言会静默成功——不会触发回退。保持js断言简单且具体,以最小化此风险。js:
IF/WHILE js:
condition best practices
js:IF/WHILE js:
条件最佳实践
js:- Use natural language (AI) conditions for DOM-based checks (element visible, text present, page state). AI conditions self-heal against DOM changes; conditions are brittle and cannot auto-heal.
js: - Use conditions only for counter/state logic — e.g.,
js:,js: counter++ < 10. Never usejs: retryCount < 3for DOM inspection likejs:.js: document.querySelector('.modal') !== null - If you need a JavaScript-based DOM check, use to evaluate it and store the result, or use
CODE:withVERIFY:(which at least has AI fallback on failure).js:
- 针对基于DOM的检查使用自然语言(AI)条件(元素可见、文本存在、页面状态)。AI条件可针对DOM变更进行自我修复;条件易失效且无法自动修复。
js: - 仅将条件用于计数器/状态逻辑——例如:
js:、js: counter++ < 10。切勿使用js: retryCount < 3进行DOM检查,如js:。js: document.querySelector('.modal') !== null - 如果需要基于JavaScript的DOM检查,使用进行评估并存储结果,或使用带有
CODE:的js:(至少在失败时有AI回退)。VERIFY:
Waiting syntax
等待语法
- — AI checks the condition repeatedly until met or timeout. Default timeout is 60 seconds. Each AI check takes 5–10s, so set
WAIT_UNTIL:to at least 15.timeout_seconds - — fixed-duration pause. Use
WAIT:to set duration.seconds:
See Smart waiting in E2E Test Design for when to use each.
- — AI反复检查条件,直到满足或超时。默认超时为60秒。每次AI检查需5–10秒,因此
WAIT_UNTIL:至少设置为15。timeout_seconds - — 固定时长等待。使用
WAIT:设置时长。seconds:
有关何时使用哪种等待方式,请参阅E2E测试设计中的智能等待部分。
General conventions
通用约定
- Put first in ACTION statements for readability
intent - is only needed when an ACTION has neither
xpathnorlocator.js - Single-test vs Suite vs Parameters:
- Single-test file — one isolated test, no shared state
- Suite — tests that have sequential dependencies (e.g., test A creates a file, test B consumes it). Each test in a suite still covers one journey — the suite just guarantees execution order and shares browser state. Do NOT use suites to bundle unrelated tests.
- Parameters — same test structure, different data inputs
- ACTION语句中放在首位,提高可读性
intent - 仅当ACTION既无也无
locator时才需要js。xpath - 单测试用例 vs 套件 vs 参数化:
- 单测试文件 — 一个独立测试,无共享状态
- 套件 — 存在顺序依赖的测试(例如:测试A创建文件,测试B使用该文件)。套件中的每个测试仍覆盖一个流程——套件仅保证执行顺序并共享浏览器状态。切勿使用套件捆绑无关测试。
- 参数化 — 测试结构相同,数据输入不同
E2E Test Design Best Practices
E2E测试设计最佳实践
These principles govern what to test and how to structure tests — independent of the YAML format. Apply them during Phase 2 (Specify) and Phase 4 (Implement).
这些原则指导测试内容和测试结构——与YAML格式无关。在阶段2(规范)和阶段4(实施)中应用这些原则。
Test isolation
测试隔离
Each test must run independently — never depend on another test's side effects, execution order, or leftover state. If a test needs data, it creates that data itself.
yaml
undefined每个测试必须独立运行——绝不依赖其他测试的副作用、执行顺序或遗留状态。如果测试需要数据,应自行创建该数据。
yaml
undefinedBAD: depends on a previous test having created "My Project"
错误:依赖之前的测试已创建"My Project"
test: Delete a project
steps:
- URL: /projects
- intent: Click on "My Project" action: click locator: "getByText('My Project')"
- intent: Click the Delete button action: click locator: "getByRole('button', { name: 'Delete' })"
test: Delete a project
steps:
- URL: /projects
- intent: Click on "My Project" action: click locator: "getByText('My Project')"
- intent: Click the Delete button action: click locator: "getByRole('button', { name: 'Delete' })"
GOOD: creates its own data, then tests the behavior
正确:自行创建数据,然后测试行为
test: Delete a project
steps:
- CODE: js: | const res = await page.request.post('/api/projects', { data: { name: 'Delete-Test-' + Date.now() } }); const project = await res.json(); save_variable('projectName', project.name);
- URL: /projects
- WAIT_UNTIL: The project list has loaded
- intent: Click on the project we just created action: click js: "await page.getByText('{{projectName}}').click()"
- intent: Click the Delete button action: click locator: "getByRole('button', { name: 'Delete' })"
- VERIFY: The project is no longer visible in the list
undefinedtest: Delete a project
steps:
- CODE: js: | const res = await page.request.post('/api/projects', { data: { name: 'Delete-Test-' + Date.now() } }); const project = await res.json(); save_variable('projectName', project.name);
- URL: /projects
- WAIT_UNTIL: The project list has loaded
- intent: Click on the project we just created action: click js: "await page.getByText('{{projectName}}').click()"
- intent: Click the Delete button action: click locator: "getByRole('button', { name: 'Delete' })"
- VERIFY: The project is no longer visible in the list
undefinedOne journey per test
每个测试覆盖一个流程
Each test should verify one logical user journey. If step 3 of 8 fails, steps 4-8 give you zero information. Split long flows into focused tests.
Exception: Suites allow sequential dependencies between tests (e.g., test A uploads a file, test B downloads it). Each test in a suite still covers one journey — the suite just guarantees order and shares browser state.
yaml
undefined每个测试应验证一个逻辑用户流程。如果8步中的第3步失败,第4-8步无法提供有效信息。将长流程拆分为聚焦的测试。
例外: 套件允许测试之间存在顺序依赖(例如:测试A上传文件,测试B下载该文件)。套件中的每个测试仍覆盖一个流程——套件仅保证顺序并共享浏览器状态。
yaml
undefinedBAD: tests login, settings change, AND deletion in one test
错误:在一个测试中覆盖登录、设置变更和账号删除
test: Full user lifecycle
steps:
- intent: Log in
- intent: Navigate to settings
- intent: Change display name
- VERIFY: Name updated
- intent: Navigate to account
- intent: Delete account
- VERIFY: Account deleted
test: Full user lifecycle
steps:
- intent: Log in
- intent: Navigate to settings
- intent: Change display name
- VERIFY: Name updated
- intent: Navigate to account
- intent: Delete account
- VERIFY: Account deleted
GOOD: separate tests, each verifiable in isolation
正确:拆分测试,每个测试可独立验证
File: update-display-name.test.yaml
文件:update-display-name.test.yaml
test: Update display name from settings
steps:
- URL: /settings
- intent: Clear the display name field and type "New Name" action: fill locator: "getByLabel('Display name')" value: "New Name"
- intent: Click Save action: click locator: "getByRole('button', { name: 'Save' })"
- VERIFY: Success message "Settings saved" is visible
test: Update display name from settings
steps:
- URL: /settings
- intent: Clear the display name field and type "New Name" action: fill locator: "getByLabel('Display name')" value: "New Name"
- intent: Click Save action: click locator: "getByRole('button', { name: 'Save' })"
- VERIFY: Success message "Settings saved" is visible
File: delete-account.test.yaml (separate test)
文件:delete-account.test.yaml (独立测试)
test: Delete account from account page
steps:
- URL: /account
... focused on deletion only
undefinedtest: Delete account from account page
steps:
- URL: /account
... 仅聚焦于删除流程
undefinedAssert what users see, not implementation details
断言用户可见内容,而非实现细节
Test visible outcomes — text, navigation, enabled/disabled states. Never assert CSS classes, data attributes, internal state, or DOM structure.
yaml
undefined测试可见结果——文本、导航、启用/禁用状态。切勿断言CSS类、数据属性、内部状态或DOM结构。
yaml
undefinedBAD: asserts implementation details
错误:断言实现细节
- VERIFY: js: | const el = await page.locator('.btn-primary'); await expect(el).toHaveClass(/disabled/); await expect(el).toHaveAttribute('data-state', 'submitted');
- VERIFY: js: | const el = await page.locator('.btn-primary'); await expect(el).toHaveClass(/disabled/); await expect(el).toHaveAttribute('data-state', 'submitted');
GOOD: asserts what a user would observe
正确:断言用户可观察到的内容
- VERIFY: The Submit button is disabled js: | await expect(page.getByRole('button', { name: 'Submit' })) .toBeDisabled({ timeout: 2000 });
undefined- VERIFY: The Submit button is disabled js: | await expect(page.getByRole('button', { name: 'Submit' })) .toBeDisabled({ timeout: 2000 });
undefinedFocused assertions
聚焦断言
Verify the one thing that proves the feature works. Over-asserting makes tests brittle — they break on cosmetic changes unrelated to the behavior under test.
yaml
undefined验证证明功能正常工作的核心点。过度断言会导致测试脆弱——无关行为的 cosmetic变更会导致测试失败。
yaml
undefinedBAD: asserts every field on the page — breaks when any label changes
错误:断言页面上的每个字段——任何标签变更都会导致测试失败
- VERIFY: Page title is "Dashboard"
- VERIFY: Welcome message shows username
- VERIFY: Sidebar has 5 menu items
- VERIFY: Footer shows current year
- VERIFY: Avatar image is loaded
- VERIFY: Notification bell is visible
- VERIFY: Page title is "Dashboard"
- VERIFY: Welcome message shows username
- VERIFY: Sidebar has 5 menu items
- VERIFY: Footer shows current year
- VERIFY: Avatar image is loaded
- VERIFY: Notification bell is visible
GOOD: asserts the one thing that proves the user landed on the dashboard
正确:断言证明用户已进入仪表盘的核心点
- VERIFY: Dashboard page shows the welcome message with the user's name
undefined- VERIFY: Dashboard page shows the welcome message with the user's name
undefinedNever test third-party services
绝不测试第三方服务
Don't assert that Stripe's checkout, Google OAuth's consent screen, or Twilio's SMS delivery works. Mock external services at the network boundary. Test your integration, not their UI.
yaml
undefined不要断言Stripe结账、Google OAuth授权界面或Twilio短信送达是否正常工作。在网络边界模拟外部服务。测试你的集成,而非他们的UI。
yaml
undefinedBAD: tests Stripe's UI (will break when Stripe updates their page)
错误:测试Stripe的UI(Stripe更新页面时会失败)
- intent: Enter card number in Stripe iframe
- intent: Click Stripe's pay button
- VERIFY: Stripe shows success checkmark
- intent: Enter card number in Stripe iframe
- intent: Click Stripe's pay button
- VERIFY: Stripe shows success checkmark
GOOD: mock the payment API, test your success handling
正确:模拟支付API,测试你的成功处理逻辑
- CODE: js: | await page.route('**/api/payments', route => route.fulfill({ status: 200, json: { status: 'succeeded', id: 'pi_mock' } }) );
- intent: Click the Pay button action: click locator: "getByRole('button', { name: 'Pay' })"
- VERIFY: Order confirmation page shows "Payment successful"
undefined- CODE: js: | await page.route('**/api/payments', route => route.fulfill({ status: 200, json: { status: 'succeeded', id: 'pi_mock' } }) );
- intent: Click the Pay button action: click locator: "getByRole('button', { name: 'Pay' })"
- VERIFY: Order confirmation page shows "Payment successful"
undefinedDeterministic test data
确定性测试数据
Use unique identifiers per test run to avoid collisions. Never rely on hardcoded data that other tests or users might modify.
yaml
undefined每次测试运行使用唯一标识符,避免冲突。绝不依赖其他测试或用户可能修改的硬编码数据。
yaml
undefinedBAD: hardcoded name — collides if tests run in parallel or data persists
错误:硬编码名称——并行测试或数据持久化时会冲突
- intent: Type "Test User" into the name field action: fill locator: "getByLabel('Name')" value: "Test User"
- intent: Type "Test User" into the name field action: fill locator: "getByLabel('Name')" value: "Test User"
GOOD: unique per run — no collisions
正确:每次运行唯一——无冲突
- CODE: js: "save_variable('testName', 'Test-User-' + Date.now());"
- intent: Type the generated name into the name field action: fill locator: "getByLabel('Name')" text: "{{testName}}"
undefined- CODE: js: "save_variable('testName', 'Test-User-' + Date.now());"
- intent: Type the generated name into the name field action: fill locator: "getByLabel('Name')" text: "{{testName}}"
undefinedPrefer API seeding over UI setup
优先使用API初始化数据,而非UI设置
When a test needs preconditions (a user exists, a project is created), set them up via API calls — not by clicking through the UI. UI setup is slow, flaky, and not what you're testing.
yaml
undefined当测试需要前置条件(用户已存在、项目已创建)时,通过API调用设置——而非点击UI。UI设置速度慢、易失效,且不是测试的核心内容。
yaml
undefinedBAD: 10 UI steps just to set up data before the real test
错误:10步UI操作只是为了在真正测试前初始化数据
- URL: /projects/new
- intent: Type project name
- intent: Select team
- intent: Click Create
- WAIT_UNTIL: Project page loads
- URL: /projects/new
- intent: Type project name
- intent: Select team
- intent: Click Create
- WAIT_UNTIL: Project page loads
... now the actual test starts
... 真正的测试现在才开始
GOOD: API seed in one step, then test the real behavior
正确:一步API初始化,然后测试核心行为
- CODE: js: | const res = await page.request.post('/api/projects', { data: { name: 'Seed-' + Date.now(), team: 'engineering' } }); const { slug } = await res.json(); save_variable('projectSlug', slug);
- URL: /projects/{{projectSlug}}/settings
- WAIT_UNTIL: Settings page has loaded
- CODE: js: | const res = await page.request.post('/api/projects', { data: { name: 'Seed-' + Date.now(), team: 'engineering' } }); const { slug } = await res.json(); save_variable('projectSlug', slug);
- URL: /projects/{{projectSlug}}/settings
- WAIT_UNTIL: Settings page has loaded
... test starts immediately at the point that matters
... 测试直接从关键节点开始
undefinedundefinedSmart waiting
智能等待
Use the right wait for the situation. costs 5-10s per check (AI resolution), so it's overkill for short, predictable delays. is fine when the delay is short and known. The anti-pattern is using as a substitute for condition-based waiting when the delay is unpredictable.
WAIT_UNTIL:WAIT:WAIT:yaml
undefined针对不同场景使用合适的等待方式。每次检查需5-10秒(AI解析),因此对于短且可预测的延迟来说过于冗余。适用于短且已知的延迟。反模式是当延迟不可预测时,使用替代基于条件的等待。
WAIT_UNTIL:WAIT:WAIT:yaml
undefinedBAD: guessing how long a data fetch takes — too short in CI, too long locally
错误:猜测数据加载时间——CI环境中太短,本地环境太长
- WAIT: Wait for data to load seconds: 5
- VERIFY: The table shows results
- WAIT: Wait for data to load seconds: 5
- VERIFY: The table shows results
GOOD: condition-based wait for unpredictable operations
正确:针对不可预测操作使用基于条件的等待
- WAIT_UNTIL: The data table has at least one row visible timeout_seconds: 30
- WAIT_UNTIL: The data table has at least one row visible timeout_seconds: 30
ALSO GOOD: short WAIT for known, fast delays (animations, transitions, debounce)
同样正确:针对已知的快速延迟(动画、过渡、防抖)使用短WAIT
- intent: Type search query action: fill locator: "getByRole('searchbox')" value: "test"
- WAIT: Wait for debounce to fire seconds: 1
- VERIFY: Search suggestions are visible
Rule of thumb: if the delay is **predictable and under 5s** (animation, debounce, transition), use `WAIT:`. If the delay is **unpredictable** (API call, data loading, file processing), use `WAIT_UNTIL:`.- intent: Type search query action: fill locator: "getByRole('searchbox')" value: "test"
- WAIT: Wait for debounce to fire seconds: 1
- VERIFY: Search suggestions are visible
经验法则:如果延迟**可预测且小于5秒**(动画、防抖、过渡),使用`WAIT:`。如果延迟**不可预测**(API调用、数据加载、文件处理),使用`WAIT_UNTIL:`。Test error states, not just happy paths
测试错误状态,而非仅正常流程
Real users hit errors. A test suite that only covers happy paths gives false confidence. For every critical journey, include at least one error/edge case test.
yaml
undefined真实用户会遇到错误。仅覆盖正常流程的测试套件会给出虚假的信心。对于每个关键流程,至少包含一个错误/边缘场景测试。
yaml
undefinedCovers: empty state, invalid input, network failure
覆盖:空状态、无效输入、网络故障
test: Search handles no results gracefully
steps:
- URL: /search
- intent: Type a query that returns no results action: fill locator: "getByRole('searchbox')" value: "zzz_no_match_zzz"
- intent: Submit the search action: click locator: "getByRole('button', { name: 'Search' })"
- VERIFY: Empty state message "No results found" is displayed
- VERIFY: The search box still contains the query (user can refine)
undefinedtest: Search handles no results gracefully
steps:
- URL: /search
- intent: Type a query that returns no results action: fill locator: "getByRole('searchbox')" value: "zzz_no_match_zzz"
- intent: Submit the search action: click locator: "getByRole('button', { name: 'Search' })"
- VERIFY: Empty state message "No results found" is displayed
- VERIFY: The search box still contains the query (user can refine)
undefinedDesign for parallel execution
为并行执行设计测试
Tests that modify shared global state (e.g., site-wide settings, the only admin account) can't safely run in parallel. Design around this:
- Use unique, per-test data instead of shared fixtures
- Avoid tests that change global configuration
- If a test must modify shared state, document it and mark it for serial execution
修改共享全局状态(例如:站点范围设置、唯一管理员账号)的测试无法安全并行运行。需针对性设计:
- 使用唯一的单测试数据,而非共享固定数据
- 避免修改全局配置的测试
- 如果测试必须修改共享状态,需记录并标记为串行执行
Flaky test policy
不稳定测试处理策略
A test that passes on retry is still broken. Never add retries to mask flakiness — find and fix the root cause:
- Timing flake? → Add a proper for the right condition
WAIT_UNTIL: - Data flake? → Use unique test data, add proper cleanup
- Order flake? → The test has a hidden dependency on another test — make it self-contained
- Environment flake? → Mock the unstable external service
重试后通过的测试仍然是有问题的。切勿添加重试来掩盖不稳定性——找到并修复根本原因:
- 时序不稳定? → 为正确的条件添加合适的
WAIT_UNTIL: - 数据不稳定? → 使用唯一测试数据,添加适当的清理逻辑
- 顺序不稳定? → 测试存在对其他测试的隐藏依赖——使其独立
- 环境不稳定? → 模拟不稳定的外部服务
Project Structure
项目结构
my-tests/
├── test-specs/ # Spec artifacts (version-controlled)
│ ├── test-strategy.md # Phase 1: app & risk profile
│ ├── test-spec.md # Phase 2: Given/When/Then scenarios
│ └── test-plan.md # Phase 3: implementation plan
│
├── playwright.config.ts
├── package.json
├── .env # API keys + credentials (gitignored)
├── .gitignore
│
├── tests/
│ ├── public-app/ # No login needed
│ │ ├── search.test.yaml
│ │ └── filter.test.yaml
│ │
│ └── my-saas-app/ # Requires login
│ ├── auth.setup.ts # Playwright login setup — you write this
│ ├── dashboard.test.yaml
│ └── settings.test.yamlThe directory contains human-readable markdown artifacts that are version-controllable. Do NOT add to .
test-specs/test-specs/.gitignoremy-tests/
├── test-specs/ # 规范产物(需版本控制)
│ ├── test-strategy.md # 阶段1:应用与风险概况
│ ├── test-spec.md # 阶段2:Given/When/Then场景
│ └── test-plan.md # 阶段3:实施计划
│
├── playwright.config.ts
├── package.json
├── .env # API密钥 + 凭证(已加入git忽略)
├── .gitignore
│
├── tests/
│ ├── public-app/ # 无需登录
│ │ ├── search.test.yaml
│ │ └── filter.test.yaml
│ │
│ └── my-saas-app/ # 需要登录
│ ├── auth.setup.ts # Playwright登录配置——由你编写
│ ├── dashboard.test.yaml
│ └── settings.test.yamltest-specs/test-specs/.gitignoreTips
小贴士
- ACTION statements with locators replay ~10x faster than DRAFTs. Always prefer ACTIONs.
- Use to understand page state. Always read the DOM file first — it provides element indices needed for
inspect_pageand consumes far fewer tokens. Only view the screenshot when you specifically need visual information (layout, colors, images), as screenshots consume significantly more tokens than DOM.act - Run a specific project's tests with:
npx shiplight test --headed my-saas-app/ - The file is auto-discovered by
.env— no manual dotenv setup needed.shiplightConfig()
- 带有定位器的ACTION语句重放速度比DRAFT快约10倍。始终优先使用ACTION。
- 使用了解页面状态。始终先读取DOM文件——它提供
inspect_page所需的元素索引,且消耗的token远少于截图。仅当特别需要视觉信息(布局、颜色、图片)时才查看截图,因为截图消耗的token远多于DOM。act - 运行特定项目的测试:
npx shiplight test --headed my-saas-app/ - 文件会被
.env自动识别——无需手动配置dotenv。shiplightConfig()