playwright-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Frontend Testing

前端测试

Unlock reliable confidence fast: enable safe refactors by choosing the right test layer, making the app observable, and eliminating nondeterminism so failures are actionable.
快速获得可靠的测试信心:通过选择合适的测试层级、让应用可观测、消除不确定性,从而实现安全重构,让测试失败可定位解决。

Philosophy: Confidence Per Minute

核心理念:每分钟测试信心

Frontend tests fail for two reasons: the product is broken, or the test is lying. Your job is to maximize signal and minimize “test is lying”.
Before writing a test, ask:
  • What user risk am I covering (money, progression, auth, data loss, “can’t start” crashes)?
  • What’s the narrowest layer that catches this bug class (pure logic vs UI vs full browser)?
  • What nondeterminism exists (time, RNG, async loading, network, animations, fonts, GPU)?
  • What “ready” signal can I wait on besides
    setTimeout
    ?
  • What should a failure print/screenshot so it’s diagnosable in CI?
Core principles:
  1. Test the contract, not the implementation: assert stable user-meaningful outcomes and public seams.
  2. Prefer determinism over retries: make time/RNG/network controllable; remove flake at the source.
  3. Observe like a debugger: console errors, network failures, screenshots, and state dumps on failure.
  4. One critical flow first: a reliable smoke test beats 50 flaky tests.
前端测试失败有两个原因:产品本身存在bug,或者测试用例不可靠。你的工作是最大化有效测试信号,最小化“测试用例误报”的情况。
编写测试前,请先思考
  • 我要覆盖的用户风险是什么(资金损失、流程中断、认证问题、数据丢失、“无法启动”崩溃等)?
  • 能发现这类bug的最精简测试层级是什么(纯逻辑测试 vs UI测试 vs 全浏览器测试)?
  • 存在哪些不确定性因素(时间、随机数生成、异步加载、网络、动画、字体、GPU渲染)?
  • 除了
    setTimeout
    ,我可以等待哪些“就绪”信号?
  • 测试失败时,应该输出什么内容/截图,才能在CI环境中快速诊断问题?
核心原则
  1. 测试契约而非实现细节:断言稳定的、对用户有意义的结果,以及公开的接口。
  2. 优先保证确定性而非重试:让时间/随机数生成/网络可控制;从根源解决测试不稳定问题。
  3. 像调试器一样观测:测试失败时捕获控制台错误、网络失败、截图和状态快照。
  4. 先覆盖核心流程:一个可靠的冒烟测试胜过50个不稳定的测试用例。

Workflow Decision Tree

工作流决策树

Pick the test type by the cheapest layer that provides the needed confidence:
  • Unit tests (fastest): pure functions, reducers, validators, math, pathfinding, deterministic simulation steps.
  • Component/integration tests (medium): UI behavior with mocked IO (React Testing Library / Vue Testing Library / Testing Library DOM).
  • E2E tests (slowest, highest confidence): critical user flows across routing, storage, real bundling/runtime.
  • Visual regression (specialized): layout/pixel regressions; for canvas/WebGL, only after locking determinism.
  • A11y checks: great for DOM UIs; limited value for pure canvas unless you expose accessible DOM overlays.
选择测试类型的原则是:用成本最低的测试层级获得所需的信心:
  • 单元测试(速度最快):纯函数、状态管理器reducer、验证器、数学计算、路径查找、确定性模拟步骤。
  • 组件/集成测试(速度中等):带有模拟IO的UI行为测试(React Testing Library / Vue Testing Library / Testing Library DOM)。
  • E2E测试(速度最慢,信心最高):跨路由、存储、真实打包/运行时的关键用户流程。
  • 视觉回归测试(专用型):布局/像素级回归测试;对于canvas/WebGL应用,必须先确保确定性才能使用。
  • A11y测试:对DOM类UI非常有用;对于纯canvas应用,除非暴露可访问的DOM覆盖层,否则价值有限。

Quick Start (Any Project)

快速开始(适用于任何项目)

  1. Define 1 smoke flow: “page loads → user can start → one key action works”.
  2. Choose runner:
    • Prefer Playwright for browser E2E + screenshots.
    • Prefer Testing Library for DOM component behavior.
    • Prefer unit tests for logic you can run without a browser.
  3. Add a “ready” signal in the app (DOM marker, window flag, or game event) and wait on that.
  4. Fail loudly: treat console errors and failed requests as test failures.
  5. Stabilize: seed RNG, freeze time, fix viewport/DPR, disable animations, and remove network variability.
  1. 定义1个冒烟测试流程:“页面加载完成 → 用户可启动应用 → 一个核心操作可正常执行”。
  2. 选择测试运行器
    • 浏览器E2E测试+截图场景,优先选择Playwright
    • DOM组件行为测试,优先选择Testing Library
    • 无需浏览器即可运行的逻辑,优先选择单元测试
  3. 在应用中添加“就绪”信号(DOM标记、window全局变量或游戏事件),并在测试中等待该信号。
  4. 让测试失败更醒目:将控制台错误和请求失败视为测试失败。
  5. 稳定测试环境:设置随机数生成种子、冻结时间、固定视口/设备像素比、禁用动画、消除网络差异。

Playwright Patterns (Especially Useful For Games)

Playwright使用模式(尤其适用于游戏)

Use Playwright when you need “real browser” confidence:
  • Drive input via mouse/keyboard/touch; treat the canvas like the user does.
  • Add a test seam: expose a small, stable test API on
    window
    (read-only state + a few commands).
  • Prefer
    waitForFunction
    -style readiness over sleep; gate on “scene ready” / “assets loaded” / “first frame rendered”.
  • For screenshots: lock viewport, device scale factor, fonts, and animation timing.
  • For 9-slice / canvas UI regressions: add a dedicated UI harness scene/page and assert via targeted screenshots (see
    references/phaser-canvas-testing.md
    ).
If using the Playwright MCP tools (browser automation inside Codex), follow the same mindset:
  • Use
    browser_console_messages
    and
    browser_network_requests
    to catch silent failures.
  • Use
    browser_evaluate
    to assert
    window.__TEST__
    state and to set up deterministic mode.
  • Use
    browser_take_screenshot
    for visual assertions after determinism is enforced.
当你需要“真实浏览器”级别的测试信心时,使用Playwright:
  • 通过鼠标/键盘/触摸模拟用户输入;像用户一样与canvas交互。
  • 添加测试接口:在
    window
    对象上暴露一个小型、稳定的测试API(只读状态+少量命令)。
  • 优先使用
    waitForFunction
    风格的就绪等待,而非休眠;等待“场景加载完成”/“资源加载完毕”/“第一帧渲染完成”。
  • 截图时:固定视口、设备缩放因子、字体和动画时序。
  • 对于9-slice / canvas UI回归测试:添加专用的UI测试场景/页面,通过定向截图进行断言(详见
    references/phaser-canvas-testing.md
    )。
如果使用Playwright MCP工具(Codex中的浏览器自动化工具),请遵循相同的思路:
  • 使用
    browser_console_messages
    browser_network_requests
    捕获静默失败。
  • 使用
    browser_evaluate
    断言
    window.__TEST__
    状态,并设置确定性模式。
  • 在确保确定性后,使用
    browser_take_screenshot
    进行视觉断言。

Reconnaissance-Then-Action (Borrowed From Real Debugging)

先侦察再行动(借鉴真实调试思路)

When a UI is dynamic, don’t guess selectors—recon first, then act:
Quick decision guide:
Task → Is it static HTML (no JS runtime needed)?
  ├─ Yes → read the HTML to find stable selectors/content, then automate
  └─ No  → treat as dynamic: run the app, wait for readiness, then inspect rendered state
  1. Navigate and wait for readiness:
    • For many webapps: wait for a meaningful “loaded” element (preferred).
    • networkidle
      can help for SPAs, but avoid it if the app uses websockets/polling.
  2. Capture evidence (what the user actually sees):
    • screenshot (full page for DOM; targeted for canvas)
    • console errors + failed requests
  3. Discover selectors from the rendered state:
    • prefer role/text/label selectors over brittle CSS
  4. Execute actions using discovered selectors and re-check state.
Common pitfall: ❌ Inspect/interact before the app is ready. ✅ Wait on an explicit ready signal (DOM marker or
window.__TEST__.ready
), not a sleep.
当UI是动态的时,不要猜测选择器——先侦察,再操作:
快速决策指南:
任务 → 是否为静态HTML(无需JS运行时)?
  ├─ 是 → 读取HTML以找到稳定的选择器/内容,然后执行自动化操作
  └─ 否 → 视为动态内容:启动应用,等待就绪,然后检查渲染后的状态
  1. 导航并等待就绪
    • 对于大多数Web应用:等待有意义的“已加载”元素(优先选择)。
    • networkidle
      可用于单页应用,但如果应用使用websocket/轮询,请避免使用。
  2. 捕获证据(用户实际看到的内容):
    • 截图(DOM应用用全页截图;canvas应用用定向截图)
    • 控制台错误+请求失败信息
  3. 从渲染状态中发现选择器
    • 优先使用角色/文本/标签选择器,而非脆弱的CSS选择器
  4. 使用发现的选择器执行操作,并重新检查状态
常见误区: ❌ 在应用就绪前就进行检查/交互。 ✅ 等待明确的就绪信号(DOM标记或
window.__TEST__.ready
),而非休眠。

Server Lifecycle Helper (Playwright E2E)

服务器生命周期助手(Playwright E2E测试)

When the dev server isn’t already running, use the bundled helper as a black box:
  • Run
    python scripts/with_server.py --help
    first.
  • Start one (or multiple) servers, wait for their ports, then run your test command.
Example:
bash
python scripts/with_server.py --server "npm run dev" --port 5173 -- npm test
当开发服务器未提前运行时,使用内置的助手工具(无需关注内部实现):
  • 先运行
    python scripts/with_server.py --help
    查看帮助。
  • 启动一个(或多个)服务器,等待端口就绪,然后运行测试命令。
示例:
bash
python scripts/with_server.py --server "npm run dev" --port 5173 -- npm test

Flake Reduction Checklist

不稳定测试修复清单

  • Replace sleeps with explicit readiness conditions.
  • Control time (
    Date.now
    , timers), RNG, and animation loops.
  • Make network deterministic (mock, record/replay, or run against a seeded local backend).
  • Eliminate “first-run” differences (asset caches, fonts) or warm them explicitly.
  • Lock environment: viewport, DPR, locale/timezone, and rendering settings.
  • 用明确的就绪条件替换休眠。
  • 控制时间(
    Date.now
    、定时器)、随机数生成和动画循环。
  • 让网络请求可预测(模拟请求、记录/重放请求,或对接本地种子化后端)。
  • 消除“首次运行”差异(资源缓存、字体),或提前预热这些资源。
  • 锁定环境配置:视口、设备像素比、区域/时区、渲染设置。

Anti-Patterns to Avoid

需避免的反模式

Testing the wrong layer: E2E tests for pure logic. Better: unit tests for logic; reserve E2E for integration contracts.
Testing implementation details: asserting DOM structure/classnames or internal engine objects. Better: assert user-meaningful outputs (text, navigation, score/HP changes) or a small stable test seam.
Sleep-driven tests:
wait 2s then click
. Better: wait on explicit readiness (DOM marker, event,
window
flag).
Uncontrolled randomness: RNG/time-based behaviors in assertions. Better: seed RNG, freeze time, and assert stable invariants.
Pixel snapshots without determinism (especially canvas/WebGL). Better: add deterministic mode first; then screenshot selectively.
Snapshot explosion: hundreds of snapshots that no one can interpret. Better: keep snapshots targeted (critical screens); prefer specific assertions for behavior.
Retries as a strategy: “just bump retries in CI”. Better: fix readiness and determinism; use retries only as temporary guardrails.
错误的测试层级选择:用E2E测试纯逻辑。 更好的做法:用单元测试覆盖逻辑;E2E测试仅用于集成契约。
测试实现细节:断言DOM结构/类名或引擎内部对象。 更好的做法:断言对用户有意义的输出(文本、导航、分数/生命值变化),或基于小型稳定的测试接口进行断言。
依赖休眠的测试:“等待2秒后点击”。 更好的做法:等待明确的就绪信号(DOM标记、事件、window全局变量)。
未受控的随机性:断言中包含基于随机数/时间的行为。 更好的做法:设置随机数生成种子、冻结时间,断言稳定的不变量。
未确保确定性的像素快照(尤其针对canvas/WebGL应用)。 更好的做法:先添加确定性模式;再选择性地进行截图。
快照爆炸:数百个无人能解读的快照。 更好的做法:保持快照针对性(核心界面);优先对行为进行具体断言。
依赖重试作为解决方案:“在CI中增加重试次数就行”。 更好的做法:修复就绪逻辑和确定性问题;仅将重试作为临时防护措施。

Variation Guidance (Prevent One-Size-Fits-All)

差异化指导(避免一刀切)

Vary the approach based on:
  • UI type: DOM app vs canvas/WebGL game vs hybrid.
  • Risk: core revenue/progression flows get E2E first; edge UI polish gets component tests.
  • CI constraints: headless-only, limited GPU, slow CPUs, no audio devices.
  • Test seam availability: if you can add a stable
    window.__TEST__
    API, assert state; if not, stick to black-box input/output.
根据以下因素调整测试方法:
  • UI类型:DOM应用 vs canvas/WebGL游戏 vs 混合应用。
  • 风险等级:核心收入/流程先做E2E测试;边缘UI优化做组件测试。
  • CI环境限制:仅支持无头模式、GPU资源有限、CPU速度慢、无音频设备。
  • 测试接口可用性:如果可以添加稳定的
    window.__TEST__
    API,则断言状态;否则,坚持黑盒输入/输出测试。

Remember

记住

You can make almost any frontend (including canvas/WebGL games) testable by adding a tiny, stable seam for readiness + state. This skill is meant to empower creative, high-signal testing rather than cargo-cult checklists. Aim for tests that are boring to maintain: deterministic, explicit about readiness, and rich in failure evidence. One reliable smoke test is the foundation; everything else compounds from there.
通过添加一个小型、稳定的就绪+状态测试接口,几乎所有前端应用(包括canvas/WebGL游戏)都可以被测试。这项技能旨在赋能高价值的创造性测试,而非生搬硬套 checklist。目标是打造易于维护的测试:确定性、就绪信号明确、失败证据丰富。一个可靠的冒烟测试是基础;其他所有测试都在此之上构建。

Bundled Resources

内置资源

Read these only when needed:
  • references/playwright-mcp-cheatsheet.md
    : patterns for using Playwright MCP tools for assertions, waiting, and diagnostics.
  • references/phaser-canvas-testing.md
    : deterministic mode + hooks for Phaser/canvas/WebGL games.
  • references/flake-reduction.md
    : deeper flake triage and stabilization tactics.
Use these scripts as black boxes (run
--help
first; don’t read source unless you must):
  • scripts/with_server.py
    : start/wait/stop one or more dev servers around a test command.
  • scripts/imgdiff.py
    : lightweight screenshot diff helper (requires
    pip install pillow
    ).
仅在需要时阅读以下内容:
  • references/playwright-mcp-cheatsheet.md
    :使用Playwright MCP工具进行断言、等待和诊断的模式。
  • references/phaser-canvas-testing.md
    :Phaser/canvas/WebGL游戏的确定性模式+钩子。
  • references/flake-reduction.md
    :更深入的不稳定测试分类排查和稳定化策略。
将以下脚本视为黑盒工具(先运行
--help
查看帮助;除非必要,不要阅读源码):
  • scripts/with_server.py
    :在测试命令前后启动/等待/停止一个或多个开发服务器。
  • scripts/imgdiff.py
    :轻量级截图对比助手(需要先
    pip install pillow
    )。",