browser-qa

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser QA

浏览器QA

You're about to drive a real browser to verify a feature. This skill exists because there are three browser stacks available (Playwright MCP, Claude-in-Chrome MCP, computer-use MCP), they have very different failure modes, and picking the wrong one for the wrong stage of work burns wall clock for no reason.
你将操控真实浏览器来验证功能。这个技能的存在是因为目前有三种浏览器工具栈可用(Playwright MCP、Claude-in-Chrome MCP、computer-use MCP),它们的故障模式差异很大,若在错误的工作阶段选择了错误的工具栈,只会白白浪费时间。

First-time setup — DO THIS BEFORE ANYTHING ELSE

首次设置——务必在开始前完成

You will be blocked mid-task if you wait until you need a permission to ask for it. Ask for everything up front, in parallel, in the very first message of the QA phase:
  1. Request computer-use access for Chrome by calling
    mcp__computer-use__request_access
    with
    apps: ["Google Chrome"]
    and a one-sentence reason. Chrome is a tier-"read" app — you'll be able to take screenshots through the OS compositor but not click. That's exactly what you want: real-Chrome screenshots without having to fight focus.
  2. Verify the Claude-in-Chrome extension is connected by calling
    mcp__claude-in-chrome__tabs_context_mcp
    with
    createIfEmpty: true
    . If this returns an error, the extension isn't installed/enabled — stop and ask the user to install it rather than falling back to a worse stack.
  3. Verify Playwright MCP works by calling
    mcp__playwright__browser_close
    once (no-op if no tab; surfaces "browser already in use" lock errors early so you can
    pkill -f "mcp-chrome-"
    before they bite you mid-task).
Do all three in parallel in one message. If anything is missing, surface it immediately — don't start QA half-blind.
如果你等到需要权限时才去申请,任务中途会被卡住。请在QA阶段的第一条消息中,并行请求所有所需权限:
  1. 请求Chrome的computer-use访问权限:调用
    mcp__computer-use__request_access
    ,参数
    apps: ["Google Chrome"]
    并附上一句话说明原因。Chrome属于“只读”层级应用——你可以通过系统 compositor 截取屏幕,但无法进行点击操作。这正是我们需要的:无需处理焦点问题即可获取真实Chrome的截图。
  2. 验证Claude-in-Chrome扩展已连接:调用
    mcp__claude-in-chrome__tabs_context_mcp
    ,参数
    createIfEmpty: true
    。如果返回错误,说明扩展未安装/未启用——请立即停止并要求用户安装,不要退而求其次使用更差的工具栈。
  3. 验证Playwright MCP可用:调用一次
    mcp__playwright__browser_close
    (如果没有打开的标签页则无操作;可提前发现“浏览器已在使用中”的锁定错误,以便在任务中途受影响前执行
    pkill -f "mcp-chrome-"
    )。
请在一条消息中并行完成以上三项操作。如果有任何缺失,请立即告知——不要在信息不全的情况下开始QA测试。

When to use which stack

不同场景下的工具栈选择

Use Playwright MCP for iterative dev-loop QA — every "did this fix work?" check while you're still building. Reasons: snapshot-aware
wait_for
, fast happy paths (~10 s for a navigate + click + screenshot), failures are at the prompt layer (wrong URL, short wait threshold) and recoverable by retrying smarter. Headless Chromium screenshots are dense, clinical, and ideal for "is the UI correct?" reviews.
Use Claude-in-Chrome + computer-use for end-of-task artefacts — PR screenshots, demo GIFs, customer bug repros. Reasons: the captured surface is the real Chrome the user sees (tab strip, profile avatar, extensions, DEV stripe, the "Claude is active in this tab group" pill). That matches reviewer expectations. Don't use it for the inner dev loop — it's slower and has runtime-layer failures you can't fix from a prompt.
Use computer-use directly only for native desktop apps — anything that isn't a web app. For browsers, prefer the dedicated MCPs above; computer-use's role in browser QA is just the screenshot grab on top of CiC.
迭代开发循环QA使用Playwright MCP——在开发过程中每次检查“这个修复是否有效?”时使用。原因:支持快照感知的
wait_for
方法,成功路径速度快(导航+点击+截图约10秒),故障出现在提示层(错误URL、等待阈值过短),可通过更智能的重试恢复。无头Chromium截图清晰、客观,非常适合“UI是否正确?”的评审。
任务收尾阶段的成果交付使用Claude-in-Chrome + computer-use——PR截图、演示GIF、客户Bug复现。原因:捕获的界面是用户实际看到的真实Chrome(标签栏、个人资料头像、扩展程序、DEV条纹、“Claude在此标签组中活跃”标识),符合评审者的预期。不要在内部开发循环中使用它——速度较慢,且存在无法通过提示修复的运行时层故障。
仅在原生桌面应用中直接使用computer-use——任何非Web应用场景。对于浏览器,优先使用上述专用MCP;computer-use在浏览器QA中的作用只是在Claude-in-Chrome基础上进行截图抓取。

Failure modes by stack (so you don't relearn them)

各工具栈的故障模式(帮你避免踩坑)

Playwright MCP failures — all fixable from your side:
  • Stale browser lock from a previous session —
    pkill -f "mcp-chrome-"
    then retry.
  • 404s from guessing URL paths — read the project's routing config (Next.js
    pages/
    /
    app/
    , React Router, file-based routers, etc.) before navigating. Don't assume the URL pattern from the feature name.
  • browser_take_screenshot
    rejects subdir paths and
    /tmp/
    ("outside allowed roots") — use a flat filename in cwd, then
    mv
    it after.
  • wait_for
    text undershoots portal-based components (Radix, Headless UI, Chakra v3, MUI Modal). Wait for the content of the portal (e.g. a field label, a heading inside the dialog), not the trigger that opened it.
  • Clicking visible text via
    evaluate
    often doesn't trigger the right handler. Many component libraries (Chakra, Radix, shadcn/ui, MUI) attach click handlers to inner elements like a
    <button>
    or an icon trigger, not to the table row or list item the user visually sees. Walk up/down the DOM to find the actual interactive element — usually a
    <button>
    or
    [role="button"]
    — and click that.
  • Cookie set in
    evaluate
    may be lost across navigations. If your app redirects unauthenticated requests, navigate first to a non-redirecting endpoint (a static asset, a JSON API route, anything that returns 200 without bouncing), set the cookie there, then navigate to the target.
Claude-in-Chrome failures — runtime-layer, harder to recover:
  • Backgrounded-tab throttling. Chrome aggressively pauses background tabs. If your CiC tab isn't the foreground tab in the user's Chrome window, React/Next will not hydrate,
    document.querySelectorAll('p').length
    will sit at 0, and your polling JS will return
    { timeout: true }
    indefinitely. Once stuck, the next
    javascript_tool
    call typically hits a 45 s CDP
    Runtime.evaluate
    timeout. Mitigation: take a screenshot via
    mcp__claude-in-chrome__computer
    action
    screenshot
    before you start polling — that brings the tab to the front. Or open the LangWatch tab in its own Chrome window the user has visible.
  • Content filter redacts page text.
    document.body.innerText
    and
    outerHTML
    come back as
    [BLOCKED: Cookie/query string data]
    when the page contains anything that looks like session state. Use targeted
    querySelectorAll
    checks instead (
    Array.from(document.querySelectorAll('p')).some(p => p.innerText === 'X')
    ).
  • Cookie reads blocked, cookie writes work. Don't try to read
    document.cookie
    to verify auth. Instead
    fetch('/api/auth/session').then(r => r.json())
    and check
    user.email
    .
  • Query-string URLs sometimes fail to render. Direct navigation to
    ?drawer.open=...
    URLs occasionally drops the query string. Click through the UI flow instead.
  • Distorted/blank screenshots. When the viewport hasn't been laid out (because the tab was throttled),
    computer-use__screenshot
    returns a blank gradient. If that happens, the tab is frozen — re-navigate, don't retry the screenshot.
  • First screenshot of a fresh CiC session shows loading spinners. The polling loop fires before tRPC queries resolve. Wait for actual content (e.g. a row label), not just
    document.title
    .
Playwright MCP故障——均可通过你的操作修复:
  • 前一次会话遗留的浏览器锁定问题——执行
    pkill -f "mcp-chrome-"
    后重试。
  • 猜测URL路径导致的404错误——导航前先查看项目的路由配置(Next.js的
    pages/
    /
    app/
    、React Router、基于文件的路由等)。不要根据功能名称假设URL模式。
  • browser_take_screenshot
    拒绝子目录路径和
    /tmp/
    (“超出允许的根目录”)——使用当前工作目录下的扁平文件名,之后再用
    mv
    命令移动。
  • wait_for
    文本无法识别基于portal的组件(Radix、Headless UI、Chakra v3、MUI Modal)。等待portal的内容(例如对话框内的字段标签、标题),而非触发打开portal的元素。
  • 通过
    evaluate
    点击可见文本通常无法触发正确的处理程序。许多组件库(Chakra、Radix、shadcn/ui、MUI)将点击处理程序附加到内部元素(如
    <button>
    或图标触发器),而非用户视觉上看到的表格行或列表项。遍历DOM找到实际的交互元素——通常是
    <button>
    [role="button"]
    ——然后点击该元素。
  • evaluate
    中设置的Cookie可能在导航后丢失。如果你的应用会对未认证请求进行重定向,请先导航到一个不会重定向的端点(静态资源、JSON API路由等任何返回200状态码而不跳转的页面),在此处设置Cookie,然后再导航到目标页面。
Claude-in-Chrome故障——运行时层问题,较难恢复:
  • 后台标签页节流。Chrome会主动暂停后台标签页。如果你的CiC标签页不是用户Chrome窗口中的前台标签页,React/Next将不会进行 hydration,
    document.querySelectorAll('p').length
    会一直为0,你的轮询JS会无限返回
    { timeout: true }
    。一旦陷入此状态,下一次
    javascript_tool
    调用通常会遇到45秒的CDP
    Runtime.evaluate
    超时。缓解方法:开始轮询前,通过
    mcp__claude-in-chrome__computer
    screenshot
    操作截取屏幕——这会将标签页切换到前台。或者在用户可见的独立Chrome窗口中打开LangWatch标签页。
  • 内容过滤器屏蔽页面文本。当页面包含任何类似会话状态的内容时,
    document.body.innerText
    outerHTML
    会返回
    [BLOCKED: Cookie/query string data]
    。改用针对性的
    querySelectorAll
    检查(例如
    Array.from(document.querySelectorAll('p')).some(p => p.innerText === 'X')
    )。
  • Cookie读取被阻止,但写入正常。不要尝试读取
    document.cookie
    来验证认证状态。而是使用
    fetch('/api/auth/session').then(r => r.json())
    并检查
    user.email
  • 带查询字符串的URL有时无法渲染。直接导航到
    ?drawer.open=...
    这类URL偶尔会丢失查询字符串。改为通过UI流程点击进入。
  • 截图变形/空白。当视口未完成布局(因为标签页被节流)时,
    computer-use__screenshot
    会返回空白渐变。如果发生这种情况,说明标签页已冻结——重新导航,不要重试截图。
  • 新CiC会话的第一张截图显示加载中状态。轮询循环在tRPC查询完成前就已触发。等待实际内容(例如行标签),而不仅仅是
    document.title

Performance expectations

性能预期

From a 3-runs-each benchmark on a real "settings → open drawer → screenshot" QA flow:
StackHappy pathWorst observedFailure modes
Playwright~12 s~60 s (debug)Prompt-layer, retryable
CiC + CU~9 s warm~280 s frozenRuntime-layer, sometimes terminal
Tool-call counts are identical on the happy path (8–9 calls). The difference is what each call returns and how often the runtime hangs. CiC's warm happy path is faster than PW's, but its tail is much worse — one CDP timeout costs you 45 s.
基于真实“设置→打开抽屉→截图”QA流程的三次运行基准测试:
工具栈成功路径耗时最差观测耗时故障模式
Playwright~12秒~60秒(调试)提示层故障,可重试
CiC + CU~9秒(热启动)~280秒(冻结)运行时层故障,有时无法恢复
成功路径下的工具调用次数相同(8–9次)。差异在于每次调用的返回结果以及运行时挂起的频率。CiC的热启动成功路径比Playwright更快,但尾部延迟严重得多——一次CDP超时会耗费45秒。

The QA flow itself

QA测试流程

  1. Pick a dev server port that isn't fighting other agents or hard-coded auth callbacks. If the project uses an external auth provider (Auth0, Clerk, NextAuth with OAuth, Supabase, Cognito, etc.), pick a port that's already in the callback allowlist — making up an arbitrary port will silently fail the redirect. Otherwise, pick something out of the way (e.g. high four-digit) so you don't collide with whatever else the user has running.
  2. Seed test data via a script, not the UI. Write a small setup script that creates whatever rows you need (user, session token, sample records) directly via the project's DB client, ORM, or seed mechanism. Faster than clicking through onboarding, repeatable across runs, and survives session expiry mid-task.
  3. For Playwright runs: navigate →
    wait_for
    actual content → click →
    wait_for
    next content → screenshot. Always wait for something the user would see, not just
    document.readyState
    .
  4. For CiC runs:
    tabs_context_mcp
    tabs_create_mcp
    navigate
    → take a screenshot immediately to defeat background-tab throttling → poll for content → click → poll for next content → screenshot.
  5. Take screenshots at the moments a reviewer would care about — initial state, mid-flow, final state. Three is usually enough; ten is noise.
  6. Verify the feature, then verify the unhappy paths. "Happy path works" and "the obvious error case shows a clear message" and "form validation rejects bad input". The bug is almost always in the path you didn't QA.
  7. Don't claim the feature works until you saw it work in a browser screenshot. Tests passing is necessary, not sufficient.
  1. 选择不与其他代理或硬编码认证回调冲突的开发服务器端口。如果项目使用外部认证提供商(Auth0、Clerk、带OAuth的NextAuth、Supabase、Cognito等),选择已在回调允许列表中的端口——随意指定端口会导致重定向静默失败。否则,选择一个不常用的端口(例如四位高位端口),避免与用户正在运行的其他服务冲突。
  2. 通过脚本而非UI植入测试数据。编写一个小型设置脚本,直接通过项目的DB客户端、ORM或种子机制创建所需的测试数据(用户、会话令牌、示例记录)。比点击引导流程更快,可重复运行,且不会因任务中途会话过期而失效。
  3. Playwright运行流程:导航→
    wait_for
    实际内容→点击→
    wait_for
    下一个内容→截图。始终等待用户能看到的内容,而非仅等待
    document.readyState
  4. CiC运行流程
    tabs_context_mcp
    tabs_create_mcp
    →导航→立即截取屏幕以避免后台标签页节流→轮询内容→点击→轮询下一个内容→截图。
  5. 在评审者关心的时刻截图——初始状态、流程中状态、最终状态。通常三张足够;十张则过于冗余。
  6. 验证功能,同时验证异常路径。“成功路径可用”且“明显的错误场景显示清晰提示”且“表单验证拒绝无效输入”。Bug几乎总是出现在你未测试的路径中。
  7. 除非在浏览器截图中看到功能正常运行,否则不要声称功能可用。测试通过是必要条件,但并非充分条件。

Screenshot handling

截图处理

  • Never commit screenshots to the repo unless they're explicitly user-facing docs assets. Put them in
    .claude/
    ,
    bench/
    , or another gitignored location.
  • Upload to
    https://img402.dev/
    for PR comments and bug reports:
    bash
    curl -F image=@screenshot.png https://img402.dev/api/free
    Returns a URL you can drop into a PR body / Slack message.
  • Embed the URLs in the PR description, not as committed files.
  • 切勿将截图提交到代码仓库,除非它们是明确面向用户的文档资源。将其放在
    .claude/
    bench/
    或其他已被git忽略的目录中。
  • **上传到
    https://img402.dev/
    **用于PR评论和Bug报告:
    bash
    curl -F image=@screenshot.png https://img402.dev/api/free
    返回的URL可直接粘贴到PR描述/Slack消息中。
  • 将URL嵌入PR描述中,而非作为提交文件。

Ending the QA phase

QA阶段结束标准

You are done with browser QA when you can answer all of these "yes":
  • I navigated through the feature like a user would, not just to the screen that proves my code path runs.
  • I tried the unhappy paths (missing config, bad input, network failure simulation if relevant).
  • I have screenshots of the happy path and the most important edge case.
  • The screenshots are uploaded and linked from the PR.
  • I noticed at least one rough UX edge during QA and either fixed it or filed it.
If you can't say yes to all of those, you haven't QA'd yet — you've smoke-tested. Go back and use the feature.
当你能对以下所有问题回答“是”时,浏览器QA测试即完成:
  • 我以用户的方式浏览了整个功能,而非仅查看能证明我的代码路径运行的页面。
  • 我测试了异常路径(缺失配置、无效输入、相关的网络故障模拟)。
  • 我拥有成功路径以及最重要边缘场景的截图。
  • 截图已上传并链接到PR中。
  • 我在QA过程中至少发现了一个用户体验瑕疵,并已修复或提交了问题。
如果你无法对所有问题回答“是”,说明你还未完成QA测试——只是进行了冒烟测试。请返回并重新使用该功能进行测试。",