qa

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

QA

QA

Drive a website with a real browser, judge how well it does the thing the user asked about, and return a score from 1 (broken) to 5 (excellent) with evidence. The deliverable is a verdict, not a screenshot dump.
使用真实浏览器驱动网站,判断其在用户要求的功能上表现如何,并返回1(失效)至5(优秀)的评分及相关证据。交付成果为评估结论,而非截图堆砌。

Inputs

输入信息

From the user's invocation (the text after
/qa
, or their message):
  • Target — a URL (
    https://…
    ) or a local dev server (
    localhost:5173
    ,
    :3000
    , "the app on 5173"). Required — if absent, ask for it before doing anything else.
  • What to test (optional) — a flow or focus ("the signup", "search + filters"). If omitted, test the most obvious happy path and say so in the report.
来自用户的调用指令(
/qa
后的文本,或用户的消息):
  • 测试目标 — 一个URL(
    https://…
    )或本地开发服务器(
    localhost:5173
    :3000
    、“5173端口上的应用”)。必填项 — 若缺失,在执行任何操作前先向用户索要。
  • 测试内容(可选) — 某个流程或测试重点(“注册流程”、“搜索+筛选功能”)。若未指定,则测试最常见的正常流程,并在报告中说明。

Can you see images? (decide this first)

能否查看图片?(需先判断)

This skill's verdict is visual — you judge the app by looking at screenshots. So before anything else, check whether you — the agent running this skill — can actually see images:
  • You have vision (multimodal / image input) → you can judge screenshots yourself. Continue to "Single flow vs. fan-out" below and choose by scale.
  • You have no vision (text-only model, no image support) → you cannot judge screenshots, and neither can same-model subagents you'd spawn. You must hand the visual judgment to Browser Use v2 cloud agents, whose own LLM looks at the page server-side and returns a text verdict (
    judge
    pass/fail + a 1–5
    structuredOutput
    ). Use v2 for every flow — even a single one — per
    references/browser-use-v2.md
    . Do not drive
    browser-harness
    yourself to read screenshots, and do not fan out to your own (equally blind) subagents. The single-flow-vs-fan-out choice below does not apply to you — it's v2 either way.
If you're unsure whether you can see images, assume you can't and use v2.
本技能的评估结论是基于视觉判断的 — 你需要通过截图来评判应用。因此在开始任何操作前,请先确认运行本技能的Agent是否能够查看图片:
  • 具备视觉能力(多模态/支持图像输入) → 你可以自行判断截图内容。继续查看下方的“单一流程 vs 并行测试”部分,根据测试规模选择合适方式。
  • 无视觉能力(纯文本模型,不支持图像) → 你无法判断截图,且你生成的同模型子Agent也无法判断。你必须将视觉判断任务交给Browser Use v2云Agent,其内置LLM会在服务器端查看页面并返回文本形式的评估结论(
    judge
    结果:通过/失败 + 1–5分的
    structuredOutput
    )。无论测试规模如何,所有流程都必须使用v2版本,遵循
    references/browser-use-v2.md
    的说明。请勿自行使用
    browser-harness
    读取截图,也请勿生成同样无视觉能力的子Agent。下方的“单一流程 vs 并行测试”选择不适用于此情况 — 无论何种规模都使用v2版本。
若不确定是否具备视觉能力,请默认按无视觉能力处理,使用v2版本。

Single flow vs. fan-out (only if you can see images)

单一流程 vs 并行测试(仅适用于具备视觉能力的情况)

Scale the approach to the ask:
  • Testing one flow / one thing? Don't bother with subagents — drive
    browser-harness
    directly
    yourself, following
    references/methodology.md
    . That's the right, lowest-overhead tool for a single test, and it's how the rest of this skill works.
  • Testing many flows / a lot at once? Fan out to subagents — one per flow — so they run in parallel. Here the user has a choice of subagent type (ask if unclear; recommend v2):
    • Browser Use v2 cloud agents — recommended. Each flow becomes an autonomous v2 task with
      judge
      (pass/fail) +
      structuredOutput
      (1–5 score), running server-side and in parallel, returning step-by-step screenshot evidence. Spends Browser Use credits (~$0.01/task + ~$0.006/step + $0.02/hr browser). Per-task flow + how to fan out:
      references/browser-use-v2.md
      .
    • Your harness's built-in subagents — spawn Claude Code subagents (the Agent tool), each driving
      browser-harness
      through
      references/methodology.md
      . No Browser Use task credits; uses your agent's own usage.
Rule of thumb (vision agents only — text-only agents use v2 for everything, see above): one flow → browser-harness directly; many flows → subagents (v2 recommended). Either way
browser-harness
is required — as the direct driver, the subagent driver, the v2 key store, and the localhost tunnel.
根据需求调整测试方式:
  • 仅测试一个流程/一项功能? 无需使用子Agent — 直接使用
    browser-harness
    驱动测试
    ,遵循
    references/methodology.md
    的说明。这是针对单一测试的最优低开销工具,也是本技能的常规运行方式。
  • 需测试多个流程/大量内容? 生成子Agent并行测试 — 每个流程对应一个子Agent。在此情况下,用户可选择子Agent类型(若不明确可询问用户;推荐使用v2版本):
    • Browser Use v2云Agent — 推荐。每个流程会成为一个独立的v2任务,包含**
      judge
      (通过/失败) +
      structuredOutput
      (1–5分评分),在服务器端
      并行运行**,并返回分步截图证据。会消耗Browser Use积分(约$0.01/任务 + ~$0.006/步骤 + $0.02/小时浏览器时长)。单任务流程及并行测试方式参考:
      references/browser-use-v2.md
    • 内置子Agent — 生成Claude Code子Agent(通过Agent工具),每个子Agent通过
      references/methodology.md
      的说明使用
      browser-harness
      驱动测试。无需消耗Browser Use任务积分;仅使用当前Agent自身的资源。
经验法则(仅适用于具备视觉能力的Agent — 纯文本Agent所有情况都使用v2版本,见上文):单一流程 → 直接使用browser-harness;多流程 → 使用子Agent(推荐v2)。无论哪种方式,
browser-harness
都是必需的 — 它可作为直接驱动工具、子Agent驱动工具、v2密钥存储工具,以及localhost隧道工具。

Dependency: browser-harness (required — install it yourself)

依赖工具:browser-harness(必需 — 自行安装)

This skill runs the test through browser-harness — a separate tool you install once. It is not optional; QA must run on a real Browser Use cloud browser, never the user's local Chrome.
Before anything else, verify it's available:
bash
command -v browser-harness && browser-harness <<'PY'
print("browser-harness OK")
PY
If
browser-harness
is not on
PATH
, install it yourself — don't make the user do it. QA runs on a cloud browser, so the CLI is all you need: none of browser-harness's local-browser setup (
chrome://inspect
,
--remote-debugging-port
, the "Allow remote debugging" popup) applies here — skip all of it. The install is one-time (~30s), no clone:
bash
command -v uv || curl -LsSf https://astral.sh/uv/install.sh | sh   # the uv installer, only if missing
uv tool install "git+https://github.com/browser-use/browser-harness"
command -v browser-harness                                         # verify it's on PATH now
(No
uv
and can't
curl | sh
? Install uv per https://docs.astral.sh/uv/getting-started/installation/ then re-run the
uv tool install
line — or
pipx install "git+https://github.com/browser-use/browser-harness"
.)
The only other thing a run needs is a
BROWSER_USE_API_KEY
, and that also auto-resolves with no human — see
references/methodology.md
step 0 (it can self-sign-up for a free key). So a fresh machine goes from "just installed
qa
" to a working test without the user doing any setup.
Last resort only — if you genuinely can't install it (no network, no Python): stop and have the user open https://www.browser-harness.com/ and paste its "prompt for LLMs" into an agent:
Set up https://github.com/browser-use/browser-harness for me. Read
install.md
first to install and connect this repo to my real browser. Then read
SKILL.md
for normal usage. Always read
helpers.py
because that is where the functions are. When you open a setup or verification tab, activate it so I can see the active browser tab. After it is installed, open this repository in my browser and, if I am logged in to GitHub, ask me whether you should star it for me as a quick demo that the interaction works — only click the star if I say yes. If I am not logged in, just go to browser-use.com.
Re-run
command -v browser-harness
and don't continue until it succeeds. Never fall back to the user's local Chrome.
Do not attempt to QA with anything other than browser-harness + a cloud browser.
本技能通过browser-harness运行测试 — 这是一个需要安装一次的独立工具。它是必备项;QA测试必须在真实的Browser Use云浏览器上运行,绝不能使用用户本地的Chrome浏览器。
在开始任何操作前,先验证工具是否可用:
bash
command -v browser-harness && browser-harness <<'PY'
print("browser-harness OK")
PY
browser-harness
不在
PATH
中,自行安装 — 不要让用户操作。QA测试在浏览器上运行,因此只需安装CLI即可:无需进行browser-harness的本地浏览器设置(
chrome://inspect
--remote-debugging-port
、“允许远程调试”弹窗) — 跳过所有这些步骤。安装为一次性操作(约30秒),无需克隆仓库:
bash
command -v uv || curl -LsSf https://astral.sh/uv/install.sh | sh   # 仅在缺失时安装uv
uv tool install "git+https://github.com/browser-use/browser-harness"
command -v browser-harness                                         # 验证是否已添加至PATH
(若无法安装uv且无法执行
curl | sh
?请按照https://docs.astral.sh/uv/getting-started/installation/的说明安装uv,然后重新运行`uv tool install
命令 — 或使用
pipx install "git+https://github.com/browser-use/browser-harness"`。)
运行测试所需的另一项是
BROWSER_USE_API_KEY
,它也会自动获取,无需人工干预 — 参见
references/methodology.md
的步骤0(可自动注册获取免费密钥)。因此,全新机器从“刚安装
qa
”到可运行测试,无需用户进行任何设置。
仅作为最后手段 — 若确实无法安装该工具(无网络、无Python环境):停止操作,让用户打开**https://www.browser-harness.com/**,并将其**“LLMs提示词”**粘贴至Agent中:
重新运行
command -v browser-harness
,确认成功后再继续。绝不要退而求其次使用用户本地的Chrome浏览器。
请勿尝试使用browser-harness + 云浏览器以外的任何方式进行QA测试。

Procedure

测试流程

  1. Confirm the target is reachable (
    curl -s -o /dev/null -w "%{http_code}" <url>
    ), and identify what the app is (title, README) so you can frame a sensible test task.
  2. Run it — first apply the vision gate from "Can you see images?" above:
    • No vision (text-only)? → use v2 cloud agents for every flow (one v2 task per flow, each with
      judge
      + a 1–5
      structuredOutput
      ), per
      references/browser-use-v2.md
      . Skip the single-flow/subagent split below — it's v2 regardless of scale. Tunnel a
      localhost
      target and pass the public
      startUrl
      .
    • Vision, one flow → drive browser-harness directly per
      references/methodology.md
      : resolve the key, tunnel localhost, and run the test loop with the field-tested gotchas (host-header rewrite, proxy-off, per-tab interstitial header, CORS-pinned APIs).
    • Vision, many flows → fan out, one subagent per flow:
      • v2 agents (recommended) → per
        references/browser-use-v2.md
        , create one task per flow (each with
        judge
        + a 1–5
        structuredOutput
        schema), poll them all, and collect the verdicts. A
        localhost
        target still needs a tunnel (the cloud agent can't reach localhost) — tunnel it and pass the public
        startUrl
        .
      • Claude subagents → spawn one Agent per flow, each following
        references/methodology.md
        on browser-harness.
    Whenever you use the v2 backend (either the no-vision case or the recommended fan-out): as each task is created, open its dashboard thread
    https://cloud.browser-use.com/thread/{sessionId}
    (the session id from the create response — not the task id) in the user's local Chrome so they can watch the agent run — the reference's
    open_local(...)
    helper does this. Always print the URL too.
  3. Tear down what you started — stop everything you opened, on every path:
    • Tunnel — if you tunneled a
      localhost
      target, kill the tunnel process. This applies to every path that tunnels, v2 included (a v2 run against localhost still starts a tunnel) — don't leave it orphaned.
    • Cloud browser — the one-flow / Claude paths drive a browser-harness cloud browser; stop it.
    • A v2 task against a public URL has nothing to tear down (its one-off session auto-closes); a v2 task against localhost still leaves the tunnel, so close that.
  4. Return the verdict: lead with
    Score: N/5
    , then task, result, what worked, issues (tagged), edge cases, and evidence — per the rubric and output format in
    references/methodology.md
    . Fanning out? Give a per-flow
    Score: N/5
    line and an overall score that reflects the weakest critical path (don't average a broken flow up because others passed).
Scale effort to the ask: a quick "does X work?" is a few interactions and one score; "thoroughly QA this" warrants more flows and edge cases. Keep the verdict honest, specific, and reproducible.
  1. 确认目标可访问(执行
    curl -s -o /dev/null -w "%{http_code}" <url>
    ),并识别应用信息(标题、README),以便制定合理的测试任务。
  2. 执行测试 — 首先根据上文“能否查看图片?”的判断选择对应方式:
    • 无视觉能力(纯文本)? → 对所有流程使用v2云Agent(每个流程对应一个v2任务,包含
      judge
      + 1–5分的
      structuredOutput
      ),遵循
      references/browser-use-v2.md
      的说明。跳过下方的单一流程/子Agent选择 — 无论规模如何都使用v2版本。为
      localhost
      目标建立隧道,并传递公开的
      startUrl
    • 具备视觉能力,单一流程 → 直接使用browser-harness驱动测试,遵循
      references/methodology.md
      的说明:解析密钥、建立localhost隧道、运行测试循环并处理经过验证的常见问题(主机头重写、关闭代理、每页插页式头部、CORS固定API)。
    • 具备视觉能力,多流程 → 生成子Agent并行测试,每个流程对应一个子Agent:
      • v2 Agent(推荐) → 遵循
        references/browser-use-v2.md
        ,为每个流程创建一个任务(每个任务包含
        judge
        + 1–5分的
        structuredOutput
        schema),轮询所有任务并收集评估结论。若目标为
        localhost
        仍需建立隧道(云Agent无法访问localhost) — 建立隧道并传递公开的
        startUrl
      • Claude子Agent → 为每个流程生成一个Agent,每个Agent遵循
        references/methodology.md
        的说明使用browser-harness。
    无论何时使用v2后端(无视觉能力情况或推荐的并行测试情况):创建每个任务时,在用户本地Chrome中打开其仪表板线程
    https://cloud.browser-use.com/thread/{sessionId}
    (创建响应中的session id — 并非任务id),以便用户可以查看Agent的运行过程 — 参考文档中的
    open_local(...)
    辅助函数可实现此操作。同时务必打印该URL。
  3. 清理资源 — 关闭所有你启动的进程,无论采用哪种测试路径:
    • 隧道 — 若为
      localhost
      目标建立了隧道,终止隧道进程。此规则适用于所有建立隧道的测试路径,包括v2版本(针对localhost的v2测试仍会启动隧道) — 不要让隧道进程成为孤儿进程。
    • 云浏览器 — 单一流程/Claude子Agent路径会驱动browser-harness云浏览器;停止该浏览器。
    • 针对公开URL的v2测试无需清理资源(其一次性会话会自动关闭);针对localhost的v2测试仍会留下隧道,因此需关闭该隧道。
  4. 返回评估结论:开头标注
    Score: N/5
    ,然后依次说明测试任务、结果、正常工作的部分、问题(标记出来)、边缘情况及证据 — 遵循
    references/methodology.md
    中的评分标准和输出格式。并行测试? 为每个流程标注
    Score: N/5
    ,并给出反映最薄弱关键路径的整体评分(不要因为其他流程通过就拉高失效流程的评分)。
根据需求调整测试力度:快速的“X功能是否可用?”只需几次交互和一个评分;“全面QA测试此应用”则需要测试更多流程和边缘情况。评估结论需真实、具体且可复现。