qa
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseQA
QA
Drive a website with a real browser, judge how well it does the thing the user asked about, and return a score from 1 (broken) to 5 (excellent) with evidence. The deliverable is a verdict, not a screenshot dump.
使用真实浏览器驱动网站,判断其在用户要求的功能上表现如何,并返回1(失效)至5(优秀)的评分及相关证据。交付成果为评估结论,而非截图堆砌。
Inputs
输入信息
From the user's invocation (the text after , or their message):
/qa- Target — a URL () or a local dev server (
https://…,localhost:5173, "the app on 5173"). Required — if absent, ask for it before doing anything else.:3000 - What to test (optional) — a flow or focus ("the signup", "search + filters"). If omitted, test the most obvious happy path and say so in the report.
来自用户的调用指令(后的文本,或用户的消息):
/qa- 测试目标 — 一个URL()或本地开发服务器(
https://…、localhost:5173、“5173端口上的应用”)。必填项 — 若缺失,在执行任何操作前先向用户索要。:3000 - 测试内容(可选) — 某个流程或测试重点(“注册流程”、“搜索+筛选功能”)。若未指定,则测试最常见的正常流程,并在报告中说明。
Can you see images? (decide this first)
能否查看图片?(需先判断)
This skill's verdict is visual — you judge the app by looking at screenshots. So before anything else, check whether you — the agent running this skill — can actually see images:
- You have vision (multimodal / image input) → you can judge screenshots yourself. Continue to "Single flow vs. fan-out" below and choose by scale.
- You have no vision (text-only model, no image support) → you cannot judge screenshots, and neither can same-model subagents you'd spawn. You must hand the visual judgment to Browser Use v2 cloud agents, whose own LLM looks at the page server-side and returns a text verdict (pass/fail + a 1–5
judge). Use v2 for every flow — even a single one — perstructuredOutput. Do not drivereferences/browser-use-v2.mdyourself to read screenshots, and do not fan out to your own (equally blind) subagents. The single-flow-vs-fan-out choice below does not apply to you — it's v2 either way.browser-harness
If you're unsure whether you can see images, assume you can't and use v2.
本技能的评估结论是基于视觉判断的 — 你需要通过截图来评判应用。因此在开始任何操作前,请先确认运行本技能的Agent是否能够查看图片:
- 具备视觉能力(多模态/支持图像输入) → 你可以自行判断截图内容。继续查看下方的“单一流程 vs 并行测试”部分,根据测试规模选择合适方式。
- 无视觉能力(纯文本模型,不支持图像) → 你无法判断截图,且你生成的同模型子Agent也无法判断。你必须将视觉判断任务交给Browser Use v2云Agent,其内置LLM会在服务器端查看页面并返回文本形式的评估结论(结果:通过/失败 + 1–5分的
judge)。无论测试规模如何,所有流程都必须使用v2版本,遵循structuredOutput的说明。请勿自行使用references/browser-use-v2.md读取截图,也请勿生成同样无视觉能力的子Agent。下方的“单一流程 vs 并行测试”选择不适用于此情况 — 无论何种规模都使用v2版本。browser-harness
若不确定是否具备视觉能力,请默认按无视觉能力处理,使用v2版本。
Single flow vs. fan-out (only if you can see images)
单一流程 vs 并行测试(仅适用于具备视觉能力的情况)
Scale the approach to the ask:
- Testing one flow / one thing? Don't bother with subagents — drive directly yourself, following
browser-harness. That's the right, lowest-overhead tool for a single test, and it's how the rest of this skill works.references/methodology.md - Testing many flows / a lot at once? Fan out to subagents — one per flow — so they run in parallel. Here the user has a choice of subagent type (ask if unclear; recommend v2):
- Browser Use v2 cloud agents — recommended. Each flow becomes an autonomous v2 task with (pass/fail) +
judge(1–5 score), running server-side and in parallel, returning step-by-step screenshot evidence. Spends Browser Use credits (~$0.01/task + ~$0.006/step + $0.02/hr browser). Per-task flow + how to fan out:structuredOutput.references/browser-use-v2.md - Your harness's built-in subagents — spawn Claude Code subagents (the Agent tool), each driving through
browser-harness. No Browser Use task credits; uses your agent's own usage.references/methodology.md
- Browser Use v2 cloud agents — recommended. Each flow becomes an autonomous v2 task with
Rule of thumb (vision agents only — text-only agents use v2 for everything, see above): one flow → browser-harness directly; many flows → subagents (v2 recommended). Either way is required — as the direct driver, the subagent driver, the v2 key store, and the localhost tunnel.
browser-harness根据需求调整测试方式:
- 仅测试一个流程/一项功能? 无需使用子Agent — 直接使用驱动测试,遵循
browser-harness的说明。这是针对单一测试的最优低开销工具,也是本技能的常规运行方式。references/methodology.md - 需测试多个流程/大量内容? 生成子Agent并行测试 — 每个流程对应一个子Agent。在此情况下,用户可选择子Agent类型(若不明确可询问用户;推荐使用v2版本):
- Browser Use v2云Agent — 推荐。每个流程会成为一个独立的v2任务,包含**(通过/失败) +
judge(1–5分评分),在服务器端并行运行**,并返回分步截图证据。会消耗Browser Use积分(约$0.01/任务 + ~$0.006/步骤 + $0.02/小时浏览器时长)。单任务流程及并行测试方式参考:structuredOutput。references/browser-use-v2.md - 内置子Agent — 生成Claude Code子Agent(通过Agent工具),每个子Agent通过的说明使用
references/methodology.md驱动测试。无需消耗Browser Use任务积分;仅使用当前Agent自身的资源。browser-harness
- Browser Use v2云Agent — 推荐。每个流程会成为一个独立的v2任务,包含**
经验法则(仅适用于具备视觉能力的Agent — 纯文本Agent所有情况都使用v2版本,见上文):单一流程 → 直接使用browser-harness;多流程 → 使用子Agent(推荐v2)。无论哪种方式,都是必需的 — 它可作为直接驱动工具、子Agent驱动工具、v2密钥存储工具,以及localhost隧道工具。
browser-harnessDependency: browser-harness (required — install it yourself)
依赖工具:browser-harness(必需 — 自行安装)
This skill runs the test through browser-harness — a separate tool you install once. It is not optional; QA must run on a real Browser Use cloud browser, never the user's local Chrome.
Before anything else, verify it's available:
bash
command -v browser-harness && browser-harness <<'PY'
print("browser-harness OK")
PYIf is not on , install it yourself — don't make the user do it. QA runs on a cloud browser, so the CLI is all you need: none of browser-harness's local-browser setup (, , the "Allow remote debugging" popup) applies here — skip all of it. The install is one-time (~30s), no clone:
browser-harnessPATHchrome://inspect--remote-debugging-portbash
command -v uv || curl -LsSf https://astral.sh/uv/install.sh | sh # the uv installer, only if missing
uv tool install "git+https://github.com/browser-use/browser-harness"
command -v browser-harness # verify it's on PATH now(No and can't ? Install uv per https://docs.astral.sh/uv/getting-started/installation/ then re-run the line — or .)
uvcurl | shuv tool installpipx install "git+https://github.com/browser-use/browser-harness"The only other thing a run needs is a , and that also auto-resolves with no human — see step 0 (it can self-sign-up for a free key). So a fresh machine goes from "just installed " to a working test without the user doing any setup.
BROWSER_USE_API_KEYreferences/methodology.mdqaLast resort only — if you genuinely can't install it (no network, no Python): stop and have the user open https://www.browser-harness.com/ and paste its "prompt for LLMs" into an agent:
Set up https://github.com/browser-use/browser-harness for me. Readfirst to install and connect this repo to my real browser. Then readinstall.mdfor normal usage. Always readSKILL.mdbecause that is where the functions are. When you open a setup or verification tab, activate it so I can see the active browser tab. After it is installed, open this repository in my browser and, if I am logged in to GitHub, ask me whether you should star it for me as a quick demo that the interaction works — only click the star if I say yes. If I am not logged in, just go to browser-use.com.helpers.py
Re-run and don't continue until it succeeds. Never fall back to the user's local Chrome.
command -v browser-harnessDo not attempt to QA with anything other than browser-harness + a cloud browser.
本技能通过browser-harness运行测试 — 这是一个需要安装一次的独立工具。它是必备项;QA测试必须在真实的Browser Use云浏览器上运行,绝不能使用用户本地的Chrome浏览器。
在开始任何操作前,先验证工具是否可用:
bash
command -v browser-harness && browser-harness <<'PY'
print("browser-harness OK")
PY若不在中,自行安装 — 不要让用户操作。QA测试在云浏览器上运行,因此只需安装CLI即可:无需进行browser-harness的本地浏览器设置(、、“允许远程调试”弹窗) — 跳过所有这些步骤。安装为一次性操作(约30秒),无需克隆仓库:
browser-harnessPATHchrome://inspect--remote-debugging-portbash
command -v uv || curl -LsSf https://astral.sh/uv/install.sh | sh # 仅在缺失时安装uv
uv tool install "git+https://github.com/browser-use/browser-harness"
command -v browser-harness # 验证是否已添加至PATH(若无法安装uv且无法执行?请按照https://docs.astral.sh/uv/getting-started/installation/的说明安装uv,然后重新运行`uv tool installpipx install "git+https://github.com/browser-use/browser-harness"`。)
curl | sh命令 — 或使用运行测试所需的另一项是,它也会自动获取,无需人工干预 — 参见的步骤0(可自动注册获取免费密钥)。因此,全新机器从“刚安装”到可运行测试,无需用户进行任何设置。
BROWSER_USE_API_KEYreferences/methodology.mdqa仅作为最后手段 — 若确实无法安装该工具(无网络、无Python环境):停止操作,让用户打开**https://www.browser-harness.com/**,并将其**“LLMs提示词”**粘贴至Agent中:
重新运行,确认成功后再继续。绝不要退而求其次使用用户本地的Chrome浏览器。
command -v browser-harness请勿尝试使用browser-harness + 云浏览器以外的任何方式进行QA测试。
Procedure
测试流程
-
Confirm the target is reachable (), and identify what the app is (title, README) so you can frame a sensible test task.
curl -s -o /dev/null -w "%{http_code}" <url> -
Run it — first apply the vision gate from "Can you see images?" above:
- No vision (text-only)? → use v2 cloud agents for every flow (one v2 task per flow, each with + a 1–5
judge), perstructuredOutput. Skip the single-flow/subagent split below — it's v2 regardless of scale. Tunnel areferences/browser-use-v2.mdtarget and pass the publiclocalhost.startUrl - Vision, one flow → drive browser-harness directly per : resolve the key, tunnel localhost, and run the test loop with the field-tested gotchas (host-header rewrite, proxy-off, per-tab interstitial header, CORS-pinned APIs).
references/methodology.md - Vision, many flows → fan out, one subagent per flow:
- v2 agents (recommended) → per , create one task per flow (each with
references/browser-use-v2.md+ a 1–5judgeschema), poll them all, and collect the verdicts. AstructuredOutputtarget still needs a tunnel (the cloud agent can't reach localhost) — tunnel it and pass the publiclocalhost.startUrl - Claude subagents → spawn one Agent per flow, each following on browser-harness.
references/methodology.md
- v2 agents (recommended) → per
Whenever you use the v2 backend (either the no-vision case or the recommended fan-out): as each task is created, open its dashboard thread(the session id from the create response — not the task id) in the user's local Chrome so they can watch the agent run — the reference'shttps://cloud.browser-use.com/thread/{sessionId}helper does this. Always print the URL too.open_local(...) - No vision (text-only)? → use v2 cloud agents for every flow (one v2 task per flow, each with
-
Tear down what you started — stop everything you opened, on every path:
- Tunnel — if you tunneled a target, kill the tunnel process. This applies to every path that tunnels, v2 included (a v2 run against localhost still starts a tunnel) — don't leave it orphaned.
localhost - Cloud browser — the one-flow / Claude paths drive a browser-harness cloud browser; stop it.
- A v2 task against a public URL has nothing to tear down (its one-off session auto-closes); a v2 task against localhost still leaves the tunnel, so close that.
- Tunnel — if you tunneled a
-
Return the verdict: lead with, then task, result, what worked, issues (tagged), edge cases, and evidence — per the rubric and output format in
Score: N/5. Fanning out? Give a per-flowreferences/methodology.mdline and an overall score that reflects the weakest critical path (don't average a broken flow up because others passed).Score: N/5
Scale effort to the ask: a quick "does X work?" is a few interactions and one score; "thoroughly QA this" warrants more flows and edge cases. Keep the verdict honest, specific, and reproducible.
-
确认目标可访问(执行),并识别应用信息(标题、README),以便制定合理的测试任务。
curl -s -o /dev/null -w "%{http_code}" <url> -
执行测试 — 首先根据上文“能否查看图片?”的判断选择对应方式:
- 无视觉能力(纯文本)? → 对所有流程使用v2云Agent(每个流程对应一个v2任务,包含+ 1–5分的
judge),遵循structuredOutput的说明。跳过下方的单一流程/子Agent选择 — 无论规模如何都使用v2版本。为references/browser-use-v2.md目标建立隧道,并传递公开的localhost。startUrl - 具备视觉能力,单一流程 → 直接使用browser-harness驱动测试,遵循的说明:解析密钥、建立localhost隧道、运行测试循环并处理经过验证的常见问题(主机头重写、关闭代理、每页插页式头部、CORS固定API)。
references/methodology.md - 具备视觉能力,多流程 → 生成子Agent并行测试,每个流程对应一个子Agent:
- v2 Agent(推荐) → 遵循,为每个流程创建一个任务(每个任务包含
references/browser-use-v2.md+ 1–5分的judgeschema),轮询所有任务并收集评估结论。若目标为structuredOutput仍需建立隧道(云Agent无法访问localhost) — 建立隧道并传递公开的localhost。startUrl - Claude子Agent → 为每个流程生成一个Agent,每个Agent遵循的说明使用browser-harness。
references/methodology.md
- v2 Agent(推荐) → 遵循
无论何时使用v2后端(无视觉能力情况或推荐的并行测试情况):创建每个任务时,在用户本地Chrome中打开其仪表板线程(创建响应中的session id — 并非任务id),以便用户可以查看Agent的运行过程 — 参考文档中的https://cloud.browser-use.com/thread/{sessionId}辅助函数可实现此操作。同时务必打印该URL。open_local(...) - 无视觉能力(纯文本)? → 对所有流程使用v2云Agent(每个流程对应一个v2任务,包含
-
清理资源 — 关闭所有你启动的进程,无论采用哪种测试路径:
- 隧道 — 若为目标建立了隧道,终止隧道进程。此规则适用于所有建立隧道的测试路径,包括v2版本(针对localhost的v2测试仍会启动隧道) — 不要让隧道进程成为孤儿进程。
localhost - 云浏览器 — 单一流程/Claude子Agent路径会驱动browser-harness云浏览器;停止该浏览器。
- 针对公开URL的v2测试无需清理资源(其一次性会话会自动关闭);针对localhost的v2测试仍会留下隧道,因此需关闭该隧道。
- 隧道 — 若为
-
返回评估结论:开头标注,然后依次说明测试任务、结果、正常工作的部分、问题(标记出来)、边缘情况及证据 — 遵循
Score: N/5中的评分标准和输出格式。并行测试? 为每个流程标注references/methodology.md,并给出反映最薄弱关键路径的整体评分(不要因为其他流程通过就拉高失效流程的评分)。Score: N/5
根据需求调整测试力度:快速的“X功能是否可用?”只需几次交互和一个评分;“全面QA测试此应用”则需要测试更多流程和边缘情况。评估结论需真实、具体且可复现。