QA

Drive a website with a real browser, judge how well it does the thing the user asked about, and return a score from 1 (broken) to 5 (excellent) with evidence. The deliverable is a verdict, not a screenshot dump.

Inputs

From the user's invocation (the text after

/qa

, or their message):

Target — a URL (
```
https://…
```
) or a local dev server (
```
localhost:5173
```
,
```
:3000
```
, "the app on 5173"). Required — if absent, ask for it before doing anything else.
What to test (optional) — a flow or focus ("the signup", "search + filters"). If omitted, test the most obvious happy path and say so in the report.

Can you see images? (decide this first)

This skill's verdict is visual — you judge the app by looking at screenshots. So before anything else, check whether you — the agent running this skill — can actually see images:

You have vision (multimodal / image input) → you can judge screenshots yourself. Continue to "Single flow vs. fan-out" below and choose by scale.
You have no vision (text-only model, no image support) → you cannot judge screenshots, and neither can same-model subagents you'd spawn. You must hand the visual judgment to Browser Use v2 cloud agents, whose own LLM looks at the page server-side and returns a text verdict (
```
judge
```
pass/fail + a 1–5
```
structuredOutput
```
). Use v2 for every flow — even a single one — per
```
references/browser-use-v2.md
```
. Do not drive
```
browser-harness
```
yourself to read screenshots, and do not fan out to your own (equally blind) subagents. The single-flow-vs-fan-out choice below does not apply to you — it's v2 either way.

If you're unsure whether you can see images, assume you can't and use v2.

Single flow vs. fan-out (only if you can see images)

Scale the approach to the ask:

Testing one flow / one thing? Don't bother with subagents — drive
browser-harness
directly yourself, following
```
references/methodology.md
```
. That's the right, lowest-overhead tool for a single test, and it's how the rest of this skill works.
Testing many flows / a lot at once? Fan out to subagents — one per flow — so they run in parallel. Here the user has a choice of subagent type (ask if unclear; recommend v2):
- Browser Use v2 cloud agents — recommended. Each flow becomes an autonomous v2 task with judge
  (pass/fail) + structuredOutput
  (1–5 score), running server-side and in parallel, returning step-by-step screenshot evidence. Spends Browser Use credits (~$0.01/task + ~$0.006/step + $0.02/hr browser). Per-task flow + how to fan out:
```
references/browser-use-v2.md
```
  .
- Your harness's built-in subagents — spawn Claude Code subagents (the Agent tool), each driving
```
browser-harness
```
  through
```
references/methodology.md
```
  . No Browser Use task credits; uses your agent's own usage.

Rule of thumb (vision agents only — text-only agents use v2 for everything, see above): one flow → browser-harness directly; many flows → subagents (v2 recommended). Either way

browser-harness

is required — as the direct driver, the subagent driver, the v2 key store, and the localhost tunnel.

Dependency: browser-harness (required — install it yourself)

This skill runs the test through browser-harness — a separate tool you install once. It is not optional; QA must run on a real Browser Use cloud browser, never the user's local Chrome.

Before anything else, verify it's available:

bash

command -v browser-harness && browser-harness <<'PY'
print("browser-harness OK")
PY

browser-harness

is not on

PATH

, install it yourself — don't make the user do it. QA runs on a cloud browser, so the CLI is all you need: none of browser-harness's local-browser setup (

chrome://inspect

--remote-debugging-port

, the "Allow remote debugging" popup) applies here — skip all of it. The install is one-time (~30s), no clone:

bash

command -v uv || curl -LsSf https://astral.sh/uv/install.sh | sh   # the uv installer, only if missing
uv tool install "git+https://github.com/browser-use/browser-harness"
command -v browser-harness                                         # verify it's on PATH now

(No

uv

and can't

curl | sh

? Install uv per https://docs.astral.sh/uv/getting-started/installation/ then re-run the

uv tool install

line — or

pipx install "git+https://github.com/browser-use/browser-harness"

The only other thing a run needs is a

BROWSER_USE_API_KEY

, and that also auto-resolves with no human — see

references/methodology.md

step 0 (it can self-sign-up for a free key). So a fresh machine goes from "just installed

qa

" to a working test without the user doing any setup.

Last resort only — if you genuinely can't install it (no network, no Python): stop and have the user open https://www.browser-harness.com/ and paste its "prompt for LLMs" into an agent:

Set up https://github.com/browser-use/browser-harness for me. Read
install.md
first to install and connect this repo to my real browser. Then read
SKILL.md
for normal usage. Always read
helpers.py
because that is where the functions are. When you open a setup or verification tab, activate it so I can see the active browser tab. After it is installed, open this repository in my browser and, if I am logged in to GitHub, ask me whether you should star it for me as a quick demo that the interaction works — only click the star if I say yes. If I am not logged in, just go to browser-use.com.

Re-run

command -v browser-harness

and don't continue until it succeeds. Never fall back to the user's local Chrome.

Do not attempt to QA with anything other than browser-harness + a cloud browser.

Procedure

Confirm the target is reachable (
```
curl -s -o /dev/null -w "%{http_code}" <url>
```
), and identify what the app is (title, README) so you can frame a sensible test task.
Run it — first apply the vision gate from "Can you see images?" above:
- No vision (text-only)? → use v2 cloud agents for every flow (one v2 task per flow, each with
```
judge
```
  + a 1–5
```
structuredOutput
```
  ), per
```
references/browser-use-v2.md
```
  . Skip the single-flow/subagent split below — it's v2 regardless of scale. Tunnel a
```
localhost
```
  target and pass the public
```
startUrl
```
  .
- Vision, one flow → drive browser-harness directly per
```
references/methodology.md
```
  : resolve the key, tunnel localhost, and run the test loop with the field-tested gotchas (host-header rewrite, proxy-off, per-tab interstitial header, CORS-pinned APIs).
- Vision, many flows → fan out, one subagent per flow:
  - v2 agents (recommended) → per
    references/browser-use-v2.md
    , create one task per flow (each with
    judge
    + a 1–5
    structuredOutput
    schema), poll them all, and collect the verdicts. A
    localhost
    target still needs a tunnel (the cloud agent can't reach localhost) — tunnel it and pass the public
    startUrl
    .
  - Claude subagents → spawn one Agent per flow, each following
    references/methodology.md
    on browser-harness.
Whenever you use the v2 backend (either the no-vision case or the recommended fan-out): as each task is created, open its dashboard thread
https://cloud.browser-use.com/thread/{sessionId}
(the session id from the create response — not the task id) in the user's local Chrome so they can watch the agent run — the reference's
open_local(...)
helper does this. Always print the URL too.
Tear down what you started — stop everything you opened, on every path:
- Tunnel — if you tunneled a
```
localhost
```
  target, kill the tunnel process. This applies to every path that tunnels, v2 included (a v2 run against localhost still starts a tunnel) — don't leave it orphaned.
- Cloud browser — the one-flow / Claude paths drive a browser-harness cloud browser; stop it.
- A v2 task against a public URL has nothing to tear down (its one-off session auto-closes); a v2 task against localhost still leaves the tunnel, so close that.
Return the verdict: lead with
```
Score: N/5
```
, then task, result, what worked, issues (tagged), edge cases, and evidence — per the rubric and output format in
```
references/methodology.md
```
. Fanning out? Give a per-flow
```
Score: N/5
```
line and an overall score that reflects the weakest critical path (don't average a broken flow up because others passed).

Scale effort to the ask: a quick "does X work?" is a few interactions and one score; "thoroughly QA this" warrants more flows and edge cases. Keep the verdict honest, specific, and reproducible.

qa

NPX Install

Tags

SKILL.md Content

QA

Inputs

Can you see images? (decide this first)

Single flow vs. fan-out (only if you can see images)

Dependency: browser-harness (required — install it yourself)

Procedure