Browser QA
You're about to drive a real browser to verify a feature. This skill exists because there are three browser stacks available (Playwright MCP, Claude-in-Chrome MCP, computer-use MCP), they have very different failure modes, and picking the wrong one for the wrong stage of work burns wall clock for no reason.
First-time setup — DO THIS BEFORE ANYTHING ELSE
You will be blocked mid-task if you wait until you need a permission to ask for it. Ask for everything up front, in parallel, in the very first message of the QA phase:
- Request computer-use access for Chrome by calling
mcp__computer-use__request_access
with and a one-sentence reason. Chrome is a tier-"read" app — you'll be able to take screenshots through the OS compositor but not click. That's exactly what you want: real-Chrome screenshots without having to fight focus.
- Verify the Claude-in-Chrome extension is connected by calling
mcp__claude-in-chrome__tabs_context_mcp
with . If this returns an error, the extension isn't installed/enabled — stop and ask the user to install it rather than falling back to a worse stack.
- Verify Playwright MCP works by calling
mcp__playwright__browser_close
once (no-op if no tab; surfaces "browser already in use" lock errors early so you can before they bite you mid-task).
Do all three in parallel in one message. If anything is missing, surface it immediately — don't start QA half-blind.
When to use which stack
Use Playwright MCP for iterative dev-loop QA — every "did this fix work?" check while you're still building. Reasons: snapshot-aware
, fast happy paths (~10 s for a navigate + click + screenshot), failures are at the prompt layer (wrong URL, short wait threshold) and recoverable by retrying smarter. Headless Chromium screenshots are dense, clinical, and ideal for "is the UI correct?" reviews.
Use Claude-in-Chrome + computer-use for end-of-task artefacts — PR screenshots, demo GIFs, customer bug repros. Reasons: the captured surface is the real Chrome the user sees (tab strip, profile avatar, extensions, DEV stripe, the "Claude is active in this tab group" pill). That matches reviewer expectations. Don't use it for the inner dev loop — it's slower and has runtime-layer failures you can't fix from a prompt.
Use computer-use directly only for native desktop apps — anything that isn't a web app. For browsers, prefer the dedicated MCPs above; computer-use's role in browser QA is just the screenshot grab on top of CiC.
Failure modes by stack (so you don't relearn them)
Playwright MCP failures — all fixable from your side:
- Stale browser lock from a previous session — then retry.
- 404s from guessing URL paths — read the project's routing config (Next.js /, React Router, file-based routers, etc.) before navigating. Don't assume the URL pattern from the feature name.
- rejects subdir paths and ("outside allowed roots") — use a flat filename in cwd, then it after.
- text undershoots portal-based components (Radix, Headless UI, Chakra v3, MUI Modal). Wait for the content of the portal (e.g. a field label, a heading inside the dialog), not the trigger that opened it.
- Clicking visible text via often doesn't trigger the right handler. Many component libraries (Chakra, Radix, shadcn/ui, MUI) attach click handlers to inner elements like a or an icon trigger, not to the table row or list item the user visually sees. Walk up/down the DOM to find the actual interactive element — usually a or — and click that.
- Cookie set in may be lost across navigations. If your app redirects unauthenticated requests, navigate first to a non-redirecting endpoint (a static asset, a JSON API route, anything that returns 200 without bouncing), set the cookie there, then navigate to the target.
Claude-in-Chrome failures — runtime-layer, harder to recover:
- Backgrounded-tab throttling. Chrome aggressively pauses background tabs. If your CiC tab isn't the foreground tab in the user's Chrome window, React/Next will not hydrate,
document.querySelectorAll('p').length
will sit at 0, and your polling JS will return indefinitely. Once stuck, the next call typically hits a 45 s CDP timeout. Mitigation: take a screenshot via mcp__claude-in-chrome__computer
action before you start polling — that brings the tab to the front. Or open the LangWatch tab in its own Chrome window the user has visible.
- Content filter redacts page text. and come back as
[BLOCKED: Cookie/query string data]
when the page contains anything that looks like session state. Use targeted checks instead (Array.from(document.querySelectorAll('p')).some(p => p.innerText === 'X')
).
- Cookie reads blocked, cookie writes work. Don't try to read to verify auth. Instead
fetch('/api/auth/session').then(r => r.json())
and check .
- Query-string URLs sometimes fail to render. Direct navigation to URLs occasionally drops the query string. Click through the UI flow instead.
- Distorted/blank screenshots. When the viewport hasn't been laid out (because the tab was throttled), returns a blank gradient. If that happens, the tab is frozen — re-navigate, don't retry the screenshot.
- First screenshot of a fresh CiC session shows loading spinners. The polling loop fires before tRPC queries resolve. Wait for actual content (e.g. a row label), not just .
Performance expectations
From a 3-runs-each benchmark on a real "settings → open drawer → screenshot" QA flow:
| Stack | Happy path | Worst observed | Failure modes |
|---|
| Playwright | ~12 s | ~60 s (debug) | Prompt-layer, retryable |
| CiC + CU | ~9 s warm | ~280 s frozen | Runtime-layer, sometimes terminal |
Tool-call counts are identical on the happy path (8–9 calls). The difference is what each call returns and how often the runtime hangs. CiC's warm happy path is faster than PW's, but its tail is much worse — one CDP timeout costs you 45 s.
The QA flow itself
- Pick a dev server port that isn't fighting other agents or hard-coded auth callbacks. If the project uses an external auth provider (Auth0, Clerk, NextAuth with OAuth, Supabase, Cognito, etc.), pick a port that's already in the callback allowlist — making up an arbitrary port will silently fail the redirect. Otherwise, pick something out of the way (e.g. high four-digit) so you don't collide with whatever else the user has running.
- Seed test data via a script, not the UI. Write a small setup script that creates whatever rows you need (user, session token, sample records) directly via the project's DB client, ORM, or seed mechanism. Faster than clicking through onboarding, repeatable across runs, and survives session expiry mid-task.
- For Playwright runs: navigate → actual content → click → next content → screenshot. Always wait for something the user would see, not just .
- For CiC runs: → → → take a screenshot immediately to defeat background-tab throttling → poll for content → click → poll for next content → screenshot.
- Take screenshots at the moments a reviewer would care about — initial state, mid-flow, final state. Three is usually enough; ten is noise.
- Verify the feature, then verify the unhappy paths. "Happy path works" and "the obvious error case shows a clear message" and "form validation rejects bad input". The bug is almost always in the path you didn't QA.
- Don't claim the feature works until you saw it work in a browser screenshot. Tests passing is necessary, not sufficient.
Screenshot handling
- Never commit screenshots to the repo unless they're explicitly user-facing docs assets. Put them in , , or another gitignored location.
- Upload to for PR comments and bug reports:
bash
curl -F image=@screenshot.png https://img402.dev/api/free
Returns a URL you can drop into a PR body / Slack message.
- Embed the URLs in the PR description, not as committed files.
Ending the QA phase
You are done with browser QA when you can answer all of these "yes":
- I navigated through the feature like a user would, not just to the screen that proves my code path runs.
- I tried the unhappy paths (missing config, bad input, network failure simulation if relevant).
- I have screenshots of the happy path and the most important edge case.
- The screenshots are uploaded and linked from the PR.
- I noticed at least one rough UX edge during QA and either fixed it or filed it.
If you can't say yes to all of those, you haven't QA'd yet — you've smoke-tested. Go back and use the feature.