browser-harness

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

browser-harness - self-healing LLM browser automation

browser-harness - 可自修复的LLM浏览器自动化

Keyword:
browser-harness
·
self-healing browser
·
llm browser automation
·
cdp agent
Direct WebSocket connection between an LLM agent and Chrome via Chrome DevTools Protocol. The agent can inspect the page, write helper code, reuse domain skills, and verify the task without an extra browser abstraction layer.
Browser Harness is the canonical replacement for the removed
agent-browser
skill in this catalog. Use it for clean browser verification, autonomous browser tasks, and platform-portable CDP control across Claude Code, Codex, Antigravity, Gemini CLI, and OpenCode.
关键词:
browser-harness
·
self-healing browser
·
llm browser automation
·
cdp agent
通过Chrome DevTools Protocol(CDP)在LLM Agent与Chrome之间建立直接WebSocket连接。Agent可以检查页面、编写辅助代码、复用领域技能,无需额外浏览器抽象层即可验证任务。
Browser Harness是本目录中已移除的
agent-browser
技能的标准替代方案。可用于纯净浏览器验证、自主浏览器任务,以及在Claude Code、Codex、Antigravity、Gemini CLI和OpenCode之间实现跨平台可移植的CDP控制。

When to use this skill

何时使用此技能

  • The user needs an LLM agent to complete a multi-step browser workflow: login, navigation, form fill, data extraction, download, or verification.
  • The workflow needs a clean browser profile or repeatable CDP verification instead of the user's already-open browser state.
  • The user is running from Codex CLI or Antigravity and needs a local browser harness the agent can operate with shell/Python commands.
  • Claude reports image/screenshot/tool errors when browser screenshots are written, resized, or re-opened.
  • The target DOM changes and the agent should add or repair helpers in
    agent-workspace/agent_helpers.py
    .
  • The task benefits from site-specific domain skills in
    agent-workspace/domain-skills/
    .
  • Browser Use Cloud is justified for concurrent browsers, proxies, or captcha solving on allowed targets.
  • 用户需要LLM Agent完成多步骤浏览器工作流:登录、导航、表单填写、数据提取、下载或验证。
  • 工作流需要纯净浏览器配置文件或可重复的CDP验证,而非用户已打开的浏览器状态。
  • 用户通过Codex CLI或Antigravity运行,需要Agent可通过Shell/Python命令操作的本地浏览器工具。
  • Claude在写入、调整大小或重新打开浏览器截图时报告图片/截图/工具错误。
  • 目标DOM发生变化,Agent需要在
    agent-workspace/agent_helpers.py
    中添加或修复辅助代码。
  • 任务可从
    agent-workspace/domain-skills/
    中的特定站点领域技能获益。
  • 针对允许的目标,需要并发浏览器、代理或验证码破解时,Browser Use Cloud是合理选择。

Do not use this skill when

何时不使用此技能

  • The task is simple HTML extraction without browser state or JS interaction -> route to
    scrapling
    .
  • The task is exact human UI annotation or pointing at a rendered issue -> route to
    agentation
    .
  • The task must reuse the user's already-open authenticated Chrome profile -> route to
    playwriter
    .
  • The task is React component source capture -> route to
    react-grab
    .
  • The task is ordinary Playwright/Puppeteer script authoring without agent autonomy -> use that stack directly.
  • 任务为无需浏览器状态或JS交互的简单HTML提取 → 转向
    scrapling
  • 任务为精确的人类UI标注或指向渲染问题 → 转向
    agentation
  • 任务必须复用用户已打开的已认证Chrome配置文件 → 转向
    playwriter
  • 任务为React组件源码捕获 → 转向
    react-grab
  • 任务为普通Playwright/Puppeteer脚本编写,无需Agent自主性 → 直接使用该技术栈。

Instructions

使用说明

Step 1: Choose the execution packet

步骤1:选择执行包

Pick one primary packet before writing commands:
  • local-cdp: local Chrome/Chromium with
    --remote-debugging-port=9222
    .
  • codex-cdp: Codex CLI controls the same local checkout and CDP endpoint.
  • antigravity-cdp: Antigravity (
    agy
    ) uses the same workspace and Chrome debugging endpoint.
  • claude-vision-safe: screenshot capture must use the safe image pipeline below.
  • domain-skill: add or repair a site-specific helper in
    agent-workspace/domain-skills/
    .
  • cloud-browser: Browser Use Cloud is needed and allowed.
在编写命令前选择一个主包:
  • local-cdp: 本地Chrome/Chromium,启用
    --remote-debugging-port=9222
  • codex-cdp: Codex CLI控制同一本地检出版本和CDP端点。
  • antigravity-cdp: Antigravity (
    agy
    ) 使用同一工作区和Chrome调试端点。
  • claude-vision-safe: 截图捕获必须使用下方的安全图片管道。
  • domain-skill: 在
    agent-workspace/domain-skills/
    中添加或修复特定站点辅助代码。
  • cloud-browser: 需要并允许使用Browser Use Cloud。

Step 2: Install browser-harness

步骤2:安装browser-harness

Browser Harness can be set up by an agent from any platform that can run shell commands:
bash
git clone https://github.com/browser-use/browser-harness.git
cd browser-harness
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Claude Code can also use the project-native setup prompt:
text
Set up https://github.com/browser-use/browser-harness for me
Requirements:
  • Python 3.10+
  • Chrome or Chromium
  • http://localhost:9222/json
    reachable from the agent runtime
Browser Harness可由Agent从任何能运行Shell命令的平台进行设置:
bash
git clone https://github.com/browser-use/browser-harness.git
cd browser-harness
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Claude Code也可使用项目原生设置提示:
text
Set up https://github.com/browser-use/browser-harness for me
要求:
  • Python 3.10+
  • Chrome或Chromium
  • 从Agent运行环境可访问
    http://localhost:9222/json

Step 3: Enable Chrome remote debugging

步骤3:启用Chrome远程调试

Use a separate profile so the harness can safely create clean sessions:
bash
undefined
使用单独配置文件,以便工具可安全创建纯净会话:
bash
undefined

macOS

macOS

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
--remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
--remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug

Linux

Linux

google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug

Windows PowerShell

Windows PowerShell

& "C:\Program Files\Google\Chrome\Application\chrome.exe" ` --remote-debugging-port=9222 --user-data-dir="$env:TEMP\chrome-debug"

Verify:

```bash
curl -s http://localhost:9222/json
& "C:\Program Files\Google\Chrome\Application\chrome.exe" ` --remote-debugging-port=9222 --user-data-dir="$env:TEMP\chrome-debug"

验证:

```bash
curl -s http://localhost:9222/json

Step 4: Platform-specific notes

步骤4:平台特定说明

PlatformUse browser-harness whenSetup note
Claude CodeYou need autonomous browser work or Claude-safe screenshotsApply the screenshot patch before image-heavy work
Codex CLIYou need local CDP automation from a repo taskKeep
.venv
inside the checkout and run commands from that shell
Antigravity (
agy
)
You need the same browser harness from Antigravity workflowsEnsure
agy
can see the checkout and
localhost:9222
Gemini CLI / OpenCodeYou need portable browser automation without platform-specific MCP wiringUse the same local CDP and Python workspace
For Codex and Antigravity, do not assume Claude Code plugin commands exist. Prefer explicit local commands:
bash
cd ~/browser-harness
source .venv/bin/activate
python -c "import browser_harness; print('browser-harness OK')"
curl -s http://localhost:9222/json
平台使用browser-harness的场景设置说明
Claude Code需要自主浏览器工作或Claude安全截图在进行图片密集型工作前应用截图补丁
Codex CLI需要从仓库任务进行本地CDP自动化
.venv
保存在检出目录内,并从该Shell运行命令
Antigravity (
agy
)
需要在Antigravity工作流中使用同一浏览器工具确保
agy
可访问检出目录和
localhost:9222
Gemini CLI / OpenCode需要无需平台特定MCP连接的可移植浏览器自动化使用同一本地CDP和Python工作区
对于Codex和Antigravity,不要假设存在Claude Code插件命令。优先使用明确的本地命令:
bash
cd ~/browser-harness
source .venv/bin/activate
python -c "import browser_harness; print('browser-harness OK')"
curl -s http://localhost:9222/json

Step 5: Apply the Claude-safe screenshot patch

步骤5:应用Claude安全截图补丁

If Claude throws image recognition, image upload, PNG read, or tool errors around screenshots, patch
src/browser_harness/helpers.py
so screenshots are decoded and resized in memory, and PIL file handles are closed before saving overlays.
Required changes:
diff
diff --git a/src/browser_harness/helpers.py b/src/browser_harness/helpers.py
--- a/src/browser_harness/helpers.py
+++ b/src/browser_harness/helpers.py
@@
-import base64, importlib.util, json, math, os, sys, time, urllib.request
+import base64, importlib.util, io, json, math, os, sys, time, urllib.request
@@
-            img = Image.open(path)
+            with Image.open(path) as src:
+                img = src.copy()
@@
-    open(path, "wb").write(base64.b64decode(r["data"]))
+    data = base64.b64decode(r["data"])
     if max_dim:
         from PIL import Image
-        img = Image.open(path)
+        img = Image.open(io.BytesIO(data))
         if max(img.size) > max_dim:
             img.thumbnail((max_dim, max_dim))
-            img.save(path)
+            buf = io.BytesIO()
+            img.save(buf, format="PNG")
+            data = buf.getvalue()
+    with open(path, "wb") as f:
+        f.write(data)
Why this matters:
  • Image.open(path)
    keeps a lazy file handle unless copied or closed.
  • Claude image/tool pipelines are more likely to fail when a PNG is opened, rewritten, then reopened by the agent in quick succession.
  • In-memory resize via
    io.BytesIO
    avoids the write-read-write cycle.
  • Writing once with
    with open(path, "wb")
    produces a stable file for Claude vision upload.
Recommended screenshot call for Claude:
python
path = capture_screenshot(max_dim=1800)
Use
max_dim=1800
on high-DPI displays to stay under common 2000px-per-side image limits.
如果Claude在截图相关操作中抛出图像识别、图像上传、PNG读取或工具错误,请修补
src/browser_harness/helpers.py
,使截图在内存中解码和调整大小,并在保存覆盖层前关闭PIL文件句柄。
所需更改:
diff
diff --git a/src/browser_harness/helpers.py b/src/browser_harness/helpers.py
--- a/src/browser_harness/helpers.py
+++ b/src/browser_harness/helpers.py
@@
-import base64, importlib.util, json, math, os, sys, time, urllib.request
+import base64, importlib.util, io, json, math, os, sys, time, urllib.request
@@
-            img = Image.open(path)
+            with Image.open(path) as src:
+                img = src.copy()
@@
-    open(path, "wb").write(base64.b64decode(r["data"]))
+    data = base64.b64decode(r["data"])
     if max_dim:
         from PIL import Image
-        img = Image.open(path)
+        img = Image.open(io.BytesIO(data))
         if max(img.size) > max_dim:
             img.thumbnail((max_dim, max_dim))
-            img.save(path)
+            buf = io.BytesIO()
+            img.save(buf, format="PNG")
+            data = buf.getvalue()
+    with open(path, "wb") as f:
+        f.write(data)
为何此更改重要:
  • Image.open(path)
    会保持延迟文件句柄,除非进行复制或关闭。
  • 当PNG被快速连续打开、重写然后由Agent重新打开时,Claude图片/工具管道更可能失败。
  • 通过
    io.BytesIO
    进行内存中调整大小可避免写入-读取-写入循环。
  • 使用
    with open(path, "wb")
    一次性写入可生成稳定文件供Claude视觉上传。
推荐的Claude截图调用:
python
path = capture_screenshot(max_dim=1800)
在高DPI显示器上使用
max_dim=1800
,以保持在常见的每边2000px图像限制内。

Step 6: Run browser tasks

步骤6:运行浏览器任务

Give the agent a natural-language task:
text
Open the local app, complete the signup form, and verify that the dashboard appears.
Navigate to GitHub, open the first open issue, and summarize the acceptance criteria.
Fill in the contact form at example.com and confirm the success message.
The agent should:
  1. Connect to Chrome via CDP.
  2. Inspect tabs and page state.
  3. Reuse existing helpers in
    agent-workspace/agent_helpers.py
    .
  4. Add missing helpers in
    agent-workspace/agent_helpers.py
    or
    agent-workspace/domain-skills/
    .
  5. Verify completion with text, URL, DOM state, screenshot, or downloaded artifact evidence.
向Agent提供自然语言任务:
text
Open the local app, complete the signup form, and verify that the dashboard appears.
Navigate to GitHub, open the first open issue, and summarize the acceptance criteria.
Fill in the contact form at example.com and confirm the success message.
Agent应:
  1. 通过CDP连接到Chrome。
  2. 检查标签页和页面状态。
  3. 复用
    agent-workspace/agent_helpers.py
    中的现有辅助代码。
  4. agent-workspace/agent_helpers.py
    agent-workspace/domain-skills/
    中添加缺失的辅助代码。
  5. 通过文本、URL、DOM状态、截图或下载的工件证据验证任务完成。

Step 7: Extend with domain skills

步骤7:通过领域技能扩展

Domain skills are site-specific playbooks. Keep them small and reusable:
text
agent-workspace/domain-skills/
├── github.py
├── linkedin.py
└── your-site.py
Example:
python
def login(page, username: str, password: str):
    """Log into mysite.com."""
    page.goto("https://mysite.com/login")
    page.fill("#username", username)
    page.fill("#password", password)
    page.click("button[type=submit]")
    page.wait_for_url("**/dashboard")
领域技能是特定站点的操作手册。保持其小巧且可复用:
text
agent-workspace/domain-skills/
├── github.py
├── linkedin.py
└── your-site.py
示例:
python
def login(page, username: str, password: str):
    """Log into mysite.com."""
    page.goto("https://mysite.com/login")
    page.fill("#username", username)
    page.fill("#password", password)
    page.click("button[type=submit]")
    page.wait_for_url("**/dashboard")

Step 8: Browser Use Cloud escalation

步骤8:Browser Use Cloud升级

Use Browser Use Cloud only when local Chrome is insufficient and the target permits automation:
python
from browser_harness import BrowserUseCloud

client = BrowserUseCloud(api_key="YOUR_API_KEY")
result = client.run("Extract the dashboard data and return a CSV summary")
print(result)
仅当本地Chrome无法满足任务需求且目标允许自动化时,才使用Browser Use Cloud:
python
from browser_harness import BrowserUseCloud

client = BrowserUseCloud(api_key="YOUR_API_KEY")
result = client.run("Extract the dashboard data and return a CSV summary")
print(result)

Best practices

最佳实践

  1. Start with
    local-cdp
    ; escalate only when the local CDP endpoint cannot satisfy the job.
  2. Keep core package edits minimal. Put ordinary workflow logic in
    agent_helpers.py
    or domain skills.
  3. Apply the Claude-safe screenshot patch before image-heavy Claude Code runs.
  4. For Codex and Antigravity, prefer explicit shell/Python commands over Claude-only plugin instructions.
  5. Treat every browser task as incomplete until the agent records final evidence.
  6. Use
    scrapling
    for stateless scraping and
    playwriter
    for already-open authenticated browser reuse.
  7. Do not bypass site terms, robots, rate limits, or authorization boundaries.
  1. local-cdp
    开始;仅当本地CDP端点无法满足任务时才升级。
  2. 尽量减少核心包修改。将普通工作流逻辑放在
    agent_helpers.py
    或领域技能中。
  3. 在进行图片密集型Claude Code运行前应用Claude安全截图补丁。
  4. 对于Codex和Antigravity,优先使用明确的Shell/Python命令而非仅Claude可用的插件指令。
  5. 除非Agent记录最终证据,否则将每个浏览器任务视为未完成。
  6. 对于无状态爬取使用
    scrapling
    ,对于复用已打开的已认证浏览器使用
    playwriter
  7. 不要违反站点条款、robots协议、速率限制或授权边界。

Quick verification

快速验证

bash
cd ~/browser-harness
source .venv/bin/activate
python -c "import browser_harness; print('browser-harness OK')"
curl -s http://localhost:9222/json
bash
cd ~/browser-harness
source .venv/bin/activate
python -c "import browser_harness; print('browser-harness OK')"
curl -s http://localhost:9222/json

References

参考资料

  • browser-use/browser-harness GitHub
  • scrapling — stateless HTML/JS scraping without agent-owned browser state
  • playwriter — running-browser reuse when existing login/session state matters
  • agentation — rendered-UI feedback and human annotation packets
  • browser-use/browser-harness GitHub
  • scrapling — 无状态HTML/JS爬取,无需Agent拥有的浏览器状态
  • playwriter — 当现有登录/会话状态重要时,复用运行中的浏览器
  • agentation — 渲染UI反馈和人类标注包