ego-browser

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ego-browser

ego-browser gives AI agents a CLI-accessible Node.js runtime, with built-in helpers — snapshotText, click, js, cdp, and more — that agents call directly inside JS scripts to observe pages, interact with UI, evaluate browser-side JavaScript, and drive a real browser for any web automation task.

For setup, install, or connection problems, read

references/install.md

Use the

Bash

tool to run all browser operations via

ego-browser nodejs <<'EOF' ... EOF

heredoc. Do not write code to a

.js

file first.

ego-browser 为AI Agent提供了可通过CLI访问的Node.js运行时，内置了snapshotText、click、js、cdp等辅助函数，Agent可在JS脚本中直接调用这些函数来观察页面、与UI交互、执行浏览器端JavaScript，并驱动真实浏览器完成任何Web自动化任务。

若遇到安装、配置或连接问题，请阅读

references/install.md

。

请使用

Bash

工具，通过

ego-browser nodejs <<'EOF' ... EOF

heredoc语法运行所有浏览器操作。请勿先将代码写入

.js

文件。

Quick start

快速开始

bash

ego-browser nodejs <<'EOF'
// Name the task space for the whole user task, then reuse that space across heredoc rounds.
const task = await useOrCreateTaskSpace('inspect example page')
cliLog('task space id: ' + task.id)

await openOrReuseTab('https://example.com', { wait: true, timeout: 20 })

cliLog(await snapshotText())
EOF

The heredoc body runs as a Node.js script that controls the selected ego-browser task space. All ego-browser helpers are preloaded into that script.

bash

ego-browser nodejs <<'EOF'
// 为整个用户任务命名任务空间，之后在多轮heredoc中复用该空间。
const task = await useOrCreateTaskSpace('inspect example page')
cliLog('task space id: ' + task.id)

await openOrReuseTab('https://example.com', { wait: true, timeout: 20 })

cliLog(await snapshotText())
EOF

Heredoc主体作为Node.js脚本运行，控制所选的ego-browser任务空间。所有ego-browser辅助函数均已预加载到该脚本中。

Common helpers

常用辅助函数

Task spaces:

listTaskSpaces

useOrCreateTaskSpace

claimTaskSpace

handOffTaskSpace

takeOverTaskSpace

waitForAgentControl

completeTaskSpace

Navigation / state:

listTabs

openOrReuseTab

closeTab

gotoAndWait

currentTab

switchTab

gotoUrl

pageInfo

ensureRealTab

Observation:

snapshotText

captureScreenshot

drainEvents

Scroll / mouse:

scrollBy

scrollToBottomUntil

scroll

click

doubleClick

hover

dragMouse

Keyboard & input:
```
typeText
```
,
```
fillInput
```
,
```
pressKey
```
,
```
dispatchKey
```
File:
```
uploadFile
```

Wait:

wait

waitForLoad

waitForElement

waitForNetworkIdle

Fetch:
```
serverFetch
```
,
```
browserFetch
```
CDP / evaluate:
```
js
```
,
```
cdp
```
Output:
```
cliLog
```
,
```
help
```

Notes:

```
cliLog(value)
```
— prints to the terminal; it is the only output mechanism inside a heredoc, and all final results must go through it.
```
await pageInfo()
```
— normally resolves to
```
{ url, title, w, h, sx, sy, pw, ph }
```
; if a native browser dialog is open, resolves to
```
{ dialog: ... }
```
instead because page JavaScript is blocked.

await pageInfo()

resolves to

{ dialog: ... }

, handle the dialog with

await cdp('Page.handleJavaScriptDialog', { accept: true })

accept: false

before running page JavaScript.

```
await ensureRealTab()
```
— switches to an existing non-internal page tab if needed and resolves to it; resolves to
```
null
```
when none exists. It does not create a tab — use
```
await openOrReuseTab(...)
```
for that.
```
await closeTab(target?)
```
— closes the given target id / tab object, or the current tab when omitted.
```
await drainEvents()
```
— consumes and returns the async event queue produced by the page (navigation events, network events, etc.).
```
await serverFetch(url, options)
```
— issues a request from Node and returns the response body.
```
await browserFetch(url, options)
```
— issues a request from the current browser page context and returns the response body.
```
help(name)
```
— prints usage for a given helper, e.g.
```
cliLog(help('click'))
```
.

任务空间：

listTaskSpaces

useOrCreateTaskSpace

claimTaskSpace

handOffTaskSpace

takeOverTaskSpace

waitForAgentControl

completeTaskSpace

导航/状态：

listTabs

openOrReuseTab

closeTab

gotoAndWait

currentTab

switchTab

gotoUrl

pageInfo

ensureRealTab

观测：

snapshotText

captureScreenshot

drainEvents

滚动/鼠标：

scrollBy

scrollToBottomUntil

scroll

click

doubleClick

hover

dragMouse

键盘与输入：
```
typeText
```
,
```
fillInput
```
,
```
pressKey
```
,
```
dispatchKey
```
文件：
```
uploadFile
```

等待：

wait

waitForLoad

waitForElement

waitForNetworkIdle

抓取：
```
serverFetch
```
,
```
browserFetch
```
CDP/执行：
```
js
```
,
```
cdp
```
输出：
```
cliLog
```
,
```
help
```

注意事项：

```
cliLog(value)
```
— 输出到终端；这是heredoc内唯一的输出机制，所有最终结果都必须通过它输出。
```
await pageInfo()
```
— 通常会返回
```
{ url, title, w, h, sx, sy, pw, ph }
```
；若原生浏览器对话框处于打开状态，则会返回
```
{ dialog: ... }
```
，因为页面JavaScript被阻塞。

若

await pageInfo()

{ dialog: ... }

，需先使用

await cdp('Page.handleJavaScriptDialog', { accept: true })

或

accept: false

处理对话框，再运行页面JavaScript。

```
await ensureRealTab()
```
— 若需要，会切换到现有非内部页面标签并返回该标签；若无符合条件的标签，则返回
```
null
```
。它不会创建新标签 — 如需创建标签，请使用
```
await openOrReuseTab(...)
```
。
```
await closeTab(target?)
```
— 关闭指定的目标ID/标签对象；若省略参数，则关闭当前标签。
```
await drainEvents()
```
— 消费并返回页面产生的异步事件队列（导航事件、网络事件等）。
```
await serverFetch(url, options)
```
— 从Node端发起请求并返回响应体。
```
await browserFetch(url, options)
```
— 从当前浏览器页面上下文发起请求并返回响应体。
```
help(name)
```
— 打印指定辅助函数的用法，例如
```
cliLog(help('click'))
```
。

Task spaces

任务空间

A task space is an isolated browsing context that ego-browser provides for AI Agents. Each task space has its own set of tabs but inherits the current user's login state by default, so Agents can operate on authenticated sites without competing with or disturbing the user's normal browser windows.

Closing all tabs in a task space is equivalent to closing that task space.

A task often takes multiple heredoc rounds to complete. Because the Node.js runtime exits after each heredoc and retains no state, normal working heredocs should start with an explicit call to

useOrCreateTaskSpace(nameOrId)

to reuse the same space — this lets you operate continuously and reuse tabs across rounds. The exception is resuming after a handoff: once the user confirms "continue" (through an Ask or in chat), start the next heredoc with

takeOverTaskSpace(nameOrId)

instead.

nameOrId

can be a task space name, numeric id, or digit-only numeric id string. String values match

name

taskId

first, then digit-only strings fall back to numeric id. Number values match existing numeric ids only; if no matching id exists,

useOrCreateTaskSpace

fails instead of creating a new space.

Use a short name for the active user goal when creating a new task space. Keep reusing that task space for follow-up questions, corrections, refinements, re-checks, and result validation, even if you previously thought the task was complete. Choose a new task space only when the user clearly starts a separate, unrelated goal. Prefer using the numeric

id

returned by

useOrCreateTaskSpace

(for example,

task.id

) to resume a known task in later rounds and avoid name collisions.

For any follow-up on the same user goal — including continue, corrections, retries, validation, user-reported problems, or work after

completeTaskSpace(..., { keep: true })

— resume the original task space first if it still exists. Do not create a new task space for the same goal unless the user asks for a fresh space, starts an unrelated goal, or the original space is unavailable after checking. If a new space is necessary, state why.

After explicit user confirmation, to continue work from an existing user-owned, inactive, or unassigned task space, use

await listTaskSpaces()

to find the space, call

await claimTaskSpace(id)

to take ownership and select it, then use

await listTabs()

and

await switchTab(targetId)

to select the exact tab before acting.

Ownership policy — every task space has

ownership: 'agent' | 'agentDelegatedToUser' | 'user'

; the helpers treat user-owned spaces differently:

Helper	When the target space is user-owned
`switchTaskSpace`	throws — agent-owned spaces only
`claimTaskSpace`	claims it (ownership transfers to the agent), then selects it
`handOffTaskSpace`	skipped — resolves `{ done: false, skipped: 'user-owned' }`
`completeTaskSpace(…, { keep: true })`	skipped — resolves `{ done: false, skipped: 'user-owned' }`
`completeTaskSpace(…, { keep: false })`	claims it, then closes it
`takeOverTaskSpace` / `waitForAgentControl`	no ownership check

handOffTaskSpace

and

completeTaskSpace

resolve

{ done: true }

when the operation actually happened. Check

done

before telling the user the handoff/cleanup is finished — a

skipped

result usually means you targeted a space that was never yours.

completeTaskSpace(nameOrId, { keep })
must occupy its own dedicated final heredoc, and run only after a prior heredoc's output has confirmed the task is genuinely done.

keep

is required and defaults by policy to

false

: close the task space after completion unless there is a concrete reason to leave the live page visible.

Use

{ keep: true }

only when the user explicitly asks to keep the page open, the task needs manual user action in that exact page, or the result cannot be delivered well as a URL, file, artifact, or summary. Do not keep a task space open merely because a page was visited, a document was created, or a screenshot was used for verification.

When passing a string that may create a new task space, the string should reflect the task's intent (e.g.

'search github issues'

); don't use literal placeholders.

If the task space needs to be preserved after the task ends, keep only the tabs that need to be shown to the user. Keep loose awareness of how many tabs are open — a quick

(await listTabs()).length

is enough; there's no need to spend a dedicated round just to check. When scratch tabs (search-result pages, cross-check pages, and other one-off pages) pile up, close them as you go rather than letting them all accumulate for the end. When finishing with

{ keep: true }

to leave pages for the user, clear out the remaining scratch tabs so only the pages worth showing stay open. Close a single tab with

await closeTab(targetId)

(

targetId

comes from

listTabs()

or an

openOrReuseTab

return value).

任务空间是ego-browser为AI Agent提供的独立浏览上下文。每个任务空间都有自己的标签页集合，但默认会继承当前用户的登录状态，因此Agent可在已认证的网站上操作，而不会干扰用户正常的浏览器窗口。

关闭任务空间中的所有标签页等同于关闭该任务空间。

一项任务通常需要多轮heredoc才能完成。由于Node.js运行时在每轮heredoc后会退出且不保留状态，正常的工作流heredoc应从显式调用

useOrCreateTaskSpace(nameOrId)

开始，以复用同一空间 — 这样就能跨轮次持续操作并复用标签页。例外情况是移交后的恢复：当用户确认“继续”（通过询问或聊天消息）后，下一轮heredoc应从

takeOverTaskSpace(nameOrId)

开始。

nameOrId

可以是任务空间名称、数字ID或纯数字字符串。字符串值优先匹配

name

taskId

，纯数字字符串会回退匹配数字ID。数值仅匹配现有的数字ID；若不存在匹配的ID，

useOrCreateTaskSpace

会失败而非创建新空间。

创建新任务空间时，请使用能体现用户目标的简短名称。后续的问题跟进、修正、优化、重新检查和结果验证，即使你之前认为任务已完成，也请继续复用该任务空间。只有当用户明确开始一个独立的、不相关的目标时，才选择新的任务空间。建议使用

useOrCreateTaskSpace

返回的数字

id

（例如

task.id

）在后续轮次中恢复已知任务，以避免名称冲突。

对于同一用户目标的任何后续操作 — 包括继续、修正、重试、验证、用户反馈的问题，或调用

completeTaskSpace(..., { keep: true })

后的工作 — 若原任务空间仍存在，请先恢复原任务空间。除非用户要求使用新空间、开始不相关目标，或检查后发现原空间不可用，否则不要为同一目标创建新空间。若必须创建新空间，请说明原因。

在获得用户明确确认后，若要从现有的用户所有、非活跃或未分配的任务空间继续工作，请使用

await listTaskSpaces()

找到该空间，调用

await claimTaskSpace(id)

获取所有权并选中它，然后使用

await listTabs()

和

await switchTab(targetId)

选择具体的标签页后再执行操作。

所有权规则 — 每个任务空间都有

ownership: 'agent' | 'agentDelegatedToUser' | 'user'

属性；辅助函数对用户所有的空间处理方式不同：

辅助函数	目标空间为用户所有时的行为
`switchTaskSpace`	抛出错误 — 仅支持Agent所有的空间
`claimTaskSpace`	认领该空间（所有权转移给Agent），然后选中它
`handOffTaskSpace`	跳过操作 — 返回 `{ done: false, skipped: 'user-owned' }`
`completeTaskSpace(…, { keep: true })`	跳过操作 — 返回 `{ done: false, skipped: 'user-owned' }`
`completeTaskSpace(…, { keep: false })`	认领该空间，然后关闭它
`takeOverTaskSpace` / `waitForAgentControl`	不检查所有权

当操作实际执行时，

handOffTaskSpace

和

completeTaskSpace

会返回

{ done: true }

。在告知用户移交/清理完成前，请检查

done

的值 —

skipped

结果通常意味着你操作的空间从未属于你。

completeTaskSpace(nameOrId, { keep })
必须放在单独的最终heredoc中，且仅在之前heredoc的输出确认任务真正完成后运行。

keep

为必填参数，默认策略为

false

：任务完成后关闭任务空间，除非有具体理由需要保留实时页面可见。

仅当用户明确要求保持页面打开、任务需要在该页面进行手动用户操作，或结果无法通过URL、文件、工件或摘要很好地交付时，才使用

{ keep: true }

。不要仅仅因为访问过页面、创建过文档或使用过截图进行验证就保留任务空间。

当传递可能创建新任务空间的字符串时，该字符串应反映任务的意图（例如

'search github issues'

）；请勿使用字面占位符。

若任务结束后需要保留任务空间，请仅保留需要展示给用户的标签页。 随时留意打开的标签页数量 — 只需快速执行

(await listTabs()).length

即可；无需专门用一轮操作来检查。当临时标签页（搜索结果页、交叉检查页和其他一次性页面）堆积时，请随时关闭，不要等到最后再清理。当使用

{ keep: true }

为用户保留页面时，请清除剩余的临时标签页，只保留值得展示的页面。使用

await closeTab(targetId)

关闭单个标签页（

targetId

来自

listTabs()

或

openOrReuseTab

的返回值）。

Control handoff

控制权移交

Only one side — agent or user — holds control of a task space at any time. While the user holds control, any browser operation by the agent fails with a "user is controlling" message — do not retry it; follow the steps below to resume.

A "user is controlling" error is a hard stop on the whole task — not an obstacle to route around. It means the user has deliberately taken the browser back, often because your current approach is going wrong. Honoring it is the correct outcome here; pushing the goal forward anyway is the failure. The only thing you may do is ask the user and wait.

An "inactive", "not assigned to an agent", or similar task-space error is also a hard stop with the same confirmation requirement. Resume only after explicit user confirmation, then start with

await claimTaskSpace(id)

Handing off: When the task requires user intervention (e.g. login, captcha, manual confirmation), call

await handOffTaskSpace([nameOrId])

to give control to the user, and tell them exactly what to do. Omitting

nameOrId

uses the currently selected task space; pass

task.id

across heredoc rounds to avoid ambiguity.

Regaining control: Take control back only after the user explicitly confirms — through an Ask (your harness's button/option prompt, e.g. "Continue" vs "Finish task") or a "continue" message in chat. Then start a new heredoc with

await takeOverTaskSpace([nameOrId])

and resume; if the user chooses to finish, close out with

await completeTaskSpace(nameOrId, { keep })

. Never call

takeOverTaskSpace

on your own to grab control back — it has no ownership check and will seize the browser away from the user.

Unexpected takeover: The user can take over at any time via the browser GUI — the same effect as the agent calling

handOffTaskSpace

. Do not retry the failed operation and do not auto-takeover; surface the Ask above (Continue / Finish) and resume only when the user picks Continue.

await waitForAgentControl(nameOrId)

is a read-only blocking poll (it never takes control); use it only to wait inside the current heredoc for a handoff you initiated.

任何时候，只有一方（Agent或用户）持有任务空间的控制权。当用户持有控制权时，Agent的任何浏览器操作都会因“用户正在控制”的消息失败 — 请勿重试，请按照以下步骤恢复。

“用户正在控制”错误是整个任务的硬停止 — 不是可以绕过的障碍。这意味着用户已故意收回浏览器控制权，通常是因为当前方法出现问题。尊重该错误是正确的处理方式；强行推进目标是错误的。此时你唯一可做的是询问用户并等待。

“非活跃”“未分配给Agent”或类似的任务空间错误也是硬停止，同样需要用户确认。仅在获得用户明确确认后恢复，然后从

await claimTaskSpace(id)

开始。

移交控制权：当任务需要用户干预（例如登录、验证码、手动确认）时，调用

await handOffTaskSpace([nameOrId])

将控制权交给用户，并明确告知他们需要执行的操作。省略

nameOrId

时会使用当前选中的任务空间；跨轮次传递

task.id

以避免歧义。

收回控制权：仅在用户明确确认后（通过询问面板的按钮/选项提示，例如“继续”vs“完成任务”，或聊天中的“继续”消息），才能收回控制权。然后启动新的heredoc，从

await takeOverTaskSpace([nameOrId])

开始恢复操作；若用户选择完成任务，则调用

await completeTaskSpace(nameOrId, { keep })

结束任务。请勿自行调用

takeOverTaskSpace

收回控制权 — 它不检查所有权，会强行从用户手中夺取浏览器控制权。

意外接管：用户可随时通过浏览器GUI接管控制权 — 效果与Agent调用

handOffTaskSpace

相同。请勿重试失败的操作，也不要自动接管；请显示上述询问（继续/完成），仅在用户选择继续后恢复操作。

await waitForAgentControl(nameOrId)

是只读的阻塞轮询（它永远不会获取控制权）；仅在当前heredoc中等待你发起的移交时使用它。

Scroll / mouse

滚动/鼠标

// DOM scroll
await scrollBy(900)
await scrollToBottomUntil(
  async () => await js(String.raw`document.querySelectorAll('article').length`) >= 20,
  { step: 900, wait: 1, maxSteps: 20 }
)

// Real wheel event
await scroll({ dy: 900 })

Element-target helpers such as

click

doubleClick

hover

dragMouse

fillInput

uploadFile

, and

waitForElement

accept the same selector/ref surface: raw CSS,

xpath=...

@N

ref=N

, and

loc=...

values from

snapshotText()

(

loc=css:...

loc=role:...

loc=href:...

@N

refs are for ego-browser helpers only; they are not valid selectors inside

document.querySelector(...)

click

doubleClick

hover

, and

dragMouse

share these target formats. Coordinates are in CSS pixels:

```
string
```
— CSS selector,
```
xpath=...
```
,
```
@N
```
/
```
ref=N
```
, or
```
loc=...
```
; clicks the element's center.
```
[x, y]
```
or
```
{x, y}
```
— viewport coordinates.
```
{selector}
```
— CSS selector,
```
xpath=...
```
,
```
@N
```
/
```
ref=N
```
, or
```
loc=...
```
; clicks the element's center.
```
{selector, x, y}
```
— offset from the element's top-left corner by
```
x
```
/
```
y
```
.
```
options.label
```
(optional) — a 3-6 word action description; triggers a visual highlight animation.

await click('@21', { label: 'check login status' })
await click('button.primary', { label: 'click submit button' })
await click([420, 260])
await click({ x: 420, y: 260 })
await click({ selector: 'canvas#stage', x: 12, y: 8 })
await hover('@5', { label: 'hover to reveal menu' })
await dragMouse([from, to], { label: 'drag card' })

// DOM滚动
await scrollBy(900)
await scrollToBottomUntil(
  async () => await js(String.raw`document.querySelectorAll('article').length`) >= 20,
  { step: 900, wait: 1, maxSteps: 20 }
)

// 真实滚轮事件
await scroll({ dy: 900 })

以元素为目标的辅助函数（如

click

、

doubleClick

、

hover

、

dragMouse

、

fillInput

、

uploadFile

和

waitForElement

）接受相同的选择器/引用格式：原始CSS选择器、

xpath=...

、

@N

ref=N

，以及

snapshotText()

返回的

loc=...

值（

loc=css:...

、

loc=role:...

、

loc=href:...

）。

@N

引用仅适用于ego-browser辅助函数；它们在

document.querySelector(...)

中不是有效的选择器。

click

、

doubleClick

、

hover

和

dragMouse

共享这些目标格式。坐标单位为CSS像素：

```
string
```
— CSS选择器、
```
xpath=...
```
、
```
@N
```
/
```
ref=N
```
或
```
loc=...
```
；点击元素中心。
```
[x, y]
```
或
```
{x, y}
```
— 视口坐标。
```
{selector}
```
— CSS选择器、
```
xpath=...
```
、
```
@N
```
/
```
ref=N
```
或
```
loc=...
```
；点击元素中心。
```
{selector, x, y}
```
— 从元素左上角偏移
```
x
```
/
```
y
```
像素的位置。
```
options.label
```
（可选） — 3-6个单词的操作描述；会触发视觉高亮动画。

await click('@21', { label: 'check login status' })
await click('button.primary', { label: 'click submit button' })
await click([420, 260])
await click({ x: 420, y: 260 })
await click({ selector: 'canvas#stage', x: 12, y: 8 })
await hover('@5', { label: 'hover to reveal menu' })
await dragMouse([from, to], { label: 'drag card' })

uploadFile

await uploadFile('input[type="file"]', "/absolute/path/to/file.pdf")

await uploadFile('input[type="file"]', "/absolute/path/to/file.pdf")

js

js()

is essentially

Runtime.evaluate

and takes a string. You can pass a function, but doing so triggers a one-time warning and wraps it via

.toString()

— closures are not captured and there is no argument channel. Do not use

js()

the way you would Puppeteer / Playwright's

page.evaluate(fn, ...args)

When you need to run multi-step logic inside the browser, wrap it in a single self-invoking closure and return once — don't split it across multiple

await js()

calls:

const data = await js(String.raw`(() => {
  const items = [...document.querySelectorAll('article')]
  return items.map(el => ({
    text: el.innerText,
    links: [...el.querySelectorAll('a')].map(a => a.href),
  }))
})()`)

js()

本质上是

Runtime.evaluate

，接受字符串参数。你可以传递函数，但这会触发一次性警告，并通过

.toString()

进行包装 — 闭包不会被捕获，也没有参数传递通道。请勿像使用Puppeteer/Playwright的

page.evaluate(fn, ...args)

那样使用

js()

。

当你需要在浏览器中运行多步逻辑时，请将其包装在一个自执行闭包中并一次性返回 — 不要将其拆分为多个

await js()

调用：

const data = await js(String.raw`(() => {
  const items = [...document.querySelectorAll('article')]
  return items.map(el => ({
    text: el.innerText,
    links: [...el.querySelectorAll('a')].map(a => a.href),
  }))
})()`)

Recommended workflow

Caveats

注意事项

```
wait(...)
```
and
```
timeout
```
values are in seconds; only parameters whose names end in
```
Ms
```
are milliseconds.
```
snapshotText()
```
defaults to
```
scope: 'full_page'
```
, covering the whole page. Use the default in almost every case; only pass
```
scope: 'only_within_viewport'
```
when the task needs only visible content.
```
@N
```
refs are only valid for the most recent
```
snapshotText
```
call — every call rebuilds the refMap. Ref numbers come from the CDP
```
backendNodeId
```
, so the same element keeps the same number across calls; but to use
```
@N
```
, N must appear in the latest snapshotText output. An element scrolled out of the viewport, a DOM re-render, or a previous call with
```
scope:'only_within_viewport'
```
that didn't cover the element will all cause
```
Unknown ref
```
. For elements you need to reference long-term, use the
```
loc=...
```
value from snapshotText output as a stable selector, or write a CSS selector directly.
```
js()
```
returns the evaluated result, not a JSON string — don't wrap it with
```
JSON.parse(...)
```
.
Inside a
```
js(...)
```
template string, regex backslashes must be doubled (e.g.
```
\\d
```
,
```
\\s
```
), or use
```
String.raw
```
.
If the source passed to
```
js()
```
contains a top-level
```
return
```
, it will be auto-wrapped in an IIFE;
```
return
```
inside nested callbacks can also trigger this accidentally. For complex expressions, prefer the explicit
```
(() => { ... })()
```
form.
If
```
await pageInfo()
```
reports
```
w: 0
```
or
```
h: 0
```
, do not continue coordinate actions or screenshots until the viewport is fixed. Try switching to the real tab, reloading, or using CDP viewport metrics, then verify with
```
await pageInfo()
```
and
```
await captureScreenshot()
```
.
Code in the heredoc body runs in Node.js; code inside
```
js(...)
```
runs in the browser page. Navigation, waits, and
```
cliLog(...)
```
belong in the heredoc body;
```
document
```
,
```
window
```
, and page selectors belong inside
```
js(...)
```
.
Always call
```
completeTaskSpace(name, { keep })
```
when the task is done — do not leave the space hanging. Default to
```
{ keep: false }
```
; use
```
{ keep: true }
```
only for the concrete live-page cases described in Task spaces.
When the user explicitly asks to use ego-browser, assume both
```
ego-browser
```
and the repo runtime are ready. Do not pre-check
```
which ego-browser
```
,
```
node -v
```
, package metadata, or help output. Only investigate environment issues if the first run produces an error.
If the first run reports
```
command not found
```
/ a missing environment (most likely ego lite isn't installed yet), or the user explicitly asks to install ego lite, first read
```
references/install.md
```
and follow its flow to complete the install, then return to the original task — do not give up, and do not keep retrying the same heredoc.

```
wait(...)
```
和
```
timeout
```
值的单位为秒；只有名称以
```
Ms
```
结尾的参数单位才是毫秒。
```
snapshotText()
```
默认
```
scope: 'full_page'
```
，覆盖整个页面。几乎所有情况下都使用默认值；仅当任务只需要可见内容时，才传递
```
scope: 'only_within_viewport'
```
。
```
@N
```
引用仅对最近一次
```
snapshotText
```
调用有效 — 每次调用都会重建引用映射。引用编号来自CDP的
```
backendNodeId
```
，因此同一元素在多次调用中会保持相同编号；但要使用
```
@N
```
，N必须出现在最新的snapshotText输出中。元素滚出视口、DOM重新渲染，或之前使用
```
scope:'only_within_viewport'
```
调用未覆盖该元素，都会导致
```
Unknown ref
```
错误。对于需要长期引用的元素，请使用snapshotText输出中的
```
loc=...
```
值作为稳定选择器，或直接编写CSS选择器。
```
js()
```
返回的是执行结果，而非JSON字符串 — 不要用
```
JSON.parse(...)
```
包裹它。
在
```
js(...)
```
模板字符串中，正则表达式的反斜杠必须加倍（例如
```
\\d
```
、
```
\\s
```
），或使用
```
String.raw
```
。
若传递给
```
js()
```
的代码包含顶层
```
return
```
，它会自动被包裹在IIFE中；嵌套回调中的
```
return
```
也可能意外触发此行为。对于复杂表达式，优先使用显式的
```
(() => { ... })()
```
形式。
若
```
await pageInfo()
```
报告
```
w: 0
```
或
```
h: 0
```
，在视口问题解决前，请不要继续执行坐标操作或截图。尝试切换到真实标签页、重新加载页面，或使用CDP视口指标，然后通过
```
await pageInfo()
```
和
```
await captureScreenshot()
```
验证。
Heredoc主体中的代码在Node.js中运行；
```
js(...)
```
内部的代码在浏览器页面中运行。导航、等待和
```
cliLog(...)
```
应放在heredoc主体中；
```
document
```
、
```
window
```
和页面选择器应放在
```
js(...)
```
内部。
任务完成后，请务必调用
```
completeTaskSpace(name, { keep })
```
— 不要让任务空间处于挂起状态。默认使用
```
{ keep: false }
```
；仅在任务空间部分描述的具体实时页面场景中使用
```
{ keep: true }
```
。
当用户明确要求使用ego-browser时，假设
```
ego-browser
```
和仓库运行时均已就绪。请勿预先检查
```
which ego-browser
```
、
```
node -v
```
、包元数据或帮助输出。仅当首次运行产生错误时，才排查环境问题。
若首次运行报告
```
command not found
```
/缺少环境（最可能是ego lite尚未安装），或用户明确要求安装ego lite，请先阅读
```
references/install.md
```
并按照流程完成安装，然后返回原任务 — 不要放弃，也不要反复重试同一个heredoc。

ego-browser

Original

Translation

ego-browser

ego-browser

Quick start

快速开始

Common helpers

常用辅助函数

Task spaces

任务空间

Control handoff

控制权移交

Scroll / mouse

滚动/鼠标

uploadFile

uploadFile

js

js

Recommended workflow

推荐工作流

Caveats

注意事项