ego-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseego-browser
ego-browser
ego-browser gives AI agents a CLI-accessible Node.js runtime, with built-in helpers — snapshotText, click, js, cdp, and more — that agents call directly inside JS scripts to observe pages, interact with UI, evaluate browser-side JavaScript, and drive a real browser for any web automation task.
For setup, install, or connection problems, read .
references/install.mdUse the tool to run all browser operations via heredoc. Do not write code to a file first.
Bashego-browser nodejs <<'EOF' ... EOF.jsego-browser 为AI Agent提供了可通过CLI访问的Node.js运行时,内置了snapshotText、click、js、cdp等辅助函数,Agent可在JS脚本中直接调用这些函数来观察页面、与UI交互、执行浏览器端JavaScript,并驱动真实浏览器完成任何Web自动化任务。
若遇到安装、配置或连接问题,请阅读。
references/install.md请使用工具,通过 heredoc语法运行所有浏览器操作。请勿先将代码写入文件。
Bashego-browser nodejs <<'EOF' ... EOF.jsQuick start
快速开始
bash
ego-browser nodejs <<'EOF'
// Name the task space for the whole user task, then reuse that space across heredoc rounds.
const task = await useOrCreateTaskSpace('inspect example page')
cliLog('task space id: ' + task.id)
await openOrReuseTab('https://example.com', { wait: true, timeout: 20 })
cliLog(await snapshotText())
EOFThe heredoc body runs as a Node.js script that controls the selected ego-browser task space. All ego-browser helpers are preloaded into that script.
bash
ego-browser nodejs <<'EOF'
// 为整个用户任务命名任务空间,之后在多轮heredoc中复用该空间。
const task = await useOrCreateTaskSpace('inspect example page')
cliLog('task space id: ' + task.id)
await openOrReuseTab('https://example.com', { wait: true, timeout: 20 })
cliLog(await snapshotText())
EOFHeredoc主体作为Node.js脚本运行,控制所选的ego-browser任务空间。所有ego-browser辅助函数均已预加载到该脚本中。
Common helpers
常用辅助函数
- Task spaces: ,
listTaskSpaces,useOrCreateTaskSpace,claimTaskSpace,handOffTaskSpace,takeOverTaskSpace,waitForAgentControlcompleteTaskSpace - Navigation / state: ,
listTabs,openOrReuseTab,closeTab,gotoAndWait,currentTab,switchTab,gotoUrl,pageInfoensureRealTab - Observation: ,
snapshotText,captureScreenshotdrainEvents - Scroll / mouse: ,
scrollBy,scrollToBottomUntil,scroll,click,doubleClick,hoverdragMouse - Keyboard & input: ,
typeText,fillInput,pressKeydispatchKey - File:
uploadFile - Wait: ,
wait,waitForLoad,waitForElementwaitForNetworkIdle - Fetch: ,
serverFetchbrowserFetch - CDP / evaluate: ,
jscdp - Output: ,
cliLoghelp
Notes:
- — prints to the terminal; it is the only output mechanism inside a heredoc, and all final results must go through it.
cliLog(value) - — normally resolves to
await pageInfo(); if a native browser dialog is open, resolves to{ url, title, w, h, sx, sy, pw, ph }instead because page JavaScript is blocked.{ dialog: ... } - If resolves to
await pageInfo(), handle the dialog with{ dialog: ... }orawait cdp('Page.handleJavaScriptDialog', { accept: true })before running page JavaScript.accept: false - — switches to an existing non-internal page tab if needed and resolves to it; resolves to
await ensureRealTab()when none exists. It does not create a tab — usenullfor that.await openOrReuseTab(...) - — closes the given target id / tab object, or the current tab when omitted.
await closeTab(target?) - — consumes and returns the async event queue produced by the page (navigation events, network events, etc.).
await drainEvents() - — issues a request from Node and returns the response body.
await serverFetch(url, options) - — issues a request from the current browser page context and returns the response body.
await browserFetch(url, options) - — prints usage for a given helper, e.g.
help(name).cliLog(help('click'))
- 任务空间:,
listTaskSpaces,useOrCreateTaskSpace,claimTaskSpace,handOffTaskSpace,takeOverTaskSpace,waitForAgentControlcompleteTaskSpace - 导航/状态:,
listTabs,openOrReuseTab,closeTab,gotoAndWait,currentTab,switchTab,gotoUrl,pageInfoensureRealTab - 观测:,
snapshotText,captureScreenshotdrainEvents - 滚动/鼠标:,
scrollBy,scrollToBottomUntil,scroll,click,doubleClick,hoverdragMouse - 键盘与输入:,
typeText,fillInput,pressKeydispatchKey - 文件:
uploadFile - 等待:,
wait,waitForLoad,waitForElementwaitForNetworkIdle - 抓取:,
serverFetchbrowserFetch - CDP/执行:,
jscdp - 输出:,
cliLoghelp
注意事项:
- — 输出到终端;这是heredoc内唯一的输出机制,所有最终结果都必须通过它输出。
cliLog(value) - — 通常会返回
await pageInfo();若原生浏览器对话框处于打开状态,则会返回{ url, title, w, h, sx, sy, pw, ph },因为页面JavaScript被阻塞。{ dialog: ... } - 若返回
await pageInfo(),需先使用{ dialog: ... }或await cdp('Page.handleJavaScriptDialog', { accept: true })处理对话框,再运行页面JavaScript。accept: false - — 若需要,会切换到现有非内部页面标签并返回该标签;若无符合条件的标签,则返回
await ensureRealTab()。它不会创建新标签 — 如需创建标签,请使用null。await openOrReuseTab(...) - — 关闭指定的目标ID/标签对象;若省略参数,则关闭当前标签。
await closeTab(target?) - — 消费并返回页面产生的异步事件队列(导航事件、网络事件等)。
await drainEvents() - — 从Node端发起请求并返回响应体。
await serverFetch(url, options) - — 从当前浏览器页面上下文发起请求并返回响应体。
await browserFetch(url, options) - — 打印指定辅助函数的用法,例如
help(name)。cliLog(help('click'))
Task spaces
任务空间
A task space is an isolated browsing context that ego-browser provides for AI Agents. Each task space has its own set of tabs but inherits the current user's login state by default, so Agents can operate on authenticated sites without competing with or disturbing the user's normal browser windows.
Closing all tabs in a task space is equivalent to closing that task space.
A task often takes multiple heredoc rounds to complete. Because the Node.js runtime exits after each heredoc and retains no state, normal working heredocs should start with an explicit call to to reuse the same space — this lets you operate continuously and reuse tabs across rounds. The exception is resuming after a handoff: once the user confirms "continue" (through an Ask or in chat), start the next heredoc with instead.
useOrCreateTaskSpace(nameOrId)takeOverTaskSpace(nameOrId)nameOrIdnametaskIduseOrCreateTaskSpaceUse a short name for the active user goal when creating a new task space. Keep reusing that task space for follow-up questions, corrections, refinements, re-checks, and result validation, even if you previously thought the task was complete. Choose a new task space only when the user clearly starts a separate, unrelated goal. Prefer using the numeric returned by (for example, ) to resume a known task in later rounds and avoid name collisions.
iduseOrCreateTaskSpacetask.idFor any follow-up on the same user goal — including continue, corrections, retries, validation, user-reported problems, or work after — resume the original task space first if it still exists. Do not create a new task space for the same goal unless the user asks for a fresh space, starts an unrelated goal, or the original space is unavailable after checking. If a new space is necessary, state why.
completeTaskSpace(..., { keep: true })After explicit user confirmation, to continue work from an existing user-owned, inactive, or unassigned task space, use to find the space, call to take ownership and select it, then use and to select the exact tab before acting.
await listTaskSpaces()await claimTaskSpace(id)await listTabs()await switchTab(targetId)Ownership policy — every task space has ; the helpers treat user-owned spaces differently:
ownership: 'agent' | 'agentDelegatedToUser' | 'user'| Helper | When the target space is user-owned |
|---|---|
| throws — agent-owned spaces only |
| claims it (ownership transfers to the agent), then selects it |
| skipped — resolves |
| skipped — resolves |
| claims it, then closes it |
| no ownership check |
handOffTaskSpacecompleteTaskSpace{ done: true }doneskippedcompleteTaskSpace(nameOrId, { keep })keepfalseUse only when the user explicitly asks to keep the page open, the task needs manual user action in that exact page, or the result cannot be delivered well as a URL, file, artifact, or summary. Do not keep a task space open merely because a page was visited, a document was created, or a screenshot was used for verification.
{ keep: true }When passing a string that may create a new task space, the string should reflect the task's intent (e.g. ); don't use literal placeholders.
'search github issues'If the task space needs to be preserved after the task ends, keep only the tabs that need to be shown to the user. Keep loose awareness of how many tabs are open — a quick is enough; there's no need to spend a dedicated round just to check. When scratch tabs (search-result pages, cross-check pages, and other one-off pages) pile up, close them as you go rather than letting them all accumulate for the end. When finishing with to leave pages for the user, clear out the remaining scratch tabs so only the pages worth showing stay open. Close a single tab with ( comes from or an return value).
(await listTabs()).length{ keep: true }await closeTab(targetId)targetIdlistTabs()openOrReuseTab任务空间是ego-browser为AI Agent提供的独立浏览上下文。每个任务空间都有自己的标签页集合,但默认会继承当前用户的登录状态,因此Agent可在已认证的网站上操作,而不会干扰用户正常的浏览器窗口。
关闭任务空间中的所有标签页等同于关闭该任务空间。
一项任务通常需要多轮heredoc才能完成。由于Node.js运行时在每轮heredoc后会退出且不保留状态,正常的工作流heredoc应从显式调用开始,以复用同一空间 — 这样就能跨轮次持续操作并复用标签页。例外情况是移交后的恢复:当用户确认“继续”(通过询问或聊天消息)后,下一轮heredoc应从开始。
useOrCreateTaskSpace(nameOrId)takeOverTaskSpace(nameOrId)nameOrIdnametaskIduseOrCreateTaskSpace创建新任务空间时,请使用能体现用户目标的简短名称。后续的问题跟进、修正、优化、重新检查和结果验证,即使你之前认为任务已完成,也请继续复用该任务空间。只有当用户明确开始一个独立的、不相关的目标时,才选择新的任务空间。建议使用返回的数字(例如)在后续轮次中恢复已知任务,以避免名称冲突。
useOrCreateTaskSpaceidtask.id对于同一用户目标的任何后续操作 — 包括继续、修正、重试、验证、用户反馈的问题,或调用后的工作 — 若原任务空间仍存在,请先恢复原任务空间。除非用户要求使用新空间、开始不相关目标,或检查后发现原空间不可用,否则不要为同一目标创建新空间。若必须创建新空间,请说明原因。
completeTaskSpace(..., { keep: true })在获得用户明确确认后,若要从现有的用户所有、非活跃或未分配的任务空间继续工作,请使用找到该空间,调用获取所有权并选中它,然后使用和选择具体的标签页后再执行操作。
await listTaskSpaces()await claimTaskSpace(id)await listTabs()await switchTab(targetId)所有权规则 — 每个任务空间都有属性;辅助函数对用户所有的空间处理方式不同:
ownership: 'agent' | 'agentDelegatedToUser' | 'user'| 辅助函数 | 目标空间为用户所有时的行为 |
|---|---|
| 抛出错误 — 仅支持Agent所有的空间 |
| 认领该空间(所有权转移给Agent),然后选中它 |
| 跳过操作 — 返回 |
| 跳过操作 — 返回 |
| 认领该空间,然后关闭它 |
| 不检查所有权 |
当操作实际执行时,和会返回。在告知用户移交/清理完成前,请检查的值 — 结果通常意味着你操作的空间从未属于你。
handOffTaskSpacecompleteTaskSpace{ done: true }doneskippedcompleteTaskSpace(nameOrId, { keep })keepfalse仅当用户明确要求保持页面打开、任务需要在该页面进行手动用户操作,或结果无法通过URL、文件、工件或摘要很好地交付时,才使用。不要仅仅因为访问过页面、创建过文档或使用过截图进行验证就保留任务空间。
{ keep: true }当传递可能创建新任务空间的字符串时,该字符串应反映任务的意图(例如);请勿使用字面占位符。
'search github issues'若任务结束后需要保留任务空间,请仅保留需要展示给用户的标签页。 随时留意打开的标签页数量 — 只需快速执行即可;无需专门用一轮操作来检查。当临时标签页(搜索结果页、交叉检查页和其他一次性页面)堆积时,请随时关闭,不要等到最后再清理。当使用为用户保留页面时,请清除剩余的临时标签页,只保留值得展示的页面。使用关闭单个标签页(来自或的返回值)。
(await listTabs()).length{ keep: true }await closeTab(targetId)targetIdlistTabs()openOrReuseTabControl handoff
控制权移交
Only one side — agent or user — holds control of a task space at any time. While the user holds control, any browser operation by the agent fails with a "user is controlling" message — do not retry it; follow the steps below to resume.
A "user is controlling" error is a hard stop on the whole task — not an obstacle to route around. It means the user has deliberately taken the browser back, often because your current approach is going wrong. Honoring it is the correct outcome here; pushing the goal forward anyway is the failure. The only thing you may do is ask the user and wait.
An "inactive", "not assigned to an agent", or similar task-space error is also a hard stop with the same confirmation requirement. Resume only after explicit user confirmation, then start with .
await claimTaskSpace(id)Handing off: When the task requires user intervention (e.g. login, captcha, manual confirmation), call to give control to the user, and tell them exactly what to do. Omitting uses the currently selected task space; pass across heredoc rounds to avoid ambiguity.
await handOffTaskSpace([nameOrId])nameOrIdtask.idRegaining control: Take control back only after the user explicitly confirms — through an Ask (your harness's button/option prompt, e.g. "Continue" vs "Finish task") or a "continue" message in chat. Then start a new heredoc with and resume; if the user chooses to finish, close out with . Never call on your own to grab control back — it has no ownership check and will seize the browser away from the user.
await takeOverTaskSpace([nameOrId])await completeTaskSpace(nameOrId, { keep })takeOverTaskSpaceUnexpected takeover: The user can take over at any time via the browser GUI — the same effect as the agent calling . Do not retry the failed operation and do not auto-takeover; surface the Ask above (Continue / Finish) and resume only when the user picks Continue.
handOffTaskSpaceawait waitForAgentControl(nameOrId)任何时候,只有一方(Agent或用户)持有任务空间的控制权。当用户持有控制权时,Agent的任何浏览器操作都会因“用户正在控制”的消息失败 — 请勿重试,请按照以下步骤恢复。
“用户正在控制”错误是整个任务的硬停止 — 不是可以绕过的障碍。这意味着用户已故意收回浏览器控制权,通常是因为当前方法出现问题。尊重该错误是正确的处理方式;强行推进目标是错误的。此时你唯一可做的是询问用户并等待。
“非活跃”“未分配给Agent”或类似的任务空间错误也是硬停止,同样需要用户确认。仅在获得用户明确确认后恢复,然后从开始。
await claimTaskSpace(id)移交控制权:当任务需要用户干预(例如登录、验证码、手动确认)时,调用将控制权交给用户,并明确告知他们需要执行的操作。省略时会使用当前选中的任务空间;跨轮次传递以避免歧义。
await handOffTaskSpace([nameOrId])nameOrIdtask.id收回控制权:仅在用户明确确认后(通过询问面板的按钮/选项提示,例如“继续”vs“完成任务”,或聊天中的“继续”消息),才能收回控制权。然后启动新的heredoc,从开始恢复操作;若用户选择完成任务,则调用结束任务。请勿自行调用收回控制权 — 它不检查所有权,会强行从用户手中夺取浏览器控制权。
await takeOverTaskSpace([nameOrId])await completeTaskSpace(nameOrId, { keep })takeOverTaskSpace意外接管:用户可随时通过浏览器GUI接管控制权 — 效果与Agent调用相同。请勿重试失败的操作,也不要自动接管;请显示上述询问(继续/完成),仅在用户选择继续后恢复操作。
handOffTaskSpaceawait waitForAgentControl(nameOrId)Scroll / mouse
滚动/鼠标
js
// DOM scroll
await scrollBy(900)
await scrollToBottomUntil(
async () => await js(String.raw`document.querySelectorAll('article').length`) >= 20,
{ step: 900, wait: 1, maxSteps: 20 }
)
// Real wheel event
await scroll({ dy: 900 })Element-target helpers such as , , , , , , and accept the same selector/ref surface: raw CSS, , / , and values from (, , ). refs are for ego-browser helpers only; they are not valid selectors inside .
clickdoubleClickhoverdragMousefillInputuploadFilewaitForElementxpath=...@Nref=Nloc=...snapshotText()loc=css:...loc=role:...loc=href:...@Ndocument.querySelector(...)clickdoubleClickhoverdragMouse- — CSS selector,
string,xpath=.../@N, orref=N; clicks the element's center.loc=... - or
[x, y]— viewport coordinates.{x, y} - — CSS selector,
{selector},xpath=.../@N, orref=N; clicks the element's center.loc=... - — offset from the element's top-left corner by
{selector, x, y}/x.y - (optional) — a 3-6 word action description; triggers a visual highlight animation.
options.label
js
await click('@21', { label: 'check login status' })
await click('button.primary', { label: 'click submit button' })
await click([420, 260])
await click({ x: 420, y: 260 })
await click({ selector: 'canvas#stage', x: 12, y: 8 })
await hover('@5', { label: 'hover to reveal menu' })
await dragMouse([from, to], { label: 'drag card' })js
// DOM滚动
await scrollBy(900)
await scrollToBottomUntil(
async () => await js(String.raw`document.querySelectorAll('article').length`) >= 20,
{ step: 900, wait: 1, maxSteps: 20 }
)
// 真实滚轮事件
await scroll({ dy: 900 })以元素为目标的辅助函数(如、、、、、和)接受相同的选择器/引用格式:原始CSS选择器、、/,以及返回的值(、、)。引用仅适用于ego-browser辅助函数;它们在中不是有效的选择器。
clickdoubleClickhoverdragMousefillInputuploadFilewaitForElementxpath=...@Nref=NsnapshotText()loc=...loc=css:...loc=role:...loc=href:...@Ndocument.querySelector(...)clickdoubleClickhoverdragMouse- — CSS选择器、
string、xpath=.../@N或ref=N;点击元素中心。loc=... - 或
[x, y]— 视口坐标。{x, y} - — CSS选择器、
{selector}、xpath=.../@N或ref=N;点击元素中心。loc=... - — 从元素左上角偏移
{selector, x, y}/x像素的位置。y - (可选) — 3-6个单词的操作描述;会触发视觉高亮动画。
options.label
js
await click('@21', { label: 'check login status' })
await click('button.primary', { label: 'click submit button' })
await click([420, 260])
await click({ x: 420, y: 260 })
await click({ selector: 'canvas#stage', x: 12, y: 8 })
await hover('@5', { label: 'hover to reveal menu' })
await dragMouse([from, to], { label: 'drag card' })uploadFile
uploadFile
js
await uploadFile('input[type="file"]', "/absolute/path/to/file.pdf")js
await uploadFile('input[type="file"]', "/absolute/path/to/file.pdf")js
js
js()Runtime.evaluate.toString()js()page.evaluate(fn, ...args)When you need to run multi-step logic inside the browser, wrap it in a single self-invoking closure and return once — don't split it across multiple calls:
await js()js
const data = await js(String.raw`(() => {
const items = [...document.querySelectorAll('article')]
return items.map(el => ({
text: el.innerText,
links: [...el.querySelectorAll('a')].map(a => a.href),
}))
})()`)js()Runtime.evaluate.toString()page.evaluate(fn, ...args)js()当你需要在浏览器中运行多步逻辑时,请将其包装在一个自执行闭包中并一次性返回 — 不要将其拆分为多个调用:
await js()js
const data = await js(String.raw`(() => {
const items = [...document.querySelectorAll('article')]
return items.map(el => ({
text: el.innerText,
links: [...el.querySelectorAll('a')].map(a => a.href),
}))
})()`)Recommended workflow
推荐工作流
ego-browser has three main workflows. Pick the workflow that fits the page and task before acting.
Use the semantic workflow first for ordinary websites with real DOM controls. For canvas-like productivity apps and rich editors — including Google Docs, Google Sheets, Lark/Feishu Docs, Notion, Figma, whiteboards, maps, and other virtualized editors — use the visual workflow first for the main editing surface. These apps often expose toolbars, title inputs, hidden textareas, offscreen iframes, or canvas layers in the DOM that do not represent the actual user-editable document or grid. Do not rely on , DOM selectors, or refs for the main editing surface unless a small write probe proves the text lands in the intended place.
await fillInput(...)snapshotText()Before writing substantial content into a rich editor, perform a tiny write probe, then verify it with , an export/readback path, or another reliable visual/state check. If the probe appears in the title bar, toolbar search, hidden input, or any wrong field, stop using DOM/input helpers for that surface and switch to screenshot-guided mouse actions plus real keyboard operations.
await captureScreenshot()-
Semantic workflow:+ refs / locators — default for most pages with normal text, links, buttons, forms, tables, and lists.
snapshotText()- Reuse or create a task space: .
const task = await useOrCreateTaskSpace(name) - Open or switch pages with ; use
await openOrReuseTab(url, { wait: true })only when navigating inside the current tab.await gotoAndWait(url, { timeout, settle }) - Observe with to get a full-page semantic tree annotated with
await snapshotText().[ref=N, loc=..., url=...] - Act with ,
await click('@N'), or stableawait fillInput('@N', ...)values. Use direct DOM logic only when it is simpler than helper calls.loc=... - After meaningful clicks, input, or navigation, observe again with ,
await snapshotText(), orawait pageInfo()before assuming success.await captureScreenshot()
- Reuse or create a task space:
-
Visual workflow:+ coordinate/keyboard actions — use when the page is primarily visual, canvas-like, heavily virtualized, or when accessibility / semantic structure is incomplete.
await captureScreenshot()- Inspect the screenshot, act with viewport coordinates such as ,
await click([x, y]),await doubleClick([x, y]), andawait pressKey(...), then verify with another screenshot or a reliable export/readback path.await typeText(...) - Prefer this path for rich editors, spreadsheets, visual menus, map/canvas UIs, drag interactions, and targets that are obvious visually but poor in the DOM/AX tree.
- Inspect the screenshot, act with viewport coordinates such as
-
Direct DOM / CDP workflow:/
await js(...)— use when you need browser state, compact data extraction, custom DOM traversal, or raw browser capabilities.await cdp(...)- Keep browser-side logic in one explicit IIFE and return once.
- Use for browser protocol operations that helpers do not cover.
await cdp(...)
These workflows can be combined. A task may take multiple heredoc rounds when the next step depends on fresh page state or user handoff. In each round, write a coherent script that advances the task: observe, act or extract, verify, and report with . Avoid tiny probe scripts, but don't force the whole task into one oversized script.
cliLog(...)ego-browser有三种主要工作流。在执行操作前,请选择适合当前页面和任务的工作流。
对于带有真实DOM控件的普通网站,优先使用语义化工作流。对于类似画布的生产力应用和富编辑器 — 包括Google Docs、Google Sheets、飞书文档(Lark/Feishu Docs)、Notion、Figma、白板、地图和其他虚拟化编辑器 — 主编辑区域优先使用视觉化工作流。这些应用通常会在DOM中暴露工具栏、标题输入框、隐藏文本区域、屏幕外iframe或画布层,但这些元素并不代表实际可编辑的文档或网格。除非小范围写入测试证明文本能正确输入到目标位置,否则不要依赖、DOM选择器或引用操作主编辑区域。
await fillInput(...)snapshotText()在向富编辑器写入大量内容前,请先执行一次小范围写入测试,然后通过、导出/回读路径或其他可靠的视觉/状态检查验证结果。若测试内容出现在标题栏、工具栏搜索框、隐藏输入框或任何错误字段中,请停止对该区域使用DOM/输入辅助函数,切换为基于截图引导的鼠标操作加上真实键盘操作。
await captureScreenshot()-
语义化工作流:+ 引用/定位器 — 适用于大多数包含普通文本、链接、按钮、表单、表格和列表的页面,为默认工作流。
snapshotText()- 复用或创建任务空间:。
const task = await useOrCreateTaskSpace(name) - 使用打开或切换页面;仅在当前标签页内导航时使用
await openOrReuseTab(url, { wait: true })。await gotoAndWait(url, { timeout, settle }) - 使用观测页面,获取带有
await snapshotText()标注的全页语义树。[ref=N, loc=..., url=...] - 使用、
await click('@N')或稳定的await fillInput('@N', ...)值执行操作。仅当直接DOM逻辑比辅助函数调用更简单时,才使用直接DOM逻辑。loc=... - 在执行有意义的点击、输入或导航后,先使用、
await snapshotText()或await pageInfo()再次观测,再假设操作成功。await captureScreenshot()
- 复用或创建任务空间:
-
视觉化工作流:+ 坐标/键盘操作 — 当页面主要为视觉化、画布类、高度虚拟化,或可访问性/语义结构不完整时使用。
await captureScreenshot()- 检查截图,使用视口坐标执行操作,如、
await click([x, y])、await doubleClick([x, y])和await pressKey(...),然后通过另一张截图或可靠的导出/回读路径验证结果。await typeText(...) - 优先将此路径用于富编辑器、电子表格、视觉菜单、地图/画布UI、拖拽交互,以及视觉上明显但DOM/AX树中结构不佳的目标。
- 检查截图,使用视口坐标执行操作,如
-
直接DOM/CDP工作流:/
await js(...)— 当你需要获取浏览器状态、提取紧凑数据、自定义DOM遍历或使用浏览器原生能力时使用。await cdp(...)- 将浏览器端逻辑放在一个显式的立即执行函数表达式(IIFE)中并一次性返回。
- 当辅助函数未覆盖所需功能时,使用执行浏览器协议操作。
await cdp(...)
这些工作流可以组合使用。当下一步操作依赖最新页面状态或用户移交时,一项任务可能需要多轮heredoc。在每一轮中,编写连贯的脚本推进任务:观测、执行操作或提取数据、验证,然后使用报告结果。避免编写过小的探测脚本,但也不要强行将整个任务塞进一个过大的脚本中。
cliLog(...)Caveats
注意事项
- and
wait(...)values are in seconds; only parameters whose names end intimeoutare milliseconds.Ms - defaults to
snapshotText(), covering the whole page. Use the default in almost every case; only passscope: 'full_page'when the task needs only visible content.scope: 'only_within_viewport' - refs are only valid for the most recent
@Ncall — every call rebuilds the refMap. Ref numbers come from the CDPsnapshotText, so the same element keeps the same number across calls; but to usebackendNodeId, N must appear in the latest snapshotText output. An element scrolled out of the viewport, a DOM re-render, or a previous call with@Nthat didn't cover the element will all causescope:'only_within_viewport'. For elements you need to reference long-term, use theUnknown refvalue from snapshotText output as a stable selector, or write a CSS selector directly.loc=... - returns the evaluated result, not a JSON string — don't wrap it with
js().JSON.parse(...) - Inside a template string, regex backslashes must be doubled (e.g.
js(...),\\d), or use\\s.String.raw - If the source passed to contains a top-level
js(), it will be auto-wrapped in an IIFE;returninside nested callbacks can also trigger this accidentally. For complex expressions, prefer the explicitreturnform.(() => { ... })() - If reports
await pageInfo()orw: 0, do not continue coordinate actions or screenshots until the viewport is fixed. Try switching to the real tab, reloading, or using CDP viewport metrics, then verify withh: 0andawait pageInfo().await captureScreenshot() - Code in the heredoc body runs in Node.js; code inside runs in the browser page. Navigation, waits, and
js(...)belong in the heredoc body;cliLog(...),document, and page selectors belong insidewindow.js(...) - Always call when the task is done — do not leave the space hanging. Default to
completeTaskSpace(name, { keep }); use{ keep: false }only for the concrete live-page cases described in Task spaces.{ keep: true } - When the user explicitly asks to use ego-browser, assume both and the repo runtime are ready. Do not pre-check
ego-browser,which ego-browser, package metadata, or help output. Only investigate environment issues if the first run produces an error.node -v - If the first run reports / a missing environment (most likely ego lite isn't installed yet), or the user explicitly asks to install ego lite, first read
command not foundand follow its flow to complete the install, then return to the original task — do not give up, and do not keep retrying the same heredoc.references/install.md
- 和
wait(...)值的单位为秒;只有名称以timeout结尾的参数单位才是毫秒。Ms - 默认
snapshotText(),覆盖整个页面。几乎所有情况下都使用默认值;仅当任务只需要可见内容时,才传递scope: 'full_page'。scope: 'only_within_viewport' - 引用仅对最近一次
@N调用有效 — 每次调用都会重建引用映射。引用编号来自CDP的snapshotText,因此同一元素在多次调用中会保持相同编号;但要使用backendNodeId,N必须出现在最新的snapshotText输出中。元素滚出视口、DOM重新渲染,或之前使用@N调用未覆盖该元素,都会导致scope:'only_within_viewport'错误。对于需要长期引用的元素,请使用snapshotText输出中的Unknown ref值作为稳定选择器,或直接编写CSS选择器。loc=... - 返回的是执行结果,而非JSON字符串 — 不要用
js()包裹它。JSON.parse(...) - 在模板字符串中,正则表达式的反斜杠必须加倍(例如
js(...)、\\d),或使用\\s。String.raw - 若传递给的代码包含顶层
js(),它会自动被包裹在IIFE中;嵌套回调中的return也可能意外触发此行为。对于复杂表达式,优先使用显式的return形式。(() => { ... })() - 若报告
await pageInfo()或w: 0,在视口问题解决前,请不要继续执行坐标操作或截图。尝试切换到真实标签页、重新加载页面,或使用CDP视口指标,然后通过h: 0和await pageInfo()验证。await captureScreenshot() - Heredoc主体中的代码在Node.js中运行;内部的代码在浏览器页面中运行。导航、等待和
js(...)应放在heredoc主体中;cliLog(...)、document和页面选择器应放在window内部。js(...) - 任务完成后,请务必调用— 不要让任务空间处于挂起状态。默认使用
completeTaskSpace(name, { keep });仅在任务空间部分描述的具体实时页面场景中使用{ keep: false }。{ keep: true } - 当用户明确要求使用ego-browser时,假设和仓库运行时均已就绪。请勿预先检查
ego-browser、which ego-browser、包元数据或帮助输出。仅当首次运行产生错误时,才排查环境问题。node -v - 若首次运行报告/缺少环境(最可能是ego lite尚未安装),或用户明确要求安装ego lite,请先阅读
command not found并按照流程完成安装,然后返回原任务 — 不要放弃,也不要反复重试同一个heredoc。references/install.md