actionbook-scraper
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseActionbook Scraper Skill
Actionbook 爬虫技能
⚠️ CRITICAL: Two-Part Verification
⚠️ 重要提示:两步验证
Every generated script MUST pass BOTH checks:
| Check | What to Verify | Failure Example |
|---|---|---|
| Part 1: Script Runs | No errors, no timeouts | |
| Part 2: Data Correct | Content matches expected | Extracted "Click to expand" instead of name |
┌─────────────────────────────────────────────────────┐
│ 1. Generate Script │
│ ↓ │
│ 2. Execute Script │
│ ↓ │
│ 3. Check Part 1: Script runs without errors? │
│ ↓ │
│ 4. Check Part 2: Data content is correct? │
│ - Not empty │
│ - Not placeholder text ("Loading...") │
│ - Not UI text ("Click to expand") │
│ - Fields mapped correctly │
│ ↓ │
│ ┌───┴───┐ │
│ BOTH Pass Either Fails │
│ │ │ │
│ │ ↓ │
│ │ Is it Actionbook data issue? │
│ │ │ │
│ │ ┌───┴───┐ │
│ │ Yes No │
│ │ │ │ │
│ │ ↓ ↓ │
│ │ Log to Fix script │
│ │ .actionbook-issues.log │
│ │ │ │ │
│ │ └───┬───┘ │
│ │ ↓ │
│ │ Retry (max 3x) │
│ ↓ │
│ Output Script │
└─────────────────────────────────────────────────────┘所有生成的脚本必须通过以下两项检查:
| 检查项 | 验证内容 | 失败示例 |
|---|---|---|
| 第一部分:脚本可运行 | 无错误、无超时 | |
| 第二部分:数据正确 | 内容与预期匹配 | 提取了“点击展开”而非名称 |
┌─────────────────────────────────────────────────────┐
│ 1. 生成脚本 │
│ ↓ │
│ 2. 执行脚本 │
│ ↓ │
│ 3. 检查第一部分:脚本可无错误运行? │
│ ↓ │
│ 4. 检查第二部分:数据内容是否正确? │
│ - 非空 │
│ - 非占位文本("Loading...") │
│ - 非UI文本("Click to expand") │
│ - 字段映射正确 │
│ ↓ │
│ ┌───┴───┐ │
│ 两项均通过 任意一项失败 │
│ │ │ │
│ │ ↓ │
│ │ 是否为Actionbook数据问题? │
│ │ │ │
│ │ ┌───┴───┐ │
│ │ 是 否 │
│ │ │ │ │
│ │ ↓ ↓ │
│ │ 记录至 修复脚本 │
│ │ .actionbook-issues.log │
│ │ │ │ │
│ │ └───┬───┘ │
│ │ ↓ │
│ │ 重试(最多3次) │
│ ↓ │
│ 输出脚本 │
└─────────────────────────────────────────────────────┘Default Output Format
默认输出格式
/actionbook-scraper:generate <url>DEFAULT = agent-browser script (bash commands)
bash
agent-browser open "https://example.com"
agent-browser scroll down 2000
agent-browser get text ".selector"
agent-browser close/actionbook-scraper:generate <url>默认输出 = agent-browser 脚本(bash命令)
bash
agent-browser open "https://example.com"
agent-browser scroll down 2000
agent-browser get text ".selector"
agent-browser closeWith --standalone Flag
使用 --standalone 标识
/actionbook-scraper:generate <url> --standaloneOutput = Playwright JavaScript code
/actionbook-scraper:generate <url> --standalone输出 = Playwright JavaScript 代码
Verification Requirements
验证要求
Two-Part Verification
两步验证
Every generated script must pass BOTH checks:
| Check | What to Verify | Failure Action |
|---|---|---|
| 1. Script Runs | No errors, no timeouts | Fix syntax/selector errors |
| 2. Data Correct | Content matches expected fields | Fix extraction logic |
所有生成的脚本必须通过以下两项检查:
| 检查项 | 验证内容 | 失败处理 |
|---|---|---|
| 1. 脚本可运行 | 无错误、无超时 | 修复语法/选择器错误 |
| 2. 数据正确 | 内容与预期字段匹配 | 修复提取逻辑 |
Part 1: Script Execution Check
第一部分:脚本执行检查
- No runtime errors
- No timeout errors
- Browser closes properly
- 无运行时错误
- 无超时错误
- 浏览器正常关闭
Part 2: Data Content Check (CRITICAL)
第二部分:数据内容检查(重要)
Verify extracted data matches the expected structure:
Expected: Company name, description, website, year founded
Actual: "Click to expand", "Loading...", empty strings
→ FAIL: Data content incorrect, need to fix extraction logicData validation rules:
| Rule | Example Failure | Fix |
|---|---|---|
| Fields not empty | | Check selector targets correct element |
| No placeholder text | | Add wait for dynamic content |
| No UI text | | Extract after expanding, not button text |
| Correct data type | | Wrong selector, fix field mapping |
| Reasonable count | Expected ~100, got 3 | Add scroll/pagination handling |
验证提取的数据是否符合预期结构:
预期:公司名称、描述、官网、成立年份
实际:"点击展开"、"Loading..."、空字符串
→ 失败:数据内容错误,需修复提取逻辑数据验证规则:
| 规则 | 失败示例 | 修复方案 |
|---|---|---|
| 字段非空 | | 检查选择器是否指向正确元素 |
| 非占位文本 | | 添加动态内容等待逻辑 |
| 非UI文本 | | 展开后再提取内容,而非按钮文本 |
| 数据类型正确 | | 选择器错误,修复字段映射 |
| 数量合理 | 预期约100条,实际3条 | 添加滚动/分页处理 |
For agent-browser Scripts
针对agent-browser脚本
- Execute the generated commands
- Check script runs without errors
- Check data content is correct:
- Fields match expected structure
- Values are actual data, not UI text
- Count is reasonable
- If failed:
- Analyze what's wrong (script error vs data error)
- Fix selector, wait logic, or extraction
- Re-execute
- If success:
- Output the verified script
- Show data preview with field validation
- 执行生成的命令
- 检查脚本可无错误运行
- 检查数据内容是否正确:
- 字段符合预期结构
- 值为实际数据,而非UI文本
- 数量合理
- 若失败:
- 分析问题所在(脚本错误 vs 数据错误)
- 修复选择器、等待逻辑或提取规则
- 重新执行
- 若成功:
- 输出已验证的脚本
- 显示带字段验证的数据预览
For Playwright Scripts (--standalone)
针对Playwright脚本(--standalone)
- Write script to temp file
- Run with
node script.js - Check script runs without errors
- Check output data is correct:
- JSON structure matches expected fields
- Values contain actual data
- Count matches expected range
- If failed:
- Analyze error type
- Fix script
- Re-run
- If success:
- Output the verified script
- 将脚本写入临时文件
- 使用运行
node script.js - 检查脚本可无错误运行
- 检查输出数据是否正确:
- JSON结构符合预期字段
- 值包含实际数据
- 数量符合预期范围
- 若失败:
- 分析错误类型
- 修复脚本
- 重新运行
- 若成功:
- 输出已验证的脚本
Architecture Overview
架构概述
/generate <url> → OUTPUT: agent-browser bash commands
/generate <url> --standalone → OUTPUT: Playwright .js file┌─────────────────────────────────────────────────────────────┐
│ /generate <url> │
│ │
│ 1. Search Actionbook → get selectors │
│ 2. Generate OUTPUT: │
│ │
│ WITHOUT --standalone │ WITH --standalone │
│ ───────────────────── │ ────────────────── │
│ agent-browser commands │ Playwright .js code │
│ │ │
│ ```bash │ ```javascript │
│ agent-browser open ... │ const { chromium } = ... │
│ agent-browser get ... │ await page.goto(...) │
│ agent-browser close │ ``` │
│ ``` │ │
└─────────────────────────────────────────────────────────────┘/generate <url> → 输出:agent-browser bash命令
/generate <url> --standalone → 输出:Playwright .js文件┌─────────────────────────────────────────────────────────────┐
│ /generate <url> │
│ │
│ 1. 搜索Actionbook → 获取选择器 │
│ 2. 生成输出: │
│ │
│ 不使用--standalone │ 使用--standalone │
│ ───────────────────── │ ────────────────── │
│ agent-browser 命令 │ Playwright .js 代码 │
│ │ │
│ ```bash │ ```javascript │
│ agent-browser open ... │ const { chromium } = ... │
│ agent-browser get ... │ await page.goto(...) │
│ agent-browser close │ ``` │
│ ``` │ │
└─────────────────────────────────────────────────────────────┘Tool Priority
工具优先级
| Operation | Primary Tool | Fallback | Notes |
|---|---|---|---|
| Find selectors for URL | | None | Search by domain/keywords |
| Get full selector details | | None | Use action_id from search |
| List available sources | | | Browse all indexed sites |
| Generate agent-browser script | Agent (sonnet) | - | Default mode for /generate |
| Generate Playwright script | Agent (sonnet) | - | Use --standalone flag |
| Structure analysis | Agent (haiku) | - | Parse Actionbook response |
| Request new website | | Manual | Submit to actionbook.dev (ONLY command that executes agent-browser) |
| 操作 | 主要工具 | 备选方案 | 说明 |
|---|---|---|---|
| 查找URL对应的选择器 | | 无 | 按域名/关键词搜索 |
| 获取完整选择器详情 | | 无 | 使用搜索得到的action_id |
| 列出可用数据源 | | | 浏览所有已索引站点 |
| 生成agent-browser脚本 | Agent (sonnet) | - | /generate命令的默认模式 |
| 生成Playwright脚本 | Agent (sonnet) | - | 使用--standalone标识 |
| 结构分析 | Agent (haiku) | - | 解析Actionbook响应 |
| 请求新增网站 | | 手动 | 提交至actionbook.dev(唯一会执行agent-browser的命令) |
Workflow Rules
工作流规则
CRITICAL: Generate → Verify → Fix
重要:生成 → 验证 → 修复
Every generated script MUST be verified by executing it.
| Step | Action |
|---|---|
| 1 | Generate script with Actionbook selectors |
| 2 | Execute script to verify it works |
| 3 | If failed: analyze error, fix script, go to step 2 |
| 4 | If success: output verified script + data preview |
所有生成的脚本必须通过执行进行验证。
| 步骤 | 操作 |
|---|---|
| 1 | 使用Actionbook选择器生成脚本 |
| 2 | 执行脚本以验证其可用 |
| 3 | 若失败:分析错误、修复脚本,返回步骤2 |
| 4 | 若成功:输出已验证的脚本 + 数据预览 |
Verification Process
验证流程
For agent-browser scripts:
bash
undefined针对agent-browser脚本:
bash
undefinedExecute each command
执行每个命令
agent-browser open "https://example.com"
agent-browser wait --load networkidle
agent-browser get text ".selector"
agent-browser open "https://example.com"
agent-browser wait --load networkidle
agent-browser get text ".selector"
Check if data is returned
检查是否返回数据
If error → fix and retry
若出错 → 修复并重试
agent-browser close
**For Playwright scripts (--standalone):**
```bashagent-browser close
**针对Playwright脚本(--standalone):**
```bashWrite to temp file and execute
写入临时文件并执行
node /tmp/scraper.js
node /tmp/scraper.js
Check if output file has data
检查输出文件是否有数据
If error → fix and retry
若出错 → 修复并重试
undefinedundefinedCritical Rules
核心规则
- ALWAYS verify generated scripts - Execute and check BOTH parts
- Part 1: Script must run - No errors, no timeouts
- Part 2: Data must be correct - Not empty, not UI text, fields mapped correctly
- Fix errors automatically - Don't output broken scripts or wrong data
- Use Actionbook MCP tools first - Never guess selectors
- Include scroll handling for lazy-loaded pages
- Include expand/collapse logic for card-based layouts
- Always close browser - Include
agent-browser close - Retry up to 3 times - If still failing, report the specific issue
- 始终验证生成的脚本 - 执行并检查两项内容
- 第一部分:脚本必须可运行 - 无错误、无超时
- 第二部分:数据必须正确 - 非空、非UI文本、字段映射正确
- 自动修复错误 - 不输出无效脚本或错误数据
- 优先使用Actionbook MCP工具 - 绝不猜测选择器
- 针对懒加载页面添加滚动处理
- 针对卡片式布局添加展开/折叠逻辑
- 始终关闭浏览器 - 必须包含
agent-browser close - 最多重试3次 - 若仍失败,报告具体问题
Common Data Errors to Catch
需排查的常见数据错误
| Error | Example | Fix |
|---|---|---|
| Extracted button text | | Extract content after expanding |
| Extracted placeholder | | Add wait for dynamic content |
| Empty fields | | Fix selector |
| Wrong field mapping | | Fix selector for each field |
| Too few items | Expected 100, got 3 | Add scroll/pagination |
| 错误 | 示例 | 修复方案 |
|---|---|---|
| 提取了按钮文本 | | 展开后再提取内容 |
| 提取了占位文本 | | 添加动态内容等待逻辑 |
| 字段为空 | | 修复选择器 |
| 字段映射错误 | | 修复各字段的选择器 |
| 结果数量过少 | 预期100条,实际3条 | 添加滚动/分页处理 |
Record Actionbook Data Issues
记录Actionbook数据问题
If Actionbook selectors are wrong or outdated, record to local file:
.actionbook-issues.logWhen to record:
- Selector doesn't exist on page
- Selector returns wrong element
- Page structure has changed
- Missing selectors for key elements
Log format:
[YYYY-MM-DD HH:MM] URL: {url}
Action ID: {action_id}
Issue Type: {selector_error | outdated | missing}
Details: {description}
Selector: {selector}
Expected: {what it should select}
Actual: {what it actually selects or error}
---若Actionbook选择器错误或过时,记录至本地文件:
.actionbook-issues.log记录场景:
- 选择器在页面中不存在
- 选择器返回错误元素
- 页面结构已变更
- 关键元素缺少选择器
日志格式:
[YYYY-MM-DD HH:MM] URL: {url}
Action ID: {action_id}
问题类型: {selector_error | outdated | missing}
详情: {description}
选择器: {selector}
预期: 应选择的内容
实际: 实际选择的内容或错误信息
---Selector Priority
选择器优先级
When Actionbook provides multiple selectors, prefer in this order:
- - Most stable, designed for automation
data-testid - - Accessibility-based, semantic
aria-label - - Class-based selectors
css - - Last resort, most fragile
xpath
当Actionbook提供多个选择器时,按以下优先级选择:
- - 最稳定,专为自动化设计
data-testid - - 基于可访问性,语义化
aria-label - - 基于类的选择器
css - - 最后选择,最不稳定
xpath
Commands
命令列表
| Command | Description | Agent |
|---|---|---|
| Analyze page structure and show available selectors | structure-analyzer |
| Generate agent-browser scraper script | code-generator |
| Generate Playwright/Puppeteer script | code-generator |
| List websites with Actionbook data | - |
| Request new website to be indexed (uses agent-browser) | website-requester |
| 命令 | 描述 | 代理 |
|---|---|---|
| 分析页面结构并显示可用选择器 | structure-analyzer |
| 生成agent-browser爬虫脚本 | code-generator |
| 生成Playwright/Puppeteer脚本 | code-generator |
| 列出带有Actionbook数据的网站 | - |
| 请求新增网站索引(使用agent-browser) | website-requester |
Data Flow
数据流
Analyze Command
分析命令
1. User: /actionbook-scraper:analyze https://example.com/page
2. Extract domain from URL → "example.com"
3. search_actions("example page") → [action_ids]
4. For best match: get_action_by_id(action_id) → full selector data
5. Structure-analyzer agent formats and presents findings1. 用户:/actionbook-scraper:analyze https://example.com/page
2. 从URL中提取域名 → "example.com"
3. search_actions("example page") → [action_ids]
4. 针对最佳匹配项:get_action_by_id(action_id) → 完整选择器数据
5. structure-analyzer代理格式化并展示分析结果Generate Command (Default: agent-browser script)
生成命令(默认:agent-browser脚本)
User: /actionbook-scraper:generate https://example.com/page
Step 1: Search Actionbook
search_actions("example.com page") → action_ids
Step 2: Get selectors
get_action_by_id(best_match) → selectors
Step 3: Generate agent-browser script
```bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser closeStep 4: VERIFY script (REQUIRED)
Execute the commands and check if data is extracted
If failed → analyze error → fix script → retry (max 3x)
Step 5: Return verified script + data preview
**Example Output:**
````markdown用户:/actionbook-scraper:generate https://example.com/page
步骤1:搜索Actionbook
search_actions("example.com page") → action_ids
步骤2:获取选择器
get_action_by_id(最佳匹配项) → 选择器
步骤3:生成agent-browser脚本
```bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close步骤4:验证脚本(必填)
执行命令并检查是否提取到数据
若失败 → 分析错误 → 修复脚本 → 重试(最多3次)
步骤5:返回已验证的脚本 + 数据预览
**示例输出:**
````markdownVerified Scraper (agent-browser)
已验证爬虫(agent-browser)
Status: ✅ Verified (extracted 50 items)
Run these commands to scrape:
bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close状态:✅ 已验证(提取50条数据)
运行以下命令进行爬取:
bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser closeData Preview
数据预览
json
[
{"name": "Item 1", "description": "..."},
{"name": "Item 2", "description": "..."},
// ... showing first 3 items
]undefinedjson
[
{"name": "Item 1", "description": "..."},
{"name": "Item 2", "description": "..."},
// ... 显示前3条数据
]undefinedGenerate Command (--standalone: Playwright script)
生成命令(--standalone:Playwright脚本)
User: /actionbook-scraper:generate https://example.com/page --standalone
Step 1: Search Actionbook for selectors
Step 2: Get full selector data
Step 3: Generate Playwright/Puppeteer script
Step 4: VERIFY script (REQUIRED)
Write to temp file → node /tmp/scraper.js → check output
If failed → analyze error → fix script → retry (max 3x)
Step 5: Return verified script + data previewExample Output:
markdown
undefined用户:/actionbook-scraper:generate https://example.com/page --standalone
步骤1:搜索Actionbook获取选择器
步骤2:获取完整选择器数据
步骤3:生成Playwright/Puppeteer脚本
步骤4:**验证脚本(必填)**
写入临时文件 → node /tmp/scraper.js → 检查输出
若失败 → 分析错误 → 修复脚本 → 重试(最多3次)
步骤5:返回已验证的脚本 + 数据预览示例输出:
markdown
undefinedVerified Scraper (Playwright)
已验证爬虫(Playwright)
Status: ✅ Verified (extracted 50 items)
javascript
const { chromium } = require('playwright');
// ... generated code with Actionbook selectorsUsage:
bash
npm install playwright
node scraper.js状态:✅ 已验证(提取50条数据)
javascript
const { chromium } = require('playwright');
// ... 包含Actionbook选择器的生成代码使用方法:
bash
npm install playwright
node scraper.jsData Preview
数据预览
json
[
{"name": "Item 1", "description": "..."},
// ... first 3 items
]undefinedjson
[
{"name": "Item 1", "description": "..."},
// ... 前3条数据
]undefinedRequest Website Command
请求网站命令
1. User: /actionbook-scraper:request-website https://newsite.com/page
2. Launch website-requester agent (uses agent-browser)
3. Agent workflow:
a. agent-browser open "https://actionbook.dev/request-website"
b. agent-browser snapshot -i (discover form selectors)
c. agent-browser type <url-field> "https://newsite.com/page"
d. agent-browser type <email-field> (optional)
e. agent-browser type <usecase-field> (optional)
f. agent-browser click <submit-button>
g. agent-browser snapshot -i (verify submission)
h. agent-browser close
4. Output: Confirmation of submission1. 用户:/actionbook-scraper:request-website https://newsite.com/page
2. 启动website-requester代理(使用agent-browser)
3. 代理工作流:
a. agent-browser open "https://actionbook.dev/request-website"
b. agent-browser snapshot -i(发现表单选择器)
c. agent-browser type <url字段> "https://newsite.com/page"
d. agent-browser type <邮箱字段>(可选)
e. agent-browser type <使用场景字段>(可选)
f. agent-browser click <提交按钮>
g. agent-browser snapshot -i(验证提交结果)
h. agent-browser close
4. 输出:提交确认信息Selector Data Structure
选择器数据结构
Actionbook returns selector data in this format:
json
{
"url": "https://example.com/page",
"title": "Page Title",
"content": "## Selector Reference\n\n| Element | CSS | XPath | Type |\n..."
}Actionbook返回的选择器数据格式如下:
json
{
"url": "https://example.com/page",
"title": "Page Title",
"content": "## Selector Reference\n\n| Element | CSS | XPath | Type |\n..."
}Common Selector Patterns
常见选择器模式
Card-based layouts:
Container: .card-list, .grid-container
Card item: .card, .list-item
Card name: .card__title, .card-name
Card description: .card__description
Expand button: .card__expand, button.expandDetail extraction (dt/dd pattern):
javascript
// Common pattern for key-value pairs
const items = container.querySelectorAll('.info-item');
items.forEach(item => {
const label = item.querySelector('dt').textContent;
const value = item.querySelector('dd').textContent;
});Table layouts:
Table: table, .data-table
Header: thead th, .table-header
Row: tbody tr, .table-row
Cell: td, .table-cell卡片式布局:
容器: .card-list, .grid-container
卡片项: .card, .list-item
卡片名称: .card__title, .card-name
卡片描述: .card__description
展开按钮: .card__expand, button.expand详情提取(dt/dd模式):
javascript
// 键值对的常见模式
const items = container.querySelectorAll('.info-item');
items.forEach(item => {
const label = item.querySelector('dt').textContent;
const value = item.querySelector('dd').textContent;
});表格布局:
表格: table, .data-table
表头: thead th, .table-header
行: tbody tr, .table-row
单元格: td, .table-cellPage Type Detection
页面类型检测
| Indicator | Page Type | Template |
|---|---|---|
| Scroll to load more | Dynamic/Infinite | playwright-js (with scroll) |
| Click to expand | Card-based | playwright-js (with click) |
| Pagination links | Paginated | playwright-js (with pagination) |
| Static content | Static | puppeteer or playwright |
| SPA framework detected | SPA | playwright-js (network idle) |
| 标识 | 页面类型 | 模板 |
|---|---|---|
| 滚动加载更多 | 动态/无限滚动 | playwright-js(带滚动逻辑) |
| 点击展开 | 卡片式 | playwright-js(带点击逻辑) |
| 分页链接 | 分页式 | playwright-js(带分页逻辑) |
| 静态内容 | 静态 | puppeteer或playwright |
| 检测到SPA框架 | 单页应用 | playwright-js(network idle) |
Output Formats
输出格式
Analysis Output
分析输出
markdown
undefinedmarkdown
undefinedPage Analysis: {url}
页面分析:{url}
Matched Action
匹配的Action
- Action ID: {action_id}
- Confidence: HIGH | MEDIUM | LOW
- Action ID:{action_id}
- 置信度:高 | 中 | 低
Available Selectors
可用选择器
| Element | Selector | Type | Methods |
|---|---|---|---|
| {name} | {selector} | {type} | {methods} |
| 元素 | 选择器 | 类型 | 方法 |
|---|---|---|---|
| {name} | {selector} | {type} | {methods} |
Page Structure
页面结构
- Type: {static|dynamic|spa}
- Data Pattern: {cards|table|list}
- Lazy Loading: {yes|no}
- Expand/Collapse: {yes|no}
- 类型:静态|动态|SPA
- 数据模式:卡片|表格|列表
- 懒加载:是|否
- 展开/折叠:是|否
Recommendations
建议
- Suggested template: {template}
- Special handling needed: {notes}
undefined- 推荐模板:{template}
- 需特殊处理:{说明}
undefinedGenerated Code Output
生成代码输出
markdown
undefinedmarkdown
undefinedGenerated Scraper
生成的爬虫
Target URL: {url}
Template: {template}
Expected Output: {description}
目标URL:{url}
模板:{template}
预期输出:{描述}
Dependencies
依赖
bash
npm install playwrightbash
npm install playwrightCode
代码
javascript
{generated_code}javascript
{生成的代码}Usage
使用方法
bash
node scraper.jsbash
node scraper.jsOutput
输出
Results saved to
{output_file}undefined结果保存至
{output_file}undefinedTemplates Reference
模板参考
| Template | Flag | Output | Run With |
|---|---|---|---|
| agent-browser | (default) | CLI commands | |
| playwright-js | --standalone | .js file | |
| playwright-python | --standalone --template playwright-python | .py file | |
| puppeteer | --standalone --template puppeteer | .js file | |
| 模板 | 标识 | 输出 | 运行方式 |
|---|---|---|---|
| agent-browser | (默认) | CLI命令 | |
| playwright-js | --standalone | .js文件 | |
| playwright-python | --standalone --template playwright-python | .py文件 | |
| puppeteer | --standalone --template puppeteer | .js文件 | |
Error Handling
错误处理
| Error | Cause | Solution |
|---|---|---|
| No actions found | URL not indexed | Use |
| Selectors not working | Page updated | Report to Actionbook, try alternative selectors |
| Timeout | Slow page load | Increase timeout, add retry logic |
| Empty data | Dynamic content | Add scroll/wait handling |
| Form submission failed | Network/page issue | Retry or submit manually at actionbook.dev |
| 错误 | 原因 | 解决方案 |
|---|---|---|
| 未找到Action | URL未被索引 | 使用 |
| 选择器无效 | 页面已更新 | 向Actionbook报告,尝试备选选择器 |
| 超时 | 页面加载缓慢 | 增加超时时间,添加重试逻辑 |
| 数据为空 | 动态内容 | 添加滚动/等待处理 |
| 表单提交失败 | 网络/页面问题 | 重试或手动在actionbook.dev提交 |
agent-browser Usage
agent-browser 使用说明
For the command, the plugin uses agent-browser CLI to automate form submission.
request-website在命令中,插件使用agent-browser CLI自动完成表单提交。
request-websiteagent-browser Commands
agent-browser 命令
bash
undefinedbash
undefinedOpen a URL
打开URL
agent-browser open "https://actionbook.dev/request-website"
agent-browser open "https://actionbook.dev/request-website"
Get page snapshot (discover selectors)
获取页面快照(发现选择器)
agent-browser snapshot -i
agent-browser snapshot -i
Type into form field
在表单字段中输入内容
agent-browser type "input[name='url']" "https://example.com"
agent-browser type "input[name='url']" "https://example.com"
Click button
点击按钮
agent-browser click "button[type='submit']"
agent-browser click "button[type='submit']"
Close browser (ALWAYS do this)
关闭浏览器(必须执行)
agent-browser close
undefinedagent-browser close
undefinedSelector Discovery
选择器发现
If form selectors are unknown, use snapshot to discover them:
bash
agent-browser open "https://actionbook.dev/request-website"
agent-browser snapshot -i # Returns page structure with selectors若表单选择器未知,使用快照功能发现:
bash
agent-browser open "https://actionbook.dev/request-website"
agent-browser snapshot -i # 返回带选择器的页面结构Always Close Browser
始终关闭浏览器
Critical: Always run at the end of any agent-browser session, even if errors occur.
agent-browser close重要提示:无论是否出现错误,所有agent-browser会话结束时必须运行。
agent-browser closeRate Limiting
速率限制
- Actionbook MCP: No rate limit for local usage
- Target websites: Respect robots.txt and add delays between requests
- Recommended: 1-2 second delay between page requests
- Actionbook MCP:本地使用无速率限制
- 目标网站:遵守robots.txt,在请求间添加延迟
- 推荐:页面请求间隔1-2秒
Examples
示例
Example 1: Generate agent-browser Script (Default)
示例1:生成agent-browser脚本(默认)
/actionbook-scraper:generate https://firstround.com/companies
Output: agent-browser commands
```bash
agent-browser open "https://firstround.com/companies"
agent-browser scroll down 2000
agent-browser get text ".company-list-card-small"
agent-browser closeUser runs these commands to scrape.
undefined/actionbook-scraper:generate https://firstround.com/companies
输出:agent-browser命令
```bash
agent-browser open "https://firstround.com/companies"
agent-browser scroll down 2000
agent-browser get text ".company-list-card-small"
agent-browser close用户运行这些命令进行爬取。
undefinedExample 2: Generate Playwright Script
示例2:生成Playwright脚本
/actionbook-scraper:generate https://firstround.com/companies --standalone
Output: Playwright JavaScript code
```javascript
const { chromium } = require('playwright');
// ... full scriptUser runs:
node scraper.jsundefined/actionbook-scraper:generate https://firstround.com/companies --standalone
输出:Playwright JavaScript代码
```javascript
const { chromium } = require('playwright');
// ... 完整脚本用户运行:
node scraper.jsundefinedExample 3: Analyze Page Structure
示例3:分析页面结构
/actionbook-scraper:analyze https://example.com/products
Output: Analysis showing:
- Available selectors
- Page structure
- Recommended approach/actionbook-scraper:analyze https://example.com/products
输出:分析结果包含:
- 可用选择器
- 页面结构
- 推荐方案Example 4: Request New Website
示例4:请求新增网站
/actionbook-scraper:request-website https://newsite.com/data
Action: Submits form to actionbook.dev (this command DOES execute agent-browser)/actionbook-scraper:request-website https://newsite.com/data
操作:向actionbook.dev提交表单(该命令会执行agent-browser)Best Practices
最佳实践
- Always analyze before generating - Understand the page structure first
- Check list-sources - Verify the site is indexed before attempting
- Review generated code - Verify selectors match expected elements
- Add appropriate delays - Be respectful to target servers
- Handle edge cases - Empty states, loading states, errors
- Test incrementally - Run on small subset before full scrape
- 生成前先分析 - 先了解页面结构
- 检查list-sources - 尝试前先验证站点已被索引
- 审查生成的代码 - 验证选择器与预期元素匹配
- 添加适当延迟 - 尊重目标服务器
- 处理边缘情况 - 空状态、加载状态、错误
- 增量测试 - 完整爬取前先在小范围内测试