actionbook-scraper

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Actionbook Scraper Skill

Actionbook 爬虫技能

⚠️ CRITICAL: Two-Part Verification

⚠️ 重要提示:两步验证

Every generated script MUST pass BOTH checks:
CheckWhat to VerifyFailure Example
Part 1: Script RunsNo errors, no timeouts
Selector not found
Part 2: Data CorrectContent matches expectedExtracted "Click to expand" instead of name
┌─────────────────────────────────────────────────────┐
│   1. Generate Script                                │
│          ↓                                          │
│   2. Execute Script                                 │
│          ↓                                          │
│   3. Check Part 1: Script runs without errors?      │
│          ↓                                          │
│   4. Check Part 2: Data content is correct?         │
│      - Not empty                                    │
│      - Not placeholder text ("Loading...")          │
│      - Not UI text ("Click to expand")              │
│      - Fields mapped correctly                      │
│          ↓                                          │
│      ┌───┴───┐                                      │
│   BOTH Pass  Either Fails                           │
│      │           │                                  │
│      │           ↓                                  │
│      │       Is it Actionbook data issue?           │
│      │           │                                  │
│      │       ┌───┴───┐                              │
│      │      Yes      No                             │
│      │       │       │                              │
│      │       ↓       ↓                              │
│      │    Log to   Fix script                       │
│      │    .actionbook-issues.log                    │
│      │       │       │                              │
│      │       └───┬───┘                              │
│      │           ↓                                  │
│      │       Retry (max 3x)                         │
│      ↓                                              │
│   Output Script                                     │
└─────────────────────────────────────────────────────┘
所有生成的脚本必须通过以下两项检查:
检查项验证内容失败示例
第一部分:脚本可运行无错误、无超时
Selector not found
第二部分:数据正确内容与预期匹配提取了“点击展开”而非名称
┌─────────────────────────────────────────────────────┐
│   1. 生成脚本                                │
│          ↓                                          │
│   2. 执行脚本                                 │
│          ↓                                          │
│   3. 检查第一部分:脚本可无错误运行?      │
│          ↓                                          │
│   4. 检查第二部分:数据内容是否正确?         │
│      - 非空                                    │
│      - 非占位文本("Loading...")          │
│      - 非UI文本("Click to expand")              │
│      - 字段映射正确                      │
│          ↓                                          │
│      ┌───┴───┐                                      │
│   两项均通过  任意一项失败                           │
│      │           │                                  │
│      │           ↓                                  │
│      │       是否为Actionbook数据问题?           │
│      │           │                                  │
│      │       ┌───┴───┐                              │
│      │      是      否                             │
│      │       │       │                              │
│      │       ↓       ↓                              │
│      │    记录至   修复脚本                       │
│      │    .actionbook-issues.log                    │
│      │       │       │                              │
│      │       └───┬───┘                              │
│      │           ↓                                  │
│      │       重试(最多3次)                         │
│      ↓                                              │
│   输出脚本                                     │
└─────────────────────────────────────────────────────┘

Default Output Format

默认输出格式

/actionbook-scraper:generate <url>
DEFAULT = agent-browser script (bash commands)
bash
agent-browser open "https://example.com"
agent-browser scroll down 2000
agent-browser get text ".selector"
agent-browser close
/actionbook-scraper:generate <url>
默认输出 = agent-browser 脚本(bash命令)
bash
agent-browser open "https://example.com"
agent-browser scroll down 2000
agent-browser get text ".selector"
agent-browser close

With --standalone Flag

使用 --standalone 标识

/actionbook-scraper:generate <url> --standalone
Output = Playwright JavaScript code

/actionbook-scraper:generate <url> --standalone
输出 = Playwright JavaScript 代码

Verification Requirements

验证要求

Two-Part Verification

两步验证

Every generated script must pass BOTH checks:
CheckWhat to VerifyFailure Action
1. Script RunsNo errors, no timeoutsFix syntax/selector errors
2. Data CorrectContent matches expected fieldsFix extraction logic
所有生成的脚本必须通过以下两项检查:
检查项验证内容失败处理
1. 脚本可运行无错误、无超时修复语法/选择器错误
2. 数据正确内容与预期字段匹配修复提取逻辑

Part 1: Script Execution Check

第一部分:脚本执行检查

  • No runtime errors
  • No timeout errors
  • Browser closes properly
  • 无运行时错误
  • 无超时错误
  • 浏览器正常关闭

Part 2: Data Content Check (CRITICAL)

第二部分:数据内容检查(重要)

Verify extracted data matches the expected structure:
Expected: Company name, description, website, year founded
Actual:   "Click to expand", "Loading...", empty strings

→ FAIL: Data content incorrect, need to fix extraction logic
Data validation rules:
RuleExample FailureFix
Fields not empty
name: ""
Check selector targets correct element
No placeholder text
name: "Loading..."
Add wait for dynamic content
No UI text
name: "Click to expand"
Extract after expanding, not button text
Correct data type
year: "View Details"
Wrong selector, fix field mapping
Reasonable countExpected ~100, got 3Add scroll/pagination handling
验证提取的数据是否符合预期结构:
预期:公司名称、描述、官网、成立年份
实际:"点击展开"、"Loading..."、空字符串

→ 失败:数据内容错误,需修复提取逻辑
数据验证规则:
规则失败示例修复方案
字段非空
name: ""
检查选择器是否指向正确元素
非占位文本
name: "Loading..."
添加动态内容等待逻辑
非UI文本
name: "Click to expand"
展开后再提取内容,而非按钮文本
数据类型正确
year: "查看详情"
选择器错误,修复字段映射
数量合理预期约100条,实际3条添加滚动/分页处理

For agent-browser Scripts

针对agent-browser脚本

  1. Execute the generated commands
  2. Check script runs without errors
  3. Check data content is correct:
    • Fields match expected structure
    • Values are actual data, not UI text
    • Count is reasonable
  4. If failed:
    • Analyze what's wrong (script error vs data error)
    • Fix selector, wait logic, or extraction
    • Re-execute
  5. If success:
    • Output the verified script
    • Show data preview with field validation
  1. 执行生成的命令
  2. 检查脚本可无错误运行
  3. 检查数据内容是否正确:
    • 字段符合预期结构
    • 值为实际数据,而非UI文本
    • 数量合理
  4. 若失败:
    • 分析问题所在(脚本错误 vs 数据错误)
    • 修复选择器、等待逻辑或提取规则
    • 重新执行
  5. 若成功:
    • 输出已验证的脚本
    • 显示带字段验证的数据预览

For Playwright Scripts (--standalone)

针对Playwright脚本(--standalone)

  1. Write script to temp file
  2. Run with
    node script.js
  3. Check script runs without errors
  4. Check output data is correct:
    • JSON structure matches expected fields
    • Values contain actual data
    • Count matches expected range
  5. If failed:
    • Analyze error type
    • Fix script
    • Re-run
  6. If success:
    • Output the verified script
  1. 将脚本写入临时文件
  2. 使用
    node script.js
    运行
  3. 检查脚本可无错误运行
  4. 检查输出数据是否正确:
    • JSON结构符合预期字段
    • 值包含实际数据
    • 数量符合预期范围
  5. 若失败:
    • 分析错误类型
    • 修复脚本
    • 重新运行
  6. 若成功:
    • 输出已验证的脚本

Architecture Overview

架构概述

/generate <url>              → OUTPUT: agent-browser bash commands
/generate <url> --standalone → OUTPUT: Playwright .js file
┌─────────────────────────────────────────────────────────────┐
│                   /generate <url>                           │
│                                                             │
│   1. Search Actionbook → get selectors                      │
│   2. Generate OUTPUT:                                       │
│                                                             │
│      WITHOUT --standalone    │    WITH --standalone         │
│      ─────────────────────   │    ──────────────────        │
│      agent-browser commands  │    Playwright .js code       │
│                              │                              │
│      ```bash                 │    ```javascript             │
│      agent-browser open ...  │    const { chromium } = ...  │
│      agent-browser get ...   │    await page.goto(...)      │
│      agent-browser close     │    ```                       │
│      ```                     │                              │
└─────────────────────────────────────────────────────────────┘
/generate <url>              → 输出:agent-browser bash命令
/generate <url> --standalone → 输出:Playwright .js文件
┌─────────────────────────────────────────────────────────────┐
│                   /generate <url>                           │
│                                                             │
│   1. 搜索Actionbook → 获取选择器                      │
│   2. 生成输出:                                       │
│                                                             │
│      不使用--standalone    │    使用--standalone         │
│      ─────────────────────   │    ──────────────────        │
│      agent-browser 命令  │    Playwright .js 代码       │
│                              │                              │
│      ```bash                 │    ```javascript             │
│      agent-browser open ...  │    const { chromium } = ...  │
│      agent-browser get ...   │    await page.goto(...)      │
│      agent-browser close     │    ```                       │
│      ```                     │                              │
└─────────────────────────────────────────────────────────────┘

Tool Priority

工具优先级

OperationPrimary ToolFallbackNotes
Find selectors for URL
search_actions
NoneSearch by domain/keywords
Get full selector details
get_action_by_id
NoneUse action_id from search
List available sources
list_sources
search_sources
Browse all indexed sites
Generate agent-browser scriptAgent (sonnet)-Default mode for /generate
Generate Playwright scriptAgent (sonnet)-Use --standalone flag
Structure analysisAgent (haiku)-Parse Actionbook response
Request new website
agent-browser
ManualSubmit to actionbook.dev (ONLY command that executes agent-browser)
操作主要工具备选方案说明
查找URL对应的选择器
search_actions
按域名/关键词搜索
获取完整选择器详情
get_action_by_id
使用搜索得到的action_id
列出可用数据源
list_sources
search_sources
浏览所有已索引站点
生成agent-browser脚本Agent (sonnet)-/generate命令的默认模式
生成Playwright脚本Agent (sonnet)-使用--standalone标识
结构分析Agent (haiku)-解析Actionbook响应
请求新增网站
agent-browser
手动提交至actionbook.dev(唯一会执行agent-browser的命令)

Workflow Rules

工作流规则

CRITICAL: Generate → Verify → Fix

重要:生成 → 验证 → 修复

Every generated script MUST be verified by executing it.
StepAction
1Generate script with Actionbook selectors
2Execute script to verify it works
3If failed: analyze error, fix script, go to step 2
4If success: output verified script + data preview
所有生成的脚本必须通过执行进行验证。
步骤操作
1使用Actionbook选择器生成脚本
2执行脚本以验证其可用
3若失败:分析错误、修复脚本,返回步骤2
4若成功:输出已验证的脚本 + 数据预览

Verification Process

验证流程

For agent-browser scripts:
bash
undefined
针对agent-browser脚本:
bash
undefined

Execute each command

执行每个命令

agent-browser open "https://example.com" agent-browser wait --load networkidle agent-browser get text ".selector"
agent-browser open "https://example.com" agent-browser wait --load networkidle agent-browser get text ".selector"

Check if data is returned

检查是否返回数据

If error → fix and retry

若出错 → 修复并重试

agent-browser close

**For Playwright scripts (--standalone):**
```bash
agent-browser close

**针对Playwright脚本(--standalone):**
```bash

Write to temp file and execute

写入临时文件并执行

node /tmp/scraper.js
node /tmp/scraper.js

Check if output file has data

检查输出文件是否有数据

If error → fix and retry

若出错 → 修复并重试

undefined
undefined

Critical Rules

核心规则

  1. ALWAYS verify generated scripts - Execute and check BOTH parts
  2. Part 1: Script must run - No errors, no timeouts
  3. Part 2: Data must be correct - Not empty, not UI text, fields mapped correctly
  4. Fix errors automatically - Don't output broken scripts or wrong data
  5. Use Actionbook MCP tools first - Never guess selectors
  6. Include scroll handling for lazy-loaded pages
  7. Include expand/collapse logic for card-based layouts
  8. Always close browser - Include
    agent-browser close
  9. Retry up to 3 times - If still failing, report the specific issue
  1. 始终验证生成的脚本 - 执行并检查两项内容
  2. 第一部分:脚本必须可运行 - 无错误、无超时
  3. 第二部分:数据必须正确 - 非空、非UI文本、字段映射正确
  4. 自动修复错误 - 不输出无效脚本或错误数据
  5. 优先使用Actionbook MCP工具 - 绝不猜测选择器
  6. 针对懒加载页面添加滚动处理
  7. 针对卡片式布局添加展开/折叠逻辑
  8. 始终关闭浏览器 - 必须包含
    agent-browser close
  9. 最多重试3次 - 若仍失败,报告具体问题

Common Data Errors to Catch

需排查的常见数据错误

ErrorExampleFix
Extracted button text
name: "Click to expand"
Extract content after expanding
Extracted placeholder
desc: "Loading..."
Add wait for dynamic content
Empty fields
name: ""
Fix selector
Wrong field mapping
year: "San Francisco"
Fix selector for each field
Too few itemsExpected 100, got 3Add scroll/pagination
错误示例修复方案
提取了按钮文本
name: "Click to expand"
展开后再提取内容
提取了占位文本
desc: "Loading..."
添加动态内容等待逻辑
字段为空
name: ""
修复选择器
字段映射错误
year: "旧金山"
修复各字段的选择器
结果数量过少预期100条,实际3条添加滚动/分页处理

Record Actionbook Data Issues

记录Actionbook数据问题

If Actionbook selectors are wrong or outdated, record to local file:
.actionbook-issues.log
When to record:
  • Selector doesn't exist on page
  • Selector returns wrong element
  • Page structure has changed
  • Missing selectors for key elements
Log format:
[YYYY-MM-DD HH:MM] URL: {url}
Action ID: {action_id}
Issue Type: {selector_error | outdated | missing}
Details: {description}
Selector: {selector}
Expected: {what it should select}
Actual: {what it actually selects or error}
---
若Actionbook选择器错误或过时,记录至本地文件:
.actionbook-issues.log
记录场景:
  • 选择器在页面中不存在
  • 选择器返回错误元素
  • 页面结构已变更
  • 关键元素缺少选择器
日志格式:
[YYYY-MM-DD HH:MM] URL: {url}
Action ID: {action_id}
问题类型: {selector_error | outdated | missing}
详情: {description}
选择器: {selector}
预期: 应选择的内容
实际: 实际选择的内容或错误信息
---

Selector Priority

选择器优先级

When Actionbook provides multiple selectors, prefer in this order:
  1. data-testid
    - Most stable, designed for automation
  2. aria-label
    - Accessibility-based, semantic
  3. css
    - Class-based selectors
  4. xpath
    - Last resort, most fragile
当Actionbook提供多个选择器时,按以下优先级选择:
  1. data-testid
    - 最稳定,专为自动化设计
  2. aria-label
    - 基于可访问性,语义化
  3. css
    - 基于类的选择器
  4. xpath
    - 最后选择,最不稳定

Commands

命令列表

CommandDescriptionAgent
/actionbook-scraper:analyze <url>
Analyze page structure and show available selectorsstructure-analyzer
/actionbook-scraper:generate <url>
Generate agent-browser scraper scriptcode-generator
/actionbook-scraper:generate <url> --standalone
Generate Playwright/Puppeteer scriptcode-generator
/actionbook-scraper:list-sources
List websites with Actionbook data-
/actionbook-scraper:request-website <url>
Request new website to be indexed (uses agent-browser)website-requester
命令描述代理
/actionbook-scraper:analyze <url>
分析页面结构并显示可用选择器structure-analyzer
/actionbook-scraper:generate <url>
生成agent-browser爬虫脚本code-generator
/actionbook-scraper:generate <url> --standalone
生成Playwright/Puppeteer脚本code-generator
/actionbook-scraper:list-sources
列出带有Actionbook数据的网站-
/actionbook-scraper:request-website <url>
请求新增网站索引(使用agent-browser)website-requester

Data Flow

数据流

Analyze Command

分析命令

1. User: /actionbook-scraper:analyze https://example.com/page
2. Extract domain from URL → "example.com"
3. search_actions("example page") → [action_ids]
4. For best match: get_action_by_id(action_id) → full selector data
5. Structure-analyzer agent formats and presents findings
1. 用户:/actionbook-scraper:analyze https://example.com/page
2. 从URL中提取域名 → "example.com"
3. search_actions("example page") → [action_ids]
4. 针对最佳匹配项:get_action_by_id(action_id) → 完整选择器数据
5. structure-analyzer代理格式化并展示分析结果

Generate Command (Default: agent-browser script)

生成命令(默认:agent-browser脚本)

User: /actionbook-scraper:generate https://example.com/page

Step 1: Search Actionbook
  search_actions("example.com page") → action_ids

Step 2: Get selectors
  get_action_by_id(best_match) → selectors

Step 3: Generate agent-browser script
  ```bash
  agent-browser open "https://example.com/page"
  agent-browser wait --load networkidle
  agent-browser scroll down 2000
  agent-browser get text ".item-container"
  agent-browser close
Step 4: VERIFY script (REQUIRED) Execute the commands and check if data is extracted If failed → analyze error → fix script → retry (max 3x)
Step 5: Return verified script + data preview

**Example Output:**
````markdown
用户:/actionbook-scraper:generate https://example.com/page

步骤1:搜索Actionbook
  search_actions("example.com page") → action_ids

步骤2:获取选择器
  get_action_by_id(最佳匹配项) → 选择器

步骤3:生成agent-browser脚本
  ```bash
  agent-browser open "https://example.com/page"
  agent-browser wait --load networkidle
  agent-browser scroll down 2000
  agent-browser get text ".item-container"
  agent-browser close
步骤4:验证脚本(必填) 执行命令并检查是否提取到数据 若失败 → 分析错误 → 修复脚本 → 重试(最多3次)
步骤5:返回已验证的脚本 + 数据预览

**示例输出:**
````markdown

Verified Scraper (agent-browser)

已验证爬虫(agent-browser)

Status: ✅ Verified (extracted 50 items)
Run these commands to scrape:
bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close
状态:✅ 已验证(提取50条数据)
运行以下命令进行爬取:
bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close

Data Preview

数据预览

json
[
  {"name": "Item 1", "description": "..."},
  {"name": "Item 2", "description": "..."},
  // ... showing first 3 items
]
undefined
json
[
  {"name": "Item 1", "description": "..."},
  {"name": "Item 2", "description": "..."},
  // ... 显示前3条数据
]
undefined

Generate Command (--standalone: Playwright script)

生成命令(--standalone:Playwright脚本)

User: /actionbook-scraper:generate https://example.com/page --standalone

Step 1: Search Actionbook for selectors
Step 2: Get full selector data
Step 3: Generate Playwright/Puppeteer script
Step 4: VERIFY script (REQUIRED)
  Write to temp file → node /tmp/scraper.js → check output
  If failed → analyze error → fix script → retry (max 3x)
Step 5: Return verified script + data preview
Example Output:
markdown
undefined
用户:/actionbook-scraper:generate https://example.com/page --standalone

步骤1:搜索Actionbook获取选择器
步骤2:获取完整选择器数据
步骤3:生成Playwright/Puppeteer脚本
步骤4:**验证脚本(必填)**
  写入临时文件 → node /tmp/scraper.js → 检查输出
  若失败 → 分析错误 → 修复脚本 → 重试(最多3次)

步骤5:返回已验证的脚本 + 数据预览
示例输出:
markdown
undefined

Verified Scraper (Playwright)

已验证爬虫(Playwright)

Status: ✅ Verified (extracted 50 items)
javascript
const { chromium } = require('playwright');
// ... generated code with Actionbook selectors
Usage:
bash
npm install playwright
node scraper.js
状态:✅ 已验证(提取50条数据)
javascript
const { chromium } = require('playwright');
// ... 包含Actionbook选择器的生成代码
使用方法:
bash
npm install playwright
node scraper.js

Data Preview

数据预览

json
[
  {"name": "Item 1", "description": "..."},
  // ... first 3 items
]
undefined
json
[
  {"name": "Item 1", "description": "..."},
  // ... 前3条数据
]
undefined

Request Website Command

请求网站命令

1. User: /actionbook-scraper:request-website https://newsite.com/page
2. Launch website-requester agent (uses agent-browser)
3. Agent workflow:
   a. agent-browser open "https://actionbook.dev/request-website"
   b. agent-browser snapshot -i (discover form selectors)
   c. agent-browser type <url-field> "https://newsite.com/page"
   d. agent-browser type <email-field> (optional)
   e. agent-browser type <usecase-field> (optional)
   f. agent-browser click <submit-button>
   g. agent-browser snapshot -i (verify submission)
   h. agent-browser close
4. Output: Confirmation of submission
1. 用户:/actionbook-scraper:request-website https://newsite.com/page
2. 启动website-requester代理(使用agent-browser)
3. 代理工作流:
   a. agent-browser open "https://actionbook.dev/request-website"
   b. agent-browser snapshot -i(发现表单选择器)
   c. agent-browser type <url字段> "https://newsite.com/page"
   d. agent-browser type <邮箱字段>(可选)
   e. agent-browser type <使用场景字段>(可选)
   f. agent-browser click <提交按钮>
   g. agent-browser snapshot -i(验证提交结果)
   h. agent-browser close
4. 输出:提交确认信息

Selector Data Structure

选择器数据结构

Actionbook returns selector data in this format:
json
{
  "url": "https://example.com/page",
  "title": "Page Title",
  "content": "## Selector Reference\n\n| Element | CSS | XPath | Type |\n..."
}
Actionbook返回的选择器数据格式如下:
json
{
  "url": "https://example.com/page",
  "title": "Page Title",
  "content": "## Selector Reference\n\n| Element | CSS | XPath | Type |\n..."
}

Common Selector Patterns

常见选择器模式

Card-based layouts:
Container: .card-list, .grid-container
Card item: .card, .list-item
Card name: .card__title, .card-name
Card description: .card__description
Expand button: .card__expand, button.expand
Detail extraction (dt/dd pattern):
javascript
// Common pattern for key-value pairs
const items = container.querySelectorAll('.info-item');
items.forEach(item => {
  const label = item.querySelector('dt').textContent;
  const value = item.querySelector('dd').textContent;
});
Table layouts:
Table: table, .data-table
Header: thead th, .table-header
Row: tbody tr, .table-row
Cell: td, .table-cell
卡片式布局:
容器: .card-list, .grid-container
卡片项: .card, .list-item
卡片名称: .card__title, .card-name
卡片描述: .card__description
展开按钮: .card__expand, button.expand
详情提取(dt/dd模式):
javascript
// 键值对的常见模式
const items = container.querySelectorAll('.info-item');
items.forEach(item => {
  const label = item.querySelector('dt').textContent;
  const value = item.querySelector('dd').textContent;
});
表格布局:
表格: table, .data-table
表头: thead th, .table-header
行: tbody tr, .table-row
单元格: td, .table-cell

Page Type Detection

页面类型检测

IndicatorPage TypeTemplate
Scroll to load moreDynamic/Infiniteplaywright-js (with scroll)
Click to expandCard-basedplaywright-js (with click)
Pagination linksPaginatedplaywright-js (with pagination)
Static contentStaticpuppeteer or playwright
SPA framework detectedSPAplaywright-js (network idle)
标识页面类型模板
滚动加载更多动态/无限滚动playwright-js(带滚动逻辑)
点击展开卡片式playwright-js(带点击逻辑)
分页链接分页式playwright-js(带分页逻辑)
静态内容静态puppeteer或playwright
检测到SPA框架单页应用playwright-js(network idle)

Output Formats

输出格式

Analysis Output

分析输出

markdown
undefined
markdown
undefined

Page Analysis: {url}

页面分析:{url}

Matched Action

匹配的Action

  • Action ID: {action_id}
  • Confidence: HIGH | MEDIUM | LOW
  • Action ID:{action_id}
  • 置信度:高 | 中 | 低

Available Selectors

可用选择器

ElementSelectorTypeMethods
{name}{selector}{type}{methods}
元素选择器类型方法
{name}{selector}{type}{methods}

Page Structure

页面结构

  • Type: {static|dynamic|spa}
  • Data Pattern: {cards|table|list}
  • Lazy Loading: {yes|no}
  • Expand/Collapse: {yes|no}
  • 类型:静态|动态|SPA
  • 数据模式:卡片|表格|列表
  • 懒加载:是|否
  • 展开/折叠:是|否

Recommendations

建议

  • Suggested template: {template}
  • Special handling needed: {notes}
undefined
  • 推荐模板:{template}
  • 需特殊处理:{说明}
undefined

Generated Code Output

生成代码输出

markdown
undefined
markdown
undefined

Generated Scraper

生成的爬虫

Target URL: {url} Template: {template} Expected Output: {description}
目标URL:{url} 模板:{template} 预期输出:{描述}

Dependencies

依赖

bash
npm install playwright
bash
npm install playwright

Code

代码

javascript
{generated_code}
javascript
{生成的代码}

Usage

使用方法

bash
node scraper.js
bash
node scraper.js

Output

输出

Results saved to
{output_file}
undefined
结果保存至
{output_file}
undefined

Templates Reference

模板参考

TemplateFlagOutputRun With
agent-browser(default)CLI commands
agent-browser
CLI
playwright-js--standalone.js file
node scraper.js
playwright-python--standalone --template playwright-python.py file
python scraper.py
puppeteer--standalone --template puppeteer.js file
node scraper.js
模板标识输出运行方式
agent-browser(默认)CLI命令
agent-browser
CLI
playwright-js--standalone.js文件
node scraper.js
playwright-python--standalone --template playwright-python.py文件
python scraper.py
puppeteer--standalone --template puppeteer.js文件
node scraper.js

Error Handling

错误处理

ErrorCauseSolution
No actions foundURL not indexedUse
/actionbook-scraper:request-website
to request indexing
Selectors not workingPage updatedReport to Actionbook, try alternative selectors
TimeoutSlow page loadIncrease timeout, add retry logic
Empty dataDynamic contentAdd scroll/wait handling
Form submission failedNetwork/page issueRetry or submit manually at actionbook.dev
错误原因解决方案
未找到ActionURL未被索引使用
/actionbook-scraper:request-website
请求索引
选择器无效页面已更新向Actionbook报告,尝试备选选择器
超时页面加载缓慢增加超时时间,添加重试逻辑
数据为空动态内容添加滚动/等待处理
表单提交失败网络/页面问题重试或手动在actionbook.dev提交

agent-browser Usage

agent-browser 使用说明

For the
request-website
command, the plugin uses agent-browser CLI to automate form submission.
request-website
命令中,插件使用agent-browser CLI自动完成表单提交。

agent-browser Commands

agent-browser 命令

bash
undefined
bash
undefined

Open a URL

打开URL

Get page snapshot (discover selectors)

获取页面快照(发现选择器)

agent-browser snapshot -i
agent-browser snapshot -i

Type into form field

在表单字段中输入内容

agent-browser type "input[name='url']" "https://example.com"
agent-browser type "input[name='url']" "https://example.com"

Click button

点击按钮

agent-browser click "button[type='submit']"
agent-browser click "button[type='submit']"

Close browser (ALWAYS do this)

关闭浏览器(必须执行)

agent-browser close
undefined
agent-browser close
undefined

Selector Discovery

选择器发现

If form selectors are unknown, use snapshot to discover them:
bash
agent-browser open "https://actionbook.dev/request-website"
agent-browser snapshot -i  # Returns page structure with selectors
若表单选择器未知,使用快照功能发现:
bash
agent-browser open "https://actionbook.dev/request-website"
agent-browser snapshot -i  # 返回带选择器的页面结构

Always Close Browser

始终关闭浏览器

Critical: Always run
agent-browser close
at the end of any agent-browser session, even if errors occur.
重要提示:无论是否出现错误,所有agent-browser会话结束时必须运行
agent-browser close

Rate Limiting

速率限制

  • Actionbook MCP: No rate limit for local usage
  • Target websites: Respect robots.txt and add delays between requests
  • Recommended: 1-2 second delay between page requests
  • Actionbook MCP:本地使用无速率限制
  • 目标网站:遵守robots.txt,在请求间添加延迟
  • 推荐:页面请求间隔1-2秒

Examples

示例

Example 1: Generate agent-browser Script (Default)

示例1:生成agent-browser脚本(默认)

/actionbook-scraper:generate https://firstround.com/companies

Output: agent-browser commands
```bash
agent-browser open "https://firstround.com/companies"
agent-browser scroll down 2000
agent-browser get text ".company-list-card-small"
agent-browser close
User runs these commands to scrape.
undefined
/actionbook-scraper:generate https://firstround.com/companies

输出:agent-browser命令
```bash
agent-browser open "https://firstround.com/companies"
agent-browser scroll down 2000
agent-browser get text ".company-list-card-small"
agent-browser close
用户运行这些命令进行爬取。
undefined

Example 2: Generate Playwright Script

示例2:生成Playwright脚本

/actionbook-scraper:generate https://firstround.com/companies --standalone

Output: Playwright JavaScript code
```javascript
const { chromium } = require('playwright');
// ... full script
User runs:
node scraper.js
undefined
/actionbook-scraper:generate https://firstround.com/companies --standalone

输出:Playwright JavaScript代码
```javascript
const { chromium } = require('playwright');
// ... 完整脚本
用户运行:
node scraper.js
undefined

Example 3: Analyze Page Structure

示例3:分析页面结构

/actionbook-scraper:analyze https://example.com/products

Output: Analysis showing:
- Available selectors
- Page structure
- Recommended approach
/actionbook-scraper:analyze https://example.com/products

输出:分析结果包含:
- 可用选择器
- 页面结构
- 推荐方案

Example 4: Request New Website

示例4:请求新增网站

/actionbook-scraper:request-website https://newsite.com/data

Action: Submits form to actionbook.dev (this command DOES execute agent-browser)
/actionbook-scraper:request-website https://newsite.com/data

操作:向actionbook.dev提交表单(该命令会执行agent-browser)

Best Practices

最佳实践

  1. Always analyze before generating - Understand the page structure first
  2. Check list-sources - Verify the site is indexed before attempting
  3. Review generated code - Verify selectors match expected elements
  4. Add appropriate delays - Be respectful to target servers
  5. Handle edge cases - Empty states, loading states, errors
  6. Test incrementally - Run on small subset before full scrape
  1. 生成前先分析 - 先了解页面结构
  2. 检查list-sources - 尝试前先验证站点已被索引
  3. 审查生成的代码 - 验证选择器与预期元素匹配
  4. 添加适当延迟 - 尊重目标服务器
  5. 处理边缘情况 - 空状态、加载状态、错误
  6. 增量测试 - 完整爬取前先在小范围内测试