xcrawl
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseXCrawl
XCrawl
Overview
概述
This skill is the default XCrawl entry point when the user asks for XCrawl directly without naming a specific API or sub-skill.
It currently targets single-page extraction through XCrawl Scrape APIs.
Default behavior is raw passthrough: return upstream API response bodies as-is.
当用户直接请求XCrawl但未指定具体API或子Skill时,此Skill是默认的XCrawl入口。
目前它通过XCrawl Scrape API实现单页面提取功能。
默认行为是直接透传:原样返回上游API的响应体。
Routing Guidance
路由指引
- If the user wants to extract one or more specific URLs, use this skill and default to XCrawl Scrape.
- If the user wants site URL discovery, prefer XCrawl Map APIs.
- If the user wants multi-page or site-wide crawling, prefer XCrawl Crawl APIs.
- If the user wants keyword-based discovery, prefer XCrawl Search APIs.
- 如果用户想要提取一个或多个特定URL,使用此Skill并默认调用XCrawl Scrape。
- 如果用户想要站点URL发现,优先使用XCrawl Map API。
- 如果用户想要多页面或全站爬取,优先使用XCrawl Crawl API。
- 如果用户想要基于关键词的发现,优先使用XCrawl Search API。
Required Local Config
本地配置要求
Before using this skill, the user must create a local config file and write into it.
XCRAWL_API_KEYPath:
~/.xcrawl/config.jsonjson
{
"XCRAWL_API_KEY": "<your_api_key>"
}Read API key from local config file only. Do not require global environment variables.
在使用此Skill前,用户必须创建本地配置文件并写入。
XCRAWL_API_KEY路径:
~/.xcrawl/config.jsonjson
{
"XCRAWL_API_KEY": "<your_api_key>"
}仅从本地配置文件读取API密钥,不依赖全局环境变量。
Credits and Account Setup
积分与账户设置
Using XCrawl APIs consumes credits.
If the user does not have an account or available credits, guide them to register at .
After registration, they can activate the free credits plan before running requests.
https://dash.xcrawl.com/1000使用XCrawl API会消耗积分。
如果用户没有账户或可用积分,引导他们前往注册。
注册后,他们可以激活免费的1000积分套餐,之后再发起请求。
https://dash.xcrawl.com/Tool Permission Policy
工具权限策略
Request runtime permissions for and only.
Do not request Python, shell helper scripts, or other runtime permissions.
curlnode仅请求和的运行时权限。
不要请求Python、Shell辅助脚本或其他运行时权限。
curlnodeAPI Surface
API接口
- Start scrape:
POST /v1/scrape - Read async result:
GET /v1/scrape/{scrape_id} - Base URL:
https://run.xcrawl.com - Required header:
Authorization: Bearer <XCRAWL_API_KEY>
- 启动抓取:
POST /v1/scrape - 读取异步结果:
GET /v1/scrape/{scrape_id} - 基础URL:
https://run.xcrawl.com - 必填请求头:
Authorization: Bearer <XCRAWL_API_KEY>
Usage Examples
使用示例
cURL (sync)
cURL(同步)
bash
API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"
curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{"url":"https://example.com","mode":"sync","output":{"formats":["markdown","links"]}}'bash
API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"
curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{"url":"https://example.com","mode":"sync","output":{"formats":["markdown","links"]}}'cURL (async create + result)
cURL(异步创建+结果获取)
bash
API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"
CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}')"
echo "$CREATE_RESP"
SCRAPE_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' "$CREATE_RESP")"
curl -sS -X GET "https://run.xcrawl.com/v1/scrape/${SCRAPE_ID}" \
-H "Authorization: Bearer ${API_KEY}"bash
API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"
CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}')"
echo "$CREATE_RESP"
SCRAPE_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' "$CREATE_RESP")"
curl -sS -X GET "https://run.xcrawl.com/v1/scrape/${SCRAPE_ID}" \
-H "Authorization: Bearer ${API_KEY}"Node
Node
bash
node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}};
fetch("https://run.xcrawl.com/v1/scrape",{
method:"POST",
headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'bash
node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}};
fetch("https://run.xcrawl.com/v1/scrape",{
method:"POST",
headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'Request Parameters
请求参数
Request endpoint and headers
请求端点与请求头
- Endpoint:
POST https://run.xcrawl.com/v1/scrape - Headers:
Content-Type: application/jsonAuthorization: Bearer <api_key>
- 端点:
POST https://run.xcrawl.com/v1/scrape - 请求头:
Content-Type: application/jsonAuthorization: Bearer <api_key>
Request body: top-level fields
请求体:顶级字段
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | Yes | - | Target URL |
| string | No | | |
| object | No | - | Proxy config |
| object | No | - | Request config |
| object | No | - | JS rendering config |
| object | No | - | Output config |
| object | No | - | Async webhook config ( |
| 字段 | 类型 | 必填 | 默认值 | 描述 |
|---|---|---|---|---|
| string | 是 | - | 目标URL |
| string | 否 | | |
| object | 否 | - | 代理配置 |
| object | 否 | - | 请求配置 |
| object | 否 | - | JS渲染配置 |
| object | 否 | - | 输出配置 |
| object | 否 | - | 异步回调配置(仅 |
proxy
proxyproxy
proxy| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | | ISO-3166-1 alpha-2 country code, e.g. |
| string | No | Auto-generated | Sticky session ID; same ID attempts to reuse exit |
| 字段 | 类型 | 必填 | 默认值 | 描述 |
|---|---|---|---|---|
| string | 否 | | ISO-3166-1 alpha-2国家代码,例如 |
| string | 否 | 自动生成 | 粘性会话ID;相同ID会尝试复用出口节点 |
request
requestrequest
request| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | | Affects |
| string | No | | |
| object map | No | - | Cookie key/value pairs |
| object map | No | - | Header key/value pairs |
| boolean | No | | Return main content only |
| boolean | No | | Attempt to block ad resources |
| boolean | No | | Skip TLS verification |
| 字段 | 类型 | 必填 | 默认值 | 描述 |
|---|---|---|---|---|
| string | 否 | | 影响 |
| string | 否 | | |
| 对象映射 | 否 | - | Cookie键值对 |
| 对象映射 | 否 | - | 请求头键值对 |
| boolean | 否 | | 仅返回主内容 |
| boolean | 否 | | 尝试屏蔽广告资源 |
| boolean | 否 | | 跳过TLS验证 |
js_render
js_renderjs_render
js_render| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| boolean | No | | Enable browser rendering |
| string | No | | |
| integer | No | - | Viewport width (desktop |
| integer | No | - | Viewport height (desktop |
| 字段 | 类型 | 必填 | 默认值 | 描述 |
|---|---|---|---|---|
| boolean | 否 | | 启用浏览器渲染 |
| string | 否 | | |
| integer | 否 | - | 视口宽度(桌面端默认1920,移动端默认402) |
| integer | 否 | - | 视口高度(桌面端默认1080,移动端默认874) |
output
outputoutput
output| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string[] | No | | Output formats |
| string | No | | |
| string | No | - | Extraction prompt |
| object | No | - | JSON Schema |
output.formatshtmlraw_htmlmarkdownlinkssummaryscreenshotjson
| 字段 | 类型 | 必填 | 默认值 | 描述 |
|---|---|---|---|---|
| string[] | 否 | | 输出格式 |
| string | 否 | | |
| string | 否 | - | 提取提示语 |
| object | 否 | - | JSON Schema |
output.formatshtmlraw_htmlmarkdownlinkssummaryscreenshotjson
webhook
webhookwebhook
webhook| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | - | Callback URL |
| object map | No | - | Custom callback headers |
| string[] | No | | Events: |
| 字段 | 类型 | 必填 | 默认值 | 描述 |
|---|---|---|---|---|
| string | 否 | - | 回调URL |
| 对象映射 | 否 | - | 自定义回调请求头 |
| string[] | 否 | | 触发回调的事件: |
Response Parameters
响应参数
Sync create response (mode=sync
)
mode=sync同步创建响应(mode=sync
)
mode=sync| Field | Type | Description |
|---|---|---|
| string | Task ID |
| string | Always |
| string | Version |
| string | |
| string | Target URL |
| object | Result data |
| string | Start time (ISO 8601) |
| string | End time (ISO 8601) |
| integer | Total credits used |
dataoutput.formats- ,
html,raw_html,markdown,links,summary,screenshotjson - (page metadata)
metadata traffic_bytescredits_usedcredits_detail
credits_detail| Field | Type | Description |
|---|---|---|
| integer | Base scrape cost |
| integer | Traffic cost |
| integer | JSON extraction cost |
| 字段 | 类型 | 描述 |
|---|---|---|
| string | 任务ID |
| string | 固定为 |
| string | 版本号 |
| string | |
| string | 目标URL |
| object | 结果数据 |
| string | 启动时间(ISO 8601格式) |
| string | 结束时间(ISO 8601格式) |
| integer | 总消耗积分 |
dataoutput.formats- ,
html,raw_html,markdown,links,summary,screenshotjson - (页面元数据)
metadata - (流量字节数)
traffic_bytes - (消耗积分)
credits_used - (积分消耗明细)
credits_detail
credits_detail| 字段 | 类型 | 描述 |
|---|---|---|
| integer | 基础抓取成本 |
| integer | 流量成本 |
| integer | JSON提取成本 |
Async create response (mode=async
)
mode=async异步创建响应(mode=async
)
mode=async| Field | Type | Description |
|---|---|---|
| string | Task ID |
| string | Always |
| string | Version |
| string | Always |
| 字段 | 类型 | 描述 |
|---|---|---|
| string | 任务ID |
| string | 固定为 |
| string | 版本号 |
| string | 固定为 |
Async result response (GET /v1/scrape/{scrape_id}
)
GET /v1/scrape/{scrape_id}异步结果响应(GET /v1/scrape/{scrape_id}
)
GET /v1/scrape/{scrape_id}| Field | Type | Description |
|---|---|---|
| string | Task ID |
| string | Always |
| string | Version |
| string | |
| string | Target URL |
| object | Same shape as sync |
| string | Start time (ISO 8601) |
| string | End time (ISO 8601) |
| 字段 | 类型 | 描述 |
|---|---|---|
| string | 任务ID |
| string | 固定为 |
| string | 版本号 |
| string | |
| string | 目标URL |
| object | 结构与同步响应的 |
| string | 启动时间(ISO 8601格式) |
| string | 结束时间(ISO 8601格式) |
Workflow
工作流程
- Classify the request through the default XCrawl entry behavior.
- If the user provides specific URLs for extraction, default to XCrawl Scrape.
- If the user clearly asks for map, crawl, or search behavior, route to the dedicated XCrawl API instead of pretending this endpoint covers it.
- Restate the user goal as an extraction contract.
- URL scope, required fields, accepted nulls, and precision expectations.
- Build the scrape request body.
- Keep only necessary options.
- Prefer explicit .
output.formats
- Execute scrape and capture task metadata.
- Track ,
scrape_id, and timestamps.status - If async, poll until or
completed.failed
- Return raw API responses directly.
- Do not synthesize or compress fields by default.
- 通过默认XCrawl入口行为对请求进行分类。
- 如果用户提供了用于提取的特定URL,默认使用XCrawl Scrape。
- 如果用户明确要求map、crawl或search行为,路由到对应的XCrawl专用API,不要假装此端点支持这些功能。
- 将用户目标重述为提取契约。
- 包括URL范围、必填字段、允许空值的情况以及精度要求。
- 构建抓取请求体。
- 仅保留必要的选项。
- 优先使用显式的。
output.formats
- 执行抓取并捕获任务元数据。
- 跟踪、
scrape_id和时间戳。status - 如果是异步模式,轮询直到状态变为或
completed。failed
- 直接返回原始API响应。
- 默认情况下不要合成或压缩字段。
Output Contract
输出契约
Return:
- Endpoint(s) used and mode (or
sync)async - used for the request
request_payload - Raw response body from each API call
- Error details when request fails
Do not generate summaries unless the user explicitly requests a summary.
返回内容包括:
- 使用的端点及模式(或
sync)async - 请求使用的
request_payload - 每个API调用的原始响应体
- 请求失败时的错误详情
除非用户明确要求摘要,否则不要生成摘要。
Guardrails
约束规则
- Do not present XCrawl Scrape as if it also covers map, crawl, or search semantics.
- Default to scrape only when user intent is URL extraction.
- Do not invent unsupported output fields.
- Do not hardcode provider-specific tool schemas in core logic.
- Call out uncertainty when page structure is unstable.
- 不要将XCrawl Scrape描述为同时支持map、crawl或search功能。
- 仅当用户意图是URL提取时,默认使用scrape。
- 不要编造不支持的输出字段。
- 不要在核心逻辑中硬编码特定供应商的工具 schema。
- 当页面结构不稳定时,要说明存在不确定性。