xcrawl-map

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

XCrawl Map

XCrawl Map

Overview

概述

This skill uses XCrawl Map API to discover URLs for a site. Default behavior is raw passthrough: return upstream API response bodies as-is.
本技能调用XCrawl Map API来发现站点的所有URL。默认行为是原始透传:原样返回上游API的响应体。

Required Local Config

本地必备配置

Before using this skill, the user must create a local config file and write
XCRAWL_API_KEY
into it.
Path:
~/.xcrawl/config.json
json
{
  "XCRAWL_API_KEY": "<your_api_key>"
}
Read API key from local config file only. Do not require global environment variables.
使用本技能前,用户必须创建本地配置文件并写入
XCRAWL_API_KEY
路径:
~/.xcrawl/config.json
json
{
  "XCRAWL_API_KEY": "<your_api_key>"
}
仅从本地配置文件读取API密钥,无需设置全局环境变量。

Credits and Account Setup

额度与账号设置

Using XCrawl APIs consumes credits. If the user does not have an account or available credits, guide them to register at
https://dash.xcrawl.com/
. After registration, they can activate the free
1000
credits plan before running requests.
调用XCrawl API会消耗额度。如果用户没有账号或者可用额度,引导其前往
https://dash.xcrawl.com/
注册。注册后可在发起请求前激活免费的
1000
额度套餐。

Tool Permission Policy

工具权限策略

Request runtime permissions for
curl
and
node
only. Do not request Python, shell helper scripts, or other runtime permissions.
仅申请
curl
node
的运行时权限,不要申请Python、shell辅助脚本或其他运行时权限。

API Surface

API接口

  • Start map task:
    POST /v1/map
  • Base URL:
    https://run.xcrawl.com
  • Required header:
    Authorization: Bearer <XCRAWL_API_KEY>
  • 启动map任务:
    POST /v1/map
  • 基础URL:
    https://run.xcrawl.com
  • 必填请求头:
    Authorization: Bearer <XCRAWL_API_KEY>

Usage Examples

使用示例

cURL

cURL

bash
API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

curl -sS -X POST "https://run.xcrawl.com/v1/map" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com","filter":"/docs/.*","limit":2000,"include_subdomains":true,"ignore_query_parameters":false}'
bash
API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

curl -sS -X POST "https://run.xcrawl.com/v1/map" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com","filter":"/docs/.*","limit":2000,"include_subdomains":true,"ignore_query_parameters":false}'

Node

Node

bash
node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",filter:"/docs/.*",limit:3000,include_subdomains:true,ignore_query_parameters:false};
fetch("https://run.xcrawl.com/v1/map",{
  method:"POST",
  headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
  body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'
bash
node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",filter:"/docs/.*",limit:3000,include_subdomains:true,ignore_query_parameters:false};
fetch("https://run.xcrawl.com/v1/map",{
  method:"POST",
  headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
  body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'

Request Parameters

请求参数

Request endpoint and headers

请求端点与请求头

  • Endpoint:
    POST https://run.xcrawl.com/v1/map
  • Headers:
  • Content-Type: application/json
  • Authorization: Bearer <api_key>
  • 端点:
    POST https://run.xcrawl.com/v1/map
  • 请求头:
  • Content-Type: application/json
  • Authorization: Bearer <api_key>

Request body: top-level fields

请求体: 顶级字段

FieldTypeRequiredDefaultDescription
url
stringYes-Site entry URL
filter
stringNo-Regex filter for URLs
limit
integerNo
5000
Max URLs (up to
100000
)
include_subdomains
booleanNo
true
Include subdomains
ignore_query_parameters
booleanNo
true
Ignore URLs with query parameters
字段类型必填默认值描述
url
string-站点入口URL
filter
string-URL的正则过滤规则
limit
integer
5000
最大返回URL数量(最高可设为
100000
include_subdomains
boolean
true
是否包含子域名
ignore_query_parameters
boolean
true
是否忽略带查询参数的URL

Response Parameters

响应参数

FieldTypeDescription
map_id
stringTask ID
endpoint
stringAlways
map
version
stringVersion
status
string
completed
url
stringEntry URL
data
objectURL list data
started_at
stringStart time (ISO 8601)
ended_at
stringEnd time (ISO 8601)
total_credits_used
integerTotal credits used
data
fields:
  • links
    : URL list
  • total_links
    : URL count
  • credits_used
    : credits used
  • credits_detail
    : credit breakdown
字段类型描述
map_id
string任务ID
endpoint
string固定为
map
version
string版本号
status
string状态,值为
completed
表示完成
url
string入口URL
data
objectURL列表数据
started_at
string任务启动时间(ISO 8601格式)
ended_at
string任务结束时间(ISO 8601格式)
total_credits_used
integer消耗的总额度
data
字段说明:
  • links
    : URL列表
  • total_links
    : URL总数
  • credits_used
    : 消耗的额度
  • credits_detail
    : 额度消耗明细

Workflow

工作流程

  1. Restate mapping objective.
  • Discovery only, selective crawl planning, or structure analysis.
  1. Build and execute
    POST /v1/map
    .
  • Keep filters explicit and reproducible.
  1. Return raw API response directly.
  • Do not synthesize URL-family summaries unless requested.
  1. 明确mapping任务目标:仅URL发现、定向爬取规划或结构分析。
  2. 构造并执行
    POST /v1/map
    请求:确保过滤规则明确可复现。
  3. 直接返回原始API响应:除非用户要求,否则不要生成URL分类汇总。

Output Contract

输出约定

Return:
  • Endpoint used (
    POST /v1/map
    )
  • request_payload
    used for the request
  • Raw response body from map call
  • Error details when request fails
Do not generate summaries unless the user explicitly requests a summary.
返回内容需包含:
  • 使用的接口地址(
    POST /v1/map
  • 请求使用的
    request_payload
  • map接口返回的原始响应体
  • 请求失败时的错误详情
除非用户明确要求生成汇总,否则不要额外生成总结内容。

Guardrails

使用规则

  • Do not claim full site coverage if
    limit
    is reached.
  • Do not mix inferred URLs with returned URLs.
  • Do not hardcode provider-specific tool schemas in core logic.
  • 如果触发了
    limit
    上限,不要宣称实现了全站覆盖。
  • 不要将推断的URL和接口返回的URL混合展示。
  • 不要在核心逻辑中硬编码服务商专属的工具Schema。