link-scraper

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Link Scraper

链接抓取工具

Fetch and extract content from URLs with automatic summarization. This skill enables the agent to gather information from the web by scraping web pages, extracting main content, and providing concise summaries.

从URL获取并提取内容，同时自动生成摘要。该技能支持Agent通过抓取网页、提取主要内容并提供简洁摘要，从网络收集信息。

When to Use

适用场景

User shares a URL and asks "what's this about?"
Researching a topic that requires reading online articles
Extracting documentation or technical content from websites
Getting summaries of blog posts, news articles, or papers
Extracting code snippets or examples from web sources
Fetching content that the user wants analyzed or discussed

用户分享URL并询问"这是关于什么的？"
研究需要阅读在线文章的主题
从网站提取文档或技术内容
获取博客文章、新闻报道或论文的摘要
从网络资源提取代码片段或示例
获取用户希望分析或讨论的内容

Setup

设置

No additional installation required. Uses built-in Node.js modules andcheerio for HTML parsing.

If cheerio is not available, falls back to basic regex-based extraction.

无需额外安装。使用Node.js内置模块和cheerio进行HTML解析。

如果cheerio不可用，则回退到基于正则表达式的基础提取方式。

Usage

使用方法

Extract a single URL

提取单个URL

bash

node /job/.pi/skills/link-scraper/scrape.js "https://example.com/article"

bash

node /job/.pi/skills/link-scraper/scrape.js "https://example.com/article"

Extract multiple URLs

提取多个URL

bash

node /job/.pi/skills/link-scraper/scrape.js "https://example.com/page1" "https://example.com/page2"

bash

node /job/.pi/skills/link-scraper/scrape.js "https://example.com/page1" "https://example.com/page2"

Get just the title

仅获取标题

bash

node /job/.pi/skills/link-scraper/scrape.js --title "https://example.com"

bash

node /job/.pi/skills/link-scraper/scrape.js --title "https://example.com"

Get full content (no summary)

获取完整内容（无摘要）

bash

node /job/.pi/skills/link-scraper/scrape.js --full "https://example.com"

bash

node /job/.pi/skills/link-scraper/scrape.js --full "https://example.com"

Extract specific elements (CSS selector)

提取特定元素（CSS选择器）

bash

node /job/.pi/skills/link-scraper/scrape.js --selector "article" "https://example.com/blog"

bash

node /job/.pi/skills/link-scraper/scrape.js --selector "article" "https://example.com/blog"

Output Format

输出格式

The scraper returns JSON with the following structure:

json

{
  "url": "https://example.com/article",
  "title": "Article Title",
  "description": "Brief description of the page...",
  "content": "Main content extracted from the page...",
  "wordCount": 500,
  "links": ["https://example.com/related1", "https://example.com/related2"],
  "images": ["https://example.com/image1.jpg"],
  "siteName": "Example Site"
}

When summarized:

json

{
  "url": "https://example.com/article",
  "title": "Article Title",
  "summary": "A concise 2-3 sentence summary of the article...",
  "keyPoints": [
    "First key point from the article",
    "Second key point",
    "Third key point"
  ],
  "wordCount": 500,
  "readTime": "2 min"
}

抓取工具返回以下结构的JSON：

json

{
  "url": "https://example.com/article",
  "title": "Article Title",
  "description": "Brief description of the page...",
  "content": "Main content extracted from the page...",
  "wordCount": 500,
  "links": ["https://example.com/related1", "https://example.com/related2"],
  "images": ["https://example.com/image1.jpg"],
  "siteName": "Example Site"
}

生成摘要时：

json

{
  "url": "https://example.com/article",
  "title": "Article Title",
  "summary": "A concise 2-3 sentence summary of the article...",
  "keyPoints": [
    "First key point from the article",
    "Second key point",
    "Third key point"
  ],
  "wordCount": 500,
  "readTime": "2 min"
}

Common Workflows

常见工作流程

Quick URL Summary

快速URL摘要

User: Check out https://github.com/openclaw/openclaw for me
Agent: [Uses link-scraper to fetch and summarize]

User: Check out https://github.com/openclaw/openclaw for me
Agent: [Uses link-scraper to fetch and summarize]

Research Task

研究任务

User: Find information about AI agents
Agent: [Uses link-scraper to fetch relevant articles, documentation, etc.]

User: Find information about AI agents
Agent: [Uses link-scraper to fetch relevant articles, documentation, etc.]

Code Example Extraction

代码示例提取

User: How do I use the GitHub API? https://docs.github.com/en/rest
Agent: [Uses link-scraper with --selector to extract code examples]

User: How do I use the GitHub API? https://docs.github.com/en/rest
Agent: [Uses link-scraper with --selector to extract code examples]

Integration with Other Skills

与其他技能集成

With memory-agent: Store researched information for future reference
With browser-tools: Use for JavaScript-rendered pages that need a browser
With voice-output: Announce summaries aloud

与memory-agent集成：存储研究信息以供未来参考
与browser-tools集成：用于需要浏览器渲染的JavaScript页面
与voice-output集成：大声播报摘要

Limitations

限制

Cannot fetch password-protected pages
Some sites block scrapers (may need browser-tools as fallback)
Large pages may be truncated for token limits
JavaScript-rendered content may not be available (use browser-tools)

无法获取受密码保护的页面
部分网站会阻止抓取工具（可能需要使用browser-tools作为备选方案）
大页面可能因令牌限制被截断
JavaScript渲染的内容可能无法获取（使用browser-tools）

Tips

提示

For articles: The scraper automatically extracts main article content
For documentation: Use --selector "pre code" to get code blocks
For lists: Use --selector "ul li" to extract list items
For speed: Add --no-summary for quick title/description only

针对文章：抓取工具会自动提取文章主要内容
针对文档：使用--selector "pre code"获取代码块
针对列表：使用--selector "ul li"提取列表项
提升速度：添加--no-summary仅快速获取标题/描述