link-scraper
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLink Scraper
链接抓取工具
Fetch and extract content from URLs with automatic summarization. This skill enables the agent to gather information from the web by scraping web pages, extracting main content, and providing concise summaries.
从URL获取并提取内容,同时自动生成摘要。该技能支持Agent通过抓取网页、提取主要内容并提供简洁摘要,从网络收集信息。
When to Use
适用场景
- User shares a URL and asks "what's this about?"
- Researching a topic that requires reading online articles
- Extracting documentation or technical content from websites
- Getting summaries of blog posts, news articles, or papers
- Extracting code snippets or examples from web sources
- Fetching content that the user wants analyzed or discussed
- 用户分享URL并询问"这是关于什么的?"
- 研究需要阅读在线文章的主题
- 从网站提取文档或技术内容
- 获取博客文章、新闻报道或论文的摘要
- 从网络资源提取代码片段或示例
- 获取用户希望分析或讨论的内容
Setup
设置
No additional installation required. Uses built-in Node.js modules andcheerio for HTML parsing.
If cheerio is not available, falls back to basic regex-based extraction.
无需额外安装。使用Node.js内置模块和cheerio进行HTML解析。
如果cheerio不可用,则回退到基于正则表达式的基础提取方式。
Usage
使用方法
Extract a single URL
提取单个URL
bash
node /job/.pi/skills/link-scraper/scrape.js "https://example.com/article"bash
node /job/.pi/skills/link-scraper/scrape.js "https://example.com/article"Extract multiple URLs
提取多个URL
bash
node /job/.pi/skills/link-scraper/scrape.js "https://example.com/page1" "https://example.com/page2"bash
node /job/.pi/skills/link-scraper/scrape.js "https://example.com/page1" "https://example.com/page2"Get just the title
仅获取标题
bash
node /job/.pi/skills/link-scraper/scrape.js --title "https://example.com"bash
node /job/.pi/skills/link-scraper/scrape.js --title "https://example.com"Get full content (no summary)
获取完整内容(无摘要)
bash
node /job/.pi/skills/link-scraper/scrape.js --full "https://example.com"bash
node /job/.pi/skills/link-scraper/scrape.js --full "https://example.com"Extract specific elements (CSS selector)
提取特定元素(CSS选择器)
bash
node /job/.pi/skills/link-scraper/scrape.js --selector "article" "https://example.com/blog"bash
node /job/.pi/skills/link-scraper/scrape.js --selector "article" "https://example.com/blog"Output Format
输出格式
The scraper returns JSON with the following structure:
json
{
"url": "https://example.com/article",
"title": "Article Title",
"description": "Brief description of the page...",
"content": "Main content extracted from the page...",
"wordCount": 500,
"links": ["https://example.com/related1", "https://example.com/related2"],
"images": ["https://example.com/image1.jpg"],
"siteName": "Example Site"
}When summarized:
json
{
"url": "https://example.com/article",
"title": "Article Title",
"summary": "A concise 2-3 sentence summary of the article...",
"keyPoints": [
"First key point from the article",
"Second key point",
"Third key point"
],
"wordCount": 500,
"readTime": "2 min"
}抓取工具返回以下结构的JSON:
json
{
"url": "https://example.com/article",
"title": "Article Title",
"description": "Brief description of the page...",
"content": "Main content extracted from the page...",
"wordCount": 500,
"links": ["https://example.com/related1", "https://example.com/related2"],
"images": ["https://example.com/image1.jpg"],
"siteName": "Example Site"
}生成摘要时:
json
{
"url": "https://example.com/article",
"title": "Article Title",
"summary": "A concise 2-3 sentence summary of the article...",
"keyPoints": [
"First key point from the article",
"Second key point",
"Third key point"
],
"wordCount": 500,
"readTime": "2 min"
}Common Workflows
常见工作流程
Quick URL Summary
快速URL摘要
User: Check out https://github.com/openclaw/openclaw for me
Agent: [Uses link-scraper to fetch and summarize]User: Check out https://github.com/openclaw/openclaw for me
Agent: [Uses link-scraper to fetch and summarize]Research Task
研究任务
User: Find information about AI agents
Agent: [Uses link-scraper to fetch relevant articles, documentation, etc.]User: Find information about AI agents
Agent: [Uses link-scraper to fetch relevant articles, documentation, etc.]Code Example Extraction
代码示例提取
User: How do I use the GitHub API? https://docs.github.com/en/rest
Agent: [Uses link-scraper with --selector to extract code examples]User: How do I use the GitHub API? https://docs.github.com/en/rest
Agent: [Uses link-scraper with --selector to extract code examples]Integration with Other Skills
与其他技能集成
- With memory-agent: Store researched information for future reference
- With browser-tools: Use for JavaScript-rendered pages that need a browser
- With voice-output: Announce summaries aloud
- 与memory-agent集成:存储研究信息以供未来参考
- 与browser-tools集成:用于需要浏览器渲染的JavaScript页面
- 与voice-output集成:大声播报摘要
Limitations
限制
- Cannot fetch password-protected pages
- Some sites block scrapers (may need browser-tools as fallback)
- Large pages may be truncated for token limits
- JavaScript-rendered content may not be available (use browser-tools)
- 无法获取受密码保护的页面
- 部分网站会阻止抓取工具(可能需要使用browser-tools作为备选方案)
- 大页面可能因令牌限制被截断
- JavaScript渲染的内容可能无法获取(使用browser-tools)
Tips
提示
- For articles: The scraper automatically extracts main article content
- For documentation: Use --selector "pre code" to get code blocks
- For lists: Use --selector "ul li" to extract list items
- For speed: Add --no-summary for quick title/description only
- 针对文章:抓取工具会自动提取文章主要内容
- 针对文档:使用--selector "pre code"获取代码块
- 针对列表:使用--selector "ul li"提取列表项
- 提升速度:添加--no-summary仅快速获取标题/描述