web-fetch

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Web Content Fetching

网页内容抓取

Fetch web content using
curl | html2markdown
with CSS selectors for clean, complete markdown output.
使用
curl | html2markdown
结合CSS选择器来抓取网页内容,输出整洁完整的Markdown格式。

Quick Usage (Known Sites)

快速使用(已知站点)

Use site-specific selectors for best results:
bash
undefined
使用站点专属选择器以获得最佳效果:
bash
undefined

Anthropic docs

Anthropic 文档

curl -s "<url>" | html2markdown --include-selector "#content-container"
curl -s "<url>" | html2markdown --include-selector "#content-container"

MDN Web Docs

MDN Web 文档

curl -s "<url>" | html2markdown --include-selector "article"
curl -s "<url>" | html2markdown --include-selector "article"

GitHub docs

GitHub 文档

curl -s "<url>" | html2markdown --include-selector "article" --exclude-selector "nav,.sidebar"
curl -s "<url>" | html2markdown --include-selector "article" --exclude-selector "nav,.sidebar"

Generic article pages

通用文章页面

curl -s "<url>" | html2markdown --include-selector "article,main,[role=main]" --exclude-selector "nav,header,footer"
undefined
curl -s "<url>" | html2markdown --include-selector "article,main,[role=main]" --exclude-selector "nav,header,footer"
undefined

Site Patterns

站点模式

SiteInclude SelectorExclude Selector
platform.claude.com
#content-container
-
docs.anthropic.com
#content-container
-
developer.mozilla.org
article
-
github.com (docs)
article
nav,.sidebar
Generic
article,main
nav,header,footer,script,style
站点包含选择器排除选择器
platform.claude.com
#content-container
-
docs.anthropic.com
#content-container
-
developer.mozilla.org
article
-
github.com (文档)
article
nav,.sidebar
通用
article,main
nav,header,footer,script,style

Universal Fallback (Unknown Sites)

通用回退方案(未知站点)

For sites without known patterns, use the Bun script which auto-detects content:
bash
bun ~/.claude/skills/web-fetch/fetch.ts "<url>"
对于没有已知模式的站点,使用Bun脚本自动检测内容:
bash
bun ~/.claude/skills/web-fetch/fetch.ts "<url>"

Setup (one-time)

一次性设置

bash
cd ~/.claude/skills/web-fetch && bun install
bash
cd ~/.claude/skills/web-fetch && bun install

Finding the Right Selector

找到合适的选择器

When a site isn't in the patterns list:
bash
undefined
当站点不在模式列表中时:
bash
undefined

Check what content containers exist

检查存在哪些内容容器

curl -s "<url>" | grep -o '<article[^>]>|<main[^>]>|id="[^"]content[^"]"' | head -10
curl -s "<url>" | grep -o '<article[^>]>|<main[^>]>|id="[^"]content[^"]"' | head -10

Test a selector

测试选择器

curl -s "<url>" | html2markdown --include-selector "<selector>" | head -30
curl -s "<url>" | html2markdown --include-selector "<selector>" | head -30

Check line count

检查行数

curl -s "<url>" | html2markdown --include-selector "<selector>" | wc -l
undefined
curl -s "<url>" | html2markdown --include-selector "<selector>" | wc -l
undefined

Options Reference

选项参考

bash
--include-selector "CSS"  # Only include matching elements
--exclude-selector "CSS"  # Remove matching elements
--domain "https://..."    # Convert relative links to absolute
bash
--include-selector "CSS"  # 仅包含匹配的元素
--exclude-selector "CSS"  # 移除匹配的元素
--domain "https://..."    # 将相对链接转换为绝对链接

Comparison

对比

MethodAnthropic DocsCode BlocksComplexity
Full page602 linesYesNoisy
--include-selector "#content-container"
385 linesYesClean
Bun script (universal)383 linesYesClean
方法Anthropic 文档代码块复杂度
完整页面602行杂乱
--include-selector "#content-container"
385行整洁
Bun脚本(通用)383行整洁

Troubleshooting

故障排除

Wrong content selected: The site may have multiple articles. Inspect the HTML:
bash
curl -s "<url>" | grep -o '<article[^>]*>'
Empty output: The selector doesn't match. Try broader selectors like
main
or
body
.
Missing code blocks: Check if the site uses non-standard code formatting.
Client-rendered content: If HTML only has "Loading..." placeholders, the content is JS-rendered. Neither curl nor the Bun script can extract it; use browser-based tools.
选择了错误的内容:该站点可能有多个文章元素。检查HTML:
bash
curl -s "<url>" | grep -o '<article[^>]*>'
空输出:选择器不匹配。尝试更宽泛的选择器,如
main
body
缺少代码块:检查站点是否使用了非标准的代码格式。
客户端渲染内容:如果HTML仅包含“Loading...”占位符,说明内容是JS渲染的。curl和Bun脚本都无法提取此类内容,请使用基于浏览器的工具。