web-fetch
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWeb Fetch
Web Fetch 网页内容抓取
All web content retrieval uses (Bash) or the built-in tool. No MCP
server needed — Claude Code's native tools cover every Fetch MCP operation with more
control.
curlWebFetch所有网页内容检索均使用(Bash)或内置的工具。无需MCP服务器 —— Claude Code的原生工具可替代所有Fetch MCP操作,且提供更强的控制能力。
curlWebFetchQuick Reference
快速参考
| Fetch MCP Tool | Replacement | When to Use |
|---|---|---|
| | Raw HTML needed for parsing |
| | API responses, structured data |
| | Readable page content (default output is markdown) |
| | Plain text extraction |
Default choice: Use for general page content. Use when you need
headers, authentication, POST bodies, or raw format control.
WebFetchcurl| Fetch MCP工具 | 替代方案 | 适用场景 |
|---|---|---|
| | 需要原始HTML用于解析时 |
| | API响应、结构化数据处理 |
| | 可读的页面内容(默认输出为Markdown格式) |
| | 纯文本提取 |
默认选择: 一般页面内容获取使用。当需要自定义请求头、认证、POST请求体或原始格式控制时,使用。
WebFetchcurlWebFetch (Built-in Tool)
WebFetch(内置工具)
The tool fetches a URL and returns clean markdown content. It handles
JavaScript-rendered pages, strips navigation and boilerplate, and returns readable text.
WebFetchBest for: documentation pages, articles, blog posts, README files — any content where
you want readable text rather than raw HTML.
Limitations: no custom headers, no POST bodies, no cookie management. Use for
those.
curlWebFetch最佳适用场景:文档页面、文章、博客、README文件 —— 所有需要可读文本而非原始HTML的内容。
局限性:不支持自定义请求头、POST请求体、Cookie管理。此类场景请使用。
curlcurl Patterns
curl 使用模式
Fetch HTML
获取HTML
bash
curl -sL "https://example.com/page"| Flag | Purpose |
|---|---|
| Silent mode — suppress progress meter |
| Follow redirects (3xx) |
| Save to file instead of stdout |
| Headers only (HEAD request) |
| Include response headers in output |
Fetch and extract specific elements with or :
xmllintpython3bash
curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys
class TitleParser(HTMLParser):
def __init__(self):
super().__init__()
self.in_title = False
self.title = ''
def handle_starttag(self, tag, attrs):
self.in_title = tag == 'title'
def handle_data(self, data):
if self.in_title:
self.title += data
def handle_endtag(self, tag):
if tag == 'title':
self.in_title = False
p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"bash
curl -sL "https://example.com/page"| 参数 | 用途 |
|---|---|
| 静默模式 —— 隐藏进度条 |
| 跟随重定向(3xx状态码) |
| 将结果保存到文件而非输出到控制台 |
| 仅获取响应头(HEAD请求) |
| 在输出中包含响应头 |
结合或获取并提取特定元素:
xmllintpython3bash
curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys
class TitleParser(HTMLParser):
def __init__(self):
super().__init__()
self.in_title = False
self.title = ''
def handle_starttag(self, tag, attrs):
self.in_title = tag == 'title'
def handle_data(self, data):
if self.in_title:
self.title += data
def handle_endtag(self, tag):
if tag == 'title':
self.in_title = False
p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"Fetch JSON
获取JSON
bash
curl -s "https://api.example.com/v1/data" \
-H "Accept: application/json" | jq '.'Filter and reshape JSON responses:
bash
undefinedbash
curl -s "https://api.example.com/v1/data" \
-H "Accept: application/json" | jq '.'过滤和重构JSON响应:
bash
undefinedExtract specific fields
提取特定字段
curl -s "https://api.example.com/users" | jq '.[] | {name, email}'
curl -s "https://api.example.com/users" | jq '.[] | {name, email}'
Filter by condition
按条件过滤
curl -s "https://api.example.com/items" | jq '[.[] | select(.status == "active")]'
curl -s "https://api.example.com/items" | jq '[.[] | select(.status == "active")]'
Count results
统计结果数量
curl -s "https://api.example.com/items" | jq 'length'
curl -s "https://api.example.com/items" | jq 'length'
Get nested value
获取嵌套值
curl -s "https://api.example.com/config" | jq '.database.host'
undefinedcurl -s "https://api.example.com/config" | jq '.database.host'
undefinedFetch Plain Text
获取纯文本
bash
undefinedbash
undefinedStrip HTML tags for plain text
移除HTML标签提取纯文本
curl -sL "https://example.com/page" | python3 -c "
import html.parser, sys
class Stripper(html.parser.HTMLParser):
def init(self):
super().init()
self.text = []
def handle_data(self, d):
self.text.append(d)
def get_text(self):
return ''.join(self.text)
s = Stripper()
s.feed(sys.stdin.read())
print(s.get_text())
"
Or use `WebFetch` which returns clean markdown — close enough to plain text for most
purposes.
---curl -sL "https://example.com/page" | python3 -c "
import html.parser, sys
class Stripper(html.parser.HTMLParser):
def init(self):
super().init()
self.text = []
def handle_data(self, d):
self.text.append(d)
def get_text(self):
return ''.join(self.text)
s = Stripper()
s.feed(sys.stdin.read())
print(s.get_text())
"
也可使用`WebFetch`,它返回的整洁Markdown格式在大多数场景下可替代纯文本。
---Authenticated Requests
认证请求
Bearer Token
Bearer令牌
bash
curl -s "https://api.example.com/data" \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json"bash
curl -s "https://api.example.com/data" \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json"API Key in Header
请求头中携带API密钥
bash
curl -s "https://api.example.com/data" \
-H "X-API-Key: $API_KEY"bash
curl -s "https://api.example.com/data" \
-H "X-API-Key: $API_KEY"API Key in Query Parameter
查询参数中携带API密钥
bash
curl -s "https://api.example.com/data?api_key=$API_KEY"bash
curl -s "https://api.example.com/data?api_key=$API_KEY"Basic Auth
基础认证
bash
curl -s -u "username:$PASSWORD" "https://api.example.com/data"Store credentials in environment variables. Never hardcode tokens or passwords in
commands.
bash
curl -s -u "username:$PASSWORD" "https://api.example.com/data"请将凭证存储在环境变量中。切勿在命令中硬编码令牌或密码。
POST, PUT, PATCH, DELETE
POST、PUT、PATCH、DELETE请求
POST with JSON Body
携带JSON请求体的POST
bash
curl -s -X POST "https://api.example.com/items" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_TOKEN" \
-d '{
"name": "item-name",
"value": 42
}' | jq '.'bash
curl -s -X POST "https://api.example.com/items" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_TOKEN" \
-d '{
"name": "item-name",
"value": 42
}' | jq '.'POST with Form Data
携带表单数据的POST
bash
curl -s -X POST "https://api.example.com/upload" \
-F "file=@./document.pdf" \
-F "description=Uploaded via curl"bash
curl -s -X POST "https://api.example.com/upload" \
-F "file=@./document.pdf" \
-F "description=Uploaded via curl"PUT (Full Update)
PUT(全量更新)
bash
curl -s -X PUT "https://api.example.com/items/123" \
-H "Content-Type: application/json" \
-d '{"name": "updated-name", "value": 99}' | jq '.'bash
curl -s -X PUT "https://api.example.com/items/123" \
-H "Content-Type: application/json" \
-d '{"name": "updated-name", "value": 99}' | jq '.'PATCH (Partial Update)
PATCH(部分更新)
bash
curl -s -X PATCH "https://api.example.com/items/123" \
-H "Content-Type: application/json" \
-d '{"value": 100}' | jq '.'bash
curl -s -X PATCH "https://api.example.com/items/123" \
-H "Content-Type: application/json" \
-d '{"value": 100}' | jq '.'DELETE
DELETE
bash
curl -s -X DELETE "https://api.example.com/items/123" \
-H "Authorization: Bearer $API_TOKEN"bash
curl -s -X DELETE "https://api.example.com/items/123" \
-H "Authorization: Bearer $API_TOKEN"Advanced Patterns
高级使用模式
Pagination
分页处理
bash
PAGE=1
while true; do
RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
-H "Authorization: Bearer $API_TOKEN")
COUNT=$(echo "$RESPONSE" | jq 'length')
echo "$RESPONSE" | jq '.[]'
[ "$COUNT" -lt 50 ] && break
PAGE=$((PAGE + 1))
donebash
PAGE=1
while true; do
RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
-H "Authorization: Bearer $API_TOKEN")
COUNT=$(echo "$RESPONSE" | jq 'length')
echo "$RESPONSE" | jq '.[]'
[ "$COUNT" -lt 50 ] && break
PAGE=$((PAGE + 1))
doneTimeout and Retry
超时与重试
bash
curl -s --connect-timeout 10 --max-time 30 \
--retry 3 --retry-delay 2 \
"https://api.example.com/data"bash
curl -s --connect-timeout 10 --max-time 30 \
--retry 3 --retry-delay 2 \
"https://api.example.com/data"Response Headers Inspection
响应头检查
bash
curl -sI "https://example.com" | grep -i "content-type"bash
curl -sI "https://example.com" | grep -i "content-type"Save Response with Status Code
保存响应并记录状态码
bash
HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'bash
HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'Cookie Handling
Cookie处理
bash
undefinedbash
undefinedSave cookies
保存Cookie
curl -s -c /tmp/cookies.txt "https://example.com/login"
-d "user=admin&pass=$PASSWORD"
-d "user=admin&pass=$PASSWORD"
curl -s -c /tmp/cookies.txt "https://example.com/login"
-d "user=admin&pass=$PASSWORD"
-d "user=admin&pass=$PASSWORD"
Reuse cookies
复用Cookie
curl -s -b /tmp/cookies.txt "https://example.com/dashboard"
---curl -s -b /tmp/cookies.txt "https://example.com/dashboard"
---Error Handling
错误处理
| HTTP Status | Meaning | Resolution |
|---|---|---|
| 301/302 | Redirect | Add |
| 401 | Unauthorized | Check token/credentials; verify env var is set |
| 403 | Forbidden | Insufficient permissions or IP restriction |
| 404 | Not Found | Verify URL path; resource may be deleted |
| 429 | Rate Limited | Respect |
| 500 | Server Error | Retry once; if persistent, report upstream |
| SSL error | Certificate issue | Do not use |
| Timeout | Network/server slow | Increase |
Verify a URL is reachable before complex operations:
bash
curl -sI -o /dev/null -w "%{http_code}" "https://example.com"| HTTP状态码 | 含义 | 解决方法 |
|---|---|---|
| 301/302 | 重定向 | 添加 |
| 401 | 未授权 | 检查令牌/凭证;验证环境变量是否已设置 |
| 403 | 禁止访问 | 权限不足或IP被限制 |
| 404 | 未找到 | 验证URL路径;资源可能已被删除 |
| 429 | 请求频率超限 | 遵循 |
| 500 | 服务器错误 | 重试一次;若问题持续,上报上游服务 |
| SSL错误 | 证书问题 | 请勿使用 |
| 超时 | 网络/服务器缓慢 | 增大 |
在执行复杂操作前,先验证URL是否可达:
bash
curl -sI -o /dev/null -w "%{http_code}" "https://example.com"Limitations
局限性
- WebFetch does not support custom headers, POST bodies, or cookies. Use for authenticated or stateful requests.
curl - curl does not render JavaScript. For JS-heavy SPAs, prefer which handles rendered content.
WebFetch - Large responses may exceed context limits. Pipe through ,
jq, orheadto extract only needed data before loading into context.python3 - Binary content (images, PDFs, archives) should be saved to disk with , not piped to stdout.
-o
- WebFetch 不支持自定义请求头、POST请求体或Cookie管理。认证或有状态请求请使用。
curl - curl 不支持JavaScript渲染。对于重度依赖JS的单页应用(SPA),优先使用,它支持渲染后的内容。
WebFetch - 大体积响应 可能超出上下文限制。在加载到上下文前,通过、
jq或head仅提取所需数据。python3 - 二进制内容(图片、PDF、压缩包)应使用参数保存到磁盘,而非输出到控制台。
-o
Calibration Rules
校准规则
- Default to WebFetch for reading web pages. It returns clean markdown, handles JS rendering, and requires no flags. Switch to curl only when you need headers, auth, POST, or raw format control.
- Always pipe JSON through jq. Raw JSON in context wastes tokens. Filter to only the fields needed.
- Never hardcode credentials. Use references. If the variable is not set, surface the error immediately.
$ENV_VAR - Follow redirects by default. Always use with curl unless you specifically need to inspect the redirect chain.
-L - Prefer (silent) on every curl call. Progress meters add noise to output.
-s
- 默认使用WebFetch读取网页。它返回整洁的Markdown格式,支持JS渲染,无需额外参数。仅当需要请求头、认证、POST请求或原始格式控制时,切换到curl。
- JSON响应必须通过jq处理。上下文中原生JSON会浪费令牌,仅过滤保留所需字段。
- 切勿硬编码凭证。使用引用环境变量。若变量未设置,立即提示错误。
$ENV_VAR - 默认跟随重定向。curl命令默认添加参数,除非你需要检查重定向链路。
-L - 所有curl命令优先使用(静默模式)。进度条会干扰输出内容。
-s