web-fetch

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Web Fetch

Web Fetch 网页内容抓取

All web content retrieval uses
curl
(Bash) or the built-in
WebFetch
tool. No MCP server needed — Claude Code's native tools cover every Fetch MCP operation with more control.
所有网页内容检索均使用
curl
(Bash)或内置的
WebFetch
工具。无需MCP服务器 —— Claude Code的原生工具可替代所有Fetch MCP操作,且提供更强的控制能力。

Quick Reference

快速参考

Fetch MCP ToolReplacementWhen to Use
fetch_html
curl -s URL
Raw HTML needed for parsing
fetch_json
curl -s URL | jq '.'
API responses, structured data
fetch_markdown
WebFetch
Readable page content (default output is markdown)
fetch_txt
curl -s URL
or
WebFetch
Plain text extraction
Default choice: Use
WebFetch
for general page content. Use
curl
when you need headers, authentication, POST bodies, or raw format control.

Fetch MCP工具替代方案适用场景
fetch_html
curl -s URL
需要原始HTML用于解析时
fetch_json
curl -s URL | jq '.'
API响应、结构化数据处理
fetch_markdown
WebFetch
可读的页面内容(默认输出为Markdown格式)
fetch_txt
curl -s URL
WebFetch
纯文本提取
默认选择: 一般页面内容获取使用
WebFetch
。当需要自定义请求头、认证、POST请求体或原始格式控制时,使用
curl

WebFetch (Built-in Tool)

WebFetch(内置工具)

The
WebFetch
tool fetches a URL and returns clean markdown content. It handles JavaScript-rendered pages, strips navigation and boilerplate, and returns readable text.
Best for: documentation pages, articles, blog posts, README files — any content where you want readable text rather than raw HTML.
Limitations: no custom headers, no POST bodies, no cookie management. Use
curl
for those.

WebFetch
工具可抓取URL并返回整洁的Markdown内容。它支持JavaScript渲染的页面,自动移除导航栏和冗余内容,返回可读文本。
最佳适用场景:文档页面、文章、博客、README文件 —— 所有需要可读文本而非原始HTML的内容。
局限性:不支持自定义请求头、POST请求体、Cookie管理。此类场景请使用
curl

curl Patterns

curl 使用模式

Fetch HTML

获取HTML

bash
curl -sL "https://example.com/page"
FlagPurpose
-s
Silent mode — suppress progress meter
-L
Follow redirects (3xx)
-o file.html
Save to file instead of stdout
-I
Headers only (HEAD request)
-i
Include response headers in output
Fetch and extract specific elements with
xmllint
or
python3
:
bash
curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys

class TitleParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.in_title = False
        self.title = ''
    def handle_starttag(self, tag, attrs):
        self.in_title = tag == 'title'
    def handle_data(self, data):
        if self.in_title:
            self.title += data
    def handle_endtag(self, tag):
        if tag == 'title':
            self.in_title = False

p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"
bash
curl -sL "https://example.com/page"
参数用途
-s
静默模式 —— 隐藏进度条
-L
跟随重定向(3xx状态码)
-o file.html
将结果保存到文件而非输出到控制台
-I
仅获取响应头(HEAD请求)
-i
在输出中包含响应头
结合
xmllint
python3
获取并提取特定元素:
bash
curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys

class TitleParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.in_title = False
        self.title = ''
    def handle_starttag(self, tag, attrs):
        self.in_title = tag == 'title'
    def handle_data(self, data):
        if self.in_title:
            self.title += data
    def handle_endtag(self, tag):
        if tag == 'title':
            self.in_title = False

p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"

Fetch JSON

获取JSON

bash
curl -s "https://api.example.com/v1/data" \
  -H "Accept: application/json" | jq '.'
Filter and reshape JSON responses:
bash
undefined
bash
curl -s "https://api.example.com/v1/data" \
  -H "Accept: application/json" | jq '.'
过滤和重构JSON响应:
bash
undefined

Extract specific fields

提取特定字段

curl -s "https://api.example.com/users" | jq '.[] | {name, email}'
curl -s "https://api.example.com/users" | jq '.[] | {name, email}'

Filter by condition

按条件过滤

curl -s "https://api.example.com/items" | jq '[.[] | select(.status == "active")]'
curl -s "https://api.example.com/items" | jq '[.[] | select(.status == "active")]'

Count results

统计结果数量

curl -s "https://api.example.com/items" | jq 'length'
curl -s "https://api.example.com/items" | jq 'length'

Get nested value

获取嵌套值

curl -s "https://api.example.com/config" | jq '.database.host'
undefined
curl -s "https://api.example.com/config" | jq '.database.host'
undefined

Fetch Plain Text

获取纯文本

bash
undefined
bash
undefined

Strip HTML tags for plain text

移除HTML标签提取纯文本

curl -sL "https://example.com/page" | python3 -c " import html.parser, sys
class Stripper(html.parser.HTMLParser): def init(self): super().init() self.text = [] def handle_data(self, d): self.text.append(d) def get_text(self): return ''.join(self.text)
s = Stripper() s.feed(sys.stdin.read()) print(s.get_text()) "

Or use `WebFetch` which returns clean markdown — close enough to plain text for most
purposes.

---
curl -sL "https://example.com/page" | python3 -c " import html.parser, sys
class Stripper(html.parser.HTMLParser): def init(self): super().init() self.text = [] def handle_data(self, d): self.text.append(d) def get_text(self): return ''.join(self.text)
s = Stripper() s.feed(sys.stdin.read()) print(s.get_text()) "

也可使用`WebFetch`,它返回的整洁Markdown格式在大多数场景下可替代纯文本。

---

Authenticated Requests

认证请求

Bearer Token

Bearer令牌

bash
curl -s "https://api.example.com/data" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json"
bash
curl -s "https://api.example.com/data" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json"

API Key in Header

请求头中携带API密钥

bash
curl -s "https://api.example.com/data" \
  -H "X-API-Key: $API_KEY"
bash
curl -s "https://api.example.com/data" \
  -H "X-API-Key: $API_KEY"

API Key in Query Parameter

查询参数中携带API密钥

bash
curl -s "https://api.example.com/data?api_key=$API_KEY"
bash
curl -s "https://api.example.com/data?api_key=$API_KEY"

Basic Auth

基础认证

bash
curl -s -u "username:$PASSWORD" "https://api.example.com/data"
Store credentials in environment variables. Never hardcode tokens or passwords in commands.

bash
curl -s -u "username:$PASSWORD" "https://api.example.com/data"
请将凭证存储在环境变量中。切勿在命令中硬编码令牌或密码。

POST, PUT, PATCH, DELETE

POST、PUT、PATCH、DELETE请求

POST with JSON Body

携带JSON请求体的POST

bash
curl -s -X POST "https://api.example.com/items" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_TOKEN" \
  -d '{
    "name": "item-name",
    "value": 42
  }' | jq '.'
bash
curl -s -X POST "https://api.example.com/items" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_TOKEN" \
  -d '{
    "name": "item-name",
    "value": 42
  }' | jq '.'

POST with Form Data

携带表单数据的POST

bash
curl -s -X POST "https://api.example.com/upload" \
  -F "file=@./document.pdf" \
  -F "description=Uploaded via curl"
bash
curl -s -X POST "https://api.example.com/upload" \
  -F "file=@./document.pdf" \
  -F "description=Uploaded via curl"

PUT (Full Update)

PUT(全量更新)

bash
curl -s -X PUT "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"name": "updated-name", "value": 99}' | jq '.'
bash
curl -s -X PUT "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"name": "updated-name", "value": 99}' | jq '.'

PATCH (Partial Update)

PATCH(部分更新)

bash
curl -s -X PATCH "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"value": 100}' | jq '.'
bash
curl -s -X PATCH "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"value": 100}' | jq '.'

DELETE

DELETE

bash
curl -s -X DELETE "https://api.example.com/items/123" \
  -H "Authorization: Bearer $API_TOKEN"

bash
curl -s -X DELETE "https://api.example.com/items/123" \
  -H "Authorization: Bearer $API_TOKEN"

Advanced Patterns

高级使用模式

Pagination

分页处理

bash
PAGE=1
while true; do
  RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
    -H "Authorization: Bearer $API_TOKEN")
  COUNT=$(echo "$RESPONSE" | jq 'length')
  echo "$RESPONSE" | jq '.[]'
  [ "$COUNT" -lt 50 ] && break
  PAGE=$((PAGE + 1))
done
bash
PAGE=1
while true; do
  RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
    -H "Authorization: Bearer $API_TOKEN")
  COUNT=$(echo "$RESPONSE" | jq 'length')
  echo "$RESPONSE" | jq '.[]'
  [ "$COUNT" -lt 50 ] && break
  PAGE=$((PAGE + 1))
done

Timeout and Retry

超时与重试

bash
curl -s --connect-timeout 10 --max-time 30 \
  --retry 3 --retry-delay 2 \
  "https://api.example.com/data"
bash
curl -s --connect-timeout 10 --max-time 30 \
  --retry 3 --retry-delay 2 \
  "https://api.example.com/data"

Response Headers Inspection

响应头检查

bash
curl -sI "https://example.com" | grep -i "content-type"
bash
curl -sI "https://example.com" | grep -i "content-type"

Save Response with Status Code

保存响应并记录状态码

bash
HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'
bash
HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'

Cookie Handling

Cookie处理

bash
undefined
bash
undefined

Save cookies

保存Cookie

curl -s -c /tmp/cookies.txt "https://example.com/login"
-d "user=admin&pass=$PASSWORD"
curl -s -c /tmp/cookies.txt "https://example.com/login"
-d "user=admin&pass=$PASSWORD"

Reuse cookies

复用Cookie

curl -s -b /tmp/cookies.txt "https://example.com/dashboard"

---
curl -s -b /tmp/cookies.txt "https://example.com/dashboard"

---

Error Handling

错误处理

HTTP StatusMeaningResolution
301/302RedirectAdd
-L
flag to follow
401UnauthorizedCheck token/credentials; verify env var is set
403ForbiddenInsufficient permissions or IP restriction
404Not FoundVerify URL path; resource may be deleted
429Rate LimitedRespect
Retry-After
header; add delay between requests
500Server ErrorRetry once; if persistent, report upstream
SSL errorCertificate issueDo not use
-k
(insecure) — fix the root cause
TimeoutNetwork/server slowIncrease
--max-time
; check connectivity
Verify a URL is reachable before complex operations:
bash
curl -sI -o /dev/null -w "%{http_code}" "https://example.com"

HTTP状态码含义解决方法
301/302重定向添加
-L
参数跟随重定向
401未授权检查令牌/凭证;验证环境变量是否已设置
403禁止访问权限不足或IP被限制
404未找到验证URL路径;资源可能已被删除
429请求频率超限遵循
Retry-After
响应头;在请求间添加延迟
500服务器错误重试一次;若问题持续,上报上游服务
SSL错误证书问题请勿使用
-k
(不安全模式)—— 修复根本问题
超时网络/服务器缓慢增大
--max-time
参数值;检查网络连接
在执行复杂操作前,先验证URL是否可达:
bash
curl -sI -o /dev/null -w "%{http_code}" "https://example.com"

Limitations

局限性

  • WebFetch does not support custom headers, POST bodies, or cookies. Use
    curl
    for authenticated or stateful requests.
  • curl does not render JavaScript. For JS-heavy SPAs, prefer
    WebFetch
    which handles rendered content.
  • Large responses may exceed context limits. Pipe through
    jq
    ,
    head
    , or
    python3
    to extract only needed data before loading into context.
  • Binary content (images, PDFs, archives) should be saved to disk with
    -o
    , not piped to stdout.

  • WebFetch 不支持自定义请求头、POST请求体或Cookie管理。认证或有状态请求请使用
    curl
  • curl 不支持JavaScript渲染。对于重度依赖JS的单页应用(SPA),优先使用
    WebFetch
    ,它支持渲染后的内容。
  • 大体积响应 可能超出上下文限制。在加载到上下文前,通过
    jq
    head
    python3
    仅提取所需数据。
  • 二进制内容(图片、PDF、压缩包)应使用
    -o
    参数保存到磁盘,而非输出到控制台。

Calibration Rules

校准规则

  1. Default to WebFetch for reading web pages. It returns clean markdown, handles JS rendering, and requires no flags. Switch to curl only when you need headers, auth, POST, or raw format control.
  2. Always pipe JSON through jq. Raw JSON in context wastes tokens. Filter to only the fields needed.
  3. Never hardcode credentials. Use
    $ENV_VAR
    references. If the variable is not set, surface the error immediately.
  4. Follow redirects by default. Always use
    -L
    with curl unless you specifically need to inspect the redirect chain.
  5. Prefer
    -s
    (silent) on every curl call.
    Progress meters add noise to output.
  1. 默认使用WebFetch读取网页。它返回整洁的Markdown格式,支持JS渲染,无需额外参数。仅当需要请求头、认证、POST请求或原始格式控制时,切换到curl。
  2. JSON响应必须通过jq处理。上下文中原生JSON会浪费令牌,仅过滤保留所需字段。
  3. 切勿硬编码凭证。使用
    $ENV_VAR
    引用环境变量。若变量未设置,立即提示错误。
  4. 默认跟随重定向。curl命令默认添加
    -L
    参数,除非你需要检查重定向链路。
  5. 所有curl命令优先使用
    -s
    (静默模式)
    。进度条会干扰输出内容。