web-fetch

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Web Fetch

Web Fetch 网页内容抓取

All web content retrieval uses

curl

(Bash) or the built-in

WebFetch

tool. No MCP server needed — Claude Code's native tools cover every Fetch MCP operation with more control.

所有网页内容检索均使用

curl

（Bash）或内置的

WebFetch

工具。无需MCP服务器 —— Claude Code的原生工具可替代所有Fetch MCP操作，且提供更强的控制能力。

Quick Reference

快速参考

Fetch MCP Tool	Replacement	When to Use
`fetch_html`	`curl -s URL`	Raw HTML needed for parsing
`fetch_json`	`curl -s URL \| jq '.'`	API responses, structured data
`fetch_markdown`	`WebFetch`	Readable page content (default output is markdown)
`fetch_txt`	`curl -s URL` or `WebFetch`	Plain text extraction

Default choice: Use

WebFetch

for general page content. Use

curl

when you need headers, authentication, POST bodies, or raw format control.

Fetch MCP工具	替代方案	适用场景
`fetch_html`	`curl -s URL`	需要原始HTML用于解析时
`fetch_json`	`curl -s URL \| jq '.'`	API响应、结构化数据处理
`fetch_markdown`	`WebFetch`	可读的页面内容（默认输出为Markdown格式）
`fetch_txt`	`curl -s URL` 或 `WebFetch`	纯文本提取

默认选择： 一般页面内容获取使用

WebFetch

。当需要自定义请求头、认证、POST请求体或原始格式控制时，使用

curl

。

WebFetch (Built-in Tool)

WebFetch（内置工具）

The

WebFetch

tool fetches a URL and returns clean markdown content. It handles JavaScript-rendered pages, strips navigation and boilerplate, and returns readable text.

Best for: documentation pages, articles, blog posts, README files — any content where you want readable text rather than raw HTML.

Limitations: no custom headers, no POST bodies, no cookie management. Use

curl

for those.

WebFetch

工具可抓取URL并返回整洁的Markdown内容。它支持JavaScript渲染的页面，自动移除导航栏和冗余内容，返回可读文本。

最佳适用场景：文档页面、文章、博客、README文件 —— 所有需要可读文本而非原始HTML的内容。

局限性：不支持自定义请求头、POST请求体、Cookie管理。此类场景请使用

curl

。

curl Patterns

curl 使用模式

Fetch HTML

获取HTML

bash

curl -sL "https://example.com/page"

Flag	Purpose
`-s`	Silent mode — suppress progress meter
`-L`	Follow redirects (3xx)
`-o file.html`	Save to file instead of stdout
`-I`	Headers only (HEAD request)
`-i`	Include response headers in output

Fetch and extract specific elements with

xmllint

python3

bash

curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys

class TitleParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.in_title = False
        self.title = ''
    def handle_starttag(self, tag, attrs):
        self.in_title = tag == 'title'
    def handle_data(self, data):
        if self.in_title:
            self.title += data
    def handle_endtag(self, tag):
        if tag == 'title':
            self.in_title = False

p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"

bash

curl -sL "https://example.com/page"

参数	用途
`-s`	静默模式 —— 隐藏进度条
`-L`	跟随重定向（3xx状态码）
`-o file.html`	将结果保存到文件而非输出到控制台
`-I`	仅获取响应头（HEAD请求）
`-i`	在输出中包含响应头

结合

xmllint

或

python3

获取并提取特定元素：

bash

curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys

class TitleParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.in_title = False
        self.title = ''
    def handle_starttag(self, tag, attrs):
        self.in_title = tag == 'title'
    def handle_data(self, data):
        if self.in_title:
            self.title += data
    def handle_endtag(self, tag):
        if tag == 'title':
            self.in_title = False

p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"

Fetch JSON

获取JSON

bash

curl -s "https://api.example.com/v1/data" \
  -H "Accept: application/json" | jq '.'

Filter and reshape JSON responses:

bash

undefined

bash

curl -s "https://api.example.com/v1/data" \
  -H "Accept: application/json" | jq '.'

过滤和重构JSON响应：

bash

undefined

Extract specific fields

提取特定字段

curl -s "https://api.example.com/users" | jq '.[] | {name, email}'

Filter by condition

按条件过滤

curl -s "https://api.example.com/items" | jq '[.[] | select(.status == "active")]'

Count results

统计结果数量

curl -s "https://api.example.com/items" | jq 'length'

Get nested value

获取嵌套值

curl -s "https://api.example.com/config" | jq '.database.host'

undefined

curl -s "https://api.example.com/config" | jq '.database.host'

undefined

Fetch Plain Text

获取纯文本

bash

undefined

bash

undefined

Strip HTML tags for plain text

移除HTML标签提取纯文本

curl -sL "https://example.com/page" | python3 -c " import html.parser, sys

class Stripper(html.parser.HTMLParser): def init(self): super().init() self.text = [] def handle_data(self, d): self.text.append(d) def get_text(self): return ''.join(self.text)

s = Stripper() s.feed(sys.stdin.read()) print(s.get_text()) "


Or use `WebFetch` which returns clean markdown — close enough to plain text for most
purposes.

---

curl -sL "https://example.com/page" | python3 -c " import html.parser, sys

class Stripper(html.parser.HTMLParser): def init(self): super().init() self.text = [] def handle_data(self, d): self.text.append(d) def get_text(self): return ''.join(self.text)

s = Stripper() s.feed(sys.stdin.read()) print(s.get_text()) "


也可使用`WebFetch`，它返回的整洁Markdown格式在大多数场景下可替代纯文本。

---

Authenticated Requests

认证请求

Bearer Token

Bearer令牌

bash

curl -s "https://api.example.com/data" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json"

bash

curl -s "https://api.example.com/data" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json"

API Key in Header

请求头中携带API密钥

bash

curl -s "https://api.example.com/data" \
  -H "X-API-Key: $API_KEY"

bash

curl -s "https://api.example.com/data" \
  -H "X-API-Key: $API_KEY"

API Key in Query Parameter

查询参数中携带API密钥

bash

curl -s "https://api.example.com/data?api_key=$API_KEY"

bash

curl -s "https://api.example.com/data?api_key=$API_KEY"

Basic Auth

基础认证

bash

curl -s -u "username:$PASSWORD" "https://api.example.com/data"

Store credentials in environment variables. Never hardcode tokens or passwords in commands.

bash

curl -s -u "username:$PASSWORD" "https://api.example.com/data"

请将凭证存储在环境变量中。切勿在命令中硬编码令牌或密码。

POST, PUT, PATCH, DELETE

POST、PUT、PATCH、DELETE请求

POST with JSON Body

携带JSON请求体的POST

bash

curl -s -X POST "https://api.example.com/items" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_TOKEN" \
  -d '{
    "name": "item-name",
    "value": 42
  }' | jq '.'

bash

curl -s -X POST "https://api.example.com/items" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_TOKEN" \
  -d '{
    "name": "item-name",
    "value": 42
  }' | jq '.'

POST with Form Data

携带表单数据的POST

bash

curl -s -X POST "https://api.example.com/upload" \
  -F "file=@./document.pdf" \
  -F "description=Uploaded via curl"

bash

curl -s -X POST "https://api.example.com/upload" \
  -F "file=@./document.pdf" \
  -F "description=Uploaded via curl"

PUT (Full Update)

PUT（全量更新）

bash

curl -s -X PUT "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"name": "updated-name", "value": 99}' | jq '.'

bash

curl -s -X PUT "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"name": "updated-name", "value": 99}' | jq '.'

PATCH (Partial Update)

PATCH（部分更新）

bash

curl -s -X PATCH "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"value": 100}' | jq '.'

bash

curl -s -X PATCH "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"value": 100}' | jq '.'

DELETE

bash

curl -s -X DELETE "https://api.example.com/items/123" \
  -H "Authorization: Bearer $API_TOKEN"

bash

curl -s -X DELETE "https://api.example.com/items/123" \
  -H "Authorization: Bearer $API_TOKEN"

Advanced Patterns

高级使用模式

Pagination

分页处理

bash

PAGE=1
while true; do
  RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
    -H "Authorization: Bearer $API_TOKEN")
  COUNT=$(echo "$RESPONSE" | jq 'length')
  echo "$RESPONSE" | jq '.[]'
  [ "$COUNT" -lt 50 ] && break
  PAGE=$((PAGE + 1))
done

bash

PAGE=1
while true; do
  RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
    -H "Authorization: Bearer $API_TOKEN")
  COUNT=$(echo "$RESPONSE" | jq 'length')
  echo "$RESPONSE" | jq '.[]'
  [ "$COUNT" -lt 50 ] && break
  PAGE=$((PAGE + 1))
done

Timeout and Retry

超时与重试

bash

curl -s --connect-timeout 10 --max-time 30 \
  --retry 3 --retry-delay 2 \
  "https://api.example.com/data"

bash

curl -s --connect-timeout 10 --max-time 30 \
  --retry 3 --retry-delay 2 \
  "https://api.example.com/data"

Response Headers Inspection

响应头检查

bash

curl -sI "https://example.com" | grep -i "content-type"

bash

curl -sI "https://example.com" | grep -i "content-type"

Save Response with Status Code

保存响应并记录状态码

bash

HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'

bash

HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'

Cookie Handling

Cookie处理

bash

undefined

bash

undefined

Save cookies

保存Cookie

curl -s -c /tmp/cookies.txt "https://example.com/login"
-d "user=admin&pass=$PASSWORD"

Reuse cookies

复用Cookie

curl -s -b /tmp/cookies.txt "https://example.com/dashboard"

---

curl -s -b /tmp/cookies.txt "https://example.com/dashboard"

---

Error Handling

错误处理

HTTP Status	Meaning	Resolution
301/302	Redirect	Add `-L` flag to follow
401	Unauthorized	Check token/credentials; verify env var is set
403	Forbidden	Insufficient permissions or IP restriction
404	Not Found	Verify URL path; resource may be deleted
429	Rate Limited	Respect `Retry-After` header; add delay between requests
500	Server Error	Retry once; if persistent, report upstream
SSL error	Certificate issue	Do not use `-k` (insecure) — fix the root cause
Timeout	Network/server slow	Increase `--max-time` ; check connectivity

Verify a URL is reachable before complex operations:

bash

curl -sI -o /dev/null -w "%{http_code}" "https://example.com"

HTTP状态码	含义	解决方法
301/302	重定向	添加 `-L` 参数跟随重定向
401	未授权	检查令牌/凭证；验证环境变量是否已设置
403	禁止访问	权限不足或IP被限制
404	未找到	验证URL路径；资源可能已被删除
429	请求频率超限	遵循 `Retry-After` 响应头；在请求间添加延迟
500	服务器错误	重试一次；若问题持续，上报上游服务
SSL错误	证书问题	请勿使用 `-k` （不安全模式）—— 修复根本问题
超时	网络/服务器缓慢	增大 `--max-time` 参数值；检查网络连接

在执行复杂操作前，先验证URL是否可达：

bash

curl -sI -o /dev/null -w "%{http_code}" "https://example.com"

Limitations

局限性

WebFetch does not support custom headers, POST bodies, or cookies. Use
```
curl
```
for authenticated or stateful requests.
curl does not render JavaScript. For JS-heavy SPAs, prefer
```
WebFetch
```
which handles rendered content.
Large responses may exceed context limits. Pipe through
```
jq
```
,
```
head
```
, or
```
python3
```
to extract only needed data before loading into context.
Binary content (images, PDFs, archives) should be saved to disk with
```
-o
```
, not piped to stdout.

WebFetch 不支持自定义请求头、POST请求体或Cookie管理。认证或有状态请求请使用
```
curl
```
。
curl 不支持JavaScript渲染。对于重度依赖JS的单页应用（SPA），优先使用
```
WebFetch
```
，它支持渲染后的内容。
大体积响应 可能超出上下文限制。在加载到上下文前，通过
```
jq
```
、
```
head
```
或
```
python3
```
仅提取所需数据。
二进制内容（图片、PDF、压缩包）应使用
```
-o
```
参数保存到磁盘，而非输出到控制台。

Calibration Rules

校准规则

Default to WebFetch for reading web pages. It returns clean markdown, handles JS rendering, and requires no flags. Switch to curl only when you need headers, auth, POST, or raw format control.
Always pipe JSON through jq. Raw JSON in context wastes tokens. Filter to only the fields needed.
Never hardcode credentials. Use
```
$ENV_VAR
```
references. If the variable is not set, surface the error immediately.
Follow redirects by default. Always use
```
-L
```
with curl unless you specifically need to inspect the redirect chain.
Prefer
-s
(silent) on every curl call. Progress meters add noise to output.

默认使用WebFetch读取网页。它返回整洁的Markdown格式，支持JS渲染，无需额外参数。仅当需要请求头、认证、POST请求或原始格式控制时，切换到curl。
JSON响应必须通过jq处理。上下文中原生JSON会浪费令牌，仅过滤保留所需字段。
切勿硬编码凭证。使用
```
$ENV_VAR
```
引用环境变量。若变量未设置，立即提示错误。
默认跟随重定向。curl命令默认添加
```
-L
```
参数，除非你需要检查重定向链路。
所有curl命令优先使用
-s
（静默模式）。进度条会干扰输出内容。