web-fetcher

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Chinese

Extract web page content as clean text/markdown from a given URL using a fallback chain of free services.

通过由免费服务组成的回退链路，从指定URL提取网页内容，输出纯净的文本/markdown格式。

bash

python3 <skill-path>/scripts/fetch.py <url>

Save to file:

bash

python3 <skill-path>/scripts/fetch.py <url> -o output.md

bash

python3 <skill-path>/scripts/fetch.py <url>

保存到文件：

bash

python3 <skill-path>/scripts/fetch.py <url> -o output.md

The script tries these sources in order, falling back on failure:

Jina Reader (
```
r.jina.ai/{url}
```
) — best markdown quality, supports JS-rendered pages
defuddle.md (
```
defuddle.md/{url}
```
) — by Obsidian creator @kepano
markdown.new (
```
markdown.new/{url}
```
) — 3-layer strategy with browser rendering fallback
OpenCLI — platform-specific commands with browser login state (zhihu, reddit, twitter, weibo)
Raw HTML — direct fetch as last resort

脚本会按顺序尝试以下数据源，失败时自动回退到下一个：

JS-rendered pages that WebFetch can't handle (Twitter/X, SPAs)
Login-required pages on supported platforms (zhihu, reddit, twitter, weibo, xiaohongshu)
Bulk content extraction
When you need clean markdown instead of summarized content

When free services fail, OpenCLI auto-detects the platform from URL and routes to the right command:

Requires:

npm i -g @jackwener/opencli

+ Browser Bridge extension in Chrome/Arc.

当免费服务失败时，OpenCLI会从URL自动检测平台并调用对应命令：

依赖要求：

npm i -g @jackwener/opencli

+ Chrome/Arc浏览器中的Browser Bridge扩展。

Service	Limit
Jina Reader	20 req/min (free), 10M token key available at jina.ai/reader
markdown.new	500 req/day/IP
defuddle.md	Not documented
OpenCLI	No documented limits (uses browser session)

服务	限制
Jina Reader	20次请求/分钟（免费版），可前往jina.ai/reader获取10M token密钥
markdown.new	500次请求/天/IP
defuddle.md	未公开限制
OpenCLI	无公开限制（使用浏览器会话）