web-to-markdown

Original：🇺🇸 English

Translated

4 scripts

Convert a web URL into cleaned Markdown with deterministic routing. Use when Codex needs to read article-like content from links and should apply source-aware fetch strategies: default to r.jina.ai for general pages (including X/Twitter), use defuddle.md for YouTube links, and use browser-impersonated extraction for WeChat/Zhihu/Feishu pages with Mozilla Readability cleanup.

7installs

Sourcerookie-ricardo/erduo-skills

Added on2026-04-08

NPX Install

npx skill4agent add rookie-ricardo/erduo-skills web-to-markdown

SKILL.md Content

View Translation Comparison →

Web To Markdown

Convert URLs into usable Markdown by applying domain-aware fetching routes, then return the cleaned content directly.

Quick Workflow

Normalize and validate the input URL.
Select route:

```
r.jina.ai
```
: general web + X/Twitter.
```
defuddle.md
```
: YouTube transcript/content extraction.
```
special-browser-fetch
```
: WeChat/Zhihu/Feishu.

Return markdown text (or JSON metadata if needed).

For generic URLs (non-YouTube, non-WeChat/Zhihu/Feishu), use this fallback chain:

try
```
r.jina.ai
```
first,
if it fails, fallback to direct HTTP fetch + Readability,
if direct fetch still fails or returns shell-like content, fallback to browser extraction.

Commands

Run from this skill directory (

skills/web-to-markdown

):

bash

npm install
node scripts/url_to_markdown.mjs <url>

Return metadata with markdown:

bash

node scripts/url_to_markdown.mjs <url> --json

Force special-site browser extraction:

bash

node scripts/fetch_special_sites.mjs <url> --json

Routing Policy

Default route:
```
https://r.jina.ai/<url>
```
.

YouTube (

youtube.com

,

youtu.be

):

https://defuddle.md/<url>

.

X/Twitter (
```
x.com
```
,
```
twitter.com
```
):
```
https://r.jina.ai/<url>
```
.
WeChat/Zhihu/Feishu: run
```
scripts/fetch_special_sites.mjs
```
.
If input is already proxy-formatted (
```
https://defuddle.md/https://...
```
or
```
https://r.jina.ai/https://...
```
), normalize back to the original URL and re-apply routing.

Special-Site Extraction Behavior

Use a two-stage strategy for WeChat/Zhihu/Feishu:

Try
```
cuimp
```
HTTP/TLS impersonation first, then clean HTML with Mozilla Readability.
If stage 1 fails or returns blocked/shell content, fallback to
```
puppeteer-extra
```
browser impersonation.

HTTP stage impersonates modern Chrome TLS/HTTP profile via
```
cuimp
```
.
Browser stage impersonates a modern Chrome user agent and standard
```
sec-ch-ua
```
headers.
Remove known login modals and backdrop overlays (best effort).
Scroll the page to trigger lazy-loaded article blocks.
Parse cleaned document with Mozilla Readability.
Convert extracted HTML body to Markdown via Turndown.
Resolve browser executable from
```
CHROME_PATH
```
first, then system Chrome/Chromium/Edge paths.

If special-site extraction fails due to anti-bot checks, account-only pages, or network limits, report failure clearly and ask for fallback input (for example raw page text).

Output Contract

For normal usage, output markdown only.

When

--json

is used, return:

source

: backend source (

r.jina.ai

,

defuddle

,

cuimp

,

browser-readability

).

strategy

: selected route (

r-jina

,

defuddle

,

special-http-fetch

,

special-browser-fetch-fallback

).

```
requestedUrl
```
: original input.
```
resolvedUrl
```
: normalized/final URL.
```
markdown
```
: extracted markdown body.

Resources

references/routing-and-notes.md: domain routing rules and operational caveats.
```
scripts/url_to_markdown.mjs
```
: primary entrypoint.
```
scripts/fetch_special_sites_http.mjs
```
: WeChat/Zhihu/Feishu HTTP impersonation fetcher (
```
cuimp
```
JS).
```
scripts/fetch_special_sites.mjs
```
: two-stage extractor (HTTP-first, browser-fallback).

web-to-markdown

NPX Install

Tags

SKILL.md Content

Web To Markdown

Quick Workflow

Commands

Routing Policy

Special-Site Extraction Behavior

Output Contract

Resources