url-to-markdown
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseURL to Markdown
URL 转 Markdown
Fetches any URL via Chrome CDP and converts HTML to clean markdown.
通过Chrome CDP抓取任意URL,并将HTML转换为格式整洁的Markdown。
Script Directory
脚本目录
Important: All scripts are located in subdirectory of this skill.
scripts/Agent Execution Instructions:
- Determine this SKILL.md file's directory path as
SKILL_DIR - Script path =
${SKILL_DIR}/scripts/<script-name>.ts - Replace all in this document with actual path
${SKILL_DIR}
Script Reference:
| Script | Purpose |
|---|---|
| CLI entry point for URL fetching |
重要提示:所有脚本均位于此skill的子目录中。
scripts/Agent执行说明:
- 确定此SKILL.md文件的目录路径为
SKILL_DIR - 脚本路径 =
${SKILL_DIR}/scripts/<script-name>.ts - 将本文档中所有替换为实际路径
${SKILL_DIR}
脚本参考:
| 脚本 | 用途 |
|---|---|
| URL抓取的CLI入口 |
Features
功能特性
- Chrome CDP for full JavaScript rendering
- Two capture modes: auto or wait-for-user
- Clean markdown output with metadata
- Handles login-required pages via wait mode
- 基于Chrome CDP实现完整JavaScript渲染
- 两种捕获模式:自动模式或等待用户模式
- 带有元数据的整洁Markdown输出
- 通过等待模式处理需要登录的页面
Usage
使用方法
bash
undefinedbash
undefinedAuto mode (default) - capture when page loads
自动模式(默认)- 页面加载完成后捕获
npx -y bun ${SKILL_DIR}/scripts/main.ts <url>
npx -y bun ${SKILL_DIR}/scripts/main.ts <url>
Wait mode - wait for user signal before capture
等待模式 - 等待用户信号后再捕获
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait
Save to specific file
保存到指定文件
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md
undefinednpx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md
undefinedOptions
选项参数
| Option | Description |
|---|---|
| URL to fetch |
| Output file path (default: auto-generated) |
| Wait for user signal before capturing |
| Page load timeout (default: 30000) |
| 选项 | 说明 |
|---|---|
| 要抓取的URL |
| 输出文件路径(默认:自动生成) |
| 捕获前等待用户信号 |
| 页面加载超时时间(默认:30000) |
Capture Modes
捕获模式
Auto Mode (default)
自动模式(默认)
Page loads → waits for network idle → captures immediately.
Best for:
- Public pages
- Static content
- No login required
页面加载完成 → 等待网络空闲 → 立即捕获。
最适用于:
- 公开页面
- 静态内容
- 无需登录的页面
Wait Mode (--wait
)
--wait等待模式(--wait
)
--waitPage opens → user can interact (login, scroll, etc.) → user signals ready → captures.
Best for:
- Login-required pages
- Dynamic content needing interaction
- Pages with lazy loading
Agent workflow for wait mode:
- Run script with flag
--wait - Script outputs:
Page opened. Press Enter when ready to capture... - Use to ask user if page is ready
AskUserQuestion - When user confirms, send newline to stdin to trigger capture
页面打开 → 用户可进行交互(登录、滚动等)→ 用户确认准备就绪 → 开始捕获。
最适用于:
- 需要登录的页面
- 需要交互的动态内容
- 带有懒加载的页面
等待模式下的Agent工作流:
- 使用参数运行脚本
--wait - 脚本输出:
Page opened. Press Enter when ready to capture... - 使用询问用户页面是否准备就绪
AskUserQuestion - 用户确认后,向标准输入发送换行符触发捕获
Output Format
输出格式
markdown
---
url: https://example.com/page
title: "Page Title"
description: "Meta description if available"
author: "Author if available"
published: "2024-01-01"
captured_at: "2024-01-15T10:30:00Z"
---markdown
---
url: https://example.com/page
title: "Page Title"
description: "Meta description if available"
author: "Author if available"
published: "2024-01-01"
captured_at: "2024-01-15T10:30:00Z"
---Page Title
Page Title
Converted markdown content...
undefinedConverted markdown content...
undefinedMode Selection Guide
模式选择指南
When user requests URL capture, help select appropriate mode:
Suggest Auto Mode when:
- URL is public (no login wall visible)
- Content appears static
- User doesn't mention login requirements
Suggest Wait Mode when:
- User mentions needing to log in
- Site known to require authentication
- User wants to scroll/interact before capture
- Content is behind paywall
Ask user when unclear:
The page may require login or interaction before capturing.
Which mode should I use?
1. Auto - Capture immediately when loaded
2. Wait - Wait for you to interact first当用户请求URL捕获时,帮助选择合适的模式:
建议使用自动模式的场景:
- URL为公开页面(无登录墙)
- 内容为静态
- 用户未提及登录需求
建议使用等待模式的场景:
- 用户提到需要登录
- 已知该网站需要身份验证
- 用户希望先进行滚动/交互再捕获
- 内容位于付费墙之后
不确定时询问用户:
该页面可能需要登录或交互后才能捕获。
请问要使用哪种模式?
1. 自动模式 - 页面加载完成后立即捕获
2. 等待模式 - 先等待您完成交互Output Directory
输出目录
Each capture creates a file organized by domain:
url-to-markdown/
└── <domain>/
└── <slug>.mdPath Components:
- : Site domain (e.g.,
<domain>,example.com)github.com - : Generated from page title or URL path (kebab-case)
<slug>
Slug Generation:
- Extract from page title (preferred) or URL path
- Convert to kebab-case, 2-6 words
- Example: "Getting Started with React" →
getting-started-with-react
Conflict Resolution:
If already exists:
url-to-markdown/<domain>/<slug>.md- Append timestamp:
<slug>-YYYYMMDD-HHMMSS.md - Example: exists →
getting-started.mdgetting-started-20260118-143052.md
每次捕获都会生成一个按域名组织的文件:
url-to-markdown/
└── <domain>/
└── <slug>.md路径组成:
- :网站域名(例如:
<domain>、example.com)github.com - :由页面标题或URL路径生成(短横线分隔的小写格式)
<slug>
Slug生成规则:
- 优先从页面标题提取,其次是URL路径
- 转换为短横线分隔的小写格式,保留2-6个单词
- 示例:"Getting Started with React" →
getting-started-with-react
冲突解决:
如果已存在:
url-to-markdown/<domain>/<slug>.md- 追加时间戳:
<slug>-YYYYMMDD-HHMMSS.md - 示例:已存在 →
getting-started.mdgetting-started-20260118-143052.md
Error Handling
错误处理
| Error | Resolution |
|---|---|
| Chrome not found | Install Chrome or set |
| Page timeout | Increase |
| Capture failed | Try wait mode for complex pages |
| Empty content | Page may need JS rendering time |
| 错误 | 解决方法 |
|---|---|
| 未找到Chrome | 安装Chrome或设置 |
| 页面超时 | 增大 |
| 捕获失败 | 尝试对复杂页面使用等待模式 |
| 内容为空 | 页面可能需要更多JavaScript渲染时间 |
Environment Variables
环境变量
| Variable | Description |
|---|---|
| Custom Chrome executable path |
| Custom data directory |
| Custom Chrome profile directory |
| 变量 | 说明 |
|---|---|
| 自定义Chrome可执行文件路径 |
| 自定义数据目录 |
| 自定义Chrome配置文件目录 |
Extension Support
扩展支持
Custom configurations via EXTEND.md.
Check paths (priority order):
- (project)
.content-gen-skills/url-to-markdown/EXTEND.md - (user)
~/.content-gen-skills/url-to-markdown/EXTEND.md
If found, load before workflow. Extension content overrides defaults.
通过EXTEND.md进行自定义配置。
路径检查优先级:
- (项目级)
.content-gen-skills/url-to-markdown/EXTEND.md - (用户级)
~/.content-gen-skills/url-to-markdown/EXTEND.md
如果找到该文件,将在工作流开始前加载。扩展内容将覆盖默认设置。