tavily-extract
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesetavily extract
tavily extract
Extract clean markdown or text content from one or more URLs.
从一个或多个URL中提取干净的markdown或文本内容。
Prerequisites
前提条件
Requires the Tavily CLI. See tavily-cli for install and auth setup.
Quick install:
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login需要安装Tavily CLI。请查看tavily-cli了解安装和身份验证设置。
快速安装命令:
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly loginWhen to use
适用场景
- You have a specific URL and want its content
- You need text from JavaScript-rendered pages
- Step 2 in the workflow: search → extract → map → crawl → research
- 你拥有特定URL并需要获取其内容
- 你需要从JavaScript渲染的页面中提取文本
- 工作流的第二步:搜索 → 提取 → 映射 → 爬取 → 研究
Quick start
快速开始
bash
undefinedbash
undefinedSingle URL
单个URL
tvly extract "https://example.com/article" --json
tvly extract "https://example.com/article" --json
Multiple URLs
多个URL
tvly extract "https://example.com/page1" "https://example.com/page2" --json
tvly extract "https://example.com/page1" "https://example.com/page2" --json
Query-focused extraction (returns relevant chunks only)
基于查询的定向提取(仅返回相关内容块)
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json
JS-heavy pages
重度依赖JS的页面
tvly extract "https://app.example.com" --extract-depth advanced --json
tvly extract "https://app.example.com" --extract-depth advanced --json
Save to file
保存到文件
tvly extract "https://example.com/article" -o article.md
undefinedtvly extract "https://example.com/article" -o article.md
undefinedOptions
可选参数
| Option | Description |
|---|---|
| Rerank chunks by relevance to this query |
| Chunks per URL (1-5, requires |
| |
| |
| Include image URLs |
| Max wait time (1-60 seconds) |
| Save output to file |
| Structured JSON output |
| 参数 | 说明 |
|---|---|
| 根据该查询对内容块进行相关性重排序 |
| 每个URL返回的内容块数量(1-5,需配合 |
| |
| |
| 包含图片URL |
| 最长等待时间(1-60秒) |
| 将输出保存到文件 |
| 返回结构化JSON格式输出 |
Extract depth
提取深度
| Depth | When to use |
|---|---|
| Simple pages, fast — try this first |
| JS-rendered SPAs, dynamic content, tables |
| 深度 | 适用场景 |
|---|---|
| 简单页面,速度快 — 优先尝试该模式 |
| JS渲染的SPA、动态内容、表格 |
Tips
小贴士
- Max 20 URLs per request — batch larger lists into multiple calls.
- Use +
--queryto get only relevant content instead of full pages.--chunks-per-source - Try first, fall back to
basicif content is missing.advanced - Set for slow pages (up to 60s).
--timeout - If search results already contain the content you need (via ), skip the extract step.
--include-raw-content
- 单次请求最多支持20个URL — 若URL数量更多,请分批调用。
- 使用+
--query仅获取相关内容,而非完整页面。--chunks-per-source - 优先尝试模式,若内容缺失再切换到
basic模式。advanced - 设置适配加载缓慢的页面(最长60秒)。
--timeout - 如果搜索结果已包含所需内容(通过参数获取),可跳过提取步骤。
--include-raw-content
See also
相关技能
- tavily-search — find pages when you don't have a URL
- tavily-crawl — extract content from many pages on a site
- tavily-search — 当你没有具体URL时,用于查找页面
- tavily-crawl — 从网站的多个页面中提取内容