tavily-extract

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

tavily extract

Extract clean markdown or text content from one or more URLs.

从一个或多个URL中提取干净的markdown或文本内容。

Prerequisites

前提条件

Requires the Tavily CLI. See tavily-cli for install and auth setup.

Quick install:

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

需要安装Tavily CLI。请查看tavily-cli了解安装和身份验证设置。

快速安装命令：

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

When to use

适用场景

You have a specific URL and want its content
You need text from JavaScript-rendered pages
Step 2 in the workflow: search → extract → map → crawl → research

你拥有特定URL并需要获取其内容
你需要从JavaScript渲染的页面中提取文本
工作流的第二步：搜索 → 提取 → 映射 → 爬取 → 研究

Quick start

快速开始

bash

undefined

bash

undefined

Single URL

单个URL

tvly extract "https://example.com/article" --json

Multiple URLs

多个URL

tvly extract "https://example.com/page1" "https://example.com/page2" --json

Query-focused extraction (returns relevant chunks only)

基于查询的定向提取（仅返回相关内容块）

tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json

JS-heavy pages

重度依赖JS的页面

tvly extract "https://app.example.com" --extract-depth advanced --json

Save to file

保存到文件

tvly extract "https://example.com/article" -o article.md

undefined

tvly extract "https://example.com/article" -o article.md

undefined

Options

可选参数

Option	Description
`--query`	Rerank chunks by relevance to this query
`--chunks-per-source`	Chunks per URL (1-5, requires `--query` )
`--extract-depth`	`basic` (default) or `advanced` (for JS pages)
`--format`	`markdown` (default) or `text`
`--include-images`	Include image URLs
`--timeout`	Max wait time (1-60 seconds)
`-o, --output`	Save output to file
`--json`	Structured JSON output

参数	说明
`--query`	根据该查询对内容块进行相关性重排序
`--chunks-per-source`	每个URL返回的内容块数量（1-5，需配合 `--query` 使用）
`--extract-depth`	`basic` （默认）或 `advanced` （用于JS渲染页面）
`--format`	`markdown` （默认）或 `text`
`--include-images`	包含图片URL
`--timeout`	最长等待时间（1-60秒）
`-o, --output`	将输出保存到文件
`--json`	返回结构化JSON格式输出

Extract depth

提取深度

Depth	When to use
`basic`	Simple pages, fast — try this first
`advanced`	JS-rendered SPAs, dynamic content, tables

深度	适用场景
`basic`	简单页面，速度快 — 优先尝试该模式
`advanced`	JS渲染的SPA、动态内容、表格

Tips

小贴士

Max 20 URLs per request — batch larger lists into multiple calls.
Use
--query
+
--chunks-per-source
to get only relevant content instead of full pages.
Try
basic
first, fall back to
```
advanced
```
if content is missing.
Set
--timeout
for slow pages (up to 60s).
If search results already contain the content you need (via
```
--include-raw-content
```
), skip the extract step.

单次请求最多支持20个URL — 若URL数量更多，请分批调用。
使用
--query
+
--chunks-per-source
仅获取相关内容，而非完整页面。
优先尝试
basic
模式，若内容缺失再切换到
```
advanced
```
模式。
设置
--timeout
适配加载缓慢的页面（最长60秒）。
如果搜索结果已包含所需内容（通过
```
--include-raw-content
```
参数获取），可跳过提取步骤。

tavily-extract

Original

Translation

tavily extract

tavily extract

Prerequisites

前提条件

When to use

适用场景

Quick start

快速开始

Single URL

单个URL

Multiple URLs

多个URL

Query-focused extraction (returns relevant chunks only)

基于查询的定向提取（仅返回相关内容块）

JS-heavy pages

重度依赖JS的页面

Save to file

保存到文件

Options

可选参数

Extract depth

提取深度

Tips

小贴士

See also

相关技能