just-scrape

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Web Scraping with just-scrape

使用just-scrape进行网页抓取

AI-powered web scraping CLI by ScrapeGraph AI. Get an API key at dashboard.scrapegraphai.com.

由ScrapeGraph AI开发的AI驱动型网页抓取CLI工具。请前往dashboard.scrapegraphai.com获取API密钥。

Setup

安装配置

Always install or run the

@latest

version to ensure you have the most recent features and fixes.

bash

npm install -g just-scrape@latest           # npm
pnpm add -g just-scrape@latest              # pnpm
yarn global add just-scrape@latest          # yarn
bun add -g just-scrape@latest               # bun
npx just-scrape@latest --help               # run without installing
bunx just-scrape@latest --help              # run without installing (bun)

bash

export SGAI_API_KEY="sgai-..."

API key resolution order:

SGAI_API_KEY

env var →

.env

file →

~/.scrapegraphai/config.json

→ interactive prompt (saves to config).

请始终安装或运行

@latest

版本，以确保获取最新功能和修复补丁。

bash

npm install -g just-scrape@latest           # npm
pnpm add -g just-scrape@latest              # pnpm
yarn global add just-scrape@latest          # yarn
bun add -g just-scrape@latest               # bun
npx just-scrape@latest --help               # 无需安装直接运行
bunx just-scrape@latest --help              # 无需安装直接运行（bun）

bash

export SGAI_API_KEY="sgai-..."

API密钥优先级顺序：

SGAI_API_KEY

环境变量 →

.env

文件 →

~/.scrapegraphai/config.json

→ 交互式提示（会保存到配置文件）。

Command Selection

命令选择

Need	Command
Extract structured data from a known URL	`smart-scraper`
Search the web and extract from results	`search-scraper`
Convert a page to clean markdown	`markdownify`
Crawl multiple pages from a site	`crawl`
Get raw HTML	`scrape`
Automate browser actions (login, click, fill)	`agentic-scraper`
Generate a JSON schema from description	`generate-schema`
Get all URLs from a sitemap	`sitemap`
Check credit balance	`credits`
Browse past requests	`history`
Validate API key	`validate`

需求	命令
从已知URL提取结构化数据	`smart-scraper`
搜索网络并从结果中提取数据	`search-scraper`
将网页转换为简洁的Markdown格式	`markdownify`
爬取网站的多个页面	`crawl`
获取原始HTML内容	`scrape`
自动化浏览器操作（登录、点击、填写表单）	`agentic-scraper`
根据描述生成JSON Schema	`generate-schema`
从站点地图获取所有URL	`sitemap`
查看积分余额	`credits`
浏览历史请求记录	`history`
验证API密钥有效性	`validate`

Common Flags

通用参数

All commands support

--json

for machine-readable output (suppresses banner, spinners, prompts).

Scraping commands share these optional flags:

```
--stealth
```
— bypass anti-bot detection (+4 credits)
```
--headers <json>
```
— custom HTTP headers as JSON string
```
--schema <json>
```
— enforce output JSON schema

所有命令均支持

--json

参数以生成机器可读的输出（会隐藏横幅、加载动画和提示信息）。

抓取类命令支持以下可选参数：

```
--stealth
```
— 绕过反机器人检测（额外消耗4个积分）
```
--headers <json>
```
— 自定义HTTP请求头，格式为JSON字符串
```
--schema <json>
```
— 强制输出符合指定的JSON Schema

Commands

详细命令说明

Smart Scraper

Extract structured data from any URL using AI.

bash

just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n>     # infinite scroll (0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n>       # multi-page (1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth         # anti-bot (+4 credits)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-text

bash

undefined

使用AI从任意URL提取结构化数据。

bash

just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n>     # 无限滚动（取值范围0-100）
just-scrape smart-scraper <url> -p <prompt> --pages <n>       # 多页面抓取（取值范围1-100）
just-scrape smart-scraper <url> -p <prompt> --stealth         # 反机器人模式（额外消耗4个积分）
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-text

bash

undefined

E-commerce extraction

电商数据提取

just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"

Strict schema + scrolling

严格Schema校验+滚动加载

just-scrape smart-scraper https://news.example.com -p "Get headlines and dates"
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'
--scrolls 5

JS-heavy SPA behind anti-bot

受反机器人保护的JS单页应用

just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats"
--stealth

undefined

just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats"
--stealth

undefined

Search Scraper

Search the web and extract structured data from results.

bash

just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n>     # sources to scrape (3-20, default 3)
just-scrape search-scraper <prompt> --no-extraction       # markdown only (2 credits vs 10)
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>

bash

undefined

搜索网络并从结果中提取结构化数据。

bash

just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n>     # 要抓取的结果源数量（3-20，默认3）
just-scrape search-scraper <prompt> --no-extraction       # 仅生成Markdown格式（消耗2个积分，而非10个）
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>

bash

undefined

Research across sources

跨数据源调研

just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10

Cheap markdown-only

低成本仅获取Markdown内容

just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5

Structured output

结构化输出

just-scrape search-scraper "Top 5 cloud providers pricing"
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'

undefined

undefined

Markdownify

Convert any webpage to clean markdown.

bash

just-scrape markdownify <url>
just-scrape markdownify <url> --stealth         # +4 credits
just-scrape markdownify <url> --headers <json>

bash

just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md

将任意网页转换为简洁的Markdown格式。

bash

just-scrape markdownify <url>
just-scrape markdownify <url> --stealth         # 额外消耗4个积分
just-scrape markdownify <url> --headers <json>

bash

just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md

Crawl

Crawl multiple pages and extract data from each.

bash

just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n>      # default 10
just-scrape crawl <url> -p <prompt> --depth <n>           # default 1
just-scrape crawl <url> --no-extraction --max-pages <n>   # markdown only (2 credits/page)
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json>        # include_paths, same_domain
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealth

bash

undefined

爬取多个页面并从每个页面提取数据。

bash

just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n>      # 最大爬取页面数（默认10）
just-scrape crawl <url> -p <prompt> --depth <n>           # 爬取深度（默认1）
just-scrape crawl <url> --no-extraction --max-pages <n>   # 仅生成Markdown格式（每页消耗2个积分）
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json>        # 包含路径、同域名限制等规则
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealth

bash

undefined

Crawl docs site

爬取文档站点

just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3

Filter to blog pages only

仅过滤博客页面

just-scrape crawl https://example.com -p "Extract article titles"
--rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50

Raw markdown, no AI extraction (cheaper)

仅获取原始Markdown，不使用AI提取（成本更低）

just-scrape crawl https://example.com --no-extraction --max-pages 10

undefined

just-scrape crawl https://example.com --no-extraction --max-pages 10

undefined

Scrape

Get raw HTML content from a URL.

bash

just-scrape scrape <url>
just-scrape scrape <url> --stealth          # +4 credits
just-scrape scrape <url> --branding         # extract logos/colors/fonts (+2 credits)
just-scrape scrape <url> --country-code <iso>

bash

just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --branding

从URL获取原始HTML内容。

bash

just-scrape scrape <url>
just-scrape scrape <url> --stealth          # 额外消耗4个积分
just-scrape scrape <url> --branding         # 提取Logo/颜色/字体（额外消耗2个积分）
just-scrape scrape <url> --country-code <iso>

bash

just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --branding

Agentic Scraper

Browser automation with AI — login, click, navigate, fill forms. Steps are comma-separated strings.

bash

just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session   # persist browser session

bash

undefined

基于AI的浏览器自动化——支持登录、点击、导航、填写表单。步骤为逗号分隔的字符串。

bash

just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session   # 持久化浏览器会话

bash

undefined

Login + extract dashboard

登录并提取仪表盘数据

just-scrape agentic-scraper https://app.example.com/login
-s "Fill email with user@test.com,Fill password with secret,Click Sign In"
--ai-extraction -p "Extract all dashboard metrics"

Multi-step form

多步骤表单填写

just-scrape agentic-scraper https://example.com/wizard
-s "Click Next,Select Premium plan,Fill name with John,Click Submit"

Persistent session across runs

跨运行会话持久化

just-scrape agentic-scraper https://app.example.com
-s "Click Settings" --use-session

undefined

just-scrape agentic-scraper https://app.example.com
-s "Click Settings" --use-session

undefined

Generate Schema

Generate a JSON schema from a natural language description.

bash

just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>

bash

just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"

根据自然语言描述生成JSON Schema。

bash

just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>

bash

just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"

Refine an existing schema

优化现有Schema

just-scrape generate-schema "Add an availability field"
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'

undefined

just-scrape generate-schema "Add an availability field"
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'

undefined

Sitemap

Get all URLs from a website's sitemap.

bash

just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'

从网站的站点地图获取所有URL。

bash

just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'

History

Browse request history. Interactive by default (arrow keys to navigate, select to view details).

bash

just-scrape history <service>                     # interactive browser
just-scrape history <service> <request-id>        # specific request
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n>     # max 100
just-scrape history <service> --json

Services:

markdownify

smartscraper

searchscraper

scrape

crawl

agentic-scraper

sitemap

bash

just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'

浏览历史请求记录。默认以交互式方式展示（使用方向键导航，选中可查看详情）。

bash

just-scrape history <service>                     # 交互式浏览器
just-scrape history <service> <request-id>        # 查看特定请求
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n>     # 最大100条
just-scrape history <service> --json

支持的服务类型：

markdownify

smartscraper

searchscraper

scrape

crawl

agentic-scraper

sitemap

bash

just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'

Credits & Validate

积分查询与密钥验证

bash

just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validate

bash

just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validate

Common Patterns

常见使用模式

Generate schema then scrape with it

先生成Schema再进行抓取

bash

just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"

bash

just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"

Pipe JSON for scripting

管道传输JSON用于脚本编写

bash

just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
  just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
done

bash

just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
  just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
done

Protected sites

受保护站点处理

bash

undefined

bash

undefined

JS-heavy SPA behind Cloudflare

受Cloudflare保护的JS单页应用

just-scrape smart-scraper https://protected.example.com -p "Extract data" --stealth

With custom cookies/headers

携带自定义Cookie/请求头

just-scrape smart-scraper https://example.com -p "Extract data"
--cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'

undefined

just-scrape smart-scraper https://example.com -p "Extract data"
--cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'

undefined

Credit Costs

积分消耗说明

Feature	Extra Credits
`--stealth`	+4 per request
`--branding` (scrape only)	+2
`search-scraper` extraction	10 per request
`search-scraper --no-extraction`	2 per request
`crawl --no-extraction`	2 per page

功能	额外消耗积分
`--stealth`	每次请求+4
`--branding` （仅scrape命令支持）	+2
`search-scraper` 数据提取	每次请求10
`search-scraper --no-extraction`	每次请求2
`crawl --no-extraction`	每页2

Environment Variables

环境变量

bash

SGAI_API_KEY=sgai-...              # API key
JUST_SCRAPE_TIMEOUT_S=300          # Request timeout in seconds (default 120)
JUST_SCRAPE_DEBUG=1                # Debug logging to stderr

bash

SGAI_API_KEY=sgai-...              # API密钥
JUST_SCRAPE_TIMEOUT_S=300          # 请求超时时间（秒，默认120）
JUST_SCRAPE_DEBUG=1                # 开启调试日志（输出到stderr）