just-scrape

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Web Scraping with just-scrape

使用just-scrape进行网页抓取

AI-powered web scraping CLI by ScrapeGraph AI. Get an API key at dashboard.scrapegraphai.com.
ScrapeGraph AI开发的AI驱动型网页抓取CLI工具。请前往dashboard.scrapegraphai.com获取API密钥。

Setup

安装配置

Always install or run the
@latest
version to ensure you have the most recent features and fixes.
bash
npm install -g just-scrape@latest           # npm
pnpm add -g just-scrape@latest              # pnpm
yarn global add just-scrape@latest          # yarn
bun add -g just-scrape@latest               # bun
npx just-scrape@latest --help               # run without installing
bunx just-scrape@latest --help              # run without installing (bun)
bash
export SGAI_API_KEY="sgai-..."
API key resolution order:
SGAI_API_KEY
env var →
.env
file →
~/.scrapegraphai/config.json
→ interactive prompt (saves to config).
请始终安装或运行
@latest
版本,以确保获取最新功能和修复补丁。
bash
npm install -g just-scrape@latest           # npm
pnpm add -g just-scrape@latest              # pnpm
yarn global add just-scrape@latest          # yarn
bun add -g just-scrape@latest               # bun
npx just-scrape@latest --help               # 无需安装直接运行
bunx just-scrape@latest --help              # 无需安装直接运行(bun)
bash
export SGAI_API_KEY="sgai-..."
API密钥优先级顺序:
SGAI_API_KEY
环境变量 →
.env
文件 →
~/.scrapegraphai/config.json
→ 交互式提示(会保存到配置文件)。

Command Selection

命令选择

NeedCommand
Extract structured data from a known URL
smart-scraper
Search the web and extract from results
search-scraper
Convert a page to clean markdown
markdownify
Crawl multiple pages from a site
crawl
Get raw HTML
scrape
Automate browser actions (login, click, fill)
agentic-scraper
Generate a JSON schema from description
generate-schema
Get all URLs from a sitemap
sitemap
Check credit balance
credits
Browse past requests
history
Validate API key
validate
需求命令
从已知URL提取结构化数据
smart-scraper
搜索网络并从结果中提取数据
search-scraper
将网页转换为简洁的Markdown格式
markdownify
爬取网站的多个页面
crawl
获取原始HTML内容
scrape
自动化浏览器操作(登录、点击、填写表单)
agentic-scraper
根据描述生成JSON Schema
generate-schema
从站点地图获取所有URL
sitemap
查看积分余额
credits
浏览历史请求记录
history
验证API密钥有效性
validate

Common Flags

通用参数

All commands support
--json
for machine-readable output (suppresses banner, spinners, prompts).
Scraping commands share these optional flags:
  • --stealth
    — bypass anti-bot detection (+4 credits)
  • --headers <json>
    — custom HTTP headers as JSON string
  • --schema <json>
    — enforce output JSON schema
所有命令均支持
--json
参数以生成机器可读的输出(会隐藏横幅、加载动画和提示信息)。
抓取类命令支持以下可选参数:
  • --stealth
    — 绕过反机器人检测(额外消耗4个积分)
  • --headers <json>
    — 自定义HTTP请求头,格式为JSON字符串
  • --schema <json>
    — 强制输出符合指定的JSON Schema

Commands

详细命令说明

Smart Scraper

Smart Scraper

Extract structured data from any URL using AI.
bash
just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n>     # infinite scroll (0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n>       # multi-page (1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth         # anti-bot (+4 credits)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-text
bash
undefined
使用AI从任意URL提取结构化数据。
bash
just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n>     # 无限滚动(取值范围0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n>       # 多页面抓取(取值范围1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth         # 反机器人模式(额外消耗4个积分)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-text
bash
undefined

E-commerce extraction

电商数据提取

just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"
just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"

Strict schema + scrolling

严格Schema校验+滚动加载

just-scrape smart-scraper https://news.example.com -p "Get headlines and dates"
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'
--scrolls 5
just-scrape smart-scraper https://news.example.com -p "Get headlines and dates"
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'
--scrolls 5

JS-heavy SPA behind anti-bot

受反机器人保护的JS单页应用

just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats"
--stealth
undefined
just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats"
--stealth
undefined

Search Scraper

Search Scraper

Search the web and extract structured data from results.
bash
just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n>     # sources to scrape (3-20, default 3)
just-scrape search-scraper <prompt> --no-extraction       # markdown only (2 credits vs 10)
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>
bash
undefined
搜索网络并从结果中提取结构化数据。
bash
just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n>     # 要抓取的结果源数量(3-20,默认3)
just-scrape search-scraper <prompt> --no-extraction       # 仅生成Markdown格式(消耗2个积分,而非10个)
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>
bash
undefined

Research across sources

跨数据源调研

just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10
just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10

Cheap markdown-only

低成本仅获取Markdown内容

just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5
just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5

Structured output

结构化输出

just-scrape search-scraper "Top 5 cloud providers pricing"
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'
undefined
just-scrape search-scraper "Top 5 cloud providers pricing"
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'
undefined

Markdownify

Markdownify

Convert any webpage to clean markdown.
bash
just-scrape markdownify <url>
just-scrape markdownify <url> --stealth         # +4 credits
just-scrape markdownify <url> --headers <json>
bash
just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md
将任意网页转换为简洁的Markdown格式。
bash
just-scrape markdownify <url>
just-scrape markdownify <url> --stealth         # 额外消耗4个积分
just-scrape markdownify <url> --headers <json>
bash
just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md

Crawl

Crawl

Crawl multiple pages and extract data from each.
bash
just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n>      # default 10
just-scrape crawl <url> -p <prompt> --depth <n>           # default 1
just-scrape crawl <url> --no-extraction --max-pages <n>   # markdown only (2 credits/page)
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json>        # include_paths, same_domain
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealth
bash
undefined
爬取多个页面并从每个页面提取数据。
bash
just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n>      # 最大爬取页面数(默认10)
just-scrape crawl <url> -p <prompt> --depth <n>           # 爬取深度(默认1)
just-scrape crawl <url> --no-extraction --max-pages <n>   # 仅生成Markdown格式(每页消耗2个积分)
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json>        # 包含路径、同域名限制等规则
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealth
bash
undefined

Crawl docs site

爬取文档站点

just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3
just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3

Filter to blog pages only

仅过滤博客页面

just-scrape crawl https://example.com -p "Extract article titles"
--rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50
just-scrape crawl https://example.com -p "Extract article titles"
--rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50

Raw markdown, no AI extraction (cheaper)

仅获取原始Markdown,不使用AI提取(成本更低)

just-scrape crawl https://example.com --no-extraction --max-pages 10
undefined
just-scrape crawl https://example.com --no-extraction --max-pages 10
undefined

Scrape

Scrape

Get raw HTML content from a URL.
bash
just-scrape scrape <url>
just-scrape scrape <url> --stealth          # +4 credits
just-scrape scrape <url> --branding         # extract logos/colors/fonts (+2 credits)
just-scrape scrape <url> --country-code <iso>
bash
just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --branding
从URL获取原始HTML内容。
bash
just-scrape scrape <url>
just-scrape scrape <url> --stealth          # 额外消耗4个积分
just-scrape scrape <url> --branding         # 提取Logo/颜色/字体(额外消耗2个积分)
just-scrape scrape <url> --country-code <iso>
bash
just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --branding

Agentic Scraper

Agentic Scraper

Browser automation with AI — login, click, navigate, fill forms. Steps are comma-separated strings.
bash
just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session   # persist browser session
bash
undefined
基于AI的浏览器自动化——支持登录、点击、导航、填写表单。步骤为逗号分隔的字符串。
bash
just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session   # 持久化浏览器会话
bash
undefined

Login + extract dashboard

登录并提取仪表盘数据

just-scrape agentic-scraper https://app.example.com/login
-s "Fill email with user@test.com,Fill password with secret,Click Sign In"
--ai-extraction -p "Extract all dashboard metrics"
just-scrape agentic-scraper https://app.example.com/login
-s "Fill email with user@test.com,Fill password with secret,Click Sign In"
--ai-extraction -p "Extract all dashboard metrics"

Multi-step form

多步骤表单填写

just-scrape agentic-scraper https://example.com/wizard
-s "Click Next,Select Premium plan,Fill name with John,Click Submit"
just-scrape agentic-scraper https://example.com/wizard
-s "Click Next,Select Premium plan,Fill name with John,Click Submit"

Persistent session across runs

跨运行会话持久化

just-scrape agentic-scraper https://app.example.com
-s "Click Settings" --use-session
undefined
just-scrape agentic-scraper https://app.example.com
-s "Click Settings" --use-session
undefined

Generate Schema

Generate Schema

Generate a JSON schema from a natural language description.
bash
just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>
bash
just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"
根据自然语言描述生成JSON Schema。
bash
just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>
bash
just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"

Refine an existing schema

优化现有Schema

just-scrape generate-schema "Add an availability field"
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'
undefined
just-scrape generate-schema "Add an availability field"
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'
undefined

Sitemap

Sitemap

Get all URLs from a website's sitemap.
bash
just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'
从网站的站点地图获取所有URL。
bash
just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'

History

History

Browse request history. Interactive by default (arrow keys to navigate, select to view details).
bash
just-scrape history <service>                     # interactive browser
just-scrape history <service> <request-id>        # specific request
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n>     # max 100
just-scrape history <service> --json
Services:
markdownify
,
smartscraper
,
searchscraper
,
scrape
,
crawl
,
agentic-scraper
,
sitemap
bash
just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'
浏览历史请求记录。默认以交互式方式展示(使用方向键导航,选中可查看详情)。
bash
just-scrape history <service>                     # 交互式浏览器
just-scrape history <service> <request-id>        # 查看特定请求
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n>     # 最大100条
just-scrape history <service> --json
支持的服务类型:
markdownify
,
smartscraper
,
searchscraper
,
scrape
,
crawl
,
agentic-scraper
,
sitemap
bash
just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'

Credits & Validate

积分查询与密钥验证

bash
just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validate
bash
just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validate

Common Patterns

常见使用模式

Generate schema then scrape with it

先生成Schema再进行抓取

bash
just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"
bash
just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"

Pipe JSON for scripting

管道传输JSON用于脚本编写

bash
just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
  just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
done
bash
just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
  just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
done

Protected sites

受保护站点处理

bash
undefined
bash
undefined

JS-heavy SPA behind Cloudflare

受Cloudflare保护的JS单页应用

just-scrape smart-scraper https://protected.example.com -p "Extract data" --stealth
just-scrape smart-scraper https://protected.example.com -p "Extract data" --stealth

With custom cookies/headers

携带自定义Cookie/请求头

just-scrape smart-scraper https://example.com -p "Extract data"
--cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'
undefined
just-scrape smart-scraper https://example.com -p "Extract data"
--cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'
undefined

Credit Costs

积分消耗说明

FeatureExtra Credits
--stealth
+4 per request
--branding
(scrape only)
+2
search-scraper
extraction
10 per request
search-scraper --no-extraction
2 per request
crawl --no-extraction
2 per page
功能额外消耗积分
--stealth
每次请求+4
--branding
(仅scrape命令支持)
+2
search-scraper
数据提取
每次请求10
search-scraper --no-extraction
每次请求2
crawl --no-extraction
每页2

Environment Variables

环境变量

bash
SGAI_API_KEY=sgai-...              # API key
JUST_SCRAPE_TIMEOUT_S=300          # Request timeout in seconds (default 120)
JUST_SCRAPE_DEBUG=1                # Debug logging to stderr
bash
SGAI_API_KEY=sgai-...              # API密钥
JUST_SCRAPE_TIMEOUT_S=300          # 请求超时时间(秒,默认120)
JUST_SCRAPE_DEBUG=1                # 开启调试日志(输出到stderr)