just-scrape
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWeb Scraping with just-scrape
使用just-scrape进行网页抓取
AI-powered web scraping CLI by ScrapeGraph AI. Get an API key at dashboard.scrapegraphai.com.
由ScrapeGraph AI开发的AI驱动型网页抓取CLI工具。请前往dashboard.scrapegraphai.com获取API密钥。
Setup
安装配置
Always install or run the version to ensure you have the most recent features and fixes.
@latestbash
npm install -g just-scrape@latest # npm
pnpm add -g just-scrape@latest # pnpm
yarn global add just-scrape@latest # yarn
bun add -g just-scrape@latest # bun
npx just-scrape@latest --help # run without installing
bunx just-scrape@latest --help # run without installing (bun)bash
export SGAI_API_KEY="sgai-..."API key resolution order: env var → file → → interactive prompt (saves to config).
SGAI_API_KEY.env~/.scrapegraphai/config.json请始终安装或运行版本,以确保获取最新功能和修复补丁。
@latestbash
npm install -g just-scrape@latest # npm
pnpm add -g just-scrape@latest # pnpm
yarn global add just-scrape@latest # yarn
bun add -g just-scrape@latest # bun
npx just-scrape@latest --help # 无需安装直接运行
bunx just-scrape@latest --help # 无需安装直接运行(bun)bash
export SGAI_API_KEY="sgai-..."API密钥优先级顺序:环境变量 → 文件 → → 交互式提示(会保存到配置文件)。
SGAI_API_KEY.env~/.scrapegraphai/config.jsonCommand Selection
命令选择
| Need | Command |
|---|---|
| Extract structured data from a known URL | |
| Search the web and extract from results | |
| Convert a page to clean markdown | |
| Crawl multiple pages from a site | |
| Get raw HTML | |
| Automate browser actions (login, click, fill) | |
| Generate a JSON schema from description | |
| Get all URLs from a sitemap | |
| Check credit balance | |
| Browse past requests | |
| Validate API key | |
| 需求 | 命令 |
|---|---|
| 从已知URL提取结构化数据 | |
| 搜索网络并从结果中提取数据 | |
| 将网页转换为简洁的Markdown格式 | |
| 爬取网站的多个页面 | |
| 获取原始HTML内容 | |
| 自动化浏览器操作(登录、点击、填写表单) | |
| 根据描述生成JSON Schema | |
| 从站点地图获取所有URL | |
| 查看积分余额 | |
| 浏览历史请求记录 | |
| 验证API密钥有效性 | |
Common Flags
通用参数
All commands support for machine-readable output (suppresses banner, spinners, prompts).
--jsonScraping commands share these optional flags:
- — bypass anti-bot detection (+4 credits)
--stealth - — custom HTTP headers as JSON string
--headers <json> - — enforce output JSON schema
--schema <json>
所有命令均支持参数以生成机器可读的输出(会隐藏横幅、加载动画和提示信息)。
--json抓取类命令支持以下可选参数:
- — 绕过反机器人检测(额外消耗4个积分)
--stealth - — 自定义HTTP请求头,格式为JSON字符串
--headers <json> - — 强制输出符合指定的JSON Schema
--schema <json>
Commands
详细命令说明
Smart Scraper
Smart Scraper
Extract structured data from any URL using AI.
bash
just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n> # infinite scroll (0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n> # multi-page (1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth # anti-bot (+4 credits)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-textbash
undefined使用AI从任意URL提取结构化数据。
bash
just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n> # 无限滚动(取值范围0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n> # 多页面抓取(取值范围1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth # 反机器人模式(额外消耗4个积分)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-textbash
undefinedE-commerce extraction
电商数据提取
just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"
just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"
Strict schema + scrolling
严格Schema校验+滚动加载
just-scrape smart-scraper https://news.example.com -p "Get headlines and dates"
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'
--scrolls 5
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'
--scrolls 5
just-scrape smart-scraper https://news.example.com -p "Get headlines and dates"
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'
--scrolls 5
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'
--scrolls 5
JS-heavy SPA behind anti-bot
受反机器人保护的JS单页应用
just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats"
--stealth
--stealth
undefinedjust-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats"
--stealth
--stealth
undefinedSearch Scraper
Search Scraper
Search the web and extract structured data from results.
bash
just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n> # sources to scrape (3-20, default 3)
just-scrape search-scraper <prompt> --no-extraction # markdown only (2 credits vs 10)
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>bash
undefined搜索网络并从结果中提取结构化数据。
bash
just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n> # 要抓取的结果源数量(3-20,默认3)
just-scrape search-scraper <prompt> --no-extraction # 仅生成Markdown格式(消耗2个积分,而非10个)
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>bash
undefinedResearch across sources
跨数据源调研
just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10
just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10
Cheap markdown-only
低成本仅获取Markdown内容
just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5
just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5
Structured output
结构化输出
just-scrape search-scraper "Top 5 cloud providers pricing"
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'
undefinedjust-scrape search-scraper "Top 5 cloud providers pricing"
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'
undefinedMarkdownify
Markdownify
Convert any webpage to clean markdown.
bash
just-scrape markdownify <url>
just-scrape markdownify <url> --stealth # +4 credits
just-scrape markdownify <url> --headers <json>bash
just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md将任意网页转换为简洁的Markdown格式。
bash
just-scrape markdownify <url>
just-scrape markdownify <url> --stealth # 额外消耗4个积分
just-scrape markdownify <url> --headers <json>bash
just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.mdCrawl
Crawl
Crawl multiple pages and extract data from each.
bash
just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n> # default 10
just-scrape crawl <url> -p <prompt> --depth <n> # default 1
just-scrape crawl <url> --no-extraction --max-pages <n> # markdown only (2 credits/page)
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json> # include_paths, same_domain
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealthbash
undefined爬取多个页面并从每个页面提取数据。
bash
just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n> # 最大爬取页面数(默认10)
just-scrape crawl <url> -p <prompt> --depth <n> # 爬取深度(默认1)
just-scrape crawl <url> --no-extraction --max-pages <n> # 仅生成Markdown格式(每页消耗2个积分)
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json> # 包含路径、同域名限制等规则
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealthbash
undefinedCrawl docs site
爬取文档站点
just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3
just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3
Filter to blog pages only
仅过滤博客页面
just-scrape crawl https://example.com -p "Extract article titles"
--rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50
--rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50
just-scrape crawl https://example.com -p "Extract article titles"
--rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50
--rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50
Raw markdown, no AI extraction (cheaper)
仅获取原始Markdown,不使用AI提取(成本更低)
just-scrape crawl https://example.com --no-extraction --max-pages 10
undefinedjust-scrape crawl https://example.com --no-extraction --max-pages 10
undefinedScrape
Scrape
Get raw HTML content from a URL.
bash
just-scrape scrape <url>
just-scrape scrape <url> --stealth # +4 credits
just-scrape scrape <url> --branding # extract logos/colors/fonts (+2 credits)
just-scrape scrape <url> --country-code <iso>bash
just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --branding从URL获取原始HTML内容。
bash
just-scrape scrape <url>
just-scrape scrape <url> --stealth # 额外消耗4个积分
just-scrape scrape <url> --branding # 提取Logo/颜色/字体(额外消耗2个积分)
just-scrape scrape <url> --country-code <iso>bash
just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --brandingAgentic Scraper
Agentic Scraper
Browser automation with AI — login, click, navigate, fill forms. Steps are comma-separated strings.
bash
just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session # persist browser sessionbash
undefined基于AI的浏览器自动化——支持登录、点击、导航、填写表单。步骤为逗号分隔的字符串。
bash
just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session # 持久化浏览器会话bash
undefinedLogin + extract dashboard
登录并提取仪表盘数据
just-scrape agentic-scraper https://app.example.com/login
-s "Fill email with user@test.com,Fill password with secret,Click Sign In"
--ai-extraction -p "Extract all dashboard metrics"
-s "Fill email with user@test.com,Fill password with secret,Click Sign In"
--ai-extraction -p "Extract all dashboard metrics"
just-scrape agentic-scraper https://app.example.com/login
-s "Fill email with user@test.com,Fill password with secret,Click Sign In"
--ai-extraction -p "Extract all dashboard metrics"
-s "Fill email with user@test.com,Fill password with secret,Click Sign In"
--ai-extraction -p "Extract all dashboard metrics"
Multi-step form
多步骤表单填写
just-scrape agentic-scraper https://example.com/wizard
-s "Click Next,Select Premium plan,Fill name with John,Click Submit"
-s "Click Next,Select Premium plan,Fill name with John,Click Submit"
just-scrape agentic-scraper https://example.com/wizard
-s "Click Next,Select Premium plan,Fill name with John,Click Submit"
-s "Click Next,Select Premium plan,Fill name with John,Click Submit"
Persistent session across runs
跨运行会话持久化
just-scrape agentic-scraper https://app.example.com
-s "Click Settings" --use-session
-s "Click Settings" --use-session
undefinedjust-scrape agentic-scraper https://app.example.com
-s "Click Settings" --use-session
-s "Click Settings" --use-session
undefinedGenerate Schema
Generate Schema
Generate a JSON schema from a natural language description.
bash
just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>bash
just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"根据自然语言描述生成JSON Schema。
bash
just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>bash
just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"Refine an existing schema
优化现有Schema
just-scrape generate-schema "Add an availability field"
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'
undefinedjust-scrape generate-schema "Add an availability field"
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'
undefinedSitemap
Sitemap
Get all URLs from a website's sitemap.
bash
just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'从网站的站点地图获取所有URL。
bash
just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'History
History
Browse request history. Interactive by default (arrow keys to navigate, select to view details).
bash
just-scrape history <service> # interactive browser
just-scrape history <service> <request-id> # specific request
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n> # max 100
just-scrape history <service> --jsonServices: , , , , , ,
markdownifysmartscrapersearchscraperscrapecrawlagentic-scrapersitemapbash
just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'浏览历史请求记录。默认以交互式方式展示(使用方向键导航,选中可查看详情)。
bash
just-scrape history <service> # 交互式浏览器
just-scrape history <service> <request-id> # 查看特定请求
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n> # 最大100条
just-scrape history <service> --json支持的服务类型:, , , , , ,
markdownifysmartscrapersearchscraperscrapecrawlagentic-scrapersitemapbash
just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'Credits & Validate
积分查询与密钥验证
bash
just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validatebash
just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validateCommon Patterns
常见使用模式
Generate schema then scrape with it
先生成Schema再进行抓取
bash
just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"bash
just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"Pipe JSON for scripting
管道传输JSON用于脚本编写
bash
just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
donebash
just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
doneProtected sites
受保护站点处理
bash
undefinedbash
undefinedJS-heavy SPA behind Cloudflare
受Cloudflare保护的JS单页应用
just-scrape smart-scraper https://protected.example.com -p "Extract data" --stealth
just-scrape smart-scraper https://protected.example.com -p "Extract data" --stealth
With custom cookies/headers
携带自定义Cookie/请求头
just-scrape smart-scraper https://example.com -p "Extract data"
--cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'
--cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'
undefinedjust-scrape smart-scraper https://example.com -p "Extract data"
--cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'
--cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'
undefinedCredit Costs
积分消耗说明
| Feature | Extra Credits |
|---|---|
| +4 per request |
| +2 |
| 10 per request |
| 2 per request |
| 2 per page |
| 功能 | 额外消耗积分 |
|---|---|
| 每次请求+4 |
| +2 |
| 每次请求10 |
| 每次请求2 |
| 每页2 |
Environment Variables
环境变量
bash
SGAI_API_KEY=sgai-... # API key
JUST_SCRAPE_TIMEOUT_S=300 # Request timeout in seconds (default 120)
JUST_SCRAPE_DEBUG=1 # Debug logging to stderrbash
SGAI_API_KEY=sgai-... # API密钥
JUST_SCRAPE_TIMEOUT_S=300 # 请求超时时间(秒,默认120)
JUST_SCRAPE_DEBUG=1 # 开启调试日志(输出到stderr)