firecrawl-monitor

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

firecrawl monitor

firecrawl 监控工具

Detect when content on a website changes and get notified by webhook or email. Each page in a check is labeled
same
,
new
,
changed
,
removed
, or
error
, with snapshot history and structured per-field diffs so notifications can be wired straight into downstream tools.
检测网站内容何时发生变化,并通过webhook或邮件接收通知。每次检查中的每个页面都会标记为
same
(无变化)、
new
(新增)、
changed
(已变更)、
removed
(已移除)或
error
(错误),同时提供快照历史和结构化字段级差异,可直接将通知接入下游工具。

When to use

使用场景

  • The user wants to know when something changes — and be notified about it — not just read what the page says right now
  • Ongoing change detection on any URL: pricing, docs, changelogs, blogs, job boards, status pages, competitor sites, regulatory pages, product availability, hiring pages, top-N rankings (HN, leaderboards, etc.)
  • "Alert me when...", "notify me when...", "email me if...", "send a webhook when...", "ping me if X changes", "track this page"
  • Anywhere the user would otherwise wire up cron + a scraper + a diff library + SMTP themselves
  • Step 5 in the workflow escalation pattern: search → scrape → map → crawl → monitor → interact
Bias toward
monitor
whenever the request implies notifications or recurrence. A single page read once =
scrape
. A single page where the user wants to be told when it changes =
monitor --page <url> --goal "..." --email|--webhook-url ...
.
  • 用户想要知道某事何时发生变化,并希望得到通知,而不只是查看页面当前的内容
  • 对任意URL进行持续变更检测:定价、文档、变更日志、博客、招聘网站、状态页面、竞品网站、监管页面、产品库存、招聘页面、Top-N排名(如HN、排行榜等)
  • “提醒我当……”“通知我当……”“如果……发邮件给我”“当……时发送webhook”“如果X变化请ping我”“跟踪这个页面”
  • 任何用户原本需要自行搭建cron+爬虫+差异库+SMTP的场景
  • 工作流升级模式的第5步:搜索→爬取→映射→抓取→监控→交互
当请求涉及通知或定期检查时,优先推荐
monitor
。一次性读取单个页面 =
scrape
。用户希望在单个页面变化时收到通知 =
monitor --page <url> --goal "..." --email|--webhook-url ...

Why use a monitor

为何使用监控工具

  • Change-detection-as-a-service. Firecrawl handles fetching, diffing, judging, and notifying — all server-side. No cron, no diff library, no SMTP setup, no snapshot DB to manage.
  • Notifications first. Webhooks (
    monitor.page
    as each page finishes,
    monitor.check.completed
    after the check is reconciled) and email summaries that only fire when something actually changed or errored. External recipients confirm via per-recipient opt-in.
  • AI noise filter via
    --goal
    .
    Set a plain-language goal and the change judge ignores formatting, whitespace, casing, punctuation, encoding, request/session IDs, cache busters, tracking params, generic metadata, and unrelated page chrome — so notifications are about content the user actually cares about, not page churn.
  • Structured per-field diffs. JSON-mode change tracking returns keyed diffs like
    plans[0].price: "$19/mo" → "$24/mo"
    instead of a wall of unified diff. Drops straight into a Slack message, CI step, or internal tool.
  • Simple page-status model. Each page in a check returns
    same
    ,
    new
    ,
    changed
    ,
    removed
    , or
    error
    . Easy to filter, easy to act on.
  • Snapshot history without infra. Point-in-time snapshots are kept for diffing via
    --retention-days
    ; no storage to provision.
  • Watch many things at once. One monitor can watch many pages or diff every page discovered by a recurring site crawl.
  • No scheduling glue. Cron normalization and
    nextRunAt
    are computed for you, with natural-language schedules supported (
    "every 30 minutes"
    ,
    "hourly"
    ,
    "daily at 9:00"
    ).
  • 变更检测即服务:Firecrawl 负责处理抓取、差异对比、判断和通知——全部在服务端完成。无需cron任务、无需差异库、无需SMTP配置、无需管理快照数据库。
  • 优先支持通知:Webhook(每个页面完成时触发
    monitor.page
    ,检查完成后触发
    monitor.check.completed
    )和邮件摘要仅在内容实际变化或出现错误时触发。外部收件人需通过每位收件人的确认链接选择接收通知。
  • 通过
    --goal
    实现AI降噪
    :设置自然语言目标后,变更判断器会忽略格式、空白字符、大小写、标点、编码、请求/会话ID、缓存清除参数、跟踪参数、通用元数据和无关页面框架——确保通知仅针对用户真正关心的内容,而非页面的无关变动。
  • 结构化字段级差异:JSON模式的变更跟踪会返回类似
    plans[0].price: "$19/mo" → "$24/mo"
    的键值对差异,而非大片的统一差异文本。可直接集成到Slack消息、CI步骤或内部工具中。
  • 简洁的页面状态模型:每次检查中的每个页面都会返回
    same
    new
    changed
    removed
    error
    状态。便于筛选和处理。
  • 无需基础设施的快照历史:通过
    --retention-days
    设置保留用于差异对比的时间点快照;无需配置存储。
  • 同时监控多个对象:一个监控任务可以监控多个页面,或者对定期网站抓取发现的每个页面进行差异对比。
  • 无需调度胶水代码:自动计算Cron规范化配置和
    nextRunAt
    时间,支持自然语言调度(如
    "every 30 minutes"
    "hourly"
    "daily at 9:00"
    )。

Quick start

快速开始

bash
undefined
bash
undefined

Single page, natural-language schedule, email alert

单页面、自然语言调度、邮件提醒

firecrawl monitor create --name "Blog" --schedule "every 30 minutes"
--goal "Alert when a new blog post is published."
--page https://example.com/blog
--email alerts@example.com
firecrawl monitor create --name "Blog" --schedule "every 30 minutes"
--goal "Alert when a new blog post is published."
--page https://example.com/blog
--email alerts@example.com

Multiple pages, one monitor

多页面、单个监控任务

firecrawl monitor create --name "Product pages" --schedule "every 30 minutes"
--goal "Alert when pricing, docs, or changelog content changes."
--scrape-urls https://example.com/pricing,https://example.com/docs,https://example.com/changelog
firecrawl monitor create --name "Product pages" --schedule "every 30 minutes"
--goal "Alert when pricing, docs, or changelog content changes."
--scrape-urls https://example.com/pricing,https://example.com/docs,https://example.com/changelog

Whole-site crawl per check (every discovered page is diffed)

每次检查抓取整个站点(发现的每个页面都会进行差异对比)

firecrawl monitor create --name "Docs site" --schedule "hourly"
--goal "Alert when any docs page is added, removed, or substantively changed."
--crawl-url https://docs.example.com
firecrawl monitor create --name "Docs site" --schedule "hourly"
--goal "Alert when any docs page is added, removed, or substantively changed."
--crawl-url https://docs.example.com

Webhook notifications

Webhook通知

firecrawl monitor create --name "Docs webhook" --schedule "every 30 minutes"
--goal "Alert when docs content changes."
--page https://example.com/docs
--webhook-url https://example.com/hook
--webhook-events monitor.page,monitor.check.completed
firecrawl monitor create --name "Docs webhook" --schedule "every 30 minutes"
--goal "Alert when docs content changes."
--page https://example.com/docs
--webhook-url https://example.com/hook
--webhook-events monitor.page,monitor.check.completed

Manage and inspect

管理和查看

firecrawl monitor list --limit 20 firecrawl monitor get <monitorId> firecrawl monitor run <monitorId> # trigger a check now firecrawl monitor checks <monitorId> # list all checks firecrawl monitor check <monitorId> <checkId> --page-status changed firecrawl monitor update <monitorId> --state paused firecrawl monitor delete <monitorId>

Subcommands: `create | list | get | update | delete | run | checks | check`.
firecrawl monitor list --limit 20 firecrawl monitor get <monitorId> firecrawl monitor run <monitorId> # 立即触发一次检查 firecrawl monitor checks <monitorId> # 列出所有检查记录 firecrawl monitor check <monitorId> <checkId> --page-status changed firecrawl monitor update <monitorId> --state paused firecrawl monitor delete <monitorId>

子命令:`create | list | get | update | delete | run | checks | check`。

Options

选项

OptionDescription
--name <name>
Monitor name (required on create)
--goal <text>
Plain-language change goal (auto-enables the AI change judge)
--schedule <text>
Natural-language schedule (
every 30 minutes
,
hourly
,
daily
)
--cron <expression>
Cron schedule (e.g.
*/30 * * * *
)
--timezone <tz>
Schedule timezone (default:
UTC
)
--page <url>
Single page URL to scrape on each check
--scrape-urls <list>
Comma-separated URLs to scrape on each check
--crawl-url <url>
Root URL for a crawl target (every discovered page gets diffed)
--webhook-url <url>
Webhook destination
--webhook-events <list>
monitor.page
,
monitor.check.completed
(comma-separated)
--email <list>
Comma-separated email recipients
--retention-days <n>
Snapshot retention window
--state <state>
active
or
paused
(update only — use
--state
, not
--status
)
--page-status <state>
Filter
check
results:
same
,
new
,
changed
,
removed
,
error
-o, --output <path>
Output file path
--pretty
Pretty-print JSON output
Minimum schedule interval is 15 minutes. Monitoring is not available for zero-data-retention teams.
选项说明
--name <name>
监控任务名称(创建时必填)
--goal <text>
自然语言描述的变更目标(自动启用AI变更判断器)
--schedule <text>
自然语言调度规则(
every 30 minutes
hourly
daily
--cron <expression>
Cron调度表达式(例如
*/30 * * * *
--timezone <tz>
调度时区(默认:
UTC
--page <url>
每次检查时要爬取的单个页面URL
--scrape-urls <list>
每次检查时要爬取的URL列表(逗号分隔)
--crawl-url <url>
抓取目标的根URL(发现的每个页面都会进行差异对比)
--webhook-url <url>
Webhook目标地址
--webhook-events <list>
触发Webhook的事件:
monitor.page
monitor.check.completed
(逗号分隔)
--email <list>
收件人邮箱列表(逗号分隔)
--retention-days <n>
快照保留天数
--state <state>
任务状态:
active
(活跃)或
paused
(暂停)(仅用于更新操作——使用
--state
,而非
--status
--page-status <state>
筛选检查结果的页面状态:
same
new
changed
removed
error
-o, --output <path>
输出文件路径
--pretty
格式化输出JSON内容
最小调度间隔为15分钟零数据保留团队无法使用监控功能

Writing a good
--goal

编写有效的
--goal

The goal is what the AI change judge uses to decide whether a page is
changed
vs
same
. Convert the user's intent into a concise 2-3 sentence goal:
  • Start with
    Alert when ...
    and state the trigger using the user's wording.
  • Restate any scope they mentioned: top N, price, role type, region, company, topic, status, or a specific entity.
  • Add an
    Ignore ...
    sentence only for intent-specific exclusions (e.g. points/comments for rankings, marketing copy for pricing, general company-page updates for job listings).
  • Do not repeat generic noise exclusions — the judge already handles whitespace, casing, punctuation, encoding, formatting-only changes, request/session IDs, cache busters, tracking params, generic metadata noise, and unrelated page chrome.
  • Don't invent page-specific sections, entities, thresholds, exclusions, or business rules unless the user mentioned them.
  • If the user is vague or asks for "any change", keep the goal broad and don't add exclusions.
User saysGood goal
top 10 hackernews stories
Alert when stories enter, leave, or change rank within the Hacker News top 10. Ignore points, comments, and timestamps. Do not alert on changes outside the top 10.
pricing changes
Alert when pricing information changes, including prices, plan names, billing periods, tiers, limits, or included features. Ignore unrelated marketing copy.
new engineering roles
Alert when a new engineering role is posted. Ignore general company-page updates unless they add, remove, or change an engineering role.
track this page
Alert when substantive visible content on this page changes.
any change
Alert when any visible page content changes, including copy, numbers, timestamps, counters, links, and layout text.
目标是AI变更判断器用于区分页面是
changed
还是
same
的依据。将用户的意图转化为简洁的2-3句话目标:
  • Alert when ...
    开头,使用用户的表述说明触发条件。
  • 重述用户提到的任何范围:Top N、价格、职位类型、地区、公司、主题、状态或特定实体。
  • 仅在需要针对特定意图排除内容时添加
    Ignore ...
    语句(例如,排名中的点赞数/评论数、定价页面中的无关营销文案、招聘列表中的通用公司页面更新)。
  • 不要重复通用的排除规则——判断器已自动处理空白字符、大小写、标点、编码、仅格式变化、请求/会话ID、缓存清除参数、跟踪参数、通用元数据噪声和无关页面框架。
  • 除非用户提及,否则不要自行添加页面特定的部分、实体、阈值、排除规则或业务逻辑。
  • 如果用户表述模糊或要求“任何变化”,则保持目标宽泛,不要添加排除规则。
用户表述有效目标
top 10 hackernews stories
Alert when stories enter, leave, or change rank within the Hacker News top 10. Ignore points, comments, and timestamps. Do not alert on changes outside the top 10.
pricing changes
Alert when pricing information changes, including prices, plan names, billing periods, tiers, limits, or included features. Ignore unrelated marketing copy.
new engineering roles
Alert when a new engineering role is posted. Ignore general company-page updates unless they add, remove, or change an engineering role.
track this page
Alert when substantive visible content on this page changes.
any change
Alert when any visible page content changes, including copy, numbers, timestamps, counters, links, and layout text.

JSON-mode change tracking (structured per-field diffs)

JSON模式变更跟踪(结构化字段级差异)

By default monitors diff each page's markdown and return a unified text diff. When the user cares about specific structured fields (price, headline, in-stock flag, items in a list), use JSON-mode change tracking. The CLI flags don't cover this — pass a JSON body via positional file or piped stdin:
bash
cat > pricing-monitor.json <<'EOF'
{
  "name": "Pricing watch",
  "goal": "Alert when plan prices or headline features change.",
  "schedule": { "text": "hourly", "timezone": "UTC" },
  "targets": [{
    "type": "scrape",
    "urls": ["https://example.com/pricing"],
    "scrapeOptions": {
      "formats": [{
        "type": "changeTracking",
        "modes": ["json"],
        "prompt": "Extract pricing tiers and headline features for each plan.",
        "schema": {
          "type": "object",
          "properties": {
            "plans": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name":     { "type": "string" },
                  "price":    { "type": "string" },
                  "features": { "type": "array", "items": { "type": "string" } }
                }
              }
            }
          }
        }
      }]
    }
  }]
}
EOF
firecrawl monitor create pricing-monitor.json
默认情况下,监控工具会对比每个页面的markdown内容并返回统一文本差异。当用户关注特定结构化字段(价格、标题、库存状态、列表项)时,使用JSON模式变更跟踪。CLI标志不支持此功能——需通过位置文件或管道输入的标准输入传递JSON体:
bash
cat > pricing-monitor.json <<'EOF'
{
  "name": "Pricing watch",
  "goal": "Alert when plan prices or headline features change.",
  "schedule": { "text": "hourly", "timezone": "UTC" },
  "targets": [{
    "type": "scrape",
    "urls": ["https://example.com/pricing"],
    "scrapeOptions": {
      "formats": [{
        "type": "changeTracking",
        "modes": ["json"],
        "prompt": "Extract pricing tiers and headline features for each plan.",
        "schema": {
          "type": "object",
          "properties": {
            "plans": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name":     { "type": "string" },
                  "price":    { "type": "string" },
                  "features": { "type": "array", "items": { "type": "string" } }
                }
              }
            }
          }
        }
      }]
    }
  }]
}
EOF
firecrawl monitor create pricing-monitor.json

or: cat pricing-monitor.json | firecrawl monitor create

or: cat pricing-monitor.json | firecrawl monitor create


Each changed page in the check response then carries a per-field diff plus a snapshot of the current full extraction:

```json
{
  "url": "https://example.com/pricing",
  "status": "changed",
  "diff": {
    "json": {
      "plans[0].price": { "previous": "$19/mo", "current": "$24/mo" },
      "plans[1].features[2]": {
        "previous": "10 GB storage",
        "current": "25 GB storage"
      }
    }
  },
  "snapshot": {
    "json": {
      "plans": [
        /* current full extraction */
      ]
    }
  }
}
Use
modes: ["json", "git-diff"]
for mixed mode — you get both
diff.json
(per-field) and
diff.text
(markdown sidecar), and the page is marked
changed
whenever either surface changed.

检查响应中每个已变更的页面会携带字段级差异以及当前完整提取内容的快照:

```json
{
  "url": "https://example.com/pricing",
  "status": "changed",
  "diff": {
    "json": {
      "plans[0].price": { "previous": "$19/mo", "current": "$24/mo" },
      "plans[1].features[2]": {
        "previous": "10 GB storage",
        "current": "25 GB storage"
      }
    }
  },
  "snapshot": {
    "json": {
      "plans": [
        /* current full extraction */
      ]
    }
  }
}
使用
modes: ["json", "git-diff"]
启用混合模式——你会同时获得
diff.json
(字段级)和
diff.text
(markdown辅助差异),只要其中任意一种检测到变化,页面就会标记为
changed

Tips

小贴士

  • Prefer one monitor over repeated one-off scrapes whenever the user wants the same URL checked more than once.
  • Use
    --state paused
    (via
    update
    ), not
    delete
    , when temporarily silencing a monitor.
  • --retention-days
    controls how long snapshots are kept for diffing. Lower it for high-frequency monitors to save storage.
  • External email recipients must opt in. First time they're added, Firecrawl sends a confirmation email and they only receive alerts after they confirm. Team-owned email addresses are auto-confirmed. Once a recipient unsubscribes, they must be re-added by the owner to get a fresh confirmation email.
  • firecrawl monitor run <id>
    triggers a check immediately — useful for smoke-testing a monitor right after creating it without waiting for the next scheduled run.
  • Filter check pages with
    --page-status changed
    (or
    new
    ,
    removed
    ,
    error
    ) to skip the noise from
    same
    pages.
  • Use
    --page-status
    (not
    --status
    )
    when filtering check pages —
    --status
    is reserved for the global CLI status flag.
  • Monitor-triggered scrapes default
    maxAge
    to
    0
    — every check performs a fresh scrape unless
    scrapeOptions.maxAge
    is set explicitly in a JSON payload.
  • 当用户需要多次检查同一URL时,优先使用一个监控任务而非重复的一次性爬取
  • 临时暂停监控任务时,使用
    update
    命令的
    --state paused
    参数,而非
    delete
    命令
  • **
    --retention-days
    **控制用于差异对比的快照保留时长。对于高频监控任务,降低此值以节省存储空间。
  • 外部邮件收件人必须选择接收。首次添加时,Firecrawl会发送确认邮件,收件人确认后才会收到提醒。团队自有邮箱地址会自动确认。一旦收件人取消订阅,必须由所有者重新添加才能获取新的确认邮件。
  • **
    firecrawl monitor run <id>
    **会立即触发一次检查——便于创建监控任务后立即进行冒烟测试,无需等待下一次调度运行。
  • 使用
    --page-status changed
    (或
    new
    removed
    error
    )筛选检查页面
    ,以跳过
    same
    页面的无关信息。
  • 筛选检查页面时使用
    --page-status
    (而非
    --status
    ——
    --status
    是全局CLI状态标志的保留参数。
  • 监控触发的爬取默认
    maxAge
    0
    ——除非在JSON payload中明确设置
    scrapeOptions.maxAge
    ,否则每次检查都会执行全新的爬取。

See also

另请参阅

  • firecrawl-scrape — one-off scrape; escalate to
    monitor
    when checks become recurring
  • firecrawl-crawl — one-off crawl; pair with
    --crawl-url
    here for recurring crawl diffs
  • firecrawl-cli — top-level workflow guide
  • firecrawl-scrape — 一次性爬取;当检查变为定期任务时升级为
    monitor
  • firecrawl-crawl — 一次性抓取;搭配此处的
    --crawl-url
    实现定期抓取差异对比
  • firecrawl-cli — 顶级工作流指南