tuning-incremental-sync-config

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Tuning incremental sync config

调整增量同步配置

A sync's configuration lives on the
ExternalDataSchema
and can be changed any time via
external-data-schemas-partial-update
. Most changes are non-destructive (take effect on the next sync), but a few (switching sync_type, changing primary keys) require careful handling to avoid corrupting the synced data.
同步配置存储在
ExternalDataSchema
中,可随时通过
external-data-schemas-partial-update
进行修改。大多数修改是非破坏性的(会在下一次同步时生效),但少数修改(切换sync_type、更改主键)需要谨慎处理,以免损坏已同步的数据。

When to use this skill

何时使用该技能

  • The user wants to change how an already-connected table is synced
  • A diagnosis flagged the incremental field or primary key as wrong
  • The table is syncing too often / not often enough
  • Switching an incremental table to CDC (or vice versa)
  • The source table was changed on the other side (new columns, dropped columns) and the sync config needs to catch up
If the user is setting up a brand-new source, use
setting-up-a-data-warehouse-source
instead — configuration is chosen at creation time there.
  • 用户希望更改已连接表的同步方式
  • 诊断结果显示增量字段或主键配置错误
  • 表同步过于频繁/不够频繁
  • 将增量表切换为CDC(或反之)
  • 源表发生变更(新增列、删除列),需要同步配置跟上变更
如果用户正在设置全新的数据源,请使用
setting-up-a-data-warehouse-source
——配置需在创建时选定。

Available tools

可用工具

ToolPurpose
external-data-schemas-retrieve
Current sync_type, incremental_field, PKs, sync_frequency
external-data-schemas-incremental-fields-create
Refresh candidate incremental fields from the live source
external-data-schemas-partial-update
Apply the config change
external-data-schemas-reload
Trigger a sync with the new config
external-data-schemas-resync
Wipe and re-import from scratch when the change invalidates existing data
external-data-schemas-delete-data
Drop the synced table while keeping the schema entry
external-data-sources-check-cdc-prerequisites-create
Pre-flight Postgres CDC (only when switching to/from CDC)
external-data-sources-webhook-info-retrieve
Current webhook state (when switching to/from sync_type=webhook)
external-data-sources-create-webhook-create
Register a webhook after switching a schema to sync_type=webhook
external-data-sources-update-webhook-inputs-create
Rotate a webhook signing secret
external-data-sources-delete-webhook-create
Unregister webhook when switching schemas off sync_type=webhook
工具名称用途
external-data-schemas-retrieve
获取当前sync_type、incremental_field、主键、sync_frequency
external-data-schemas-incremental-fields-create
从实时数据源刷新候选增量字段
external-data-schemas-partial-update
应用配置变更
external-data-schemas-reload
使用新配置触发同步
external-data-schemas-resync
当变更使现有数据失效时,清空并重新导入数据
external-data-schemas-delete-data
删除已同步的表,同时保留模式条目
external-data-sources-check-cdc-prerequisites-create
切换CDC前后的Postgres CDC预检查(仅在切换CDC时使用)
external-data-sources-webhook-info-retrieve
获取当前webhook状态(切换sync_type=webhook时使用)
external-data-sources-create-webhook-create
将模式切换为sync_type=webhook后注册webhook
external-data-sources-update-webhook-inputs-create
轮换webhook签名密钥
external-data-sources-delete-webhook-create
将模式从sync_type=webhook切换后注销webhook

The fields you can tune

可调整的字段

From the partial-update endpoint:
FieldValuesNotes
sync_type
full_refresh
,
incremental
,
append
,
cdc
,
webhook
Source must support the target type — check via incremental-fields
incremental_field
Column name from the sourceMust appear in
incremental_fields
list for the schema
incremental_field_type
datetime
,
date
,
timestamp
,
integer
,
numeric
,
objectid
Must match the column's real type
primary_key_columns
Array of column namesRequired for CDC. Used for upsert dedup on incremental
cdc_table_mode
consolidated
,
cdc_only
,
both
Only meaningful when sync_type=cdc
sync_frequency
1min
,
5min
,
15min
,
30min
,
1hour
,
6hour
,
12hour
,
24hour
,
7day
,
30day
,
never
Applies to all non-CDC types
sync_time_of_day
HH:MM:SS
When sync_frequency is daily/weekly-scale
should_sync
true
/
false
Pause the schema without deleting it
通过partial-update端点可调整以下字段:
字段名称可选值说明
sync_type
full_refresh
,
incremental
,
append
,
cdc
,
webhook
数据源必须支持目标类型——通过incremental-fields工具检查
incremental_field
数据源中的列名必须存在于该模式的
incremental_fields
列表中
incremental_field_type
datetime
,
date
,
timestamp
,
integer
,
numeric
,
objectid
必须与列的实际类型匹配
primary_key_columns
列名数组CDC模式必填,用于增量同步时的去重更新
cdc_table_mode
consolidated
,
cdc_only
,
both
仅在sync_type=cdc时生效
sync_frequency
1min
,
5min
,
15min
,
30min
,
1hour
,
6hour
,
12hour
,
24hour
,
7day
,
30day
,
never
适用于所有非CDC类型的同步
sync_time_of_day
HH:MM:SS
当sync_frequency为每日/每周级别的频率时生效
should_sync
true
/
false
暂停模式同步,不删除配置

Workflow

工作流程

Step 1 — Read the current config

步骤1 — 读取当前配置

Always start with
external-data-schemas-retrieve({id})
. Understanding the current state prevents mistakes like "fixing" an incremental_field that's actually correct.
Note:
  • Current
    sync_type
    ,
    incremental_field
    ,
    incremental_field_type
    ,
    primary_key_columns
  • Current
    status
    (don't tune a schema that's currently
    Running
    — wait or cancel first)
  • last_synced_at
    (so you can tell if the next sync worked)
  • latest_error
    if present (the error often tells you exactly what to change)
始终从调用
external-data-schemas-retrieve({id})
开始。了解当前状态可避免错误,比如“修复”原本正确的incremental_field。
注意:
  • 当前的
    sync_type
    incremental_field
    incremental_field_type
    primary_key_columns
  • 当前
    status
    (不要调整正在
    Running
    的模式——等待其完成或先取消)
  • last_synced_at
    (用于判断下一次同步是否成功)
  • 若存在
    latest_error
    (错误信息通常会明确告知需要修改的内容)

Step 2 — If changing sync_type or incremental_field, refresh candidates

步骤2 — 若切换sync_type或incremental_field,刷新候选字段

Call
external-data-schemas-incremental-fields-create({id})
. Even though the operation name says "create", it re-reads the source and returns the current candidate fields — use it to confirm the field you want to set actually exists on the source and which sync types are now available for this table.
The response:
text
{
  "incremental_fields": [{"field": "updated_at", "type": "datetime", ...}, ...],
  "incremental_available": true,
  "append_available": true,
  "cdc_available": true,
  "full_refresh_available": true,
  "detected_primary_keys": ["id"],
  "available_columns": [...]
}
If your target
incremental_field
isn't in the list, tell the user — they need to either pick a different field or change the source table to add one.
调用
external-data-schemas-incremental-fields-create({id})
。尽管操作名称是“create”,但它会重新读取数据源并返回当前候选字段——用于确认你要设置的字段确实存在于数据源中,以及该表当前支持哪些同步类型。
返回示例:
text
{
  "incremental_fields": [{"field": "updated_at", "type": "datetime", ...}, ...],
  "incremental_available": true,
  "append_available": true,
  "cdc_available": true,
  "full_refresh_available": true,
  "detected_primary_keys": ["id"],
  "available_columns": [...]
}
如果目标
incremental_field
不在列表中,告知用户——他们需要选择其他字段或修改源表添加该字段。

Step 3 — Apply the change

步骤3 — 应用变更

Call
external-data-schemas-partial-update({id}, {...changed fields})
.
Only send the fields that are actually changing. Partial update means unspecified fields stay as they are.
Examples:
json
// Switch from full_refresh to incremental
{
  "sync_type": "incremental",
  "incremental_field": "updated_at",
  "incremental_field_type": "datetime"
}

// Change sync frequency to hourly
{"sync_frequency": "1hour"}

// Fix wrong PK on a CDC table
{"primary_key_columns": ["tenant_id", "order_id"]}

// Pause a schema
{"should_sync": false}
调用
external-data-schemas-partial-update({id}, {...changed fields})
仅发送实际需要修改的字段。部分更新意味着未指定的字段将保持原样。
示例:
json
// 从full_refresh切换为incremental
{
  "sync_type": "incremental",
  "incremental_field": "updated_at",
  "incremental_field_type": "datetime"
}

// 将同步频率改为每小时一次
{"sync_frequency": "1hour"}

// 修复CDC表的错误主键
{"primary_key_columns": ["tenant_id", "order_id"]}

// 暂停模式
{"should_sync": false}

Step 4 — Decide whether existing data is still valid

步骤4 — 判断现有数据是否仍有效

This is the step that's easy to get wrong. Some config changes invalidate the synced data; others don't.
Changes that DON'T invalidate existing data:
  • sync_frequency
    ,
    sync_time_of_day
    — scheduling only
  • should_sync
    — on/off
  • cdc_table_mode
    in most cases — next sync will start writing to the new shape, but historical consolidated rows stay valid
  • Switching between
    incremental
    and
    full_refresh
    with the same
    incremental_field
    — next sync just re-runs fresh
  • Switching to or from
    sync_type: "webhook"
    — the synced data stays valid; only the ingestion path changes. Remember to register or unregister the webhook (see sections below) alongside the sync_type change.
Changes that MAY invalidate existing data and need a resync:
  • Changing
    incremental_field
    to a different column — the high-water mark is from the old column and won't match. Without a resync you'll miss rows that were updated between the two fields' histories.
  • Changing
    primary_key_columns
    — existing rows may be deduplicated incorrectly against new PK definitions.
  • Switching from
    full_refresh
    to
    append
    — the existing rows don't have the version-history shape that append expects.
  • Switching from
    append
    to
    full_refresh
    — opposite problem; you'll end up with duplicate historical versions.
  • Switching to/from
    cdc
    — the table shape changes fundamentally.
When the change invalidates data, the clean flow is:
  1. external-data-schemas-partial-update
    with the new config
  2. Warn the user this is destructive
  3. external-data-schemas-resync
    to wipe and re-import under the new config
Or equivalently,
external-data-schemas-delete-data
external-data-schemas-reload
.
delete-data
+
reload
is cleaner when the table is large and the user wants to start from zero.
这一步很容易出错。部分配置变更会使已同步数据失效,部分则不会。
不会使现有数据失效的变更:
  • sync_frequency
    sync_time_of_day
    — 仅调整调度
  • should_sync
    — 开关控制
  • 大多数情况下的
    cdc_table_mode
    — 下一次同步将开始写入新格式,但历史合并行仍有效
  • 使用相同
    incremental_field
    incremental
    full_refresh
    之间切换 — 下一次同步只是重新运行全量
  • 切换至或从
    sync_type: "webhook"
    — 已同步数据保持有效;仅 ingestion 路径变更。记得在切换sync_type的同时注册或注销webhook(见下文章节)。
可能使现有数据失效且需要重新同步的变更:
  • incremental_field
    改为其他列 — 高水位标记来自旧列,无法匹配新列。若不重新同步,会遗漏两个字段历史之间更新的行。
  • 更改
    primary_key_columns
    — 现有行可能会根据新的主键定义被错误地去重。
  • full_refresh
    切换为
    append
    — 现有行不具备append模式所需的版本历史格式。
  • append
    切换为
    full_refresh
    — 相反的问题;会导致历史版本重复。
  • 切换至或从
    cdc
    — 表结构发生根本性变化。
当变更使数据失效时,正确流程为:
  1. 使用新配置调用
    external-data-schemas-partial-update
  2. 警告用户此操作具有破坏性
  3. 调用
    external-data-schemas-resync
    清空并在新配置下重新导入数据
或者等效操作:
external-data-schemas-delete-data
external-data-schemas-reload
。当表数据量较大且用户希望从零开始时,
delete-data
+
reload
更干净。

Step 5 — Trigger and confirm

步骤5 — 触发并确认

For non-destructive changes, call
external-data-schemas-reload({id})
to pick up the new config immediately rather than waiting for the schedule.
Wait a moment, then
external-data-schemas-retrieve({id})
to confirm
status = Running
then
Completed
. Report
last_synced_at
and any new
latest_error
.
对于非破坏性变更,调用
external-data-schemas-reload({id})
立即应用新配置,而非等待调度时间。
稍等片刻后,调用
external-data-schemas-retrieve({id})
确认
status = Running
然后变为
Completed
。向用户反馈
last_synced_at
和任何新的
latest_error

Specific common changes

常见特定变更场景

Switching full_refresh → incremental

从full_refresh切换为incremental

  1. incremental-fields-create
    to confirm the desired field exists and
    incremental_available: true
    .
  2. partial-update
    :
    {sync_type: "incremental", incremental_field, incremental_field_type}
    .
  3. No data wipe needed — next sync just switches strategy. If the source is growing fast, the next incremental sync is the cheap one.
  1. 调用
    incremental-fields-create
    确认目标字段存在且
    incremental_available: true
  2. 调用
    partial-update
    {sync_type: "incremental", incremental_field, incremental_field_type}
  3. 无需清空数据 — 下一次同步只需切换策略。若数据源增长迅速,下一次增量同步成本更低。

Switching incremental → cdc (Postgres only)

从incremental切换为cdc(仅Postgres)

  1. Run
    external-data-sources-check-cdc-prerequisites-create
    on the parent source. Only proceed if
    valid: true
    .
  2. incremental-fields-create
    to confirm
    cdc_available: true
    and see
    detected_primary_keys
    .
  3. partial-update
    :
    {sync_type: "cdc", primary_key_columns: [...], cdc_table_mode: "consolidated"}
    .
  4. Resync required — CDC tables have a different shape. Trigger
    external-data-schemas-resync
    after the update. Warn the user this wipes existing data.
  1. 在父数据源上运行
    external-data-sources-check-cdc-prerequisites-create
    。仅当
    valid: true
    时继续。
  2. 调用
    incremental-fields-create
    确认
    cdc_available: true
    并查看
    detected_primary_keys
  3. 调用
    partial-update
    {sync_type: "cdc", primary_key_columns: [...], cdc_table_mode: "consolidated"}
  4. 必须重新同步 — CDC表结构不同。更新配置后触发
    external-data-schemas-resync
    。警告用户此操作会清空现有数据。

Fixing a stale incremental field after schema drift

模式漂移后修复失效的增量字段

Source dropped the
updated_at
column. Sync has been failing with "column does not exist".
  1. incremental-fields-create
    to see what fields remain.
  2. Pick a replacement (or switch to
    full_refresh
    if none are suitable).
  3. partial-update
    with the new field + type (or new sync_type).
  4. reload
    to retry.
数据源删除了
updated_at
列。同步因“列不存在”而失败。
  1. 调用
    incremental-fields-create
    查看剩余字段。
  2. 选择替代字段(若没有合适字段则切换为
    full_refresh
    )。
  3. 调用
    partial-update
    设置新字段+类型(或新sync_type)。
  4. 调用
    reload
    重试。

Changing primary keys on a CDC table

修改CDC表的主键

  1. partial-update
    :
    {primary_key_columns: [...]}
    .
  2. Resync required — existing CDC tombstones and upsert keys won't match the new PK definition, leading to row duplication or missed updates.
  3. resync
    , warn the user.
  1. 调用
    partial-update
    {primary_key_columns: [...]}
  2. 必须重新同步 — 现有CDC墓碑和更新键与新主键定义不匹配,会导致行重复或遗漏更新。
  3. 调用
    resync
    并警告用户。

Changing sync_frequency

修改sync_frequency

  1. partial-update
    :
    {sync_frequency: "1hour"}
    .
  2. No reload needed — the next scheduled sync picks up the new cadence. Or reload manually if the user wants to confirm nothing broke.
  1. 调用
    partial-update
    {sync_frequency: "1hour"}
  2. 无需重新加载 — 下一次调度同步会采用新频率。若用户希望确认无问题,可手动重新加载。

Switching a schema to
sync_type: "webhook"

将模式切换为
sync_type: "webhook"

Only works for sources that implement
WebhookSource
(today: Stripe) and tables where
supports_webhooks: true
from
incremental-fields-create
.
  1. incremental-fields-create
    to confirm
    supports_webhooks: true
    for the table.
  2. partial-update
    :
    {sync_type: "webhook"}
    .
  3. If the source doesn't already have a webhook registered (check with
    webhook-info-retrieve
    ), call
    external-data-sources-create-webhook-create({source_id})
    to register it.
  4. No resync required — the schema's existing bulk-synced data stays, and the webhook becomes the primary ingestion path once the next reconciliation finishes.
  5. Keep
    sync_frequency
    set (e.g.
    24hour
    ) — it acts as a safety-net reconciliation in case any webhook delivery is missed.
仅适用于实现
WebhookSource
的数据源(目前为Stripe),且
incremental-fields-create
返回
supports_webhooks: true
的表。
  1. 调用
    incremental-fields-create
    确认该表
    supports_webhooks: true
  2. 调用
    partial-update
    {sync_type: "webhook"}
  3. 若数据源尚未注册webhook(通过
    webhook-info-retrieve
    检查),调用
    external-data-sources-create-webhook-create({source_id})
    进行注册。
  4. 无需重新同步 — 模式的现有批量同步数据保留,下一次对账完成后webhook将成为主要 ingestion 路径。
  5. 保持
    sync_frequency
    设置(如
    24hour
    )—— 作为安全网,在webhook投递失败时进行对账。

Switching off
sync_type: "webhook"

sync_type: "webhook"
切换回其他模式

  1. partial-update
    :
    {sync_type: "incremental"}
    (or whatever bulk type is appropriate) with the required
    incremental_field
    +
    incremental_field_type
    .
  2. If no other schemas on the source are still using
    sync_type: "webhook"
    , call
    external-data-sources-delete-webhook-create({source_id})
    to unregister. Leaving an orphaned webhook registered on the source side just means events will be received and dropped — not harmful, but messy.
  3. If other schemas on the source are still on webhook, leave the webhook registered — it's shared across all webhook-type schemas on the source.
  1. 调用
    partial-update
    {sync_type: "incremental"}
    (或其他合适的批量类型),并设置所需的
    incremental_field
    +
    incremental_field_type
  2. 若该数据源没有其他模式仍使用
    sync_type: "webhook"
    ,调用
    external-data-sources-delete-webhook-create({source_id})
    注销webhook。在数据源侧保留孤立webhook只会导致事件被接收后丢弃——无危害但不整洁。
  3. 若该数据源还有其他模式使用webhook,保留webhook注册——它由该数据源上所有webhook类型的模式共享。

Rotating a webhook signing secret

轮换webhook签名密钥

The source's signing secret (e.g. Stripe's
whsec_...
) was rotated, and payloads are now failing signature verification.
  1. Grab the new secret from the source's dashboard.
  2. external-data-sources-update-webhook-inputs-create({source_id}, {inputs: {signing_secret: "whsec_..."}})
    .
  3. No reload needed — the next inbound webhook payload will verify against the new secret.
数据源的签名密钥(如Stripe的
whsec_...
)已轮换,导致负载签名验证失败。
  1. 从数据源控制台获取新密钥。
  2. 调用
    external-data-sources-update-webhook-inputs-create({source_id}, {inputs: {signing_secret: "whsec_..."}})
  3. 无需重新加载 — 下一次入站webhook负载将使用新密钥验证。

Pausing a schema

暂停模式

  1. partial-update
    :
    {should_sync: false}
    . Schema stops syncing but stays configured.
  2. To resume later:
    partial-update
    :
    {should_sync: true}
    , then
    reload
    for an immediate run.
  1. 调用
    partial-update
    {should_sync: false}
    。模式停止同步但配置保留。
  2. 后续恢复:调用
    partial-update
    {should_sync: true}
    ,然后调用
    reload
    立即运行。

Important notes

重要注意事项

  • Read before you write. Always retrieve the current config first.
    partial-update
    doesn't complain if you set a field to the value it already had, but you might be about to change something you didn't realize was already set.
  • Not every sync_type is available on every schema. The
    incremental-fields-create
    response tells you what's available right now, which can be different from what was available at creation (e.g. CDC may have been enabled for the team since).
  • Wipe when the shape changes. Switching sync strategy often changes the physical table. If you don't resync, you'll be mixing row shapes and queries will return garbage.
  • CDC needs prerequisites. Never switch to
    sync_type: "cdc"
    without running
    check-cdc-prerequisites-create
    first. The sync will just fail immediately.
  • Don't touch a Running schema. If the schema is currently running, either wait for it to finish or
    external-data-schemas-cancel
    before applying the change. Updating config mid-sync can leave the incremental high-water mark inconsistent.
  • Sync frequency is cheap to change. Encourage experimentation there. Sync_type and incremental_field are expensive to change — encourage care.
  • Webhooks are registered at the source level, not the schema level. Multiple webhook-type schemas on the same source share one webhook registration. Only delete the webhook when the last webhook-type schema on that source is being switched away, otherwise other schemas stop receiving pushes.
  • 先读后写。始终先获取当前配置。
    partial-update
    不会报错如果你将字段设置为已有值,但你可能会不小心修改原本已正确设置的内容。
  • 并非每个sync_type都适用于所有模式
    incremental-fields-create
    的返回结果会告诉你当前可用的类型,这可能与创建时可用的类型不同(例如,团队可能之后启用了CDC)。
  • 结构变更时清空数据。切换同步策略通常会改变物理表结构。若不重新同步,会混合不同的行结构,导致查询返回错误数据。
  • CDC需要前置条件。切换为
    sync_type: "cdc"
    前必须先运行
    check-cdc-prerequisites-create
    。否则同步会立即失败。
  • 不要修改正在运行的模式。若模式当前正在运行,等待其完成或调用
    external-data-schemas-cancel
    后再应用变更。同步过程中更新配置会导致增量高水位标记不一致。
  • 同步频率修改成本低。鼓励用户尝试修改。sync_type和incremental_field修改成本高——建议谨慎操作。
  • Webhook在数据源级别注册,而非模式级别。同一数据源上的多个webhook类型模式共享一个webhook注册。仅当该数据源上最后一个webhook类型模式被切换时才注销webhook,否则其他模式将停止接收推送。