tuning-incremental-sync-config
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTuning incremental sync config
调整增量同步配置
A sync's configuration lives on the and can be changed any time via
. Most changes are non-destructive (take effect on the next sync), but a few
(switching sync_type, changing primary keys) require careful handling to avoid corrupting the synced data.
ExternalDataSchemaexternal-data-schemas-partial-update同步配置存储在中,可随时通过进行修改。大多数修改是非破坏性的(会在下一次同步时生效),但少数修改(切换sync_type、更改主键)需要谨慎处理,以免损坏已同步的数据。
ExternalDataSchemaexternal-data-schemas-partial-updateWhen to use this skill
何时使用该技能
- The user wants to change how an already-connected table is synced
- A diagnosis flagged the incremental field or primary key as wrong
- The table is syncing too often / not often enough
- Switching an incremental table to CDC (or vice versa)
- The source table was changed on the other side (new columns, dropped columns) and the sync config needs to catch up
If the user is setting up a brand-new source, use instead — configuration is
chosen at creation time there.
setting-up-a-data-warehouse-source- 用户希望更改已连接表的同步方式
- 诊断结果显示增量字段或主键配置错误
- 表同步过于频繁/不够频繁
- 将增量表切换为CDC(或反之)
- 源表发生变更(新增列、删除列),需要同步配置跟上变更
如果用户正在设置全新的数据源,请使用——配置需在创建时选定。
setting-up-a-data-warehouse-sourceAvailable tools
可用工具
| Tool | Purpose |
|---|---|
| Current sync_type, incremental_field, PKs, sync_frequency |
| Refresh candidate incremental fields from the live source |
| Apply the config change |
| Trigger a sync with the new config |
| Wipe and re-import from scratch when the change invalidates existing data |
| Drop the synced table while keeping the schema entry |
| Pre-flight Postgres CDC (only when switching to/from CDC) |
| Current webhook state (when switching to/from sync_type=webhook) |
| Register a webhook after switching a schema to sync_type=webhook |
| Rotate a webhook signing secret |
| Unregister webhook when switching schemas off sync_type=webhook |
| 工具名称 | 用途 |
|---|---|
| 获取当前sync_type、incremental_field、主键、sync_frequency |
| 从实时数据源刷新候选增量字段 |
| 应用配置变更 |
| 使用新配置触发同步 |
| 当变更使现有数据失效时,清空并重新导入数据 |
| 删除已同步的表,同时保留模式条目 |
| 切换CDC前后的Postgres CDC预检查(仅在切换CDC时使用) |
| 获取当前webhook状态(切换sync_type=webhook时使用) |
| 将模式切换为sync_type=webhook后注册webhook |
| 轮换webhook签名密钥 |
| 将模式从sync_type=webhook切换后注销webhook |
The fields you can tune
可调整的字段
From the partial-update endpoint:
| Field | Values | Notes |
|---|---|---|
| | Source must support the target type — check via incremental-fields |
| Column name from the source | Must appear in |
| | Must match the column's real type |
| Array of column names | Required for CDC. Used for upsert dedup on incremental |
| | Only meaningful when sync_type=cdc |
| | Applies to all non-CDC types |
| | When sync_frequency is daily/weekly-scale |
| | Pause the schema without deleting it |
通过partial-update端点可调整以下字段:
| 字段名称 | 可选值 | 说明 |
|---|---|---|
| | 数据源必须支持目标类型——通过incremental-fields工具检查 |
| 数据源中的列名 | 必须存在于该模式的 |
| | 必须与列的实际类型匹配 |
| 列名数组 | CDC模式必填,用于增量同步时的去重更新 |
| | 仅在sync_type=cdc时生效 |
| | 适用于所有非CDC类型的同步 |
| | 当sync_frequency为每日/每周级别的频率时生效 |
| | 暂停模式同步,不删除配置 |
Workflow
工作流程
Step 1 — Read the current config
步骤1 — 读取当前配置
Always start with . Understanding the current state prevents mistakes like
"fixing" an incremental_field that's actually correct.
external-data-schemas-retrieve({id})Note:
- Current ,
sync_type,incremental_field,incremental_field_typeprimary_key_columns - Current (don't tune a schema that's currently
status— wait or cancel first)Running - (so you can tell if the next sync worked)
last_synced_at - if present (the error often tells you exactly what to change)
latest_error
始终从调用开始。了解当前状态可避免错误,比如“修复”原本正确的incremental_field。
external-data-schemas-retrieve({id})注意:
- 当前的、
sync_type、incremental_field、incremental_field_typeprimary_key_columns - 当前(不要调整正在
status的模式——等待其完成或先取消)Running - (用于判断下一次同步是否成功)
last_synced_at - 若存在(错误信息通常会明确告知需要修改的内容)
latest_error
Step 2 — If changing sync_type or incremental_field, refresh candidates
步骤2 — 若切换sync_type或incremental_field,刷新候选字段
Call . Even though the operation name says "create", it
re-reads the source and returns the current candidate fields — use it to confirm the field you want to set actually
exists on the source and which sync types are now available for this table.
external-data-schemas-incremental-fields-create({id})The response:
text
{
"incremental_fields": [{"field": "updated_at", "type": "datetime", ...}, ...],
"incremental_available": true,
"append_available": true,
"cdc_available": true,
"full_refresh_available": true,
"detected_primary_keys": ["id"],
"available_columns": [...]
}If your target isn't in the list, tell the user — they need to either pick a different field or
change the source table to add one.
incremental_field调用。尽管操作名称是“create”,但它会重新读取数据源并返回当前候选字段——用于确认你要设置的字段确实存在于数据源中,以及该表当前支持哪些同步类型。
external-data-schemas-incremental-fields-create({id})返回示例:
text
{
"incremental_fields": [{"field": "updated_at", "type": "datetime", ...}, ...],
"incremental_available": true,
"append_available": true,
"cdc_available": true,
"full_refresh_available": true,
"detected_primary_keys": ["id"],
"available_columns": [...]
}如果目标不在列表中,告知用户——他们需要选择其他字段或修改源表添加该字段。
incremental_fieldStep 3 — Apply the change
步骤3 — 应用变更
Call .
external-data-schemas-partial-update({id}, {...changed fields})Only send the fields that are actually changing. Partial update means unspecified fields stay as they are.
Examples:
json
// Switch from full_refresh to incremental
{
"sync_type": "incremental",
"incremental_field": "updated_at",
"incremental_field_type": "datetime"
}
// Change sync frequency to hourly
{"sync_frequency": "1hour"}
// Fix wrong PK on a CDC table
{"primary_key_columns": ["tenant_id", "order_id"]}
// Pause a schema
{"should_sync": false}调用。
external-data-schemas-partial-update({id}, {...changed fields})仅发送实际需要修改的字段。部分更新意味着未指定的字段将保持原样。
示例:
json
// 从full_refresh切换为incremental
{
"sync_type": "incremental",
"incremental_field": "updated_at",
"incremental_field_type": "datetime"
}
// 将同步频率改为每小时一次
{"sync_frequency": "1hour"}
// 修复CDC表的错误主键
{"primary_key_columns": ["tenant_id", "order_id"]}
// 暂停模式
{"should_sync": false}Step 4 — Decide whether existing data is still valid
步骤4 — 判断现有数据是否仍有效
This is the step that's easy to get wrong. Some config changes invalidate the synced data; others don't.
Changes that DON'T invalidate existing data:
- ,
sync_frequency— scheduling onlysync_time_of_day - — on/off
should_sync - in most cases — next sync will start writing to the new shape, but historical consolidated rows stay valid
cdc_table_mode - Switching between and
incrementalwith the samefull_refresh— next sync just re-runs freshincremental_field - Switching to or from — the synced data stays valid; only the ingestion path changes. Remember to register or unregister the webhook (see sections below) alongside the sync_type change.
sync_type: "webhook"
Changes that MAY invalidate existing data and need a resync:
- Changing to a different column — the high-water mark is from the old column and won't match. Without a resync you'll miss rows that were updated between the two fields' histories.
incremental_field - Changing — existing rows may be deduplicated incorrectly against new PK definitions.
primary_key_columns - Switching from to
full_refresh— the existing rows don't have the version-history shape that append expects.append - Switching from to
append— opposite problem; you'll end up with duplicate historical versions.full_refresh - Switching to/from — the table shape changes fundamentally.
cdc
When the change invalidates data, the clean flow is:
- with the new config
external-data-schemas-partial-update - Warn the user this is destructive
- to wipe and re-import under the new config
external-data-schemas-resync
Or equivalently, → . + is
cleaner when the table is large and the user wants to start from zero.
external-data-schemas-delete-dataexternal-data-schemas-reloaddelete-datareload这一步很容易出错。部分配置变更会使已同步数据失效,部分则不会。
不会使现有数据失效的变更:
- 、
sync_frequency— 仅调整调度sync_time_of_day - — 开关控制
should_sync - 大多数情况下的— 下一次同步将开始写入新格式,但历史合并行仍有效
cdc_table_mode - 使用相同在
incremental_field和incremental之间切换 — 下一次同步只是重新运行全量full_refresh - 切换至或从— 已同步数据保持有效;仅 ingestion 路径变更。记得在切换sync_type的同时注册或注销webhook(见下文章节)。
sync_type: "webhook"
可能使现有数据失效且需要重新同步的变更:
- 将改为其他列 — 高水位标记来自旧列,无法匹配新列。若不重新同步,会遗漏两个字段历史之间更新的行。
incremental_field - 更改— 现有行可能会根据新的主键定义被错误地去重。
primary_key_columns - 从切换为
full_refresh— 现有行不具备append模式所需的版本历史格式。append - 从切换为
append— 相反的问题;会导致历史版本重复。full_refresh - 切换至或从— 表结构发生根本性变化。
cdc
当变更使数据失效时,正确流程为:
- 使用新配置调用
external-data-schemas-partial-update - 警告用户此操作具有破坏性
- 调用清空并在新配置下重新导入数据
external-data-schemas-resync
或者等效操作: → 。当表数据量较大且用户希望从零开始时, + 更干净。
external-data-schemas-delete-dataexternal-data-schemas-reloaddelete-datareloadStep 5 — Trigger and confirm
步骤5 — 触发并确认
For non-destructive changes, call to pick up the new config immediately rather
than waiting for the schedule.
external-data-schemas-reload({id})Wait a moment, then to confirm then . Report
and any new .
external-data-schemas-retrieve({id})status = RunningCompletedlast_synced_atlatest_error对于非破坏性变更,调用立即应用新配置,而非等待调度时间。
external-data-schemas-reload({id})稍等片刻后,调用确认然后变为。向用户反馈和任何新的。
external-data-schemas-retrieve({id})status = RunningCompletedlast_synced_atlatest_errorSpecific common changes
常见特定变更场景
Switching full_refresh → incremental
从full_refresh切换为incremental
- to confirm the desired field exists and
incremental-fields-create.incremental_available: true - :
partial-update.{sync_type: "incremental", incremental_field, incremental_field_type} - No data wipe needed — next sync just switches strategy. If the source is growing fast, the next incremental sync is the cheap one.
- 调用确认目标字段存在且
incremental-fields-create。incremental_available: true - 调用:
partial-update。{sync_type: "incremental", incremental_field, incremental_field_type} - 无需清空数据 — 下一次同步只需切换策略。若数据源增长迅速,下一次增量同步成本更低。
Switching incremental → cdc (Postgres only)
从incremental切换为cdc(仅Postgres)
- Run on the parent source. Only proceed if
external-data-sources-check-cdc-prerequisites-create.valid: true - to confirm
incremental-fields-createand seecdc_available: true.detected_primary_keys - :
partial-update.{sync_type: "cdc", primary_key_columns: [...], cdc_table_mode: "consolidated"} - Resync required — CDC tables have a different shape. Trigger after the update. Warn the user this wipes existing data.
external-data-schemas-resync
- 在父数据源上运行。仅当
external-data-sources-check-cdc-prerequisites-create时继续。valid: true - 调用确认
incremental-fields-create并查看cdc_available: true。detected_primary_keys - 调用:
partial-update。{sync_type: "cdc", primary_key_columns: [...], cdc_table_mode: "consolidated"} - 必须重新同步 — CDC表结构不同。更新配置后触发。警告用户此操作会清空现有数据。
external-data-schemas-resync
Fixing a stale incremental field after schema drift
模式漂移后修复失效的增量字段
Source dropped the column. Sync has been failing with "column does not exist".
updated_at- to see what fields remain.
incremental-fields-create - Pick a replacement (or switch to if none are suitable).
full_refresh - with the new field + type (or new sync_type).
partial-update - to retry.
reload
数据源删除了列。同步因“列不存在”而失败。
updated_at- 调用查看剩余字段。
incremental-fields-create - 选择替代字段(若没有合适字段则切换为)。
full_refresh - 调用设置新字段+类型(或新sync_type)。
partial-update - 调用重试。
reload
Changing primary keys on a CDC table
修改CDC表的主键
- :
partial-update.{primary_key_columns: [...]} - Resync required — existing CDC tombstones and upsert keys won't match the new PK definition, leading to row duplication or missed updates.
- , warn the user.
resync
- 调用:
partial-update。{primary_key_columns: [...]} - 必须重新同步 — 现有CDC墓碑和更新键与新主键定义不匹配,会导致行重复或遗漏更新。
- 调用并警告用户。
resync
Changing sync_frequency
修改sync_frequency
- :
partial-update.{sync_frequency: "1hour"} - No reload needed — the next scheduled sync picks up the new cadence. Or reload manually if the user wants to confirm nothing broke.
- 调用:
partial-update。{sync_frequency: "1hour"} - 无需重新加载 — 下一次调度同步会采用新频率。若用户希望确认无问题,可手动重新加载。
Switching a schema to sync_type: "webhook"
sync_type: "webhook"将模式切换为sync_type: "webhook"
sync_type: "webhook"Only works for sources that implement (today: Stripe) and tables where
from .
WebhookSourcesupports_webhooks: trueincremental-fields-create- to confirm
incremental-fields-createfor the table.supports_webhooks: true - :
partial-update.{sync_type: "webhook"} - If the source doesn't already have a webhook registered (check with ), call
webhook-info-retrieveto register it.external-data-sources-create-webhook-create({source_id}) - No resync required — the schema's existing bulk-synced data stays, and the webhook becomes the primary ingestion path once the next reconciliation finishes.
- Keep set (e.g.
sync_frequency) — it acts as a safety-net reconciliation in case any webhook delivery is missed.24hour
仅适用于实现的数据源(目前为Stripe),且返回的表。
WebhookSourceincremental-fields-createsupports_webhooks: true- 调用确认该表
incremental-fields-create。supports_webhooks: true - 调用:
partial-update。{sync_type: "webhook"} - 若数据源尚未注册webhook(通过检查),调用
webhook-info-retrieve进行注册。external-data-sources-create-webhook-create({source_id}) - 无需重新同步 — 模式的现有批量同步数据保留,下一次对账完成后webhook将成为主要 ingestion 路径。
- 保持设置(如
sync_frequency)—— 作为安全网,在webhook投递失败时进行对账。24hour
Switching off sync_type: "webhook"
sync_type: "webhook"从sync_type: "webhook"
切换回其他模式
sync_type: "webhook"- :
partial-update(or whatever bulk type is appropriate) with the required{sync_type: "incremental"}+incremental_field.incremental_field_type - If no other schemas on the source are still using , call
sync_type: "webhook"to unregister. Leaving an orphaned webhook registered on the source side just means events will be received and dropped — not harmful, but messy.external-data-sources-delete-webhook-create({source_id}) - If other schemas on the source are still on webhook, leave the webhook registered — it's shared across all webhook-type schemas on the source.
- 调用:
partial-update(或其他合适的批量类型),并设置所需的{sync_type: "incremental"}+incremental_field。incremental_field_type - 若该数据源没有其他模式仍使用,调用
sync_type: "webhook"注销webhook。在数据源侧保留孤立webhook只会导致事件被接收后丢弃——无危害但不整洁。external-data-sources-delete-webhook-create({source_id}) - 若该数据源还有其他模式使用webhook,保留webhook注册——它由该数据源上所有webhook类型的模式共享。
Rotating a webhook signing secret
轮换webhook签名密钥
The source's signing secret (e.g. Stripe's ) was rotated, and payloads are now failing signature
verification.
whsec_...- Grab the new secret from the source's dashboard.
- .
external-data-sources-update-webhook-inputs-create({source_id}, {inputs: {signing_secret: "whsec_..."}}) - No reload needed — the next inbound webhook payload will verify against the new secret.
数据源的签名密钥(如Stripe的)已轮换,导致负载签名验证失败。
whsec_...- 从数据源控制台获取新密钥。
- 调用。
external-data-sources-update-webhook-inputs-create({source_id}, {inputs: {signing_secret: "whsec_..."}}) - 无需重新加载 — 下一次入站webhook负载将使用新密钥验证。
Pausing a schema
暂停模式
- :
partial-update. Schema stops syncing but stays configured.{should_sync: false} - To resume later: :
partial-update, then{should_sync: true}for an immediate run.reload
- 调用:
partial-update。模式停止同步但配置保留。{should_sync: false} - 后续恢复:调用:
partial-update,然后调用{should_sync: true}立即运行。reload
Important notes
重要注意事项
- Read before you write. Always retrieve the current config first. doesn't complain if you set a field to the value it already had, but you might be about to change something you didn't realize was already set.
partial-update - Not every sync_type is available on every schema. The response tells you what's available right now, which can be different from what was available at creation (e.g. CDC may have been enabled for the team since).
incremental-fields-create - Wipe when the shape changes. Switching sync strategy often changes the physical table. If you don't resync, you'll be mixing row shapes and queries will return garbage.
- CDC needs prerequisites. Never switch to without running
sync_type: "cdc"first. The sync will just fail immediately.check-cdc-prerequisites-create - Don't touch a Running schema. If the schema is currently running, either wait for it to finish or
before applying the change. Updating config mid-sync can leave the incremental high-water mark inconsistent.
external-data-schemas-cancel - Sync frequency is cheap to change. Encourage experimentation there. Sync_type and incremental_field are expensive to change — encourage care.
- Webhooks are registered at the source level, not the schema level. Multiple webhook-type schemas on the same source share one webhook registration. Only delete the webhook when the last webhook-type schema on that source is being switched away, otherwise other schemas stop receiving pushes.
- 先读后写。始终先获取当前配置。不会报错如果你将字段设置为已有值,但你可能会不小心修改原本已正确设置的内容。
partial-update - 并非每个sync_type都适用于所有模式。的返回结果会告诉你当前可用的类型,这可能与创建时可用的类型不同(例如,团队可能之后启用了CDC)。
incremental-fields-create - 结构变更时清空数据。切换同步策略通常会改变物理表结构。若不重新同步,会混合不同的行结构,导致查询返回错误数据。
- CDC需要前置条件。切换为前必须先运行
sync_type: "cdc"。否则同步会立即失败。check-cdc-prerequisites-create - 不要修改正在运行的模式。若模式当前正在运行,等待其完成或调用后再应用变更。同步过程中更新配置会导致增量高水位标记不一致。
external-data-schemas-cancel - 同步频率修改成本低。鼓励用户尝试修改。sync_type和incremental_field修改成本高——建议谨慎操作。
- Webhook在数据源级别注册,而非模式级别。同一数据源上的多个webhook类型模式共享一个webhook注册。仅当该数据源上最后一个webhook类型模式被切换时才注销webhook,否则其他模式将停止接收推送。