setting-up-a-data-warehouse-source

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Setting up a data warehouse source

数据仓库源设置

Use this skill when the user wants to connect an external data source to PostHog's data warehouse for the first time. The setup has a specific three-step flow (wizard → db-schema → create) — skipping steps leads to failed sources and confused users.

当用户首次想要将外部数据源连接到PostHog的数据仓库时，可使用本技能。设置流程分为特定的三步（向导→db-schema→创建）——跳过步骤会导致源创建失败，让用户产生困惑。

When to use this skill

何时使用本技能

The user wants to connect a new source: "connect Stripe", "import my Postgres orders table", "sync Hubspot contacts"
The user isn't sure what source types PostHog supports
The user has credentials but doesn't know how to structure the
```
schemas
```
payload
The user wants guidance on which sync method to pick per table

用户想要连接新源：「连接Stripe」「导入我的Postgres订单表」「同步Hubspot联系人」
用户不确定PostHog支持哪些源类型
用户拥有凭证，但不知道如何构造
```
schemas
```
负载
用户需要针对每张表选择同步方式的指导

Available tools

可用工具

Tool	Purpose
`external-data-sources-wizard`	Discover which source types exist and what fields each needs
`external-data-sources-db-schema`	Validate credentials and list tables with available sync methods per table
`external-data-sources-create`	Create the source — requires a `schemas` array built from the db-schema response
`external-data-sources-check-cdc-prerequisites-create`	Postgres CDC pre-flight check (optional, only for Postgres CDC)
`external-data-sources-webhook-info-retrieve`	Check if a source supports webhooks and whether one has been registered
`external-data-sources-create-webhook-create`	Register a webhook with the external service after source creation
`external-data-sources-update-webhook-inputs-create`	Supply the signing secret manually when auto-registration failed
`external-data-sources-list`	After creation, confirm the source is listed and see its initial status
`external-data-schemas-list`	See per-table sync status once the source is created

工具名称	用途
`external-data-sources-wizard`	发现存在哪些源类型，以及每种类型需要哪些字段
`external-data-sources-db-schema`	验证凭证，并列出每张表及其可用的同步方式
`external-data-sources-create`	创建源——需要基于db-schema响应构建的 `schemas` 数组
`external-data-sources-check-cdc-prerequisites-create`	Postgres CDC预检（可选，仅适用于Postgres CDC）
`external-data-sources-webhook-info-retrieve`	检查源是否支持Webhook，以及是否已注册Webhook
`external-data-sources-create-webhook-create`	在源创建完成后，向外部服务注册Webhook
`external-data-sources-update-webhook-inputs-create`	当自动注册失败时，手动提供签名密钥
`external-data-sources-list`	创建完成后，确认源已列出并查看其初始状态
`external-data-schemas-list`	源创建完成后，查看每张表的同步状态

The three-step flow

三步流程

Every source setup follows the same shape. Don't try to shortcut to

external-data-sources-create

— you need the db-schema response to build a valid

schemas

payload.

text

         ┌────────────────────┐
         │ 1. wizard          │  What source types exist? What fields does each need?
         └────────┬───────────┘
                  ▼
         ┌────────────────────┐
         │ 2. db-schema       │  Validate creds. List tables + available sync methods per table.
         └────────┬───────────┘
                  ▼
         ┌────────────────────┐
         │ 3. create          │  Send source_type + credentials + schemas[] to actually create.
         └────────────────────┘

每个源的设置都遵循相同的流程。不要试图直接跳过步骤调用

external-data-sources-create

——你需要db-schema的响应来构建有效的

schemas

负载。

text

         ┌────────────────────┐
         │ 1. wizard          │  有哪些源类型？每种类型需要哪些字段？
         └────────┬───────────┘
                  ▼
         ┌────────────────────┐
         │ 2. db-schema       │  验证凭证。列出表格及每张表可用的同步方式。
         └────────┬───────────┘
                  ▼
         ┌────────────────────┐
         │ 3. create          │  发送source_type + 凭证 + schemas[]来实际创建源。
         └────────────────────┘

Workflow

工作流程

Step 1 — Discover the source type

步骤1 — 发现源类型

Call

external-data-sources-wizard

(no params). The response is a dict keyed by source type. Each entry describes:

```
name
```
— the canonical source_type string you'll pass to later calls (e.g.
```
"Postgres"
```
,
```
"Stripe"
```
,
```
"Hubspot"
```
).
```
label
```
/
```
caption
```
— human-readable.
```
fields
```
— the config fields needed (host, port, database, api_key, client_id/secret, ...). Each has
```
name
```
,
```
type
```
(input, password, switch, select, file-upload), and
```
required
```
.
```
featured
```
,
```
unreleasedSource
```
— use to gauge readiness. Skip sources marked
```
unreleasedSource: true
```
unless the user explicitly asked for a preview.

Match the user's request to a source. If they said "Postgres", look up

Postgres

. If they said something ambiguous like "database", present the top relevant matches (Postgres, MySQL, MongoDB, BigQuery, Snowflake, Redshift) and let them pick.

For OAuth-based sources (Hubspot, Salesforce, Google Ads), the wizard entry hints at an OAuth flow. These typically need the user to authorize in the PostHog UI rather than pasting credentials — explain this and direct them to the source setup page rather than trying to collect tokens in chat. OAuth is about authentication, not about how data flows; OAuth sources still use polling bulk sync, not webhooks.

Gather the required credentials from the user. Never ask for more fields than the wizard entry says are required — asking for an unnecessary

port

when the source doesn't need one confuses users.

调用

external-data-sources-wizard

（无参数）。响应是一个以源类型为键的字典。每个条目包含：

```
name
```
— 后续调用中需传入的标准source_type字符串（例如
```
"Postgres"
```
、
```
"Stripe"
```
、
```
"Hubspot"
```
）。
```
label
```
/
```
caption
```
— 人类可读的名称。
```
fields
```
— 所需的配置字段（主机、端口、数据库、api_key、client_id/密钥等）。每个字段包含
```
name
```
、
```
type
```
（输入框、密码框、开关、选择器、文件上传）和
```
required
```
。
```
featured
```
、
```
unreleasedSource
```
— 用于判断源的就绪状态。除非用户明确要求预览，否则跳过标记为
```
unreleasedSource: true
```
的源。

将用户的请求与源类型匹配。如果用户说“Postgres”，就查找

Postgres

；如果用户的请求模糊，比如“数据库”，则展示最相关的匹配项（Postgres、MySQL、MongoDB、BigQuery、Snowflake、Redshift），让用户选择。

对于基于OAuth的源（Hubspot、Salesforce、Google Ads），向导条目会提示OAuth流程。这类源通常需要用户在PostHog UI中授权，而非粘贴凭证——请向用户说明这一点，并引导他们前往源设置页面，不要尝试在聊天中收集令牌。OAuth仅用于身份验证，与数据流无关；OAuth源仍使用轮询批量同步，而非Webhook。

向用户收集所需的凭证。切勿询问向导条目中未标记为必填的字段——例如，当源不需要

port

时询问该字段会让用户困惑。

Step 2 — Validate credentials and discover tables

步骤2 — 验证凭证并发现表

Call

external-data-sources-db-schema

with

source_type

plus all credential fields. This does two things at once:

Validates the credentials against the live source. Returns 400 with a
```
message
```
if anything is wrong (bad host, wrong password, permission denied). Show the error verbatim — it's often actionable ("password authentication failed for user 'x'").
If valid, returns an array of table entries. Each entry:

text

{
  "table": "orders",
  "should_sync": false,
  "rows": 1_250_000,
  "incremental_available": true,   # can do sync_type=incremental
  "append_available": true,        # can do sync_type=append
  "cdc_available": true,           # can do sync_type=cdc  (null = not enabled for team)
  "supports_webhooks": false,      # can do sync_type=webhook for real-time push
  "incremental_fields": [          # candidates: usually updated_at, created_at, id
    {"field": "updated_at", "type": "datetime", "label": "updated_at", ...},
    {"field": "created_at", "type": "datetime", ...},
    {"field": "id", "type": "integer", ...}
  ],
  "detected_primary_keys": ["id"],
  "available_columns": [{"field": "id", "type": "integer", "nullable": false}, ...],
  "description": "..."
}

Present this to the user. Don't dump the raw JSON — summarize: which tables were found, row counts, and the default sync method recommendation per table (see sync-type decision guide).

调用

external-data-sources-db-schema

，传入

source_type

及所有凭证字段。该操作同时完成两件事：

针对实际源验证凭证。如果有错误（主机错误、密码错误、权限不足），会返回400状态码和
```
message
```
。直接向用户展示错误信息——这些信息通常可用于排查问题（例如“用户'x'的密码验证失败”）。
如果验证通过，会返回一个表条目数组。每个条目如下：

text

{
  "table": "orders",
  "should_sync": false,
  "rows": 1_250_000,
  "incremental_available": true,   # 可使用sync_type=incremental
  "append_available": true,        # 可使用sync_type=append
  "cdc_available": true,           # 可使用sync_type=cdc （null = 团队未启用）
  "supports_webhooks": false,      # 可使用sync_type=webhook进行实时推送
  "incremental_fields": [          # 候选字段：通常为updated_at、created_at、id
    {"field": "updated_at", "type": "datetime", "label": "updated_at", ...},
    {"field": "created_at", "type": "datetime", ...},
    {"field": "id", "type": "integer", ...}
  ],
  "detected_primary_keys": ["id"],
  "available_columns": [{"field": "id", "type": "integer", "nullable": false}, ...],
  "description": "..."
}

将这些信息呈现给用户。不要直接输出原始JSON——进行总结：已发现哪些表、行数，以及每张表的默认同步方式建议（参见同步类型决策指南）。

Step 3 — Confirm per-table sync configuration

步骤3 — 确认逐表同步配置

For each table the user wants to sync, pick a sync_type. See the sync-type decision guide for detailed rules, but the short version is:

Small / dimension tables (<50k rows, no natural ordering column):
```
full_refresh
```
— simple and always correct.
Large tables with an
updated_at
/
modified_at
:
```
incremental
```
— much cheaper per sync.
Append-only immutable tables (logs, events):
```
append
```
if available — preserves history.
Postgres with CDC enabled and you need near-real-time:
```
cdc
```
— requires primary keys and Postgres prerequisites.
Sources that support webhooks (currently Stripe): for near-real-time ingestion set
```
sync_type: "webhook"
```
on the tables where
```
supports_webhooks: true
```
, then register the webhook as a post-create step (see step 6 below). Tables that don't support webhooks on the same source still need a bulk sync_type.

For each schema that will use

incremental

append

cdc

, you also need:

```
incremental_field
```
— which column to track for high-water-mark ordering. Pick from the
```
incremental_fields
```
list returned by db-schema. Prefer
```
updated_at
```
over
```
created_at
```
(updated_at catches late-arriving updates; created_at misses them). For integer-only tables, use the monotonically increasing primary key.

incremental_field_type

— must match the chosen field's type (

datetime

timestamp

date

integer

numeric

objectid

primary_key_columns

— required for CDC. Use

detected_primary_keys

from db-schema.

对于用户想要同步的每张表，选择sync_type。详细规则请参见同步类型决策指南，以下是简化版：

小型/维度表（<5万行，无自然排序列）：
```
full_refresh
```
——简单且始终准确。
带有
updated_at
/
modified_at
的大型表：
```
incremental
```
——每次同步成本低得多。
仅追加的不可变表（日志、事件）： 如果支持则使用
```
append
```
——保留历史记录。
启用CDC且需要近实时同步的Postgres：
```
cdc
```
——需要主键和Postgres前置条件。
支持Webhook的源（目前仅Stripe）： 对于需要近实时 ingestion 的表，在
```
supports_webhooks: true
```
的表上设置
```
sync_type: "webhook"
```
，然后在创建完成后执行注册Webhook的步骤（参见下文步骤6）。同一源中不支持Webhook的表仍需使用批量同步类型。

对于每个使用

incremental

append

cdc

的schema，还需要：

```
incremental_field
```
——用于跟踪高水位标记排序的列。从db-schema返回的
```
incremental_fields
```
列表中选择。优先选择
```
updated_at
```
而非
```
created_at
```
（updated_at能捕获延迟到达的更新；created_at会遗漏这些更新）。对于仅含整数的表，使用单调递增的主键。
```
incremental_field_type
```
——必须与所选字段的类型匹配（
```
datetime
```
、
```
timestamp
```
、
```
date
```
、
```
integer
```
、
```
numeric
```
、
```
objectid
```
）。
```
primary_key_columns
```
——CDC必填。使用db-schema返回的
```
detected_primary_keys
```
。

Step 4 — Pick a good prefix

步骤4 — 选择合适的前缀

The source's

prefix

is prepended to table names in HogQL. Tables end up as

{prefix}_{table_name}

Default to the source type lowercased if there's only one source of that type:
```
stripe
```
,
```
postgres
```
.
If the user already has a Postgres source, pick something distinguishing:
```
postgres_prod
```
,
```
postgres_analytics
```
.
Use lowercase, underscore-separated. The prefix becomes part of every HogQL query the user writes.

Confirm the prefix with the user before creating — changing it later is possible but renames every table.

源的

prefix

会添加到HogQL中的表名前，最终表名格式为

{prefix}_{table_name}

。

如果该类型的源只有一个，默认使用小写的源类型：
```
stripe
```
、
```
postgres
```
。
如果用户已有一个Postgres源，选择具有区分度的名称：
```
postgres_prod
```
、
```
postgres_analytics
```
。
使用小写、下划线分隔的格式。前缀会成为用户编写的每个HogQL查询的一部分。

在创建前与用户确认前缀——后续虽可修改，但会重命名所有表。

Step 5 — Create the source

步骤5 — 创建源

Call

external-data-sources-create

with:

json

{
  "source_type": "Postgres",
  "prefix": "postgres_prod",
  "payload": {
    "host": "...",
    "port": "5432",
    "dbname": "...",
    "user": "...",
    "password": "...",
    "schema": "public",
    "schemas": [
      {
        "name": "orders",
        "should_sync": true,
        "sync_type": "incremental",
        "incremental_field": "updated_at",
        "incremental_field_type": "datetime",
        "primary_key_columns": ["id"]
      },
      {
        "name": "users",
        "should_sync": true,
        "sync_type": "full_refresh"
      },
      {
        "name": "audit_log",
        "should_sync": false
      }
    ]
  }
}

Rules for the

schemas

array:

Every table returned by db-schema should be included, even ones the user doesn't want (set
```
should_sync: false
```
). Tables the user didn't mention default to
```
should_sync: false
```
.
```
sync_type
```
is required only when
```
should_sync: true
```
.

incremental_field

incremental_field_type

must be present when

sync_type

incremental

append

```
primary_key_columns
```
must be present when
```
sync_type
```
is
```
cdc
```
.

On success you'll get back a source with a new

id

. The first sync is triggered automatically.

调用

external-data-sources-create

，传入以下参数：

json

{
  "source_type": "Postgres",
  "prefix": "postgres_prod",
  "payload": {
    "host": "...",
    "port": "5432",
    "dbname": "...",
    "user": "...",
    "password": "...",
    "schema": "public",
    "schemas": [
      {
        "name": "orders",
        "should_sync": true,
        "sync_type": "incremental",
        "incremental_field": "updated_at",
        "incremental_field_type": "datetime",
        "primary_key_columns": ["id"]
      },
      {
        "name": "users",
        "should_sync": true,
        "sync_type": "full_refresh"
      },
      {
        "name": "audit_log",
        "should_sync": false
      }
    ]
  }
}

schemas

数组的规则：

db-schema返回的每张表都应包含在内，即使用户不想同步（设置
```
should_sync: false
```
）。用户未提及的表默认设置
```
should_sync: false
```
。
仅当
```
should_sync: true
```
时，才需要
```
sync_type
```
。

当

sync_type

为

incremental

或

append

时，必须提供

incremental_field

incremental_field_type

。

当
```
sync_type
```
为
```
cdc
```
时，必须提供
```
primary_key_columns
```
。

成功后会返回一个带有新

id

的源。首次同步会自动触发。

Step 6 — Register a webhook (only when any schema is

sync_type: "webhook"

)

步骤6 — 注册Webhook（仅当有schema设置为

sync_type: "webhook"

时）

Webhook-type schemas don't start receiving data just by existing — the external service needs to know where to POST events, and PostHog needs to know how to verify them. This is a second call after source creation, not part of the

external-data-sources-create

payload. Do this before telling the user the setup is complete, otherwise they hear "syncs are running" while the push channel is still unregistered.

Only needed when at least one schema on the source has

sync_type: "webhook"

and

should_sync: true

. Currently only Stripe implements this flow; for everything else skip this step.

Before calling create-webhook, check

external-data-sources-webhook-info-retrieve({id})

. If it already returns

exists: true

, do NOT call create-webhook again — each successful call registers a new external endpoint and would result in duplicate deliveries.

Call
```
external-data-sources-create-webhook-create({id})
```
. PostHog:
- creates the HogFunction that will receive webhook POSTs,
- builds a schema_mapping from external event types to PostHog schema ids,
- calls the source's API (e.g. Stripe) to register the webhook URL and subscribe to the relevant events,
- on Stripe, auto-captures the
```
signing_secret
```
  and stores it securely.
Returns
```
{success, webhook_url, error}
```
. On success report the
```
webhook_url
```
to the user for their records — but they don't need to paste it anywhere; registration is already done.
If
```
success: false
```
with a permissions error like "API key doesn't have permission to create webhooks":
- The HogFunction is still created, just disabled.
- Ask the user to create the webhook manually in the source's dashboard using the returned
```
webhook_url
```
  .
- Have them copy the signing secret from the source's webhook settings.
- Call
```
external-data-sources-update-webhook-inputs-create({id}, {inputs: {signing_secret: "whsec_..."}})
```
  to store it. The HogFunction picks it up and verifies incoming payloads.

Verify with

external-data-sources-webhook-info-retrieve({id})

. A healthy webhook has

exists: true

external_status.status: "enabled"

, and no

error

Webhooks are supplementary to bulk sync. The first load of a webhook-enabled schema is still done via polling (

initial_sync_complete

flips to true when done); after that, the webhook becomes the primary ingestion path. A webhook schema will still have a

sync_frequency

that schedules a periodic bulk refresh as a safety net. This is expected — not something to "fix".

设置为Webhook类型的schema不会自动开始接收数据——外部服务需要知道向哪个URL发送POST请求，PostHog也需要知道如何验证这些请求。这是源创建完成后的第二步操作，不属于

external-data-sources-create

负载的一部分。在告知用户设置完成前务必完成此步骤，否则用户会听到“同步正在运行”，但推送通道尚未注册。

仅当源上至少有一个schema设置为

sync_type: "webhook"

且

should_sync: true

时才需要执行此步骤。目前仅Stripe实现了此流程；其他源请跳过此步骤。

调用create-webhook前，先调用

external-data-sources-webhook-info-retrieve({id})

。如果返回

exists: true

，请勿再次调用create-webhook——每次成功调用都会注册一个新的外部端点，导致重复投递。

调用
```
external-data-sources-create-webhook-create({id})
```
。PostHog会：
- 创建用于接收Webhook POST请求的HogFunction，
- 构建从外部事件类型到PostHog schema ID的schema_mapping，
- 调用源的API（例如Stripe）注册Webhook URL并订阅相关事件，
- 在Stripe上自动捕获
```
signing_secret
```
  并安全存储。
返回
```
{success, webhook_url, error}
```
。成功后将
```
webhook_url
```
告知用户留存——但用户无需粘贴该URL，注册已完成。
如果
```
success: false
```
且返回权限错误，例如“API密钥没有创建Webhook的权限”：
- HogFunction仍会创建，但处于禁用状态。
- 请用户使用返回的
```
webhook_url
```
  在源的仪表盘中手动创建Webhook。
- 让用户从源的Webhook设置中复制签名密钥。
- 调用
```
external-data-sources-update-webhook-inputs-create({id}, {inputs: {signing_secret: "whsec_..."}})
```
  存储密钥。HogFunction会获取该密钥并验证传入的负载。

调用

external-data-sources-webhook-info-retrieve({id})

验证。健康的Webhook应满足

exists: true

、

external_status.status: "enabled"

且无

error

。

Webhook是批量同步的补充。支持Webhook的schema的首次加载仍通过轮询完成（完成后

initial_sync_complete

会变为true）；之后，Webhook会成为主要的 ingestion 路径。Webhook schema仍会有

sync_frequency

，用于定期调度批量刷新作为安全保障。这是预期行为——无需“修复”。

Step 7 — Confirm and explain what happens next

步骤7 — 确认并说明后续操作

After creation (and, for webhook schemas, after Step 6):

Call
```
external-data-schemas-list
```
to show the user the initial state.
Explain: every enabled schema enters
```
Running
```
, then moves to
```
Completed
```
when the first sync finishes. First syncs can take anywhere from seconds to hours depending on row count — a multi-million-row table is fine, just slow.

Tell them how to query:

SELECT * FROM {prefix}_{table_name} LIMIT 10

in HogQL.

Offer to check back in a few minutes to confirm the initial syncs succeeded.

创建完成后（对于Webhook schema，需完成步骤6后）：

调用
```
external-data-schemas-list
```
向用户展示初始状态。
说明：每个启用的schema会进入
```
Running
```
状态，首次同步完成后变为
```
Completed
```
。首次同步的耗时从几秒到几小时不等，取决于行数——数百万行的表是正常的，只是速度较慢。
告知用户查询方式：在HogQL中使用
```
SELECT * FROM {prefix}_{table_name} LIMIT 10
```
。
可主动提出几分钟后再次检查，确认首次同步是否成功。

CDC setup for Postgres (optional, when requested)

Postgres的CDC设置（可选，仅当用户请求时）

If the user wants near-real-time replication from Postgres:

Before calling db-schema, run
```
external-data-sources-check-cdc-prerequisites-create
```
with their Postgres creds. It returns
```
{valid, errors[]}
```
listing anything missing (wal_level, replication slot, publication, permissions).
If
```
valid: false
```
, present the errors and ask the user to fix on the Postgres side. Don't try to create a CDC source that will immediately fail.
Once prerequisites pass, proceed to db-schema and create. Set
```
sync_type: "cdc"
```
on the tables that need it, and include
```
primary_key_columns
```
for each (CDC requires them).

如果用户想要从Postgres进行近实时复制：

在调用db-schema前，使用用户的Postgres凭证调用
```
external-data-sources-check-cdc-prerequisites-create
```
。返回
```
{valid, errors[]}
```
，列出所有缺失的条件（wal_level、复制槽、发布、权限）。
如果
```
valid: false
```
，向用户展示错误信息，并请用户在Postgres端修复。不要尝试创建会立即失败的CDC源。
前置条件满足后，继续执行db-schema和创建步骤。在需要的表上设置
```
sync_type: "cdc"
```
，并为每个表添加
```
primary_key_columns
```
（CDC需要主键）。

Important notes

重要注意事项

Always validate creds with db-schema before create. The create endpoint will accept invalid creds and then fail asynchronously — the source appears in the list with status
```
Error
```
and no tables. Skipping the validation step just pushes the failure into the background.
Present the table list before creating. Large databases may have hundreds of tables. Don't auto-select them all — row counts and relevance matter for billing. Let the user opt in explicitly.
Don't invent schemas. Every entry in the
```
schemas
```
array must correspond to a real table from the db-schema response. You can't "also add an orders table" unless db-schema found one.
Prefix is load-bearing. It's part of every HogQL query the user will ever write against these tables. Pick something short, descriptive, and not already taken.
OAuth sources are different. Hubspot, Salesforce, Google Ads etc. need the user to authorize via the PostHog UI. Direct them there — don't try to collect OAuth tokens in chat.
Webhooks are a separate step after create. Setting
```
sync_type: "webhook"
```
on a schema doesn't register the webhook — the
```
create-webhook
```
call does. Always follow create → create-webhook → webhook-info for webhook-type schemas, and never leave a webhook schema dangling without registration (it just won't receive events).
Webhook support is source-specific and sparse. Currently only Stripe implements
```
WebhookSource
```
. Don't promise webhooks for Hubspot, Salesforce, or Postgres — they'll use polling sync.
Row counts drive billing. Warehouse syncing is metered by rows synced. A chatty 500M-row events table synced hourly is very different from a 10k-row dimension table synced daily. Flag large tables and offer longer sync frequencies (
```
sync_frequency: "24hour"
```
) as the default.

创建前务必使用db-schema验证凭证。 创建端点会接受无效凭证，然后异步失败——源会出现在列表中，状态为
```
Error
```
且无表。跳过验证步骤只会将失败隐藏到后台。
创建前先展示表列表。 大型数据库可能有数百张表。不要自动全选——行数和相关性会影响计费。让用户明确选择需要同步的表。
不要自行构造schemas。
```
schemas
```
数组中的每个条目必须对应db-schema响应中的真实表。除非db-schema发现了
```
orders
```
表，否则不能“添加orders表”。
前缀至关重要。 它是用户针对这些表编写的每个HogQL查询的一部分。选择简短、描述性强且未被使用的前缀。
OAuth源有所不同。 Hubspot、Salesforce、Google Ads等源需要用户通过PostHog UI授权。引导用户前往该页面——不要尝试在聊天中收集OAuth令牌。
Webhook是创建后的独立步骤。 在schema上设置
```
sync_type: "webhook"
```
不会自动注册Webhook——需要调用
```
create-webhook
```
。对于Webhook类型的schema，务必遵循创建→创建Webhook→检查Webhook信息的流程，不要让Webhook schema处于未注册状态（否则无法接收事件）。
Webhook支持是源特定的，且目前较少。 目前仅Stripe实现了
```
WebhookSource
```
。不要向Hubspot、Salesforce或Postgres用户承诺Webhook支持——这些源使用轮询同步。
行数影响计费。 仓库同步按同步的行数计费。一个有5亿行的事件表每小时同步，与一个有1万行的维度表每天同步，成本差异极大。标记大型表，并建议将较长的同步频率（
```
sync_frequency: "24hour"
```
）设为默认值。

setting-up-a-data-warehouse-source

Original

Translation

Setting up a data warehouse source

数据仓库源设置

When to use this skill

何时使用本技能

Available tools

可用工具

The three-step flow

三步流程

Workflow

工作流程

Step 1 — Discover the source type

步骤1 — 发现源类型

Step 2 — Validate credentials and discover tables

步骤2 — 验证凭证并发现表

Step 3 — Confirm per-table sync configuration

步骤3 — 确认逐表同步配置

Step 4 — Pick a good prefix

步骤4 — 选择合适的前缀

Step 5 — Create the source

步骤5 — 创建源

Step 6 — Register a webhook (only when any schema is
`sync_type: "webhook"`
)

步骤6 — 注册Webhook（仅当有schema设置为
`sync_type: "webhook"`
时）

Step 7 — Confirm and explain what happens next

步骤7 — 确认并说明后续操作

CDC setup for Postgres (optional, when requested)

Postgres的CDC设置（可选，仅当用户请求时）

Important notes

重要注意事项

setting-up-a-data-warehouse-source

Original

Translation

Setting up a data warehouse source

数据仓库源设置

When to use this skill

何时使用本技能

Available tools

可用工具

The three-step flow

三步流程

Workflow

工作流程

Step 1 — Discover the source type

步骤1 — 发现源类型

Step 2 — Validate credentials and discover tables

步骤2 — 验证凭证并发现表

Step 3 — Confirm per-table sync configuration

步骤3 — 确认逐表同步配置

Step 4 — Pick a good prefix

步骤4 — 选择合适的前缀

Step 5 — Create the source

步骤5 — 创建源

Step 6 — Register a webhook (only when any schema is sync_type: "webhook")

步骤6 — 注册Webhook（仅当有schema设置为sync_type: "webhook"时）

Step 7 — Confirm and explain what happens next

步骤7 — 确认并说明后续操作

CDC setup for Postgres (optional, when requested)

Postgres的CDC设置（可选，仅当用户请求时）

Important notes

重要注意事项

Step 6 — Register a webhook (only when any schema is
`sync_type: "webhook"`
)

步骤6 — 注册Webhook（仅当有schema设置为
`sync_type: "webhook"`
时）