motherduck-load-data

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Chinese

Use this skill when the job is getting data into MotherDuck correctly and efficiently, not just writing one ad hoc import query.

当需要正确且高效地将数据导入MotherDuck，而非仅编写临时导入查询时，使用本技能。

Prefer current MotherDuck loading, cloud-storage, and Postgres-endpoint loading docs first.
Use
```
CREATE SECRET
```
and cloud-storage docs for protected-object-store workflows.
Use the DuckDB database upload docs when the source is an existing local
```
.duckdb
```
,
```
.ddb
```
, or attached DuckDB database.
Keep the loading advice aligned with MotherDuck's documented posture:
- batch over streaming
- Parquet over CSV when you control the format
- dataframe,
```
COPY
```
  , CTAS, or
```
INSERT ... SELECT
```
  over row-by-row inserts
- native MotherDuck storage first unless DuckLake is explicitly required

Start by classifying the source: object storage or HTTPS, local file or local DuckDB, in-memory rows, or an external database.

Prefer

CREATE TABLE AS SELECT

for first loads and

INSERT INTO ... SELECT

for appends.

For whole local DuckDB databases, use
```
CREATE OR REPLACE DATABASE remote_name FROM CURRENT_DATABASE()
```
, an attached local database, or a file path from a native DuckDB client after attaching
```
md:
```
.
Use Parquet for durable bulk movement whenever you control the source format.
Treat the Postgres endpoint as a thin-client path for server-side remote reads, not for local-file or extension-driven ingestion.
Bootstrap the target MotherDuck database first when the ingestion tool does not create it automatically.
Keep raw landing minimally transformed; do typing, deduplication, and business logic in staging or modeling steps.
Keep source storage close to the MotherDuck region when you control placement.

Identify where the source data actually lives.
Choose the loading path:
- object storage or HTTPS: remote read into MotherDuck
- local file or local DuckDB: use a DuckDB client path
- in-memory rows: Arrow or dataframe bulk load first, batched inserts only as a fallback
- external database: use the appropriate scan or replication path from a DuckDB-capable environment
Land the data into a raw or staging table with minimal transformation.
Validate row counts, types, and a few business aggregates immediately after the load.
Promote into modeled tables only after the landing step is correct.

确定数据源的实际存储位置。
选择加载路径：
- 对象存储或HTTPS：直接远程读取到MotherDuck
- 本地文件或本地DuckDB：使用DuckDB客户端路径
- 内存行数据：优先使用Arrow或数据帧批量加载，仅在万不得已时使用批量插入
- 外部数据库：从支持DuckDB的环境中使用合适的扫描或复制路径
将数据落地到原始表或staging表，尽量少做转换。
加载完成后立即验证行数、数据类型及部分业务聚合指标。
仅在落地步骤验证正确后，再将数据迁移到建模表中。

```
references/INGESTION_PATTERNS.md
```
for format-specific options, cloud-storage secrets, Postgres-endpoint loading tradeoffs, Python dataframe paths, and advanced ingestion patterns

```
references/INGESTION_PATTERNS.md
```
：包含格式专属选项、云存储密钥、Postgres端点加载权衡、Python数据帧路径及高级摄入模式的内容