motherduck-ducklake
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUse DuckLake on MotherDuck
在MotherDuck上使用DuckLake
Use this skill when the storage decision is genuinely about open table format and object-store behavior, not just about where to put another analytical table.
当存储决策真正关乎开放表格式和对象存储行为,而非仅仅是将分析表放置在何处时,可使用本Skill。
Source Of Truth
权威来源
- Prefer current MotherDuck DuckLake docs first.
- Use the upstream DuckLake and DuckDB extension docs only to clarify extension-level behavior that MotherDuck docs reference.
- Keep the guidance aligned with the documented product posture:
- native MotherDuck first
- upstream DuckLake v1.0 is production-ready and supported by DuckDB 1.5.2, while MotherDuck's DuckLake docs still define the MotherDuck product surface and preview/compatibility limits
- fully managed, BYOB, and own-compute paths are distinct
- maintenance and compaction are explicit operations, not background magic
- 优先参考最新的MotherDuck DuckLake官方文档。
- 仅在MotherDuck文档提及扩展级行为需要澄清时,才使用上游DuckLake和DuckDB扩展文档。
- 确保指导内容与产品文档的定位保持一致:
- 优先使用原生MotherDuck存储
- 上游DuckLake v1.0已具备生产就绪能力,且受DuckDB 1.5.2支持,而MotherDuck的DuckLake文档仍定义了MotherDuck的产品范围以及预览版/兼容性限制
- 完全托管、BYOB和自有计算路径是相互独立的
- 维护和压缩是显式操作,而非后台自动完成的功能
Default Posture
默认原则
- Start with native MotherDuck storage unless there is a concrete DuckLake requirement.
- Reach for DuckLake when you need open-table-format semantics, object storage as the source of truth, BYOB, or file-aware maintenance.
- Do not recommend DuckLake just because a workload is "large"; MotherDuck's docs explicitly note native storage is often faster for reads.
- Choose the operating mode deliberately: fully managed for easiest evaluation, BYOB for customer bucket ownership, own compute only when the compute boundary matters too.
- Document the fallback to native MotherDuck storage if the DuckLake requirement is weak, unverified, or only about future portability.
- For DuckLake v1.0, data inlining, sorted tables, bucket partitioning, deletion vectors, or extension behavior, verify the current MotherDuck DuckLake docs and DuckDB/DuckLake version matrix before giving syntax guarantees.
- Do not infer MotherDuck client/runtime support from upstream DuckDB release notes alone; check the MotherDuck lifecycle docs when the exact DuckDB version matters.
- Keep the MotherDuck product surface separate from raw DuckLake-extension assumptions.
- 除非有明确的DuckLake需求,否则从原生MotherDuck存储开始。
- 当你需要开放表格式语义、以对象存储作为权威数据源、BYOB或文件感知型维护时,选择DuckLake。
- 不要仅仅因为工作负载“规模大”就推荐DuckLake;MotherDuck文档明确指出,原生存储在读取速度上通常更快。
- 谨慎选择运行模式:完全托管模式便于评估,BYOB模式适用于客户拥有存储桶的场景,自有计算模式仅在计算边界至关重要时使用。
- 如果DuckLake需求较弱、未经验证或仅关乎未来可移植性,请记录回退到原生MotherDuck存储的方案。
- 对于DuckLake v1.0的功能(如数据内联、排序表、桶分区、删除向量或扩展行为),在给出语法保证前,请先验证当前MotherDuck DuckLake文档以及DuckDB/DuckLake版本矩阵。
- 不要仅从上游DuckDB发布说明推断MotherDuck客户端/运行时支持;当具体DuckDB版本很重要时,请查阅MotherDuck生命周期文档。
- 将MotherDuck产品范围与原始DuckLake扩展假设区分开。
Workflow
工作流程
- Confirm why native MotherDuck storage is insufficient.
- Pick the operating mode: fully managed, BYOB with MotherDuck compute, or BYOB with own compute.
- Verify regional and bucket constraints before proposing BYOB.
- Define the ingestion and maintenance posture up front, including data inlining, file compaction, and cleanup expectations.
- Validate who will query the data and from which compute surface before finalizing the architecture.
- 确认原生MotherDuck存储无法满足需求的原因。
- 选择运行模式:完全托管、搭配MotherDuck计算资源的BYOB,或搭配自有计算资源的BYOB。
- 在提议BYOB模式前,验证区域和存储桶限制。
- 预先定义数据摄入和维护策略,包括数据内联、文件压缩和清理预期。
- 在最终确定架构前,确认谁将查询数据以及从哪个计算层面进行查询。
Open Next
待办事项
- for the mode decision matrix, MotherDuck-specific SQL patterns, BYOB constraints, data-inlining behavior, maintenance functions, and common DuckLake mistakes
references/DUCKLAKE_PLAYBOOK.md
- 参考获取模式决策矩阵、MotherDuck专属SQL模式、BYOB限制、数据内联行为、维护函数以及常见DuckLake使用误区
references/DUCKLAKE_PLAYBOOK.md
Related Skills
相关Skill
- for choosing native DuckDB versus Postgres-endpoint access paths
motherduck-connect - when the real issue is ingestion rather than storage format
motherduck-load-data - when the user still needs analytical table design after the storage decision
motherduck-model-data - when DuckLake is just one part of a broader ingestion-to-serving workflow
motherduck-build-data-pipeline
- :用于选择原生DuckDB与Postgres端点访问路径
motherduck-connect - :当核心问题是数据摄入而非存储格式时使用
motherduck-load-data - :当用户在存储决策后仍需设计分析表时使用
motherduck-model-data - :当DuckLake只是从摄入到服务的更广泛工作流中的一部分时使用
motherduck-build-data-pipeline