motherduck-ducklake

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Use DuckLake on MotherDuck

在MotherDuck上使用DuckLake

Use this skill when the storage decision is genuinely about open table format and object-store behavior, not just about where to put another analytical table.
当存储决策真正关乎开放表格式和对象存储行为,而非仅仅是将分析表放置在何处时,可使用本Skill。

Source Of Truth

权威来源

  • Prefer current MotherDuck DuckLake docs first.
  • Use the upstream DuckLake and DuckDB extension docs only to clarify extension-level behavior that MotherDuck docs reference.
  • Keep the guidance aligned with the documented product posture:
    • native MotherDuck first
    • upstream DuckLake v1.0 is production-ready and supported by DuckDB 1.5.2, while MotherDuck's DuckLake docs still define the MotherDuck product surface and preview/compatibility limits
    • fully managed, BYOB, and own-compute paths are distinct
    • maintenance and compaction are explicit operations, not background magic
  • 优先参考最新的MotherDuck DuckLake官方文档。
  • 仅在MotherDuck文档提及扩展级行为需要澄清时,才使用上游DuckLake和DuckDB扩展文档。
  • 确保指导内容与产品文档的定位保持一致:
    • 优先使用原生MotherDuck存储
    • 上游DuckLake v1.0已具备生产就绪能力,且受DuckDB 1.5.2支持,而MotherDuck的DuckLake文档仍定义了MotherDuck的产品范围以及预览版/兼容性限制
    • 完全托管、BYOB和自有计算路径是相互独立的
    • 维护和压缩是显式操作,而非后台自动完成的功能

Default Posture

默认原则

  • Start with native MotherDuck storage unless there is a concrete DuckLake requirement.
  • Reach for DuckLake when you need open-table-format semantics, object storage as the source of truth, BYOB, or file-aware maintenance.
  • Do not recommend DuckLake just because a workload is "large"; MotherDuck's docs explicitly note native storage is often faster for reads.
  • Choose the operating mode deliberately: fully managed for easiest evaluation, BYOB for customer bucket ownership, own compute only when the compute boundary matters too.
  • Document the fallback to native MotherDuck storage if the DuckLake requirement is weak, unverified, or only about future portability.
  • For DuckLake v1.0, data inlining, sorted tables, bucket partitioning, deletion vectors, or extension behavior, verify the current MotherDuck DuckLake docs and DuckDB/DuckLake version matrix before giving syntax guarantees.
  • Do not infer MotherDuck client/runtime support from upstream DuckDB release notes alone; check the MotherDuck lifecycle docs when the exact DuckDB version matters.
  • Keep the MotherDuck product surface separate from raw DuckLake-extension assumptions.
  • 除非有明确的DuckLake需求,否则从原生MotherDuck存储开始。
  • 当你需要开放表格式语义、以对象存储作为权威数据源、BYOB或文件感知型维护时,选择DuckLake。
  • 不要仅仅因为工作负载“规模大”就推荐DuckLake;MotherDuck文档明确指出,原生存储在读取速度上通常更快。
  • 谨慎选择运行模式:完全托管模式便于评估,BYOB模式适用于客户拥有存储桶的场景,自有计算模式仅在计算边界至关重要时使用。
  • 如果DuckLake需求较弱、未经验证或仅关乎未来可移植性,请记录回退到原生MotherDuck存储的方案。
  • 对于DuckLake v1.0的功能(如数据内联、排序表、桶分区、删除向量或扩展行为),在给出语法保证前,请先验证当前MotherDuck DuckLake文档以及DuckDB/DuckLake版本矩阵。
  • 不要仅从上游DuckDB发布说明推断MotherDuck客户端/运行时支持;当具体DuckDB版本很重要时,请查阅MotherDuck生命周期文档。
  • 将MotherDuck产品范围与原始DuckLake扩展假设区分开。

Workflow

工作流程

  1. Confirm why native MotherDuck storage is insufficient.
  2. Pick the operating mode: fully managed, BYOB with MotherDuck compute, or BYOB with own compute.
  3. Verify regional and bucket constraints before proposing BYOB.
  4. Define the ingestion and maintenance posture up front, including data inlining, file compaction, and cleanup expectations.
  5. Validate who will query the data and from which compute surface before finalizing the architecture.
  1. 确认原生MotherDuck存储无法满足需求的原因。
  2. 选择运行模式:完全托管、搭配MotherDuck计算资源的BYOB,或搭配自有计算资源的BYOB。
  3. 在提议BYOB模式前,验证区域和存储桶限制。
  4. 预先定义数据摄入和维护策略,包括数据内联、文件压缩和清理预期。
  5. 在最终确定架构前,确认谁将查询数据以及从哪个计算层面进行查询。

Open Next

待办事项

  • references/DUCKLAKE_PLAYBOOK.md
    for the mode decision matrix, MotherDuck-specific SQL patterns, BYOB constraints, data-inlining behavior, maintenance functions, and common DuckLake mistakes
  • 参考
    references/DUCKLAKE_PLAYBOOK.md
    获取模式决策矩阵、MotherDuck专属SQL模式、BYOB限制、数据内联行为、维护函数以及常见DuckLake使用误区

Related Skills

相关Skill

  • motherduck-connect
    for choosing native DuckDB versus Postgres-endpoint access paths
  • motherduck-load-data
    when the real issue is ingestion rather than storage format
  • motherduck-model-data
    when the user still needs analytical table design after the storage decision
  • motherduck-build-data-pipeline
    when DuckLake is just one part of a broader ingestion-to-serving workflow
  • motherduck-connect
    :用于选择原生DuckDB与Postgres端点访问路径
  • motherduck-load-data
    :当核心问题是数据摄入而非存储格式时使用
  • motherduck-model-data
    :当用户在存储决策后仍需设计分析表时使用
  • motherduck-build-data-pipeline
    :当DuckLake只是从摄入到服务的更广泛工作流中的一部分时使用