clickhouse-best-practices
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseClickHouse Best Practices
ClickHouse最佳实践
Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.
Official docs: ClickHouse Best Practices
这是一份针对ClickHouse的全面指南,涵盖表结构设计、查询优化和数据导入。包含3个主要类别(表结构、查询、插入)的28条规则,按影响优先级排序。
官方文档: ClickHouse最佳实践
IMPORTANT: How to Apply This Skill
重要提示:如何使用此技能
Before answering ClickHouse questions, follow this priority order:
- Check for applicable rules in the directory
rules/ - If rules exist: Apply them and cite them in your response using "Per ..."
rule-name - If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
- If uncertain: Use web search for current best practices
- Always cite your source: rule name, "general ClickHouse guidance", or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.
在回答ClickHouse相关问题前,请遵循以下优先级顺序:
- 检查目录中的适用规则
rules/ - 若存在对应规则: 应用规则并在回复中使用「根据...」的格式引用
规则名称 - 若不存在对应规则: 使用大语言模型的ClickHouse相关知识或搜索官方文档
- 若不确定: 搜索当前的最佳实践
- 始终注明来源: 规则名称、「通用ClickHouse指南」或具体URL
规则优先级更高的原因: ClickHouse具有特定的特性(列式存储、稀疏索引、合并树机制),通用数据库的直觉可能会产生误导。这些规则整合了经过验证的、针对ClickHouse的专属指导。
For Formal Reviews
正式审核流程
When performing a formal review of schemas, queries, or data ingestion:
当对表结构、查询语句或数据导入进行正式审核时:
Review Procedures
审核步骤
For Schema Reviews (CREATE TABLE, ALTER TABLE)
表结构审核(CREATE TABLE、ALTER TABLE)
Read these rule files in order:
- - ORDER BY is immutable
rules/schema-pk-plan-before-creation.md - - Column ordering in keys
rules/schema-pk-cardinality-order.md - - Filter column inclusion
rules/schema-pk-prioritize-filters.md - - Proper type selection
rules/schema-types-native-types.md - - Numeric type sizing
rules/schema-types-minimize-bitwidth.md - - LowCardinality usage
rules/schema-types-lowcardinality.md - - Nullable vs DEFAULT
rules/schema-types-avoid-nullable.md - - Partition count limits
rules/schema-partition-low-cardinality.md - - Partitioning purpose
rules/schema-partition-lifecycle.md
Check for:
- PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
- Data types match actual data ranges
- LowCardinality applied to appropriate string columns
- Partition key cardinality bounded (100-1,000 values)
- ReplacingMergeTree has version column if used
请按顺序阅读以下规则文件:
- - ORDER BY子句不可变更
rules/schema-pk-plan-before-creation.md - - 键中的列顺序
rules/schema-pk-cardinality-order.md - - 筛选列的纳入原则
rules/schema-pk-prioritize-filters.md - - 正确选择数据类型
rules/schema-types-native-types.md - - 数值类型的大小优化
rules/schema-types-minimize-bitwidth.md - - LowCardinality的使用
rules/schema-types-lowcardinality.md - - Nullable与DEFAULT的选择
rules/schema-types-avoid-nullable.md - - 分区数量限制
rules/schema-partition-low-cardinality.md - - 分区的用途
rules/schema-partition-lifecycle.md
检查要点:
- PRIMARY KEY / ORDER BY的列顺序(从低基数到高基数)
- 数据类型与实际数据范围匹配
- 对合适的字符串列应用LowCardinality类型
- 分区键的基数控制在100-1000个值之间
- 若使用ReplacingMergeTree引擎,需包含版本列
For Query Reviews (SELECT, JOIN, aggregations)
查询审核(SELECT、JOIN、聚合操作)
Read these rule files:
- - Algorithm selection
rules/query-join-choose-algorithm.md - - Pre-join filtering
rules/query-join-filter-before.md - - ANY vs regular JOIN
rules/query-join-use-any.md - - Secondary index usage
rules/query-index-skipping-indices.md - - Filter alignment with ORDER BY
rules/schema-pk-filter-on-orderby.md
Check for:
- Filters use ORDER BY prefix columns
- JOINs filter tables before joining (not after)
- Correct JOIN algorithm for table sizes
- Skipping indices for non-ORDER BY filter columns
请阅读以下规则文件:
- - 算法选择
rules/query-join-choose-algorithm.md - - JOIN前筛选
rules/query-join-filter-before.md - - ANY JOIN与常规JOIN的选择
rules/query-join-use-any.md - - 二级索引的使用
rules/query-index-skipping-indices.md - - 筛选条件与ORDER BY的对齐
rules/schema-pk-filter-on-orderby.md
检查要点:
- 筛选条件使用ORDER BY的前缀列
- 在JOIN前对表进行筛选(而非之后)
- 根据表大小选择正确的JOIN算法
- 为非ORDER BY的筛选列设置跳数索引
For Insert Strategy Reviews (data ingestion, updates, deletes)
插入策略审核(数据导入、更新、删除)
Read these rule files:
- - Batch sizing requirements
rules/insert-batch-size.md - - UPDATE alternatives
rules/insert-mutation-avoid-update.md - - DELETE alternatives
rules/insert-mutation-avoid-delete.md - - Async insert usage
rules/insert-async-small-batches.md - - OPTIMIZE TABLE risks
rules/insert-optimize-avoid-final.md
Check for:
- Batch size 10K-100K rows per INSERT
- No ALTER TABLE UPDATE for frequent changes
- ReplacingMergeTree or CollapsingMergeTree for update patterns
- Async inserts enabled for high-frequency small batches
请阅读以下规则文件:
- - 批量大小要求
rules/insert-batch-size.md - - UPDATE的替代方案
rules/insert-mutation-avoid-update.md - - DELETE的替代方案
rules/insert-mutation-avoid-delete.md - - 异步插入的使用
rules/insert-async-small-batches.md - - OPTIMIZE TABLE的风险
rules/insert-optimize-avoid-final.md
检查要点:
- 每个INSERT语句的批量大小为10000-100000行
- 避免对频繁变更的场景使用ALTER TABLE UPDATE
- 对更新场景使用ReplacingMergeTree或CollapsingMergeTree
- 对高频小批量场景启用异步插入
Output Format
回复格式
Structure your response as follows:
undefined请按照以下结构组织回复:
undefinedRules Checked
已检查规则
- - Compliant / Violation found
rule-name-1 - - Compliant / Violation found ...
rule-name-2
- - 符合要求 / 发现违规
rule-name-1 - - 符合要求 / 发现违规 ...
rule-name-2
Findings
审核结果
Violations
违规项
- : Description of the issue
rule-name- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
- : 问题描述
rule-name- 当前情况:[代码的实际行为]
- 要求:[应有的行为]
- 修复方案:[具体修正措施]
Compliant
合规项
- : Brief note on why it's correct
rule-name
- : 符合要求的简要说明
rule-name
Recommendations
建议
[Prioritized list of changes, citing rules]
---[按优先级排序的变更列表,需引用对应规则]
---Rule Categories by Priority
规则类别优先级
| Priority | Category | Impact | Prefix | Rule Count |
|---|---|---|---|---|
| 1 | Primary Key Selection | CRITICAL | | 4 |
| 2 | Data Type Selection | CRITICAL | | 5 |
| 3 | JOIN Optimization | CRITICAL | | 5 |
| 4 | Insert Batching | CRITICAL | | 1 |
| 5 | Mutation Avoidance | CRITICAL | | 2 |
| 6 | Partitioning Strategy | HIGH | | 4 |
| 7 | Skipping Indices | HIGH | | 1 |
| 8 | Materialized Views | HIGH | | 2 |
| 9 | Async Inserts | HIGH | | 2 |
| 10 | OPTIMIZE Avoidance | HIGH | | 1 |
| 11 | JSON Usage | MEDIUM | | 1 |
| 优先级 | 类别 | 影响级别 | 前缀 | 规则数量 |
|---|---|---|---|---|
| 1 | 主键选择 | CRITICAL | | 4 |
| 2 | 数据类型选择 | CRITICAL | | 5 |
| 3 | JOIN优化 | CRITICAL | | 5 |
| 4 | 插入批量处理 | CRITICAL | | 1 |
| 5 | 避免变更操作 | CRITICAL | | 2 |
| 6 | 分区策略 | HIGH | | 4 |
| 7 | 跳数索引 | HIGH | | 1 |
| 8 | 物化视图 | HIGH | | 2 |
| 9 | 异步插入 | HIGH | | 2 |
| 10 | 避免OPTIMIZE操作 | HIGH | | 1 |
| 11 | JSON使用 | MEDIUM | | 1 |
Quick Reference
快速参考
Schema Design - Primary Key (CRITICAL)
表结构设计 - 主键(CRITICAL)
- - Plan ORDER BY before table creation (immutable)
schema-pk-plan-before-creation - - Order columns low-to-high cardinality
schema-pk-cardinality-order - - Include frequently filtered columns
schema-pk-prioritize-filters - - Query filters must use ORDER BY prefix
schema-pk-filter-on-orderby
- - 创建表前规划ORDER BY子句(不可变更)
schema-pk-plan-before-creation - - 列顺序按从低基数到高基数排列
schema-pk-cardinality-order - - 纳入频繁用于筛选的列
schema-pk-prioritize-filters - - 查询筛选条件必须使用ORDER BY的前缀列
schema-pk-filter-on-orderby
Schema Design - Data Types (CRITICAL)
表结构设计 - 数据类型(CRITICAL)
- - Use native types, not String for everything
schema-types-native-types - - Use smallest numeric type that fits
schema-types-minimize-bitwidth - - LowCardinality for <10K unique strings
schema-types-lowcardinality - - Enum for finite value sets with validation
schema-types-enum - - Avoid Nullable; use DEFAULT instead
schema-types-avoid-nullable
- - 使用原生类型,避免所有列都用String类型
schema-types-native-types - - 使用能容纳数据的最小数值类型
schema-types-minimize-bitwidth - - 对唯一值少于10000的字符串列使用LowCardinality
schema-types-lowcardinality - - 对有限值集合使用Enum类型并启用验证
schema-types-enum - - 避免使用Nullable;改用DEFAULT
schema-types-avoid-nullable
Schema Design - Partitioning (HIGH)
表结构设计 - 分区(HIGH)
- - Keep partition count 100-1,000
schema-partition-low-cardinality - - Use partitioning for data lifecycle, not queries
schema-partition-lifecycle - - Understand partition pruning trade-offs
schema-partition-query-tradeoffs - - Consider starting without partitioning
schema-partition-start-without
- - 分区数量保持在100-1000之间
schema-partition-low-cardinality - - 分区用于数据生命周期管理,而非查询优化
schema-partition-lifecycle - - 了解分区修剪的权衡
schema-partition-query-tradeoffs - - 考虑初始不使用分区
schema-partition-start-without
Schema Design - JSON (MEDIUM)
表结构设计 - JSON(MEDIUM)
- - JSON for dynamic schemas; typed columns for known
schema-json-when-to-use
- - 动态表结构使用JSON;已知结构使用类型化列
schema-json-when-to-use
Query Optimization - JOINs (CRITICAL)
查询优化 - JOIN(CRITICAL)
- - Select algorithm based on table sizes
query-join-choose-algorithm - - ANY JOIN when only one match needed
query-join-use-any - - Filter tables before joining
query-join-filter-before - - Dictionaries/denormalization vs JOIN
query-join-consider-alternatives - - join_use_nulls=0 for default values
query-join-null-handling
- - 根据表大小选择JOIN算法
query-join-choose-algorithm - - 仅需单个匹配时使用ANY JOIN
query-join-use-any - - JOIN前对表进行筛选
query-join-filter-before - - 考虑使用字典/反规范化替代JOIN
query-join-consider-alternatives - - 启用join_use_nulls=0以使用默认值
query-join-null-handling
Query Optimization - Indices (HIGH)
查询优化 - 索引(HIGH)
- - Skipping indices for non-ORDER BY filters
query-index-skipping-indices
- - 为非ORDER BY的筛选列设置跳数索引
query-index-skipping-indices
Query Optimization - Materialized Views (HIGH)
查询优化 - 物化视图(HIGH)
- - Incremental MVs for real-time aggregations
query-mv-incremental - - Refreshable MVs for complex joins
query-mv-refreshable
- - 实时聚合使用增量物化视图
query-mv-incremental - - 复杂JOIN使用可刷新物化视图
query-mv-refreshable
Insert Strategy - Batching (CRITICAL)
插入策略 - 批量处理(CRITICAL)
- - Batch 10K-100K rows per INSERT
insert-batch-size
- - 每个INSERT语句批量处理10000-100000行
insert-batch-size
Insert Strategy - Async (HIGH)
插入策略 - 异步(HIGH)
- - Async inserts for high-frequency small batches
insert-async-small-batches - - Native format for best performance
insert-format-native
- - 高频小批量场景使用异步插入
insert-async-small-batches - - 使用原生格式以获得最佳性能
insert-format-native
Insert Strategy - Mutations (CRITICAL)
插入策略 - 变更操作(CRITICAL)
- - ReplacingMergeTree instead of ALTER UPDATE
insert-mutation-avoid-update - - Lightweight DELETE or DROP PARTITION
insert-mutation-avoid-delete
- - 使用ReplacingMergeTree替代ALTER UPDATE
insert-mutation-avoid-update - - 使用轻量DELETE或DROP PARTITION
insert-mutation-avoid-delete
Insert Strategy - Optimization (HIGH)
插入策略 - 优化(HIGH)
- - Let background merges work
insert-optimize-avoid-final
- - 让后台合并进程自动处理
insert-optimize-avoid-final
When to Apply
适用场景
This skill activates when you encounter:
- statements
CREATE TABLE - modifications
ALTER TABLE - or
ORDER BYdiscussionsPRIMARY KEY - Data type selection questions
- Slow query troubleshooting
- JOIN optimization requests
- Data ingestion pipeline design
- Update/delete strategy questions
- ReplacingMergeTree or other specialized engine usage
- Partitioning strategy decisions
当遇到以下场景时,激活此技能:
- 语句
CREATE TABLE - 修改操作
ALTER TABLE - ORDER BY或PRIMARY KEY相关讨论
- 数据类型选择问题
- 慢查询排查
- JOIN优化请求
- 数据导入管道设计
- 更新/删除策略问题
- ReplacingMergeTree或其他专用引擎的使用
- 分区策略决策
Rule File Structure
规则文件结构
Each rule file in contains:
rules/- YAML frontmatter: title, impact level, tags
- Brief explanation: Why this rule matters
- Incorrect example: Anti-pattern with explanation
- Correct example: Best practice with explanation
- Additional context: Trade-offs, when to apply, references
rules/- YAML前置元数据:标题、影响级别、标签
- 简要说明:规则的重要性
- 错误示例:反模式及说明
- 正确示例:最佳实践及说明
- 附加上下文:权衡、适用场景、参考资料
Full Compiled Document
完整编译文档
For the complete guide with all rules expanded inline:
AGENTS.mdUse when you need to check multiple rules quickly without reading individual files.
AGENTS.md如需查看包含所有展开规则的完整指南,请参考:
AGENTS.md当需要快速检查多个规则而无需阅读单个文件时,使用。
AGENTS.md