cassandra

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Apache Cassandra

Apache Cassandra

Cassandra is a wide-column store database designed for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
Cassandra是一款宽列存储数据库,专为可扩展性和高可用性设计,同时不牺牲性能。其线性可扩展性以及在通用硬件或云基础设施上经证实的容错能力,使其成为承载关键业务数据的理想平台。

When to Use

适用场景

  • High Write Throughput: Ingests millions of writes per second.
  • Always On: Zero single points of failure. Updates can happen even if nodes are down (Eventual Consistency).
  • Multi-Region: Active-Active multi-region replication is built-in.
  • 高写入吞吐量:每秒可处理数百万次写入操作。
  • 始终在线:无单点故障。即使部分节点宕机,仍可进行更新操作(最终一致性)。
  • 多区域部署:内置支持多区域主动-主动复制。

Quick Start (CQL)

快速开始(CQL)

sql
CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  name text,
  email text
);

INSERT INTO users (user_id, name) VALUES (uuid(), 'Alice');
sql
CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  name text,
  email text
);

INSERT INTO users (user_id, name) VALUES (uuid(), 'Alice');

Core Concepts

核心概念

Partition Key & Clustering Key

分区键(Partition Key)与聚类键(Clustering Key)

  • Partition Key: Determines which node holds the data.
  • Clustering Key: Sorts data within the partition on disk.
  • Partition Key:决定数据存储在哪个节点上。
  • Clustering Key:在磁盘上对分区内的数据进行排序。

Tunable Consistency

可调一致性

You choose consistency level per query.
  • ANY
    : Fastest, least specific.
  • QUORUM
    : Majority must acknowledge. Balanced.
  • ALL
    : Slowest, safest.
可针对每个查询选择一致性级别:
  • ANY
    :速度最快,一致性最弱。
  • QUORUM
    :需要多数节点确认,平衡性能与一致性。
  • ALL
    :速度最慢,一致性最强、最安全。

Vector Search (5.0+)

向量搜索(5.0+版本)

Native support for Vector Search (ANN) allows using Cassandra as a Vector DB for AI apps.
原生支持向量搜索(ANN),可将Cassandra用作AI应用的向量数据库。

Best Practices (2025)

2025年最佳实践

Do:
  • Query by Partition Key: Always. Scans are prohibited in production.
  • Use SAI (Storage Attached Indexes): New in 5.0. Better than old secondary indexes.
  • Denormalize: Optimize schema for Reads. It is okay to duplicate data into 3 tables to satisfy 3 different query patterns.
Don't:
  • Don't use distributed joins: Cassandra doesn't do joins. Join in the app.
  • Don't use large partitions: Keep partitions under 100MB to avoid compaction issues.
建议
  • 按分区键查询:务必遵循此原则,生产环境中禁止使用扫描操作。
  • 使用SAI(存储附加索引):5.0版本新增功能,优于旧版二级索引。
  • 反规范化设计:针对读取操作优化 schema。为满足3种不同查询模式,将数据复制到3张表是可行的做法。
禁忌
  • 不要使用分布式连接:Cassandra不支持连接操作,需在应用层实现连接。
  • 不要使用大分区:分区大小需控制在100MB以下,以避免压缩问题。

References

参考资料