cassandra

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Apache Cassandra

Cassandra is a wide-column store database designed for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.

Cassandra是一款宽列存储数据库，专为可扩展性和高可用性设计，同时不牺牲性能。其线性可扩展性以及在通用硬件或云基础设施上经证实的容错能力，使其成为承载关键业务数据的理想平台。

When to Use

适用场景

High Write Throughput: Ingests millions of writes per second.
Always On: Zero single points of failure. Updates can happen even if nodes are down (Eventual Consistency).
Multi-Region: Active-Active multi-region replication is built-in.

高写入吞吐量：每秒可处理数百万次写入操作。
始终在线：无单点故障。即使部分节点宕机，仍可进行更新操作（最终一致性）。
多区域部署：内置支持多区域主动-主动复制。

Quick Start (CQL)

快速开始（CQL）

sql

CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  name text,
  email text
);

INSERT INTO users (user_id, name) VALUES (uuid(), 'Alice');

sql

CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  name text,
  email text
);

INSERT INTO users (user_id, name) VALUES (uuid(), 'Alice');

Core Concepts

核心概念

Partition Key & Clustering Key

分区键（Partition Key）与聚类键（Clustering Key）

Partition Key: Determines which node holds the data.
Clustering Key: Sorts data within the partition on disk.

Partition Key：决定数据存储在哪个节点上。
Clustering Key：在磁盘上对分区内的数据进行排序。

Tunable Consistency

可调一致性

You choose consistency level per query.

```
ANY
```
: Fastest, least specific.
```
QUORUM
```
: Majority must acknowledge. Balanced.
```
ALL
```
: Slowest, safest.

可针对每个查询选择一致性级别：

```
ANY
```
：速度最快，一致性最弱。
```
QUORUM
```
：需要多数节点确认，平衡性能与一致性。
```
ALL
```
：速度最慢，一致性最强、最安全。

Vector Search (5.0+)

向量搜索（5.0+版本）

Native support for Vector Search (ANN) allows using Cassandra as a Vector DB for AI apps.

原生支持向量搜索（ANN），可将Cassandra用作AI应用的向量数据库。

Best Practices (2025)

2025年最佳实践

Do:

Query by Partition Key: Always. Scans are prohibited in production.
Use SAI (Storage Attached Indexes): New in 5.0. Better than old secondary indexes.
Denormalize: Optimize schema for Reads. It is okay to duplicate data into 3 tables to satisfy 3 different query patterns.

Don't:

Don't use distributed joins: Cassandra doesn't do joins. Join in the app.
Don't use large partitions: Keep partitions under 100MB to avoid compaction issues.

建议：

按分区键查询：务必遵循此原则，生产环境中禁止使用扫描操作。
使用SAI（存储附加索引）：5.0版本新增功能，优于旧版二级索引。
反规范化设计：针对读取操作优化 schema。为满足3种不同查询模式，将数据复制到3张表是可行的做法。

禁忌：

不要使用分布式连接：Cassandra不支持连接操作，需在应用层实现连接。
不要使用大分区：分区大小需控制在100MB以下，以避免压缩问题。

References

参考资料

Apache Cassandra Documentation

Apache Cassandra 官方文档