mvcc

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MVCC Guide (Experimental)

MVCC指南(实验性)

Multi-Version Concurrency Control. Work in progress, not production-ready.
CRITICAL: Ignore MVCC when debugging unless the bug is MVCC-specific.
多版本并发控制(Multi-Version Concurrency Control,MVCC)。功能仍在开发中,暂不适合生产环境。
重要提示:调试时请忽略MVCC,除非问题是MVCC特有的。

Enabling MVCC

启用MVCC

sql
PRAGMA journal_mode = 'experimental_mvcc';
Runtime configuration, not a compile-time feature flag. Per-database setting.
sql
PRAGMA journal_mode = 'experimental_mvcc';
这是运行时配置,而非编译时特性标志,是针对单个数据库的设置。

How It Works

工作原理

Standard WAL: single version per page, readers see snapshot at read mark time.
MVCC: multiple row versions, snapshot isolation. Each transaction sees consistent snapshot at begin time.
标准WAL(Write-Ahead Logging):每个页面仅保留一个版本,读取者看到的是读取标记时刻的快照。
MVCC:支持多行版本,实现快照隔离。每个事务在启动时看到一致的快照。

Key Differences from WAL

与WAL的主要区别

AspectWALMVCC
Write granularityEvery commit writes full pagesAffected rows only
Readers/WritersDon't block each otherDon't block each other
Persistence
.db-wal
.db-log
(logical log)
IsolationSnapshot (page-level)Snapshot (row-level)
方面WALMVCC
写入粒度每次提交写入完整页面仅写入受影响的行
读写阻塞互不阻塞互不阻塞
持久化文件
.db-wal
.db-log
(逻辑日志)
隔离级别快照(页面级)快照(行级)

Versioning

版本控制

Each row version tracks:
  • begin
    - timestamp when visible
  • end
    - timestamp when deleted/replaced
  • btree_resident
    - existed before MVCC enabled
每个行版本跟踪以下信息:
  • begin
    - 版本可见的时间戳
  • end
    - 版本被删除/替换的时间戳
  • btree_resident
    - 表示该版本在MVCC启用前已存在

Architecture

架构

Database
  └─ mv_store: MvStore
      ├─ rows: SkipMap<RowID, Vec<RowVersion>>
      ├─ txs: SkipMap<TxID, Transaction>
      ├─ Storage (.db-log file)
      └─ CheckpointStateMachine
Per-connection:
mv_tx
tracks current MVCC transaction.
Shared:
MvStore
with lock-free
crossbeam_skiplist
structures.
Database
  └─ mv_store: MvStore
      ├─ rows: SkipMap<RowID, Vec<RowVersion>>
      ├─ txs: SkipMap<TxID, Transaction>
      ├─ Storage (.db-log file)
      └─ CheckpointStateMachine
每个连接
mv_tx
跟踪当前的MVCC事务。
共享组件
MvStore
采用无锁的
crossbeam_skiplist
结构。

Key Files

关键文件

  • core/mvcc/mod.rs
    - Module overview
  • core/mvcc/database/mod.rs
    - Main implementation (~3000 lines)
  • core/mvcc/cursor.rs
    - Merged MVCC + B-tree cursor
  • core/mvcc/persistent_storage/logical_log.rs
    - Disk format
  • core/mvcc/database/checkpoint_state_machine.rs
    - Checkpoint logic
  • core/mvcc/mod.rs
    - 模块概述
  • core/mvcc/database/mod.rs
    - 主要实现(约3000行代码)
  • core/mvcc/cursor.rs
    - 合并MVCC与B树的游标
  • core/mvcc/persistent_storage/logical_log.rs
    - 磁盘格式定义
  • core/mvcc/database/checkpoint_state_machine.rs
    - 检查点逻辑

Checkpointing

检查点机制

Flushes row versions to B-tree periodically.
sql
PRAGMA mvcc_checkpoint_threshold = <pages>;
Process: acquire lock → begin pager txn → write rows → commit → truncate log → fsync → release.
定期将行版本刷新到B树中。
sql
PRAGMA mvcc_checkpoint_threshold = <pages>;
流程:获取锁 → 启动页面事务 → 写入行数据 → 提交 → 截断日志 → 执行fsync → 释放锁。

Current Limitations

当前局限性

Not implemented:
  • Garbage collection (old versions accumulate)
  • Recovery from logical log on restart
Known issues:
  • Checkpoint blocks other transactions, even reads!
  • No spilling to disk; memory use concerns
未实现功能
  • 垃圾回收(旧版本会不断累积)
  • 重启时从逻辑日志恢复
已知问题
  • 检查点会阻塞其他事务,包括读取操作!
  • 不支持磁盘溢出,存在内存占用过高的问题

Testing

测试

bash
undefined
bash
undefined

Run MVCC-specific tests

Run MVCC-specific tests

cargo test mvcc
cargo test mvcc

TCL tests with MVCC

TCL tests with MVCC

make test-mvcc

Use `#[turso_macros::test(mvcc)]` attribute for MVCC-enabled tests.

```rust
#[turso_macros::test(mvcc)]
fn test_something() {
    // runs with MVCC enabled
}
make test-mvcc

使用`#[turso_macros::test(mvcc)]`属性来启用MVCC测试。

```rust
#[turso_macros::test(mvcc)]
fn test_something() {
    // runs with MVCC enabled
}

References

参考资料

  • core/mvcc/mod.rs
    documents data anomalies (dirty reads, lost updates, etc.)
  • Snapshot isolation vs serializability: MVCC provides the former, not the latter
  • core/mvcc/mod.rs
    中记录了数据异常情况(脏读、更新丢失等)
  • 快照隔离与可串行化:MVCC仅提供前者,不支持后者