memory-benchmark

Original：🇺🇸 English

Translated

How to benchmark and analyze memory usage in Turso using the memory-benchmark crate and dhat heap profiler. Use this skill whenever the user mentions memory usage, memory profiling, allocation tracking, heap analysis, memory regression, memory benchmarking, dhat, or wants to understand where memory is being allocated during SQL workloads. Also use when investigating memory growth in WAL or MVCC mode. IMPORTANT - If you modify the perf/memory crate (add profiles, change CLI flags, change output format, etc.), update this skill document to reflect those changes so it stays accurate for future agents.

3installs

Sourcetursodatabase/turso

Added on2026-04-18

NPX Install

npx skill4agent add tursodatabase/turso memory-benchmark

SKILL.md Content

View Translation Comparison →

Memory Benchmarking & Analysis

The

perf/memory

crate benchmarks memory usage of SQL workloads under WAL and MVCC journal modes. It uses

dhat

as the global allocator to track every heap allocation, and

memory-stats

for process-level RSS snapshots.

Location

Benchmark crate:
```
perf/memory/
```
Analysis script:
```
perf/memory/analyze-dhat.py
```
dhat output:
```
dhat-heap.json
```
(written to CWD after each run)

Running Benchmarks

Always run in release mode — debug builds have wildly different allocation patterns and the results are not representative of real-world usage.

bash

# Basic: single connection, WAL mode, insert-heavy workload
cargo run --release -p memory-benchmark -- --mode wal --workload insert-heavy -i 100 -b 100

# MVCC with concurrent connections
cargo run --release -p memory-benchmark -- --mode mvcc --workload mixed -i 100 -b 100 --connections 4

# All CLI options
cargo run --release -p memory-benchmark -- \
  --mode wal|mvcc \
  --workload insert-heavy|read-heavy|mixed|scan-heavy \
  -i <iterations> \
  -b <batch-size> \
  --connections <N> \
  --timeout <ms> \
  --cache-size <pages> \
  --format human|json|csv

Every run produces a

dhat-heap.json

in the current directory. This file contains per-allocation-site data for the entire run.

Built-in Workload Profiles

Profile	Description	Setup
`insert-heavy`	100% INSERT statements	Creates table
`read-heavy`	90% SELECT by id / 10% INSERT	Seeds 10k rows
`mixed`	50% SELECT / 50% INSERT	Seeds 10k rows
`scan-heavy`	Full table scans with LIKE	Seeds 10k rows

Profiles implement the

Profile

trait in

perf/memory/src/profile/

. To add a new workload, create a new file implementing the trait and wire it into the

WorkloadProfile

enum in

main.rs

.

Understanding the Output

The benchmark reports three categories of metrics:

RSS (process-level)

Measured via

memory-stats

crate. Includes everything: heap, mmap'd files (WAL, DB pages pulled into OS page cache), tokio runtime, etc. Snapshots are taken at phase transitions (setup -> run) and after each batch.

Baseline: RSS before any DB work (runtime overhead)
Peak: Highest RSS observed during the run
Net growth: Final RSS minus baseline — the memory attributable to the workload

Heap (dhat)

Precise allocation tracking via the

dhat

global allocator. Only counts explicit heap allocations (malloc/alloc), not mmap.

Current: Bytes still allocated at measurement time
Peak: Highest simultaneous live allocation during the entire run
Total allocs: Number of individual allocation calls
Total bytes: Cumulative bytes allocated (includes freed memory) — measures allocation pressure

Disk

File sizes after the benchmark completes:

DB file: The
```
.db
```
file
WAL file: The
```
.db-wal
```
file (WAL mode only)
Log file: The
```
.db-log
```
file (MVCC logical log only)

Analyzing dhat Output

After running a benchmark, use the analysis script to produce a readable report from

dhat-heap.json

:

bash

# Overview: top allocation sites by bytes live at global peak
python3 perf/memory/analyze-dhat.py dhat-heap.json --top 15 --modules

# Focus on a specific subsystem
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter mvcc --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter btree --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter page_cache --stacks

# Sort by different metrics
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb  # bytes at exit (leaks)
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by tb  # total bytes (pressure)
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by mb  # max live bytes per site

# JSON output for programmatic use
python3 perf/memory/analyze-dhat.py dhat-heap.json --json

Sort Metrics

Flag	Metric	Use when
`gb`	Bytes live at global peak (default)	Finding what dominates memory at the high-water mark
`eb`	Bytes live at exit	Finding memory leaks or things that never get freed
`tb`	Total bytes allocated	Finding allocation pressure hotspots (GC churn)
`mb`	Max bytes live per site	Finding per-site high-water marks
`tbk`	Total allocation count	Finding chatty allocators (many small allocs)

Analysis Flags

```
--top N
```
— Show top N sites (default 15)
```
--filter PATTERN
```
— Filter to sites/stacks containing substring (e.g.
```
mvcc
```
,
```
btree
```
,
```
wal
```
,
```
pager
```
)
```
--stacks
```
— Show full callstacks for top allocation sites
```
--modules
```
— Aggregate by crate/module for a high-level breakdown
```
--json
```
— Machine-readable aggregated output

Typical Workflow

When investigating memory usage or a suspected regression:

Run the benchmark with parameters matching the scenario:

bash

cargo run -p memory-benchmark -- --mode mvcc --workload mixed -i 500 -b 100 --connections 4

Get the high-level picture — which modules use the most memory:

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --modules --top 20

Drill into the hot module — e.g. if

turso_core

dominates:

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --filter turso_core --stacks --top 10

Check for leaks — anything still alive at exit that shouldn't be:

bash

python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb --top 10

Compare modes — run the same workload under WAL and MVCC and compare the reports to see the memory cost of MVCC versioning.

Concurrency Details

When

--connections > 1

:

Setup phase (schema creation, seeding) always runs on a single connection sequentially
Run phase spawns one tokio task per connection, each executing its batch concurrently
Each connection gets
```
busy_timeout
```
set (default 30s, configurable via
```
--timeout
```
)
WAL mode uses
```
BEGIN
```
, MVCC uses
```
BEGIN CONCURRENT
```
The
```
Profile
```
trait's
```
next_batch(connections)
```
returns one batch per connection with non-overlapping row IDs

Adding a New Profile

Create

perf/memory/src/profile/your_profile.rs

implementing the

Profile

trait

Add

pub mod your_profile;

to

perf/memory/src/profile/mod.rs

Add a variant to
```
WorkloadProfile
```
enum in
```
main.rs
```
Wire it into
```
create_profile()
```
in
```
main.rs
```

The

Profile

trait:

rust

pub trait Profile {
    fn name(&self) -> &str;
    fn next_batch(&mut self, connections: usize) -> (Phase, Vec<Vec<WorkItem>>);
}

Return

Phase::Setup

for schema/seeding (single batch),

Phase::Run

for measured work (one batch per connection),

Phase::Done

when finished.

Keeping This Skill Up to Date

This skill document is the source of truth for how agents use the memory benchmark tooling. If you modify the

perf/memory

crate — adding profiles, changing CLI flags, altering output format, updating the analysis script, changing the

Profile

trait, etc. — update this SKILL.md to match. Specifically:

New CLI flags: add to the "Running Benchmarks" section
New profiles: add to the "Built-in Workload Profiles" table
Changed output metrics: update the "Understanding the Output" section
New analyze-dhat.py flags or sort metrics: update the "Analyzing dhat Output" section
Changed
```
Profile
```
trait signature: update "Adding a New Profile"

Future agents rely on this document being accurate. Stale instructions cause wasted work.

memory-benchmark

NPX Install

Tags

SKILL.md Content

Memory Benchmarking & Analysis

Location

Running Benchmarks

Built-in Workload Profiles

Understanding the Output

RSS (process-level)

Heap (dhat)

Disk

Analyzing dhat Output

Sort Metrics

Analysis Flags

Typical Workflow

Concurrency Details

Adding a New Profile

Keeping This Skill Up to Date