memory-benchmark
Original:🇺🇸 English
Translated
How to benchmark and analyze memory usage in Turso using the memory-benchmark crate and dhat heap profiler. Use this skill whenever the user mentions memory usage, memory profiling, allocation tracking, heap analysis, memory regression, memory benchmarking, dhat, or wants to understand where memory is being allocated during SQL workloads. Also use when investigating memory growth in WAL or MVCC mode. IMPORTANT - If you modify the perf/memory crate (add profiles, change CLI flags, change output format, etc.), update this skill document to reflect those changes so it stays accurate for future agents.
1installs
Sourcetursodatabase/turso
Added on
NPX Install
npx skill4agent add tursodatabase/turso memory-benchmarkTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Memory Benchmarking & Analysis
The crate benchmarks memory usage of SQL workloads under WAL and MVCC journal modes. It uses as the global allocator to track every heap allocation, and for process-level RSS snapshots.
perf/memorydhatmemory-statsLocation
- Benchmark crate:
perf/memory/ - Analysis script:
perf/memory/analyze-dhat.py - dhat output: (written to CWD after each run)
dhat-heap.json
Running Benchmarks
Always run in release mode — debug builds have wildly different allocation patterns and the results are not representative of real-world usage.
bash
# Basic: single connection, WAL mode, insert-heavy workload
cargo run --release -p memory-benchmark -- --mode wal --workload insert-heavy -i 100 -b 100
# MVCC with concurrent connections
cargo run --release -p memory-benchmark -- --mode mvcc --workload mixed -i 100 -b 100 --connections 4
# All CLI options
cargo run --release -p memory-benchmark -- \
--mode wal|mvcc \
--workload insert-heavy|read-heavy|mixed|scan-heavy \
-i <iterations> \
-b <batch-size> \
--connections <N> \
--timeout <ms> \
--cache-size <pages> \
--format human|json|csvEvery run produces a in the current directory. This file contains per-allocation-site data for the entire run.
dhat-heap.jsonBuilt-in Workload Profiles
| Profile | Description | Setup |
|---|---|---|
| 100% INSERT statements | Creates table |
| 90% SELECT by id / 10% INSERT | Seeds 10k rows |
| 50% SELECT / 50% INSERT | Seeds 10k rows |
| Full table scans with LIKE | Seeds 10k rows |
Profiles implement the trait in . To add a new workload, create a new file implementing the trait and wire it into the enum in .
Profileperf/memory/src/profile/WorkloadProfilemain.rsUnderstanding the Output
The benchmark reports three categories of metrics:
RSS (process-level)
Measured via crate. Includes everything: heap, mmap'd files (WAL, DB pages pulled into OS page cache), tokio runtime, etc. Snapshots are taken at phase transitions (setup -> run) and after each batch.
memory-stats- Baseline: RSS before any DB work (runtime overhead)
- Peak: Highest RSS observed during the run
- Net growth: Final RSS minus baseline — the memory attributable to the workload
Heap (dhat)
Precise allocation tracking via the global allocator. Only counts explicit heap allocations (malloc/alloc), not mmap.
dhat- Current: Bytes still allocated at measurement time
- Peak: Highest simultaneous live allocation during the entire run
- Total allocs: Number of individual allocation calls
- Total bytes: Cumulative bytes allocated (includes freed memory) — measures allocation pressure
Disk
File sizes after the benchmark completes:
- DB file: The file
.db - WAL file: The file (WAL mode only)
.db-wal - Log file: The file (MVCC logical log only)
.db-log
Analyzing dhat Output
After running a benchmark, use the analysis script to produce a readable report from :
dhat-heap.jsonbash
# Overview: top allocation sites by bytes live at global peak
python3 perf/memory/analyze-dhat.py dhat-heap.json --top 15 --modules
# Focus on a specific subsystem
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter mvcc --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter btree --stacks
python3 perf/memory/analyze-dhat.py dhat-heap.json --filter page_cache --stacks
# Sort by different metrics
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb # bytes at exit (leaks)
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by tb # total bytes (pressure)
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by mb # max live bytes per site
# JSON output for programmatic use
python3 perf/memory/analyze-dhat.py dhat-heap.json --jsonSort Metrics
| Flag | Metric | Use when |
|---|---|---|
| Bytes live at global peak (default) | Finding what dominates memory at the high-water mark |
| Bytes live at exit | Finding memory leaks or things that never get freed |
| Total bytes allocated | Finding allocation pressure hotspots (GC churn) |
| Max bytes live per site | Finding per-site high-water marks |
| Total allocation count | Finding chatty allocators (many small allocs) |
Analysis Flags
- — Show top N sites (default 15)
--top N - — Filter to sites/stacks containing substring (e.g.
--filter PATTERN,mvcc,btree,wal)pager - — Show full callstacks for top allocation sites
--stacks - — Aggregate by crate/module for a high-level breakdown
--modules - — Machine-readable aggregated output
--json
Typical Workflow
When investigating memory usage or a suspected regression:
-
Run the benchmark with parameters matching the scenario:bash
cargo run -p memory-benchmark -- --mode mvcc --workload mixed -i 500 -b 100 --connections 4 -
Get the high-level picture — which modules use the most memory:bash
python3 perf/memory/analyze-dhat.py dhat-heap.json --modules --top 20 -
Drill into the hot module — e.g. ifdominates:
turso_corebashpython3 perf/memory/analyze-dhat.py dhat-heap.json --filter turso_core --stacks --top 10 -
Check for leaks — anything still alive at exit that shouldn't be:bash
python3 perf/memory/analyze-dhat.py dhat-heap.json --sort-by eb --top 10 -
Compare modes — run the same workload under WAL and MVCC and compare the reports to see the memory cost of MVCC versioning.
Concurrency Details
When :
--connections > 1- Setup phase (schema creation, seeding) always runs on a single connection sequentially
- Run phase spawns one tokio task per connection, each executing its batch concurrently
- Each connection gets set (default 30s, configurable via
busy_timeout)--timeout - WAL mode uses , MVCC uses
BEGINBEGIN CONCURRENT - The trait's
Profilereturns one batch per connection with non-overlapping row IDsnext_batch(connections)
Adding a New Profile
- Create implementing the
perf/memory/src/profile/your_profile.rstraitProfile - Add to
pub mod your_profile;perf/memory/src/profile/mod.rs - Add a variant to enum in
WorkloadProfilemain.rs - Wire it into in
create_profile()main.rs
The trait:
Profilerust
pub trait Profile {
fn name(&self) -> &str;
fn next_batch(&mut self, connections: usize) -> (Phase, Vec<Vec<WorkItem>>);
}Return for schema/seeding (single batch), for measured work (one batch per connection), when finished.
Phase::SetupPhase::RunPhase::DoneKeeping This Skill Up to Date
This skill document is the source of truth for how agents use the memory benchmark tooling. If you modify the crate — adding profiles, changing CLI flags, altering output format, updating the analysis script, changing the trait, etc. — update this SKILL.md to match. Specifically:
perf/memoryProfile- New CLI flags: add to the "Running Benchmarks" section
- New profiles: add to the "Built-in Workload Profiles" table
- Changed output metrics: update the "Understanding the Output" section
- New analyze-dhat.py flags or sort metrics: update the "Analyzing dhat Output" section
- Changed trait signature: update "Adding a New Profile"
Profile
Future agents rely on this document being accurate. Stale instructions cause wasted work.