tailslayer-dram-hedged-reads
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTailslayer — DRAM Hedged Read Library
Tailslayer — DRAM对冲读取库
Skill by ara.so — Daily 2026 Skills collection.
Tailslayer is a C++ library that reduces tail latency in RAM reads caused by DRAM refresh stalls. It replicates data across multiple independent DRAM channels with uncorrelated refresh schedules, issues hedged reads across all replicas simultaneously, and returns whichever result responds first — eliminating worst-case stall spikes from DRAM refresh cycles.
Works on AMD, Intel, and AWS Graviton using undocumented channel scrambling offsets.
由ara.so提供的Skill — 2026年度技能合集。
Tailslayer是一款C++库,用于降低由DRAM刷新停滞导致的RAM读取尾部延迟。它会将数据复制到多个采用非关联刷新调度的独立DRAM通道中,同时向所有副本发起对冲读取,返回最先响应的结果,从而消除DRAM刷新周期带来的最坏情况停滞峰值。
该库借助未公开的通道加扰偏移量,可在AMD、Intel和AWS Graviton平台上运行。
How It Works
工作原理
- Data is replicated N times, each copy placed on a different DRAM channel
- Each replica is monitored by a worker pinned to a separate CPU core
- When a read is triggered (via your signal function), all replicas are read simultaneously
- Whichever channel responds first wins; the result is passed to your work function
- DRAM refresh on one channel cannot stall all channels simultaneously → tail latency is eliminated
- 数据会被复制N份,每份副本存放在不同的DRAM通道上
- 每个副本由绑定在独立CPU核心上的工作线程监控
- 当读取被触发(通过你定义的信号函数)时,所有副本会被同时读取
- 最先响应的通道胜出,结果会被传递到你的工作函数中
- 单通道的DRAM刷新不会同时阻塞所有通道 → 尾部延迟被消除
Installation
安装
Copy the header into your project
复制头文件到你的项目
bash
git clone https://github.com/LaurieWired/tailslayer.git
cp -r tailslayer/include/tailslayer /your/project/include/bash
git clone https://github.com/LaurieWired/tailslayer.git
cp -r tailslayer/include/tailslayer /your/project/include/Include in your code
在代码中引入
cpp
#include <tailslayer/hedged_reader.hpp>cpp
#include <tailslayer/hedged_reader.hpp>Build the provided example
编译提供的示例代码
bash
git clone https://github.com/LaurieWired/tailslayer.git
cd tailslayer
make
./tailslayer_examplebash
git clone https://github.com/LaurieWired/tailslayer.git
cd tailslayer
make
./tailslayer_exampleKey API
核心API
tailslayer::HedgedReader<T, SignalFn, WorkFn, SignalArgs, WorkArgs>
tailslayer::HedgedReader<T, SignalFn, WorkFn, SignalArgs, WorkArgs>tailslayer::HedgedReader<T, SignalFn, WorkFn, SignalArgs, WorkArgs>
tailslayer::HedgedReader<T, SignalFn, WorkFn, SignalArgs, WorkArgs>Template parameters:
| Parameter | Description |
|---|---|
| Value type stored and read |
| Function that waits for a trigger and returns the index to read |
| Function called with the value immediately after read |
| (optional) |
| (optional) |
模板参数:
| 参数 | 描述 |
|---|---|
| 存储和读取的值的类型 |
| 等待触发信号并返回待读取索引的函数 |
| 读取完成后立即接收值并调用的函数 |
| (可选)传递给信号函数的编译期参数,格式为 |
| (可选)传递给工作函数的编译期参数,格式为 |
Constructor optional parameters
构造函数可选参数
cpp
HedgedReader(
uint64_t channel_offset = DEFAULT_OFFSET, // undocumented channel scrambling offset
uint64_t channel_bit = DEFAULT_BIT, // bit used for channel selection
std::size_t n_replicas = 2 // number of DRAM channel replicas
)cpp
HedgedReader(
uint64_t channel_offset = DEFAULT_OFFSET, // 未公开的通道加扰偏移量
uint64_t channel_bit = DEFAULT_BIT, // 用于通道选择的比特位
std::size_t n_replicas = 2 // DRAM通道副本数量
)Methods
方法
cpp
reader.insert(T value); // Insert value, replicated across all channels
reader.start_workers(); // Launch per-channel worker threads (blocking)cpp
reader.insert(T value); // 插入值,会自动复制到所有通道
reader.start_workers(); // 启动每个通道对应的工作线程(阻塞方法)Utilities
工具方法
cpp
tailslayer::pin_to_core(core_id); // Pin calling thread to a specific core
tailslayer::CORE_MAIN // Constant: recommended core for main threadcpp
tailslayer::pin_to_core(core_id); // 将调用线程绑定到指定CPU核心
tailslayer::CORE_MAIN // 常量:推荐给主线程使用的核心Minimal Usage Pattern
最简使用示例
cpp
#include <tailslayer/hedged_reader.hpp>
#include <cstdint>
#include <cstdio>
// 1. Define your signal function — waits for your event, returns index to read
[[gnu::always_inline]] inline std::size_t my_signal() {
// Example: busy-wait for an external flag, then return the index
extern volatile std::size_t g_index;
extern volatile bool g_trigger;
while (!g_trigger) {}
g_trigger = false;
return g_index;
}
// 2. Define your work function — receives the read value immediately
template <typename T>
[[gnu::always_inline]] inline void my_work(T val) {
// Process val as fast as possible
printf("Read value: %u\n", (unsigned)val);
}
int main() {
using T = uint8_t;
// Pin main thread to recommended core
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
// Construct reader with 2 replicas (default)
tailslayer::HedgedReader<T, my_signal, my_work<T>> reader{};
// Insert data — replicated across both DRAM channels automatically
reader.insert(0x43);
reader.insert(0x44);
// Launch workers — blocks; workers spin until signal fires
reader.start_workers();
return 0;
}cpp
#include <tailslayer/hedged_reader.hpp>
#include <cstdint>
#include <cstdio>
// 1. 定义你的信号函数 — 等待你的事件触发,返回待读取的索引
[[gnu::always_inline]] inline std::size_t my_signal() {
// 示例:忙等待外部标记,然后返回索引
extern volatile std::size_t g_index;
extern volatile bool g_trigger;
while (!g_trigger) {}
g_trigger = false;
return g_index;
}
// 2. 定义你的工作函数 — 读取完成后立即接收读取到的值
template <typename T>
[[gnu::always_inline]] inline void my_work(T val) {
// 尽快处理val
printf("Read value: %u\n", (unsigned)val);
}
int main() {
using T = uint8_t;
// 将主线程绑定到推荐核心
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
// 构造读取器,默认使用2个副本
tailslayer::HedgedReader<T, my_signal, my_work<T>> reader{};
// 插入数据 — 自动复制到两个DRAM通道
reader.insert(0x43);
reader.insert(0x44);
// 启动工作线程 — 会阻塞,工作线程会自旋等待信号触发
reader.start_workers();
return 0;
}Passing Arguments to Signal and Work Functions
向信号函数和工作函数传递参数
Use to pass compile-time integer arguments:
tailslayer::ArgList<...>cpp
#include <tailslayer/hedged_reader.hpp>
// Signal function with args
[[gnu::always_inline]] inline std::size_t my_signal(int threshold, int channel) {
// use threshold and channel...
return 0;
}
// Work function with args
template <typename T>
[[gnu::always_inline]] inline void my_work(T val, int multiplier) {
volatile int result = (int)val * multiplier;
(void)result;
}
int main() {
using T = uint8_t;
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
tailslayer::HedgedReader<
T,
my_signal,
my_work<T>,
tailslayer::ArgList<10, 1>, // args forwarded to my_signal: threshold=10, channel=1
tailslayer::ArgList<2> // args forwarded to my_work: multiplier=2
> reader{};
reader.insert(0xAB);
reader.start_workers();
}使用传递编译期整数参数:
tailslayer::ArgList<...>cpp
#include <tailslayer/hedged_reader.hpp>
// 带参数的信号函数
[[gnu::always_inline]] inline std::size_t my_signal(int threshold, int channel) {
// 使用threshold和channel参数...
return 0;
}
// 带参数的工作函数
template <typename T>
[[gnu::always_inline]] inline void my_work(T val, int multiplier) {
volatile int result = (int)val * multiplier;
(void)result;
}
int main() {
using T = uint8_t;
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
tailslayer::HedgedReader<
T,
my_signal,
my_work<T>,
tailslayer::ArgList<10, 1>, // 传递给my_signal的参数: threshold=10, channel=1
tailslayer::ArgList<2> // 传递给my_work的参数: multiplier=2
> reader{};
reader.insert(0xAB);
reader.start_workers();
}Custom Channel Configuration
自定义通道配置
Override channel offset, channel bit, and replica count in the constructor:
cpp
// Example: 4 replicas, custom channel bit 8 (common for AMD/Intel)
tailslayer::HedgedReader<T, my_signal, my_work<T>> reader{
/* channel_offset */ 0,
/* channel_bit */ 8,
/* n_replicas */ 4
};Note: N-way (more than 2 replicas) hedging requires using the benchmark code in. The main library header currently exposes 2 channels by default.discovery/benchmark/
在构造函数中覆盖通道偏移、通道比特位和副本数量:
cpp
// 示例:4个副本,自定义通道比特位为8(AMD/Intel平台常见配置)
tailslayer::HedgedReader<T, my_signal, my_work<T>> reader{
/* channel_offset */ 0,
/* channel_bit */ 8,
/* n_replicas */ 4
};注意: N路(超过2个副本)对冲读取需要使用下的基准测试代码。当前主库头文件默认仅暴露2个通道。discovery/benchmark/
Running Benchmarks
运行基准测试
Channel-hedged read benchmark (N-way)
通道对冲读取基准测试(N路)
bash
cd discovery/benchmark
make
sudo chrt -f 99 ./hedged_read_cpp --all --channel-bit 8Flags:
| Flag | Description |
|---|---|
| Run all channel configurations |
| Specify the DRAM channel selection bit (try 6, 7, or 8 for your platform) |
bash
cd discovery/benchmark
make
sudo chrt -f 99 ./hedged_read_cpp --all --channel-bit 8参数说明:
| 参数 | 描述 |
|---|---|
| 运行所有通道配置 |
| 指定DRAM通道选择比特位(你的平台可以尝试6、7或8) |
DRAM refresh spike timing probe
DRAM刷新峰值计时探测工具
bash
cd discovery
gcc -O2 -o trefi_probe trefi_probe.c
sudo ./trefi_probeThis measures your DRAM's tREFI refresh interval and the worst-case stall duration — useful for calibrating expectations.
bash
cd discovery
gcc -O2 -o trefi_probe trefi_probe.c
sudo ./trefi_probe该工具会测量你的DRAM的tREFI刷新间隔和最坏情况停滞时长,可用于校准预期性能。
Platform Notes
平台说明
| Platform | Typical Channel Bit | Notes |
|---|---|---|
| AMD (Zen) | 6 or 7 | Verify with benchmark |
| Intel | 6, 7, or 8 | Run benchmark with |
| AWS Graviton | 8 | Confirmed working |
Use in the benchmark to auto-detect the best channel bit for your system.
--all| 平台 | 典型通道比特位 | 说明 |
|---|---|---|
| AMD (Zen) | 6或7 | 请用基准测试验证 |
| Intel | 6、7或8 | 带 |
| AWS Graviton | 8 | 已验证可正常运行 |
在基准测试中使用参数可自动探测你系统的最佳通道比特位。
--allCommon Patterns
常见使用场景
Low-latency trading / event-driven read
低延迟交易/事件驱动读取
cpp
// Pre-load order book prices into hedged reader
// Signal on market data arrival, process immediately
[[gnu::always_inline]] inline std::size_t await_market_signal() {
extern volatile std::size_t g_book_idx;
extern volatile bool g_tick;
while (!g_tick) { __builtin_ia32_pause(); }
g_tick = false;
return g_book_idx;
}
template <typename T>
[[gnu::always_inline]] inline void process_price(T price) {
// Submit order using price with minimal latency
extern void submit_order(T);
submit_order(price);
}
int main() {
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
tailslayer::HedgedReader<uint64_t, await_market_signal, process_price<uint64_t>> reader{};
for (uint64_t price : preloaded_prices) {
reader.insert(price);
}
reader.start_workers();
}cpp
// 预先将订单簿价格加载到对冲读取器中
// 市场数据到达时触发信号,立即处理
[[gnu::always_inline]] inline std::size_t await_market_signal() {
extern volatile std::size_t g_book_idx;
extern volatile bool g_tick;
while (!g_tick) { __builtin_ia32_pause(); }
g_tick = false;
return g_book_idx;
}
template <typename T>
[[gnu::always_inline]] inline void process_price(T price) {
// 以最低延迟使用价格提交订单
extern void submit_order(T);
submit_order(price);
}
int main() {
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
tailslayer::HedgedReader<uint64_t, await_market_signal, process_price<uint64_t>> reader{};
for (uint64_t price : preloaded_prices) {
reader.insert(price);
}
reader.start_workers();
}Preloading a lookup table across channels
跨通道预加载查询表
cpp
// Each insert automatically maps to correct DRAM channel via address calculation
// Access is via logical index — tailslayer manages physical placement
tailslayer::HedgedReader<uint32_t, my_signal, my_work<uint32_t>> reader{};
std::vector<uint32_t> lut = {100, 200, 300, 400};
for (auto v : lut) {
reader.insert(v);
}
reader.start_workers();cpp
// 每次插入会通过地址计算自动映射到正确的DRAM通道
// 通过逻辑索引访问 — tailslayer会管理物理存放位置
tailslayer::HedgedReader<uint32_t, my_signal, my_work<uint32_t>> reader{};
std::vector<uint32_t> lut = {100, 200, 300, 400};
for (auto v : lut) {
reader.insert(v);
}
reader.start_workers();Troubleshooting
故障排查
High latency still observed
仍观察到高延迟
- Verify you are using the correct for your CPU. Run benchmark with
--channel-bit.--all - Ensure workers are pinned to isolated cores (use kernel boot parameter).
isolcpus= - Run with real-time scheduling:
sudo chrt -f 99 ./your_binary
- 确认你使用了适配你CPU的正确参数,带
--channel-bit参数运行基准测试。--all - 确保工作线程绑定到了隔离核心(使用内核启动参数)。
isolcpus= - 使用实时调度运行:
sudo chrt -f 99 ./your_binary
Build errors — missing headers
编译错误 — 缺少头文件
- Confirm is on your include path.
include/tailslayer/hedged_reader.hpp - Requires C++17 or later: add to your compiler flags.
-std=c++17
- 确认在你的头文件搜索路径中。
include/tailslayer/hedged_reader.hpp - 需要C++17或更高版本:在你的编译参数中添加。
-std=c++17
Workers don't start / deadlock
工作线程未启动/死锁
- is blocking. It launches threads and waits — your signal function must eventually return.
start_workers() - Ensure the signal function does not block indefinitely during testing.
- 是阻塞方法,它会启动线程并等待 — 你的信号函数必须最终返回。
start_workers() - 测试时确保信号函数不会无限期阻塞。
Data corruption / wrong values
数据损坏/值错误
- Each replicates the value N times (one per channel). Logical indexing is handled internally — do not attempt to address replicas directly.
insert() - Do not modify inserted data after is called.
insert()
- 每次会将值复制N份(每个通道一份),逻辑索引由内部处理 — 不要尝试直接访问副本。
insert() - 调用完成后不要修改已插入的数据。
insert()
Platform not supported
平台不受支持
- Tailslayer uses undocumented DRAM channel scrambling offsets. If your platform is not AMD, Intel, or Graviton, run the trefi_probe and benchmark tools to characterize refresh behavior before using the library in production.
- Tailslayer使用了未公开的DRAM通道加扰偏移量。如果你的平台不是AMD、Intel或Graviton,请先运行trefi_probe和基准测试工具分析刷新行为,再在生产环境使用该库。
Project Structure
项目结构
tailslayer/
├── include/tailslayer/
│ └── hedged_reader.hpp # Main library header (copy this)
├── tailslayer_example.cpp # Usage example
├── discovery/
│ ├── trefi_probe.c # DRAM refresh spike timing tool
│ └── benchmark/ # N-way channel hedging benchmark
└── Makefiletailslayer/
├── include/tailslayer/
│ └── hedged_reader.hpp # 主库头文件(复制这个文件即可)
├── tailslayer_example.cpp # 使用示例
├── discovery/
│ ├── trefi_probe.c # DRAM刷新峰值计时工具
│ └── benchmark/ # N路通道对冲基准测试
└── Makefile