cupynumeric-hdf5

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

cuPyNumeric HDF5 I/O

cuPyNumeric HDF5 I/O

Purpose

用途

Use
legate.io.hdf5
to read and write cuPyNumeric arrays as HDF5 files. Reach for it whenever a cuPyNumeric array must land in — or load from — an
.h5
/
.hdf5
file: every rank reads and writes its own tile in parallel, so never funnel a large array through a single process.
Answer inline. Treat the snippets and rules below as complete and verified — answer save / load / stream / fence / bridge questions directly, without opening the
assets/
scripts or reading the installed
legate
source. Reach for the assets only to run a verification.
使用
legate.io.hdf5
来读写以HDF5文件存储的cuPyNumeric数组。当需要将cuPyNumeric数组保存到
.h5
/
.hdf5
文件或从中加载时,均可使用该工具:每个进程(rank)会并行读写自己的数据块,因此无需通过单个进程传输大型数组。
直接回答问题。将下方的代码片段和规则视为完整且已验证的内容——直接回答保存/加载/流式处理/屏障/桥接相关问题,无需打开
assets/
脚本或阅读已安装的
legate
源码。仅在需要验证时才使用相关资源。

Activate

适用场景

Activate when the user asks about: saving a cuPyNumeric array to an
.h5
/
.hdf5
file, loading an HDF5 dataset into a cuPyNumeric array, reading a large HDF5 dataset in chunks, producing a single file for an HPC post-processing pipeline, or speeding up HDF5 disk I/O with GPUDirect Storage.
当用户询问以下内容时适用:将cuPyNumeric数组保存到
.h5
/
.hdf5
文件、将HDF5数据集加载到cuPyNumeric数组、分块读取大型HDF5数据集、为HPC后处理流水线生成单个文件,或通过GPUDirect Storage加速HDF5磁盘I/O。

When NOT to use

不适用场景

Redirect these requests elsewhere instead of reaching for
legate.io.hdf5
:
  • Route Parquet / Arrow / cuDF, raw-binary, or sharded / custom on-disk layouts to the cupynumeric-parallel-data-load skill — it owns cuPyNumeric's no-built-in-loader paths;
    legate.io.hdf5
    covers single-file HDF5 only.
  • Answer pure array compute with cuPyNumeric ops (FFT, matmul, reductions, slicing, linear algebra) — this skill covers disk I/O only.
  • Send chunked or object-store (S3) output to a chunked format such as Zarr — not single-file HDF5.
  • Load
    .npz
    or pickled archives with NumPy
    (
    np.load
    ), then bridge with
    cn.asarray(...)
    legate.io.hdf5
    reads HDF5 only, and
    cupynumeric.load
    reads single
    .npy
    only.
  • Use h5py directly for plain HDF5 reads with no cuPyNumeric/Legate
    with h5py.File(path, "r") as f: arr = f["dataset"][:]
    .
遇到以下请求时,请引导至其他工具,不要使用
legate.io.hdf5
  • Parquet/Arrow/cuDF、原始二进制或分片/自定义磁盘布局相关请求,请引导至cupynumeric-parallel-data-load技能 — 该技能负责cuPyNumeric中无内置加载器的路径;
    legate.io.hdf5
    仅支持单文件HDF5。
  • 纯数组计算请求(如FFT、矩阵乘法、归约、切片、线性代数),请使用cuPyNumeric运算处理 — 本技能仅覆盖磁盘I/O场景。
  • 分块或对象存储(S3)输出,请使用Zarr等分块格式 — 不要使用单文件HDF5。
  • .npz
    或pickle归档文件,请使用NumPy加载
    np.load
    ),再通过
    cn.asarray(...)
    转换 —
    legate.io.hdf5
    仅读取HDF5文件,而
    cupynumeric.load
    仅读取单个
    .npy
    文件。
  • 未搭配cuPyNumeric/Legate的纯HDF5读取需求,请直接使用h5py — 示例代码:
    with h5py.File(path, "r") as f: arr = f["dataset"][:]

Prerequisites

前置条件

Install h5py before importing anything from
legate.io.hdf5
:
bash
conda install -c conda-forge h5py        # required; legate/io/hdf5.py imports it at load
Expect
from legate.io.hdf5 import ...
to raise
ModuleNotFoundError
until you do — the module imports
h5py
at load time. (h5py · conda-forge build)
在导入
legate.io.hdf5
中的任何内容之前,请先安装h5py:
bash
conda install -c conda-forge h5py        # required; legate/io/hdf5.py imports it at load
如果未安装h5py,执行
from legate.io.hdf5 import ...
会触发
ModuleNotFoundError
— 该模块在加载时会导入h5py。(h5py · conda-forge构建版)

API

API

FunctionSignaturePurpose
to_file
to_file(array, path, dataset_name)
Write a cuPyNumeric array /
LogicalArray
to one HDF5 file as a virtual dataset (VDS) — each rank writes its own tile.
from_file
from_file(path, dataset_name) -> LogicalArray
Read one HDF5 dataset into a distributed array.
from_file_batched
from_file_batched(path, dataset_name, chunk_size) -> Iterator[(LogicalArray, offsets)]
Read a dataset in chunks — chunks the file read, not the assembled array.
Import all three from
legate.io.hdf5
. Always pass
dataset_name
as the full path to a single array inside the file (e.g.
"/data"
or
"/group/x"
), never a group.
函数签名用途
to_file
to_file(array, path, dataset_name)
将cuPyNumeric数组/
LogicalArray
写入单个HDF5文件,作为虚拟数据集(VDS)——每个进程写入自己的数据块。
from_file
from_file(path, dataset_name) -> LogicalArray
将单个HDF5数据集读取为分布式数组。
from_file_batched
from_file_batched(path, dataset_name, chunk_size) -> Iterator[(LogicalArray, offsets)]
分块读取数据集——对文件读取进行分块,而非对组装后的数组分块。
legate.io.hdf5
导入上述三个函数。请始终将
dataset_name
指定为文件内单个数组的完整路径(例如
"/data"
"/group/x"
),切勿指定为组。

Examples

示例

Round trip

往返读写示例

python
import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file, to_file

a = cn.arange(64, dtype=cn.float32).reshape(8, 8)
python
import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file, to_file

a = cn.arange(64, dtype=cn.float32).reshape(8, 8)

Write: pass the cuPyNumeric ndarray straight in - no manual conversion.

Write: pass the cuPyNumeric ndarray straight in - no manual conversion.

to_file(array=a, path="out.h5", dataset_name="/data") get_legate_runtime().issue_execution_fence(block=True) # needed before any external reader
to_file(array=a, path="out.h5", dataset_name="/data") get_legate_runtime().issue_execution_fence(block=True) # needed before any external reader

Read: from_file returns a legate LogicalArray; cn.asarray bridges it back.

Read: from_file returns a legate LogicalArray; cn.asarray bridges it back.

b = cn.asarray(from_file("out.h5", dataset_name="/data")) assert cn.array_equal(a, b)

Run `assets/hdf5_roundtrip.py` to verify (optional — not needed to answer).
b = cn.asarray(from_file("out.h5", dataset_name="/data")) assert cn.array_equal(a, b)

可运行`assets/hdf5_roundtrip.py`进行验证(可选——回答问题无需执行)。

Read a large file in chunks

分块读取大型文件

Use
from_file_batched
to read the source file in chunks instead of pulling it into host memory all at once. It yields one
LogicalArray
per chunk plus that chunk's offsets in the global shape. Expect clipped boundary chunks (an axis of length 5 with
chunk_size=2
yields 2, 2, 1), so place each chunk by its actual shape, not the requested
chunk_size
. Note that this chunks the file read, not the result — the assembled array (
out
) still has to fit in distributed memory:
python
import h5py
import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file_batched

with h5py.File("big.h5", "r") as f:          # read shape/dtype without loading data
    shape, dtype = f["data"].shape, f["data"].dtype

out = cn.empty(shape, dtype=dtype)
for chunk, (r0, c0) in from_file_batched("big.h5", "data", chunk_size=(4096, 4096)):
    out[r0:r0 + chunk.shape[0], c0:c0 + chunk.shape[1]] = cn.asarray(chunk)
get_legate_runtime().issue_execution_fence(block=True)
Keep every
chunk_size
entry positive and its length equal to the dataset's rank, or
from_file_batched
raises
ValueError
. Run
assets/hdf5_batched_read.py
to verify (optional).
使用
from_file_batched
分块读取源文件,而非一次性将其全部加载到主机内存中。它会逐个返回每个分块的
LogicalArray
以及该分块在全局形状中的偏移量。注意边界分块可能会被截断(例如长度为5的轴,
chunk_size=2
会生成2、2、1的分块),因此请根据分块的实际形状而非请求的
chunk_size
来放置每个分块。需要注意的是,这是对文件读取进行分块,而非对结果数组分块——组装后的数组(
out
)仍需能容纳在分布式内存中:
python
import h5py
import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file_batched

with h5py.File("big.h5", "r") as f:          # read shape/dtype without loading data
    shape, dtype = f["data"].shape, f["data"].dtype

out = cn.empty(shape, dtype=dtype)
for chunk, (r0, c0) in from_file_batched("big.h5", "data", chunk_size=(4096, 4096)):
    out[r0:r0 + chunk.shape[0], c0:c0 + chunk.shape[1]] = cn.asarray(chunk)
get_legate_runtime().issue_execution_fence(block=True)
请确保每个
chunk_size
的取值为正数,且长度与数据集的维度数一致,否则
from_file_batched
会触发
ValueError
。可运行
assets/hdf5_batched_read.py
进行验证(可选)。

Instructions

使用说明

  • Pass the cuPyNumeric ndarray directly to
    to_file
    - it implements
    __legate_data_interface__
    , which
    to_file
    accepts as
    LogicalArrayLike
    . Skip any
    np.array(...)
    round-trip.
  • Bridge results back with
    cn.asarray(...)
    .
    from_file
    and each
    from_file_batched
    chunk return a Legate
    LogicalArray
    ; wrap it with
    cn.asarray(la)
    to get a cuPyNumeric ndarray (zero-copy, no host bounce).
  • Fence before any external reader. Legate I/O is asynchronous:
    to_file
    only queues the write. Insert
    get_legate_runtime().issue_execution_fence(block=True)
    before h5py, a subprocess, or another tool opens the file. Skip the fence for a
    from_file
    issued later in the same Legate program — the runtime preserves that ordering.
  • Run from outside the cuPyNumeric source tree (e.g.
    cd /tmp
    ). Python puts the cwd first on
    sys.path
    , so an in-tree
    cupynumeric/
    directory shadows the installed package (
    ModuleNotFoundError: cupynumeric.install_info
    ).
  • Give every rank the same
    path
    .
    The program runs on every rank (SPMD), so pass
    to_file
    /
    from_file
    an identical
    path
    on each — a per-rank
    tempfile.mkstemp()
    name breaks the collective I/O. When the program creates the file itself, write it with the collective
    to_file
    , not a per-rank
    h5py
    write.
  • 直接将cuPyNumeric ndarray传入
    to_file
    — 它实现了
    __legate_data_interface__
    to_file
    可将其作为
    LogicalArrayLike
    接受。无需进行
    np.array(...)
    的往返转换。
  • 使用
    cn.asarray(...)
    转换结果
    from_file
    和每个
    from_file_batched
    分块都会返回Legate的
    LogicalArray
    ;使用
    cn.asarray(la)
    将其包装为cuPyNumeric ndarray(零拷贝,无需主机内存中转)。
  • 在外部读取前添加屏障。Legate I/O是异步的:
    to_file
    仅会将写入操作加入队列。在使用h5py、子进程或其他工具打开文件之前,请插入
    get_legate_runtime().issue_execution_fence(block=True)
    。如果是在同一个Legate程序中后续执行
    from_file
    ,则无需添加屏障——运行时会保证执行顺序。
  • 在cuPyNumeric源码树外运行(例如
    cd /tmp
    )。Python会将当前工作目录放在
    sys.path
    的首位,因此源码树内的
    cupynumeric/
    目录会覆盖已安装的包(导致
    ModuleNotFoundError: cupynumeric.install_info
    )。
  • 为每个进程指定相同的
    path
    。程序会在每个进程上运行(SPMD模式),因此请为每个进程的
    to_file
    /
    from_file
    传入相同的
    path
    ——每个进程使用不同的
    tempfile.mkstemp()
    生成的名称会破坏集体I/O。当程序自行创建文件时,请使用集体
    to_file
    写入,而非每个进程单独使用
    h5py
    写入。

to_file
behavior to plan around

to_file
的行为注意事项

  • Expect an HDF5 virtual dataset (VDS): each rank writes its own tile and the file presents them as one logical dataset.
  • Treat
    to_file
    as destructive — it overwrites
    path
    if it already exists, so guard any file you must not clobber.
  • Let
    to_file
    create missing parent directories; do not pre-create them.
  • Give
    path
    a file name (
    /path/to/file.h5
    ), never a directory — a directory raises
    ValueError
    . Pass a bound array (one with a known shape);
    to_file
    raises
    ValueError
    on an unbound array — a Legate array created without a shape (e.g.
    create_array(dtype, ndim=n)
    ) whose extent a producing task fills in later. cuPyNumeric ndarrays are always bound — even lazy/deferred ones — so this only affects raw
    LogicalArray
    s.
  • 生成的是HDF5虚拟数据集(VDS):每个进程写入自己的数据块,文件将它们呈现为一个逻辑数据集。
  • to_file
    破坏性操作——如果
    path
    对应的文件已存在,它会覆盖该文件,因此请保护好任何不能被覆盖的文件。
  • to_file
    自动创建缺失的父目录;无需预先创建。
  • path
    必须是文件名(例如
    /path/to/file.h5
    ),不能是目录——传入目录会触发
    ValueError
    。请传入绑定数组(即形状已知的数组);
    to_file
    会对未绑定数组触发
    ValueError
    ——未绑定数组是指Legate创建的未指定形状的数组(例如
    create_array(dtype, ndim=n)
    ),其范围由后续生成任务填充。cuPyNumeric ndarrays始终是绑定的——即使是延迟计算的数组也是如此——因此这仅会影响原始的
    LogicalArray

GPUDirect Storage (GDS)

GPUDirect Storage(GDS)

Always set
LEGATE_IO_USE_VFD_GDS=1
for runs that read HDF5 into GPU memory
— whether or not the cluster has GPUDirect-capable storage:
bash
export LEGATE_IO_USE_VFD_GDS=1          # set before launching
当需要将HDF5数据读取到GPU内存时,请始终设置
LEGATE_IO_USE_VFD_GDS=1
——无论集群是否支持GPUDirect存储:
bash
export LEGATE_IO_USE_VFD_GDS=1          # set before launching

or, with the legate driver:

or, with the legate driver:

legate --io-use-vfd-gds my_script.py

- **Read into the GPU through the GDS VFD, not the default path.** The default (POSIX) VFD stages each GPU read through zero-copy memory (ZCMEM), of which Legate reserves only 128 MB — so a GPU read of an array larger than ~128 MB aborts. The GDS VFD removes that staging buffer.
- **Leave it unset when reading into host (CPU) memory** — the VFD GDS plugin is unnecessary there and only adds overhead.
- **Keep `=1` even without GPUDirect-capable storage** — cuFile falls back to compatibility mode automatically (set `export CUFILE_ALLOW_COMPAT_MODE=true` if it is not already on), and `=1` still avoids the ZCMEM abort.
- **Attribute it correctly:** the GDS VFD is the [nv-legate/vfd-gds](https://github.com/nv-legate/vfd-gds) plugin over NVIDIA [cuFile](https://developer.nvidia.com/gpudirect-storage), **not** KvikIO (KvikIO backs Legate's Zarr/tile I/O, not HDF5). Confirm it engaged by grepping the run log for `H5FD__gds_open: Successfully opened file w/GDS VFD`.
legate --io-use-vfd-gds my_script.py

- **通过GDS VFD将数据读取到GPU,而非默认路径**。默认(POSIX)VFD会将每个GPU读取操作通过零拷贝内存(ZCMEM)中转,而Legate仅预留了128 MB的ZCMEM——因此读取大于约128 MB的GPU数组会导致程序终止。GDS VFD会移除该中转缓冲区。
- **当读取到主机(CPU)内存时,请不要设置该环境变量**——VFD GDS插件在此场景下是不必要的,只会增加开销。
- **即使没有支持GPUDirect的存储,也请保持设置为`=1`**——cuFile会自动回退到兼容模式(如果尚未开启,请设置`export CUFILE_ALLOW_COMPAT_MODE=true`),且设置`=1`仍可避免ZCMEM导致的程序终止。
- **正确区分相关组件**:GDS VFD是基于NVIDIA [cuFile](https://developer.nvidia.com/gpudirect-storage)的[nv-legate/vfd-gds](https://github.com/nv-legate/vfd-gds)插件,**而非**KvikIO(KvikIO为Legate的Zarr/分块I/O提供支持,而非HDF5)。可通过在运行日志中搜索`H5FD__gds_open: Successfully opened file w/GDS VFD`来确认其是否已启用。

Troubleshooting

故障排除

SymptomCause and fix
ModuleNotFoundError: No module named 'h5py'
on import
h5py is missing —
conda install -c conda-forge h5py
.
File looks empty/truncated to h5py right after
to_file
The async write hasn't landed — add
get_legate_runtime().issue_execution_fence(block=True)
before the external read.
ValueError
from
to_file
path
is a directory — pass a file path such as
results/data.h5
.
ModuleNotFoundError: No module named 'cupynumeric.install_info'
Running inside the source tree —
cd /tmp
(any directory outside the repo).
Abort/crash reading a GPU array ≳128 MBDefault 128 MB ZCMEM staging buffer — set
LEGATE_IO_USE_VFD_GDS=1
for GPU reads.
from_file
returned
LogicalArray(...)
Expected — wrap it with
cn.asarray(...)
.
症状原因与解决方法
导入时出现
ModuleNotFoundError: No module named 'h5py'
缺少h5py——执行
conda install -c conda-forge h5py
安装。
to_file
执行后立即使用h5py查看文件,发现文件为空/被截断
异步写入尚未完成——在外部读取前添加
get_legate_runtime().issue_execution_fence(block=True)
to_file
触发
ValueError
path
是目录——请传入文件路径,例如
results/data.h5
出现
ModuleNotFoundError: No module named 'cupynumeric.install_info'
在源码树内运行——切换到
/tmp
(或任何仓库外的目录)。
读取大于约128 MB的GPU数组时程序终止/崩溃默认的128 MB ZCMEM中转缓冲区限制——读取GPU数据时设置
LEGATE_IO_USE_VFD_GDS=1
from_file
返回
LogicalArray(...)
这是预期行为——使用
cn.asarray(...)
进行转换。

Limitations & version notes

限制与版本说明

  • Import from
    legate.io.hdf5
    (Legate 26.01+); rewrite any
    legate.core.io.hdf5
    import left over from the 25.03 line (e.g. the 25.03 launch blog still shows the old path).
  • Install h5py explicitly — it ships in no default cuPyNumeric env.
  • Point
    dataset_name
    at a single array, never a group
    ; traverse groups with h5py first to discover dataset paths.
  • On GPU, always read with
    LEGATE_IO_USE_VFD_GDS=1
    (see GPUDirect Storage) — the default path aborts on GPU arrays larger than the 128 MB ZCMEM buffer. Leave it unset for CPU reads.
  • legate.io.hdf5
    导入
    (Legate 26.01及以上版本);请修改25.03版本遗留的
    legate.core.io.hdf5
    导入路径(例如25.03发布博客中仍显示旧路径)。
  • 需显式安装h5py——默认的cuPyNumeric环境中不包含该包。
  • dataset_name
    需指向单个数组,而非组
    ;请先使用h5py遍历组以发现数据集路径。
  • 在GPU上读取时,请始终设置
    LEGATE_IO_USE_VFD_GDS=1
    (参考GPUDirect Storage)——默认路径在读取大于128 MB ZCMEM缓冲区的GPU数组时会导致程序终止。读取CPU数据时无需设置。

Verify

验证

bash
cd /tmp                                  # outside the cupynumeric source tree
conda install -c conda-forge h5py        # one-time, if not already present
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_roundtrip.py
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_batched_read.py
Expect
HDF5 ROUND TRIP OK
and
HDF5 BATCHED READ OK
. Add
--gpus 1
(and
LEGATE_IO_USE_VFD_GDS=1
) to exercise the GPU / GDS path.
bash
cd /tmp                                  # outside the cupynumeric source tree
conda install -c conda-forge h5py        # one-time, if not already present
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_roundtrip.py
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_batched_read.py
预期会输出
HDF5 ROUND TRIP OK
HDF5 BATCHED READ OK
。添加
--gpus 1
(以及
LEGATE_IO_USE_VFD_GDS=1
)可测试GPU/GDS路径。