earth2studio-deterministic-forecast

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Earth2Studio Deterministic Forecast Skill

Earth2Studio确定性预报技能指南

Guide users through building deterministic (single-member) weather forecast inference scripts using

earth2studio.run.deterministic

指导用户使用

earth2studio.run.deterministic

构建确定性（单成员）气象预报推理脚本。

Prerequisites

前提条件

Earth2Studio installed with CUDA-capable GPU
Python 3.10+, network access for model weights and data

已安装Earth2Studio且具备支持CUDA的GPU
Python 3.10及以上版本，可联网获取模型权重与数据

Live Doc References

实时文档参考

Fetch relevant docs to verify current APIs before recommending components:

Component	URL
Prognostic models	https://nvidia.github.io/earth2studio/modules/models_px.html
Data sources (analysis)	https://nvidia.github.io/earth2studio/modules/datasources_analysis.html
Data sources (forecast)	https://nvidia.github.io/earth2studio/modules/datasources_forecast.html
IO backends	https://nvidia.github.io/earth2studio/modules/io.html
`run.deterministic`	https://github.com/NVIDIA/earth2studio/blob/main/earth2studio/run.py

在推荐组件前，获取相关文档以验证当前API：

组件	链接
预报模型	https://nvidia.github.io/earth2studio/modules/models_px.html
分析数据源	https://nvidia.github.io/earth2studio/modules/datasources_analysis.html
预报数据源	https://nvidia.github.io/earth2studio/modules/datasources_forecast.html
IO后端	https://nvidia.github.io/earth2studio/modules/io.html
`run.deterministic`	https://github.com/NVIDIA/earth2studio/blob/main/earth2studio/run.py

Workflow

工作流程

1. Gather Requirements (skip what's already provided)

1. 收集需求（跳过已提供的内容）

Time horizon (hours/days/weeks)
Variables of interest (t2m, wind, geopotential, etc.)
Region (global or specific like CONUS)
GPU/VRAM available

时间范围（小时/天/周）
关注变量（如t2m、风、位势等）
区域（全球或特定区域如CONUS）
可用GPU/显存

2. Select Model

2. 选择模型

Fetch prognostic models page. Filter by time horizon, region, VRAM. Note model's:

Input variables (
```
input_coords["variable"]
```
)
Time step size (
```
output_coords["lead_time"]
```
)

获取预报模型页面，按时间范围、区域、显存筛选。注意模型的：

输入变量（
```
input_coords["variable"]
```
）
时间步长（
```
output_coords["lead_time"]
```
）

3. Select Data Source

3. 选择数据源

Data source must provide all model input variables. Verify via lexicon at

earth2studio/lexicon/<source>.py

. Common pairings: Global models → GFS/ARCO/IFS; Regional → HRRR.

数据源必须提供模型所需的所有输入变量。可通过

earth2studio/lexicon/<source>.py

中的词汇表验证。常见搭配：全球模型→GFS/ARCO/IFS；区域模型→HRRR。

4. Select IO Backend

4. 选择IO后端

Default:

ZarrBackend

. Use

NetCDF4Backend

for legacy tools,

XarrayBackend

for in-memory/small runs.

默认使用

ZarrBackend

。若使用旧工具可选择

NetCDF4Backend

，内存内运行或小规模任务可选择

XarrayBackend

。

5. Calculate nsteps

5. 计算nsteps

nsteps = forecast_hours / model_step_hours

Example: 5-day forecast with 6h step →

nsteps = 120 / 6 = 20

nsteps = 预报小时数 / 模型步长小时数

示例：6小时步长的5天预报 →

nsteps = 120 / 6 = 20

6. Decide: output_coords Filtering

6. 决定：output_coords过滤

Filter variables (
```
output_coords
```
) when user requests specific variables (e.g., "t2m and wind") - reduces output size
Save all variables (omit
```
output_coords
```
) when user says "all variables" or doesn't specify - preserves full model output

当用户请求特定变量（如“t2m和风”）时，过滤变量（
```
output_coords
```
）——可减小输出体积
当用户要求“所有变量”或未指定时，保存所有变量（省略
```
output_coords
```
）——保留完整模型输出

7. Generate Script

7. 生成脚本

python

from collections import OrderedDict
import numpy as np
import torch
from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

model = <ModelClass>.load_model(<ModelClass>.load_default_package())
data = <DataSourceClass>()
io = <IOBackendClass>("<output_path>")

python

from collections import OrderedDict
import numpy as np
import torch
from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

model = <ModelClass>.load_model(<ModelClass>.load_default_package())
data = <DataSourceClass>()
io = <IOBackendClass>("<output_path>")

Include output_coords ONLY if user requested specific variables

仅当用户请求特定变量时才包含output_coords

output_coords = OrderedDict({"variable": np.array(["t2m", "u10m"])})

io = deterministic( time=["YYYY-MM-DDTHH:MM:SS"], nsteps=<N>, prognostic=model, data=data, io=io, output_coords=output_coords, # omit if saving all variables device=torch.device("cuda"), )

undefined

output_coords = OrderedDict({"variable": np.array(["t2m", "u10m"])})

io = deterministic( time=["YYYY-MM-DDTHH:MM:SS"], nsteps=<N>, prognostic=model, data=data, io=io, output_coords=output_coords, # 保存所有变量时省略 device=torch.device("cuda"), )

undefined

8. Manual Loop Alternative

8. 手动循环替代方案

When user explicitly requests manual implementation (NOT using

earth2studio.run.deterministic

), follow this checklist in order:

fetch_data - Get initial conditions:

x, coords = fetch_data(data, time, model.input_coords, device)

Setup total_coords - Build coordinate arrays for time and lead_time dimensions
io.add_array - Initialize IO backend with total_coords before loop

create_iterator - Create prognostic iterator:

model_iter = model.create_iterator(x, coords)

Loop through nsteps -

for step, (x, coords) in enumerate(model_iter): if step >= nsteps: break

map_coords - Filter output variables if needed:

x_out, coords_out = map_coords(x, coords, output_coords)

split_coords - Prepare for IO write:

x_out, coords_out = split_coords(x_out, coords_out)

io.write - Write each step to backend

当用户明确要求手动实现（不使用

earth2studio.run.deterministic

）时，按以下顺序执行：

fetch_data - 获取初始条件：

x, coords = fetch_data(data, time, model.input_coords, device)

设置total_coords - 构建时间和预报时效维度的坐标数组
io.add_array - 在循环前用total_coords初始化IO后端

create_iterator - 创建预报迭代器：

model_iter = model.create_iterator(x, coords)

循环nsteps次 -

for step, (x, coords) in enumerate(model_iter): if step >= nsteps: break

map_coords - 按需过滤输出变量：

x_out, coords_out = map_coords(x, coords, output_coords)

split_coords - 为IO写入做准备：

x_out, coords_out = split_coords(x_out, coords_out)

io.write - 将每一步结果写入后端

9. Explain Next Steps

9. 说明后续步骤

How to change forecast time or run multiple initializations
How to read output (
```
xr.open_zarr(...)
```
)
Point to diagnostic workflow for post-processing

如何更改预报时间或运行多次初始化
如何读取输出（
```
xr.open_zarr(...)
```
）
指向诊断工作流程以进行后处理

Ownership

职责范围

Owns: Model selection, data source compatibility, IO backend selection, nsteps calculation, generating

earth2studio.run.deterministic

scripts.

Does not own: Ensemble workflows, diagnostics, data-only fetch, installation, model training.

负责： 模型选择、数据源兼容性、IO后端选择、nsteps计算、生成

earth2studio.run.deterministic

脚本。

不负责： 集合预报工作流程、诊断分析、仅数据获取、安装操作、模型训练。

Troubleshooting

故障排除

See

references/troubleshooting.md

for common errors and solutions.

请查看

references/troubleshooting.md

获取常见错误及解决方案。

Reminders

注意事项

Always fetch live docs before recommending models or data sources - APIs change between releases
Verify lexicon compatibility - Model input variables must exist in data source's VOCAB
Use
load_default_package()
- This is the standard pattern for loading model weights
Time format is ISO 8601 - Use
```
"YYYY-MM-DDTHH:MM:SS"
```
format for the
```
time
```
argument
Wind speed needs both components - If user asks for "wind speed", include both
```
u10m
```
and
```
v10m
```

nsteps is integer division -

nsteps = total_hours // model_step_hours

ZarrBackend is the default - Only suggest alternatives if user has specific requirements
GPU is required - All prognostic models require CUDA; CPU inference is not supported

始终获取实时文档后再推荐模型或数据源——API会随版本更新而变化
验证词汇表兼容性——模型输入变量必须存在于数据源的VOCAB中
使用
load_default_package()
——这是加载模型权重的标准方式
时间格式为ISO 8601——
```
time
```
参数需使用
```
"YYYY-MM-DDTHH:MM:SS"
```
格式
风速需要两个分量——若用户要求“风速”，需同时包含
```
u10m
```
和
```
v10m
```

nsteps为整数除法——

nsteps = 总小时数 // 模型步长小时数

默认使用ZarrBackend——仅当用户有特定需求时才推荐替代方案
必须使用GPU——所有预报模型均需CUDA；不支持CPU推理