earth2studio-deterministic-forecast

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Earth2Studio Deterministic Forecast Skill

Earth2Studio确定性预报技能指南

Guide users through building deterministic (single-member) weather forecast inference scripts using
earth2studio.run.deterministic
.
指导用户使用
earth2studio.run.deterministic
构建确定性(单成员)气象预报推理脚本。

Prerequisites

前提条件

  • Earth2Studio installed with CUDA-capable GPU
  • Python 3.10+, network access for model weights and data
  • 已安装Earth2Studio且具备支持CUDA的GPU
  • Python 3.10及以上版本,可联网获取模型权重与数据

Live Doc References

实时文档参考

Workflow

工作流程

1. Gather Requirements (skip what's already provided)

1. 收集需求(跳过已提供的内容)

  • Time horizon (hours/days/weeks)
  • Variables of interest (t2m, wind, geopotential, etc.)
  • Region (global or specific like CONUS)
  • GPU/VRAM available
  • 时间范围(小时/天/周)
  • 关注变量(如t2m、风、位势等)
  • 区域(全球或特定区域如CONUS)
  • 可用GPU/显存

2. Select Model

2. 选择模型

Fetch prognostic models page. Filter by time horizon, region, VRAM. Note model's:
  • Input variables (
    input_coords["variable"]
    )
  • Time step size (
    output_coords["lead_time"]
    )
获取预报模型页面,按时间范围、区域、显存筛选。注意模型的:
  • 输入变量(
    input_coords["variable"]
  • 时间步长(
    output_coords["lead_time"]

3. Select Data Source

3. 选择数据源

Data source must provide all model input variables. Verify via lexicon at
earth2studio/lexicon/<source>.py
. Common pairings: Global models → GFS/ARCO/IFS; Regional → HRRR.
数据源必须提供模型所需的所有输入变量。可通过
earth2studio/lexicon/<source>.py
中的词汇表验证。常见搭配:全球模型→GFS/ARCO/IFS;区域模型→HRRR。

4. Select IO Backend

4. 选择IO后端

Default:
ZarrBackend
. Use
NetCDF4Backend
for legacy tools,
XarrayBackend
for in-memory/small runs.
默认使用
ZarrBackend
。若使用旧工具可选择
NetCDF4Backend
,内存内运行或小规模任务可选择
XarrayBackend

5. Calculate nsteps

5. 计算nsteps

nsteps = forecast_hours / model_step_hours
Example: 5-day forecast with 6h step →
nsteps = 120 / 6 = 20
nsteps = 预报小时数 / 模型步长小时数
示例:6小时步长的5天预报 →
nsteps = 120 / 6 = 20

6. Decide: output_coords Filtering

6. 决定:output_coords过滤

  • Filter variables (
    output_coords
    ) when user requests specific variables (e.g., "t2m and wind") - reduces output size
  • Save all variables (omit
    output_coords
    ) when user says "all variables" or doesn't specify - preserves full model output
  • 当用户请求特定变量(如“t2m和风”)时,过滤变量
    output_coords
    )——可减小输出体积
  • 当用户要求“所有变量”或未指定时,保存所有变量(省略
    output_coords
    )——保留完整模型输出

7. Generate Script

7. 生成脚本

python
from collections import OrderedDict
import numpy as np
import torch
from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

model = <ModelClass>.load_model(<ModelClass>.load_default_package())
data = <DataSourceClass>()
io = <IOBackendClass>("<output_path>")
python
from collections import OrderedDict
import numpy as np
import torch
from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

model = <ModelClass>.load_model(<ModelClass>.load_default_package())
data = <DataSourceClass>()
io = <IOBackendClass>("<output_path>")

Include output_coords ONLY if user requested specific variables

仅当用户请求特定变量时才包含output_coords

output_coords = OrderedDict({"variable": np.array(["t2m", "u10m"])})
io = deterministic( time=["YYYY-MM-DDTHH:MM:SS"], nsteps=<N>, prognostic=model, data=data, io=io, output_coords=output_coords, # omit if saving all variables device=torch.device("cuda"), )
undefined
output_coords = OrderedDict({"variable": np.array(["t2m", "u10m"])})
io = deterministic( time=["YYYY-MM-DDTHH:MM:SS"], nsteps=<N>, prognostic=model, data=data, io=io, output_coords=output_coords, # 保存所有变量时省略 device=torch.device("cuda"), )
undefined

8. Manual Loop Alternative

8. 手动循环替代方案

When user explicitly requests manual implementation (NOT using
earth2studio.run.deterministic
), follow this checklist in order:
  1. fetch_data - Get initial conditions:
    x, coords = fetch_data(data, time, model.input_coords, device)
  2. Setup total_coords - Build coordinate arrays for time and lead_time dimensions
  3. io.add_array - Initialize IO backend with total_coords before loop
  4. create_iterator - Create prognostic iterator:
    model_iter = model.create_iterator(x, coords)
  5. Loop through nsteps -
    for step, (x, coords) in enumerate(model_iter): if step >= nsteps: break
  6. map_coords - Filter output variables if needed:
    x_out, coords_out = map_coords(x, coords, output_coords)
  7. split_coords - Prepare for IO write:
    x_out, coords_out = split_coords(x_out, coords_out)
  8. io.write - Write each step to backend
当用户明确要求手动实现(不使用
earth2studio.run.deterministic
)时,按以下顺序执行:
  1. fetch_data - 获取初始条件:
    x, coords = fetch_data(data, time, model.input_coords, device)
  2. 设置total_coords - 构建时间和预报时效维度的坐标数组
  3. io.add_array - 在循环前用total_coords初始化IO后端
  4. create_iterator - 创建预报迭代器:
    model_iter = model.create_iterator(x, coords)
  5. 循环nsteps次 -
    for step, (x, coords) in enumerate(model_iter): if step >= nsteps: break
  6. map_coords - 按需过滤输出变量:
    x_out, coords_out = map_coords(x, coords, output_coords)
  7. split_coords - 为IO写入做准备:
    x_out, coords_out = split_coords(x_out, coords_out)
  8. io.write - 将每一步结果写入后端

9. Explain Next Steps

9. 说明后续步骤

  • How to change forecast time or run multiple initializations
  • How to read output (
    xr.open_zarr(...)
    )
  • Point to diagnostic workflow for post-processing
  • 如何更改预报时间或运行多次初始化
  • 如何读取输出(
    xr.open_zarr(...)
  • 指向诊断工作流程以进行后处理

Ownership

职责范围

Owns: Model selection, data source compatibility, IO backend selection, nsteps calculation, generating
earth2studio.run.deterministic
scripts.
Does not own: Ensemble workflows, diagnostics, data-only fetch, installation, model training.
负责: 模型选择、数据源兼容性、IO后端选择、nsteps计算、生成
earth2studio.run.deterministic
脚本。
不负责: 集合预报工作流程、诊断分析、仅数据获取、安装操作、模型训练。

Troubleshooting

故障排除

See
references/troubleshooting.md
for common errors and solutions.
请查看
references/troubleshooting.md
获取常见错误及解决方案。

Reminders

注意事项

  • Always fetch live docs before recommending models or data sources - APIs change between releases
  • Verify lexicon compatibility - Model input variables must exist in data source's VOCAB
  • Use
    load_default_package()
    - This is the standard pattern for loading model weights
  • Time format is ISO 8601 - Use
    "YYYY-MM-DDTHH:MM:SS"
    format for the
    time
    argument
  • Wind speed needs both components - If user asks for "wind speed", include both
    u10m
    and
    v10m
  • nsteps is integer division -
    nsteps = total_hours // model_step_hours
  • ZarrBackend is the default - Only suggest alternatives if user has specific requirements
  • GPU is required - All prognostic models require CUDA; CPU inference is not supported
  • 始终获取实时文档后再推荐模型或数据源——API会随版本更新而变化
  • 验证词汇表兼容性——模型输入变量必须存在于数据源的VOCAB中
  • 使用
    load_default_package()
    ——这是加载模型权重的标准方式
  • 时间格式为ISO 8601——
    time
    参数需使用
    "YYYY-MM-DDTHH:MM:SS"
    格式
  • 风速需要两个分量——若用户要求“风速”,需同时包含
    u10m
    v10m
  • nsteps为整数除法——
    nsteps = 总小时数 // 模型步长小时数
  • 默认使用ZarrBackend——仅当用户有特定需求时才推荐替代方案
  • 必须使用GPU——所有预报模型均需CUDA;不支持CPU推理