xarray

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

xarray - Multi-Dimensional Geoscience Data

xarray - 多维地球科学数据

Quick Reference

快速参考

python

import xarray as xr

python

import xarray as xr

Read

读取数据

ds = xr.open_dataset('data.nc')

Access data

访问数据

temp = ds['temperature'] # DataArray values = temp.values # numpy array df = ds.to_dataframe() # pandas DataFrame

temp = ds['temperature'] # DataArray values = temp.values # numpy数组 df = ds.to_dataframe() # pandas DataFrame

Structure info

结构信息

print(ds) # Overview print(ds.dims) # Dimensions print(ds.data_vars) # Variables

print(ds) # 概览 print(ds.dims) # 维度 print(ds.data_vars) # 变量

Write

写入数据

ds.to_netcdf('output.nc')

undefined

ds.to_netcdf('output.nc')

undefined

Key Classes

核心类

Class	Purpose
`Dataset`	Collection of aligned DataArrays (like NetCDF file)
`DataArray`	Single variable with labeled dimensions
`Coordinates`	Dimension labels (time, lat, lon)

类	用途
`Dataset`	对齐的DataArray集合（类似NetCDF文件）
`DataArray`	带标记维度的单个变量
`Coordinates`	维度标签（时间、纬度、经度）

Essential Operations

关键操作

Select Data

数据选择

python

undefined

python

undefined

By coordinate value

通过坐标值选择

temp_jan = ds['temperature'].sel(time='2020-01-15') temp_region = ds['temperature'].sel(lat=slice(-30, 30), lon=slice(-60, 60))

Nearest value

By index

通过索引选择

temp_first = ds['temperature'].isel(time=0)

undefined

temp_first = ds['temperature'].isel(time=0)

undefined

Compute Statistics

统计计算

python

temp = ds['temperature']
temp_mean_time = temp.mean(dim='time')           # Spatial map
temp_mean_space = temp.mean(dim=['lat', 'lon'])  # Time series

python

temp = ds['temperature']
temp_mean_time = temp.mean(dim='time')           # 空间分布图
temp_mean_space = temp.mean(dim=['lat', 'lon'])  # 时间序列

Area-weighted mean

区域加权均值

import numpy as np weights = np.cos(np.deg2rad(ds.lat)) temp_weighted = temp.weighted(weights).mean(dim=['lat', 'lon'])

undefined

import numpy as np weights = np.cos(np.deg2rad(ds.lat)) temp_weighted = temp.weighted(weights).mean(dim=['lat', 'lon'])

undefined

GroupBy and Resample

分组与重采样

python

temp = ds['temperature']

python

temp = ds['temperature']

Temporal aggregations

时间聚合

monthly_mean = temp.groupby('time.month').mean() annual_mean = temp.groupby('time.year').mean()

Climatology and anomalies

气候态与距平

climatology = temp.groupby('time.month').mean('time') anomalies = temp.groupby('time.month') - climatology

Resample time series

时间序列重采样

monthly = temp.resample(time='1M').mean() rolling_30d = temp.rolling(time=30, center=True).mean()

undefined

monthly = temp.resample(time='1M').mean() rolling_30d = temp.rolling(time=30, center=True).mean()

undefined

Create New Dataset

创建新数据集

python

import numpy as np
import pandas as pd

times = pd.date_range('2020-01-01', periods=365, freq='D')
lats = np.linspace(-90, 90, 180)
lons = np.linspace(-180, 180, 360)

da = xr.DataArray(
    data=np.random.randn(365, 180, 360),
    dims=['time', 'lat', 'lon'],
    coords={'time': times, 'lat': lats, 'lon': lons},
    attrs={'units': 'degC', 'long_name': 'Temperature'}
)

ds = xr.Dataset({'temperature': da})
ds.to_netcdf('output.nc')

python

import numpy as np
import pandas as pd

times = pd.date_range('2020-01-01', periods=365, freq='D')
lats = np.linspace(-90, 90, 180)
lons = np.linspace(-180, 180, 360)

da = xr.DataArray(
    data=np.random.randn(365, 180, 360),
    dims=['time', 'lat', 'lon'],
    coords={'time': times, 'lat': lats, 'lon': lons},
    attrs={'units': 'degC', 'long_name': 'Temperature'}
)

ds = xr.Dataset({'temperature': da})
ds.to_netcdf('output.nc')

Masking

数据掩膜

python

temp_warm = temp.where(temp > 20)                 # Mask by condition
temp_clipped = temp.where(temp > 0, 0)            # Replace negative with 0

tropics = (ds.lat > -23.5) & (ds.lat < 23.5)
temp_tropics = temp.where(tropics, drop=True)    # Mask by coordinate

python

temp_warm = temp.where(temp > 20)                 # 按条件掩膜
temp_clipped = temp.where(temp > 0, 0)            # 将负值替换为0

tropics = (ds.lat > -23.5) & (ds.lat < 23.5)
temp_tropics = temp.where(tropics, drop=True)    # 按坐标掩膜

Large Datasets (Dask)

大型数据集处理（Dask）

python

undefined

python

undefined

Open with chunking (lazy loading)

分块打开（延迟加载）

ds = xr.open_dataset('large_file.nc', chunks={'time': 100}) ds = xr.open_mfdataset('data_*.nc', chunks='auto')

Operations are lazy until .compute()

操作延迟执行，直到调用.compute()

result = ds['temperature'].mean(dim='time').compute()

undefined

result = ds['temperature'].mean(dim='time').compute()

undefined

When to Use vs Alternatives

适用场景与替代工具对比

Tool	Best For	Limitations
xarray	Labeled multi-dim arrays, NetCDF/Zarr, Dask integration	Learning curve for newcomers from numpy
iris	Met Office climate workflows, UGRID mesh support	Smaller community, UK-centric conventions
CDO	Fast command-line climate data operations	Not Python-native, limited custom analysis
NCO	Quick NetCDF file manipulation and arithmetic	Command-line only, no visualization

Use xarray when you need labeled dimension handling, seamless NetCDF/Zarr I/O, groupby/resample operations, or Dask-based parallel processing of large datasets.

Consider alternatives when you need fast one-off command-line operations on NetCDF files (use CDO/NCO), or you work within the Met Office ecosystem with UGRID meshes (use iris).

工具	最佳适用场景	局限性
xarray	标记多维数组处理、NetCDF/Zarr读写、Dask集成	从numpy转来的新手有学习曲线
iris	英国气象局气候工作流、UGRID网格支持	社区规模较小，遵循英国本土规范
CDO	快速命令行气候数据操作	非Python原生，自定义分析能力有限
NCO	NetCDF文件快速操作与运算	仅支持命令行，无可视化功能

选择xarray的场景：需要处理带标记维度的数据、无缝读写NetCDF/Zarr文件、执行分组/重采样操作，或使用Dask进行大型数据集并行处理时。

考虑替代工具的场景：需要对NetCDF文件执行快速一次性命令行操作时（使用CDO/NCO），或在英国气象局生态系统中处理UGRID网格数据时（使用iris）。

Common Workflows

常见工作流

Climate data analysis with temporal aggregation

气候数据分析与时间聚合

Open NetCDF dataset with
```
xr.open_dataset()
```
(use
```
chunks=
```
if large)
Inspect dimensions, coordinates, and variables with
```
print(ds)
```
Select region of interest with
```
.sel(lat=slice(), lon=slice())
```
Compute climatology with
```
.groupby('time.month').mean('time')
```
Calculate anomalies by subtracting climatology from data
Compute area-weighted spatial mean using cosine latitude weights
Resample to desired temporal resolution (monthly, annual)
Save results to NetCDF with
```
.to_netcdf()
```

使用
```
xr.open_dataset()
```
打开NetCDF数据集（大型数据集使用
```
chunks=
```
参数）
通过
```
print(ds)
```
查看维度、坐标和变量信息
使用
```
.sel(lat=slice(), lon=slice())
```
选择感兴趣区域
通过
```
.groupby('time.month').mean('time')
```
计算气候态
用数据减去气候态得到距平值
使用余弦纬度权重计算区域加权空间均值
重采样至所需时间分辨率（月、年）
使用
```
.to_netcdf()
```
保存结果至NetCDF文件

Common Issues

常见问题

Issue	Solution
Memory error	Use `chunks=` for lazy loading
Time decoding fails	`decode_times=False` then manual decode
Missing coordinates	Check `ds.coords` and `ds.dims`
Alignment errors	Check coordinate values match

问题	解决方案
内存错误	使用 `chunks=` 参数进行延迟加载
时间解码失败	设置 `decode_times=False` 后手动解码
坐标缺失	检查 `ds.coords` 和 `ds.dims`
对齐错误	确认坐标值匹配

References

参考资料

I/O Formats - NetCDF, Zarr, and other formats
Computation - Aggregation and analysis methods

I/O格式 - NetCDF、Zarr及其他格式
计算 - 聚合与分析方法

Scripts

脚本

scripts/climate_analysis.py - Climate data analysis

scripts/climate_analysis.py - 气候数据分析脚本

xarray

Original

Translation

xarray - Multi-Dimensional Geoscience Data

xarray - 多维地球科学数据

Quick Reference

快速参考

Read

读取数据

Access data

访问数据

Structure info

结构信息

Write

写入数据

Key Classes

核心类

Essential Operations

关键操作

Select Data

数据选择

By coordinate value

通过坐标值选择

Nearest value

最近值选择

By index

通过索引选择

Compute Statistics

统计计算

Area-weighted mean

区域加权均值

GroupBy and Resample

分组与重采样

Temporal aggregations

时间聚合

Climatology and anomalies

气候态与距平

Resample time series

时间序列重采样

Create New Dataset

创建新数据集

Masking

数据掩膜

Large Datasets (Dask)

大型数据集处理（Dask）

Open with chunking (lazy loading)

分块打开（延迟加载）

Operations are lazy until .compute()

操作延迟执行，直到调用.compute()

When to Use vs Alternatives

适用场景与替代工具对比

Common Workflows

常见工作流

Climate data analysis with temporal aggregation

气候数据分析与时间聚合

Common Issues

常见问题

References

参考资料

Scripts

脚本