xarray

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

xarray - Multi-Dimensional Geoscience Data

xarray - 多维地球科学数据

Quick Reference

快速参考

python
import xarray as xr
python
import xarray as xr

Read

读取数据

ds = xr.open_dataset('data.nc')
ds = xr.open_dataset('data.nc')

Access data

访问数据

temp = ds['temperature'] # DataArray values = temp.values # numpy array df = ds.to_dataframe() # pandas DataFrame
temp = ds['temperature'] # DataArray values = temp.values # numpy数组 df = ds.to_dataframe() # pandas DataFrame

Structure info

结构信息

print(ds) # Overview print(ds.dims) # Dimensions print(ds.data_vars) # Variables
print(ds) # 概览 print(ds.dims) # 维度 print(ds.data_vars) # 变量

Write

写入数据

ds.to_netcdf('output.nc')
undefined
ds.to_netcdf('output.nc')
undefined

Key Classes

核心类

ClassPurpose
Dataset
Collection of aligned DataArrays (like NetCDF file)
DataArray
Single variable with labeled dimensions
Coordinates
Dimension labels (time, lat, lon)
用途
Dataset
对齐的DataArray集合(类似NetCDF文件)
DataArray
带标记维度的单个变量
Coordinates
维度标签(时间、纬度、经度)

Essential Operations

关键操作

Select Data

数据选择

python
undefined
python
undefined

By coordinate value

通过坐标值选择

temp_jan = ds['temperature'].sel(time='2020-01-15') temp_region = ds['temperature'].sel(lat=slice(-30, 30), lon=slice(-60, 60))
temp_jan = ds['temperature'].sel(time='2020-01-15') temp_region = ds['temperature'].sel(lat=slice(-30, 30), lon=slice(-60, 60))

Nearest value

最近值选择

temp_point = ds['temperature'].sel(lat=35.5, lon=-120.3, method='nearest')
temp_point = ds['temperature'].sel(lat=35.5, lon=-120.3, method='nearest')

By index

通过索引选择

temp_first = ds['temperature'].isel(time=0)
undefined
temp_first = ds['temperature'].isel(time=0)
undefined

Compute Statistics

统计计算

python
temp = ds['temperature']
temp_mean_time = temp.mean(dim='time')           # Spatial map
temp_mean_space = temp.mean(dim=['lat', 'lon'])  # Time series
python
temp = ds['temperature']
temp_mean_time = temp.mean(dim='time')           # 空间分布图
temp_mean_space = temp.mean(dim=['lat', 'lon'])  # 时间序列

Area-weighted mean

区域加权均值

import numpy as np weights = np.cos(np.deg2rad(ds.lat)) temp_weighted = temp.weighted(weights).mean(dim=['lat', 'lon'])
undefined
import numpy as np weights = np.cos(np.deg2rad(ds.lat)) temp_weighted = temp.weighted(weights).mean(dim=['lat', 'lon'])
undefined

GroupBy and Resample

分组与重采样

python
temp = ds['temperature']
python
temp = ds['temperature']

Temporal aggregations

时间聚合

monthly_mean = temp.groupby('time.month').mean() annual_mean = temp.groupby('time.year').mean()
monthly_mean = temp.groupby('time.month').mean() annual_mean = temp.groupby('time.year').mean()

Climatology and anomalies

气候态与距平

climatology = temp.groupby('time.month').mean('time') anomalies = temp.groupby('time.month') - climatology
climatology = temp.groupby('time.month').mean('time') anomalies = temp.groupby('time.month') - climatology

Resample time series

时间序列重采样

monthly = temp.resample(time='1M').mean() rolling_30d = temp.rolling(time=30, center=True).mean()
undefined
monthly = temp.resample(time='1M').mean() rolling_30d = temp.rolling(time=30, center=True).mean()
undefined

Create New Dataset

创建新数据集

python
import numpy as np
import pandas as pd

times = pd.date_range('2020-01-01', periods=365, freq='D')
lats = np.linspace(-90, 90, 180)
lons = np.linspace(-180, 180, 360)

da = xr.DataArray(
    data=np.random.randn(365, 180, 360),
    dims=['time', 'lat', 'lon'],
    coords={'time': times, 'lat': lats, 'lon': lons},
    attrs={'units': 'degC', 'long_name': 'Temperature'}
)

ds = xr.Dataset({'temperature': da})
ds.to_netcdf('output.nc')
python
import numpy as np
import pandas as pd

times = pd.date_range('2020-01-01', periods=365, freq='D')
lats = np.linspace(-90, 90, 180)
lons = np.linspace(-180, 180, 360)

da = xr.DataArray(
    data=np.random.randn(365, 180, 360),
    dims=['time', 'lat', 'lon'],
    coords={'time': times, 'lat': lats, 'lon': lons},
    attrs={'units': 'degC', 'long_name': 'Temperature'}
)

ds = xr.Dataset({'temperature': da})
ds.to_netcdf('output.nc')

Masking

数据掩膜

python
temp_warm = temp.where(temp > 20)                 # Mask by condition
temp_clipped = temp.where(temp > 0, 0)            # Replace negative with 0

tropics = (ds.lat > -23.5) & (ds.lat < 23.5)
temp_tropics = temp.where(tropics, drop=True)    # Mask by coordinate
python
temp_warm = temp.where(temp > 20)                 # 按条件掩膜
temp_clipped = temp.where(temp > 0, 0)            # 将负值替换为0

tropics = (ds.lat > -23.5) & (ds.lat < 23.5)
temp_tropics = temp.where(tropics, drop=True)    # 按坐标掩膜

Large Datasets (Dask)

大型数据集处理(Dask)

python
undefined
python
undefined

Open with chunking (lazy loading)

分块打开(延迟加载)

ds = xr.open_dataset('large_file.nc', chunks={'time': 100}) ds = xr.open_mfdataset('data_*.nc', chunks='auto')
ds = xr.open_dataset('large_file.nc', chunks={'time': 100}) ds = xr.open_mfdataset('data_*.nc', chunks='auto')

Operations are lazy until .compute()

操作延迟执行,直到调用.compute()

result = ds['temperature'].mean(dim='time').compute()
undefined
result = ds['temperature'].mean(dim='time').compute()
undefined

When to Use vs Alternatives

适用场景与替代工具对比

ToolBest ForLimitations
xarrayLabeled multi-dim arrays, NetCDF/Zarr, Dask integrationLearning curve for newcomers from numpy
irisMet Office climate workflows, UGRID mesh supportSmaller community, UK-centric conventions
CDOFast command-line climate data operationsNot Python-native, limited custom analysis
NCOQuick NetCDF file manipulation and arithmeticCommand-line only, no visualization
Use xarray when you need labeled dimension handling, seamless NetCDF/Zarr I/O, groupby/resample operations, or Dask-based parallel processing of large datasets.
Consider alternatives when you need fast one-off command-line operations on NetCDF files (use CDO/NCO), or you work within the Met Office ecosystem with UGRID meshes (use iris).
工具最佳适用场景局限性
xarray标记多维数组处理、NetCDF/Zarr读写、Dask集成从numpy转来的新手有学习曲线
iris英国气象局气候工作流、UGRID网格支持社区规模较小,遵循英国本土规范
CDO快速命令行气候数据操作非Python原生,自定义分析能力有限
NCONetCDF文件快速操作与运算仅支持命令行,无可视化功能
选择xarray的场景:需要处理带标记维度的数据、无缝读写NetCDF/Zarr文件、执行分组/重采样操作,或使用Dask进行大型数据集并行处理时。
考虑替代工具的场景:需要对NetCDF文件执行快速一次性命令行操作时(使用CDO/NCO),或在英国气象局生态系统中处理UGRID网格数据时(使用iris)。

Common Workflows

常见工作流

Climate data analysis with temporal aggregation

气候数据分析与时间聚合

  • Open NetCDF dataset with
    xr.open_dataset()
    (use
    chunks=
    if large)
  • Inspect dimensions, coordinates, and variables with
    print(ds)
  • Select region of interest with
    .sel(lat=slice(), lon=slice())
  • Compute climatology with
    .groupby('time.month').mean('time')
  • Calculate anomalies by subtracting climatology from data
  • Compute area-weighted spatial mean using cosine latitude weights
  • Resample to desired temporal resolution (monthly, annual)
  • Save results to NetCDF with
    .to_netcdf()
  • 使用
    xr.open_dataset()
    打开NetCDF数据集(大型数据集使用
    chunks=
    参数)
  • 通过
    print(ds)
    查看维度、坐标和变量信息
  • 使用
    .sel(lat=slice(), lon=slice())
    选择感兴趣区域
  • 通过
    .groupby('time.month').mean('time')
    计算气候态
  • 用数据减去气候态得到距平值
  • 使用余弦纬度权重计算区域加权空间均值
  • 重采样至所需时间分辨率(月、年)
  • 使用
    .to_netcdf()
    保存结果至NetCDF文件

Common Issues

常见问题

IssueSolution
Memory errorUse
chunks=
for lazy loading
Time decoding fails
decode_times=False
then manual decode
Missing coordinatesCheck
ds.coords
and
ds.dims
Alignment errorsCheck coordinate values match
问题解决方案
内存错误使用
chunks=
参数进行延迟加载
时间解码失败设置
decode_times=False
后手动解码
坐标缺失检查
ds.coords
ds.dims
对齐错误确认坐标值匹配

References

参考资料

  • I/O Formats - NetCDF, Zarr, and other formats
  • Computation - Aggregation and analysis methods
  • I/O格式 - NetCDF、Zarr及其他格式
  • 计算 - 聚合与分析方法

Scripts

脚本

  • scripts/climate_analysis.py - Climate data analysis
  • scripts/climate_analysis.py - 气候数据分析脚本