xarray
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesexarray - Multi-Dimensional Geoscience Data
xarray - 多维地球科学数据
Quick Reference
快速参考
python
import xarray as xrpython
import xarray as xrRead
读取数据
ds = xr.open_dataset('data.nc')
ds = xr.open_dataset('data.nc')
Access data
访问数据
temp = ds['temperature'] # DataArray
values = temp.values # numpy array
df = ds.to_dataframe() # pandas DataFrame
temp = ds['temperature'] # DataArray
values = temp.values # numpy数组
df = ds.to_dataframe() # pandas DataFrame
Structure info
结构信息
print(ds) # Overview
print(ds.dims) # Dimensions
print(ds.data_vars) # Variables
print(ds) # 概览
print(ds.dims) # 维度
print(ds.data_vars) # 变量
Write
写入数据
ds.to_netcdf('output.nc')
undefinedds.to_netcdf('output.nc')
undefinedKey Classes
核心类
| Class | Purpose |
|---|---|
| Collection of aligned DataArrays (like NetCDF file) |
| Single variable with labeled dimensions |
| Dimension labels (time, lat, lon) |
| 类 | 用途 |
|---|---|
| 对齐的DataArray集合(类似NetCDF文件) |
| 带标记维度的单个变量 |
| 维度标签(时间、纬度、经度) |
Essential Operations
关键操作
Select Data
数据选择
python
undefinedpython
undefinedBy coordinate value
通过坐标值选择
temp_jan = ds['temperature'].sel(time='2020-01-15')
temp_region = ds['temperature'].sel(lat=slice(-30, 30), lon=slice(-60, 60))
temp_jan = ds['temperature'].sel(time='2020-01-15')
temp_region = ds['temperature'].sel(lat=slice(-30, 30), lon=slice(-60, 60))
Nearest value
最近值选择
temp_point = ds['temperature'].sel(lat=35.5, lon=-120.3, method='nearest')
temp_point = ds['temperature'].sel(lat=35.5, lon=-120.3, method='nearest')
By index
通过索引选择
temp_first = ds['temperature'].isel(time=0)
undefinedtemp_first = ds['temperature'].isel(time=0)
undefinedCompute Statistics
统计计算
python
temp = ds['temperature']
temp_mean_time = temp.mean(dim='time') # Spatial map
temp_mean_space = temp.mean(dim=['lat', 'lon']) # Time seriespython
temp = ds['temperature']
temp_mean_time = temp.mean(dim='time') # 空间分布图
temp_mean_space = temp.mean(dim=['lat', 'lon']) # 时间序列Area-weighted mean
区域加权均值
import numpy as np
weights = np.cos(np.deg2rad(ds.lat))
temp_weighted = temp.weighted(weights).mean(dim=['lat', 'lon'])
undefinedimport numpy as np
weights = np.cos(np.deg2rad(ds.lat))
temp_weighted = temp.weighted(weights).mean(dim=['lat', 'lon'])
undefinedGroupBy and Resample
分组与重采样
python
temp = ds['temperature']python
temp = ds['temperature']Temporal aggregations
时间聚合
monthly_mean = temp.groupby('time.month').mean()
annual_mean = temp.groupby('time.year').mean()
monthly_mean = temp.groupby('time.month').mean()
annual_mean = temp.groupby('time.year').mean()
Climatology and anomalies
气候态与距平
climatology = temp.groupby('time.month').mean('time')
anomalies = temp.groupby('time.month') - climatology
climatology = temp.groupby('time.month').mean('time')
anomalies = temp.groupby('time.month') - climatology
Resample time series
时间序列重采样
monthly = temp.resample(time='1M').mean()
rolling_30d = temp.rolling(time=30, center=True).mean()
undefinedmonthly = temp.resample(time='1M').mean()
rolling_30d = temp.rolling(time=30, center=True).mean()
undefinedCreate New Dataset
创建新数据集
python
import numpy as np
import pandas as pd
times = pd.date_range('2020-01-01', periods=365, freq='D')
lats = np.linspace(-90, 90, 180)
lons = np.linspace(-180, 180, 360)
da = xr.DataArray(
data=np.random.randn(365, 180, 360),
dims=['time', 'lat', 'lon'],
coords={'time': times, 'lat': lats, 'lon': lons},
attrs={'units': 'degC', 'long_name': 'Temperature'}
)
ds = xr.Dataset({'temperature': da})
ds.to_netcdf('output.nc')python
import numpy as np
import pandas as pd
times = pd.date_range('2020-01-01', periods=365, freq='D')
lats = np.linspace(-90, 90, 180)
lons = np.linspace(-180, 180, 360)
da = xr.DataArray(
data=np.random.randn(365, 180, 360),
dims=['time', 'lat', 'lon'],
coords={'time': times, 'lat': lats, 'lon': lons},
attrs={'units': 'degC', 'long_name': 'Temperature'}
)
ds = xr.Dataset({'temperature': da})
ds.to_netcdf('output.nc')Masking
数据掩膜
python
temp_warm = temp.where(temp > 20) # Mask by condition
temp_clipped = temp.where(temp > 0, 0) # Replace negative with 0
tropics = (ds.lat > -23.5) & (ds.lat < 23.5)
temp_tropics = temp.where(tropics, drop=True) # Mask by coordinatepython
temp_warm = temp.where(temp > 20) # 按条件掩膜
temp_clipped = temp.where(temp > 0, 0) # 将负值替换为0
tropics = (ds.lat > -23.5) & (ds.lat < 23.5)
temp_tropics = temp.where(tropics, drop=True) # 按坐标掩膜Large Datasets (Dask)
大型数据集处理(Dask)
python
undefinedpython
undefinedOpen with chunking (lazy loading)
分块打开(延迟加载)
ds = xr.open_dataset('large_file.nc', chunks={'time': 100})
ds = xr.open_mfdataset('data_*.nc', chunks='auto')
ds = xr.open_dataset('large_file.nc', chunks={'time': 100})
ds = xr.open_mfdataset('data_*.nc', chunks='auto')
Operations are lazy until .compute()
操作延迟执行,直到调用.compute()
result = ds['temperature'].mean(dim='time').compute()
undefinedresult = ds['temperature'].mean(dim='time').compute()
undefinedWhen to Use vs Alternatives
适用场景与替代工具对比
| Tool | Best For | Limitations |
|---|---|---|
| xarray | Labeled multi-dim arrays, NetCDF/Zarr, Dask integration | Learning curve for newcomers from numpy |
| iris | Met Office climate workflows, UGRID mesh support | Smaller community, UK-centric conventions |
| CDO | Fast command-line climate data operations | Not Python-native, limited custom analysis |
| NCO | Quick NetCDF file manipulation and arithmetic | Command-line only, no visualization |
Use xarray when you need labeled dimension handling, seamless NetCDF/Zarr I/O,
groupby/resample operations, or Dask-based parallel processing of large datasets.
Consider alternatives when you need fast one-off command-line operations on NetCDF
files (use CDO/NCO), or you work within the Met Office ecosystem with UGRID meshes
(use iris).
| 工具 | 最佳适用场景 | 局限性 |
|---|---|---|
| xarray | 标记多维数组处理、NetCDF/Zarr读写、Dask集成 | 从numpy转来的新手有学习曲线 |
| iris | 英国气象局气候工作流、UGRID网格支持 | 社区规模较小,遵循英国本土规范 |
| CDO | 快速命令行气候数据操作 | 非Python原生,自定义分析能力有限 |
| NCO | NetCDF文件快速操作与运算 | 仅支持命令行,无可视化功能 |
选择xarray的场景:需要处理带标记维度的数据、无缝读写NetCDF/Zarr文件、执行分组/重采样操作,或使用Dask进行大型数据集并行处理时。
考虑替代工具的场景:需要对NetCDF文件执行快速一次性命令行操作时(使用CDO/NCO),或在英国气象局生态系统中处理UGRID网格数据时(使用iris)。
Common Workflows
常见工作流
Climate data analysis with temporal aggregation
气候数据分析与时间聚合
- Open NetCDF dataset with (use
xr.open_dataset()if large)chunks= - Inspect dimensions, coordinates, and variables with
print(ds) - Select region of interest with
.sel(lat=slice(), lon=slice()) - Compute climatology with
.groupby('time.month').mean('time') - Calculate anomalies by subtracting climatology from data
- Compute area-weighted spatial mean using cosine latitude weights
- Resample to desired temporal resolution (monthly, annual)
- Save results to NetCDF with
.to_netcdf()
- 使用打开NetCDF数据集(大型数据集使用
xr.open_dataset()参数)chunks= - 通过查看维度、坐标和变量信息
print(ds) - 使用选择感兴趣区域
.sel(lat=slice(), lon=slice()) - 通过计算气候态
.groupby('time.month').mean('time') - 用数据减去气候态得到距平值
- 使用余弦纬度权重计算区域加权空间均值
- 重采样至所需时间分辨率(月、年)
- 使用保存结果至NetCDF文件
.to_netcdf()
Common Issues
常见问题
| Issue | Solution |
|---|---|
| Memory error | Use |
| Time decoding fails | |
| Missing coordinates | Check |
| Alignment errors | Check coordinate values match |
| 问题 | 解决方案 |
|---|---|
| 内存错误 | 使用 |
| 时间解码失败 | 设置 |
| 坐标缺失 | 检查 |
| 对齐错误 | 确认坐标值匹配 |
References
参考资料
- I/O Formats - NetCDF, Zarr, and other formats
- Computation - Aggregation and analysis methods
- I/O格式 - NetCDF、Zarr及其他格式
- 计算 - 聚合与分析方法
Scripts
脚本
- scripts/climate_analysis.py - Climate data analysis
- scripts/climate_analysis.py - 气候数据分析脚本