carto-spatial-autocorrelation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSpatial Autocorrelation with Moran's I
利用Moran's I进行空间自相关分析
Builds CARTO Workflows that measure spatial autocorrelation using Moran's I, determining whether a variable exhibits clustering, dispersion, or randomness, and classifying each location into HH/HL/LH/LL quadrants.
Prerequisites: Load for the development process, JSON structure, and validation commands.
carto-create-workflowWhen to use Moran's I vs Getis-Ord Gi*:
- Moran's I: "Is there clustering?" + classify into cluster types (HH, HL, LH, LL) + identify spatial outliers (HL, LH)
- Getis-Ord Gi*: "Where are the hotspots/coldspots?" + magnitude of clustering (z-scores)
构建CARTO工作流,通过Moran's I测量空间自相关,判断变量是否呈现聚类、离散或随机分布,并将每个位置分类为HH/HL/LH/LL象限。
前提条件:加载以获取开发流程、JSON结构和验证命令。
carto-create-workflowMoran's I与Getis-Ord Gi*的适用场景对比:
- Moran's I:用于回答“是否存在聚类?”+ 分类聚类类型(HH、HL、LH、LL)+ 识别空间异常值(HL、LH)
- Getis-Ord Gi*:用于回答“热点/冷点位于何处?”+ 衡量聚类强度(z值)
Instructions
操作步骤
A Moran's I workflow follows this pipeline:
Source Data -> (Filter) -> Spatial Indexing (H3) -> Aggregation -> Moran's I -> (Filter Significant) -> SaveMoran's I工作流遵循以下流程:
源数据 -> (过滤) -> 空间索引(H3)-> 聚合 -> Moran's I分析 -> (过滤显著结果) -> 保存Step 1: Load Source Data
步骤1:加载源数据
Use . The input table typically contains point geometries or pre-indexed grid data.
native.gettablebynameSuccess: Node outputs a table with a geometry column (e.g. ) or an existing spatial index column.
geom使用。输入表通常包含点几何数据或预索引的网格数据。
native.gettablebyname成功标志:节点输出包含几何列(如)或现有空间索引列的表格。
geomStep 2: Filter (if needed)
步骤2:过滤(如有需要)
Use or to narrow the dataset (e.g. filter by category, date range, non-null values).
native.wheresimplifiednative.whereSuccess: Output contains only the subset relevant to the analysis.
使用或缩小数据集范围(例如按类别、日期范围、非空值过滤)。
native.wheresimplifiednative.where成功标志:输出仅包含与分析相关的子集。
Step 3: Spatial Indexing
步骤3:空间索引
Convert point geometries to H3 cells using .
native.h3frompointResolution guidance -- higher resolution = smaller cells = more local patterns:
| Resolution | Cell size | Use case |
|---|---|---|
| H3 res 7 | ~5 km edge | District/city-level patterns |
| H3 res 8 | ~2 km edge | Neighborhood-level |
| H3 res 9 | ~500m edge | Street-level (used in Berlin POI tutorial) |
Success: Every row has a spatial index column (e.g. ).
h3使用将点几何数据转换为H3单元格。
native.h3frompoint分辨率指南——分辨率越高,单元格越小,越能捕捉局部模式:
| 分辨率 | 单元格边长 | 适用场景 |
|---|---|---|
| H3 res 7 | ~5公里 | 区域/城市级模式 |
| H3 res 8 | ~2公里 | 社区级模式 |
| H3 res 9 | ~500米 | 街道级模式(柏林POI教程中使用) |
成功标志:每一行都包含空间索引列(如)。
h3Step 4: Aggregate per Cell
步骤4:按单元格聚合
Use to produce one row per cell with a numeric value:
native.groupby- Group by: the spatial index column ()
h3 - Aggregation: (or
geoid,count/value_col,sum)value_col,avg
Success: Output has exactly one row per unique cell with a numeric column (e.g. ).
geoid_count使用生成每个单元格对应一行的数值结果:
native.groupby- 分组依据:空间索引列()
h3 - 聚合方式:(或
geoid,count/value_col,sum)value_col,avg
成功标志:输出中每个唯一单元格对应一行,且包含数值列(如)。
geoid_countStep 5: Run Moran's I
步骤5:运行Moran's I分析
Use with:
native.moransi| Input | Description | Default |
|---|---|---|
| Column with H3/Quadbin indexes | |
| Numeric column to test for autocorrelation | |
| K-ring neighborhood radius (in hops) | |
| Distance decay function for spatial weights | |
Decay options: , , , .
uniforminverseinverse_squareexponential- : Equal weight to all neighbors within the k-ring
uniform - : Weight decreases exponentially with distance (used in Berlin POI tutorial)
exponential
K-ring size: Larger = broader neighborhood = smoother global patterns. Smaller = more localized assessment. The choice of neighborhood size significantly affects results.
Success: Output contains , , , and columns for every cell. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)
indexmorans_ip_valuequadrant使用,参数如下:
native.moransi| 输入参数 | 描述 | 默认值 |
|---|---|---|
| 包含H3/Quadbin索引的列 | |
| 用于测试自相关的数值列 | |
| K环邻域半径(以跳数为单位) | |
| 空间权重的距离衰减函数 | |
衰减函数选项:、、、。
uniforminverseinverse_squareexponential- :K环内所有邻域权重相等
uniform - :权重随距离呈指数衰减(柏林POI教程中使用)
exponential
K环大小:数值越大,邻域范围越广,全局模式越平滑;数值越小,评估越本地化。邻域大小的选择对结果影响显著。
成功标志:输出包含每个单元格的、、和列。(注意“注意事项”中的提供商大小写说明——Snowflake会将这些列名显示为大写。)
indexmorans_ip_valuequadrantStep 6: Filter Significant Results (recommended)
步骤6:过滤显著结果(推荐)
Use to keep only statistically significant cells. Quadrant classification is only meaningful for significant cells.
native.whereCommon filters:
- -- all significant cells (95% confidence)
p_value < 0.05 - -- high-value clusters only
p_value < 0.05 AND quadrant = 'HH' - -- spatial outliers only
p_value < 0.05 AND (quadrant = 'HL' OR quadrant = 'LH')
Success: Only cells with statistically meaningful spatial patterns remain.
使用仅保留具有统计显著性的单元格。只有显著单元格的象限分类才有意义。
native.where常用过滤条件:
- ——所有显著单元格(95%置信度)
p_value < 0.05 - ——仅保留高值聚类
p_value < 0.05 AND quadrant = 'HH' - ——仅保留空间异常值
p_value < 0.05 AND (quadrant = 'HL' OR quadrant = 'LH')
成功标志:仅保留具有统计意义空间模式的单元格。
Step 7: Save
步骤7:保存结果
Use to persist results. The H3/Quadbin column is directly visualizable in CARTO Builder without geometry conversion.
native.saveastableSuccess: Validated workflow that can be uploaded via .
carto workflows create使用保存结果。H3/Quadbin列可直接在CARTO Builder中可视化,无需转换几何数据。
native.saveastable成功标志:生成可通过上传的验证通过的工作流。
carto workflows createOutput Columns
输出列说明
| Column | Meaning |
|---|---|
| Spatial index cell ID (H3 or Quadbin) |
| Local Moran's I value -- positive = similar neighbors, negative = dissimilar neighbors |
| Statistical significance -- lower = more confident |
| Cluster classification: |
The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.
| 列名 | 含义 |
|---|---|
| 空间索引单元格ID(H3或Quadbin) |
| 局部Moran's I值——正值表示邻域值相似,负值表示邻域值相异 |
| 统计显著性——值越低,置信度越高 |
| 聚类分类: |
引擎默认将这些列名定义为小写。Snowflake的列名规则请查看“注意事项”部分。
Interpreting Results
结果解读
Global Moran's I (overall pattern):
- > 0 = spatial clustering (similar values near each other)
- < 0 = spatial dispersion (dissimilar values near each other)
- Near 0 = spatial randomness
Local quadrants (per-cell classification):
| Quadrant | Meaning | Interpretation |
|---|---|---|
| HH | High value surrounded by high values | Cluster core |
| LL | Low value surrounded by low values | Low-value cluster |
| HL | High value surrounded by low values | Spatial outlier (high anomaly) |
| LH | Low value surrounded by high values | Spatial outlier (low anomaly) |
全局Moran's I(整体模式):
-
0 = 空间聚类(相似值聚集)
- < 0 = 空间离散(相异值相邻)
- 接近0 = 空间随机分布
局部象限(按单元格分类):
| 象限 | 含义 | 解读 |
|---|---|---|
| HH | 高值被高值包围 | 聚类核心 |
| LL | 低值被低值包围 | 低值聚类 |
| HL | 高值被低值包围 | 空间异常值(高值异常点) |
| LH | 低值被高值包围 | 空间异常值(低值异常点) |
Gotchas
注意事项
- Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference ,
H3,INDEX,MORANS_I,P_VALUE,QUADRANTin expressions. SeeGEOID_COUNTfor casing rules and SQL dialect equivalents.carto-create-workflow/references/providers/<provider>.md - The Moran's I component requires the Analytics Toolbox. Always run to ensure the AT path is resolved.
carto workflows verify-remote --connection <conn>is offline and cannot resolve AT location.carto workflows validate - The output column is named , not
indexorh3. If you need to join back to original data, rename it (e.g. withquadbin). This is the same behavior as Getis-Ord.native.renamecolumn - The must be numeric. If you are counting features, the group-by step must produce a count column -- do not pass the raw index column as the value.
valuecol - Resolution too high + large area = very many cells, which can be slow or hit memory limits. Start with a moderate resolution and refine.
- Moran's I is sensitive to the definition of neighborhood. Both k-ring size and decay function choice materially affect results. Document your choices and consider testing alternatives.
- Quadrant classification is only meaningful for statistically significant cells. Always filter by before interpreting quadrants -- non-significant cells may show any quadrant label by chance.
p_value - The decay input parameter is named (not
decay). Check the component schema if unsure.kernel
- 提供商大小写与SQL方言:本技能文档中的列名为小写(符合BigQuery / Databricks / Postgres / Redshift的约定)。在Snowflake中,未加引号的标识符会显示为大写——在表达式中需引用、
H3、INDEX、MORANS_I、P_VALUE、QUADRANT。请查看GEOID_COUNT获取大小写规则和SQL方言对应关系。carto-create-workflow/references/providers/<provider>.md - Moran's I组件需要Analytics Toolbox。请始终运行以确保AT路径已解析。
carto workflows verify-remote --connection <conn>为离线验证,无法解析AT位置。carto workflows validate - 输出列名为,而非
index或h3。如果需要与原始数据关联,请重命名该列(例如使用quadbin)。此行为与Getis-Ord一致。native.renamecolumn - 必须为数值类型。如果是统计要素数量,分组步骤必须生成计数列——请勿将原始索引列作为值传入。
valuecol - 分辨率过高+分析区域过大=单元格数量过多,可能导致运行缓慢或触发内存限制。建议从适中分辨率开始,逐步优化。
- Moran's I对邻域定义敏感。K环大小和衰减函数的选择都会对结果产生实质性影响。请记录你的选择,并考虑测试替代方案。
- 只有具有统计显著性的单元格,其象限分类才有意义。解读象限前请务必按过滤——非显著单元格的象限标签可能是随机产生的。
p_value - 衰减输入参数名为(而非
decay)。如有疑问,请查看组件架构。kernel
Reference Templates
参考模板
| Resource | Description |
|---|---|
| BQ Tutorial | Computing spatial autocorrelation of POI locations in Berlin (BigQuery) |
| SF Tutorial | Same tutorial for Snowflake |
| Workflow template | "Computing the spatial auto-correlation of point of interest locations" (available in CARTO Workspace) |
Common Variations
常见变体
| Variant | How |
|---|---|
| Pre-indexed data | Skip Step 3 if data already has H3/Quadbin column |
| Polygon input instead of points | Use |
| Complete grid (no gaps) | Polyfill study area boundary first, then enrich with data (same approach as hotspot analysis) |
| Combine with Getis-Ord | Run both analyses on the same aggregated grid, then join results for a richer picture |
| Filter to outliers only | Keep |
| 变体 | 实现方式 |
|---|---|
| 预索引数据 | 若数据已包含H3/Quadbin列,跳过步骤3 |
| 输入为面数据而非点数据 | 使用 |
| 完整网格(无间隙) | 先对研究区域边界进行Polyfill,再补充数据(与热点分析方法相同) |
| 结合Getis-Ord分析 | 在同一聚合网格上运行两种分析,然后关联结果以获取更丰富的信息 |
| 仅保留异常值 | 保留 |