carto-spatial-autocorrelation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Spatial Autocorrelation with Moran's I

利用Moran's I进行空间自相关分析

Builds CARTO Workflows that measure spatial autocorrelation using Moran's I, determining whether a variable exhibits clustering, dispersion, or randomness, and classifying each location into HH/HL/LH/LL quadrants.
Prerequisites: Load
carto-create-workflow
for the development process, JSON structure, and validation commands.
When to use Moran's I vs Getis-Ord Gi*:
  • Moran's I: "Is there clustering?" + classify into cluster types (HH, HL, LH, LL) + identify spatial outliers (HL, LH)
  • Getis-Ord Gi*: "Where are the hotspots/coldspots?" + magnitude of clustering (z-scores)

构建CARTO工作流,通过Moran's I测量空间自相关,判断变量是否呈现聚类、离散或随机分布,并将每个位置分类为HH/HL/LH/LL象限。
前提条件:加载
carto-create-workflow
以获取开发流程、JSON结构和验证命令。
Moran's I与Getis-Ord Gi*的适用场景对比
  • Moran's I:用于回答“是否存在聚类?”+ 分类聚类类型(HH、HL、LH、LL)+ 识别空间异常值(HL、LH)
  • Getis-Ord Gi*:用于回答“热点/冷点位于何处?”+ 衡量聚类强度(z值)

Instructions

操作步骤

A Moran's I workflow follows this pipeline:
Source Data -> (Filter) -> Spatial Indexing (H3) -> Aggregation -> Moran's I -> (Filter Significant) -> Save
Moran's I工作流遵循以下流程:
源数据 -> (过滤) -> 空间索引(H3)-> 聚合 -> Moran's I分析 -> (过滤显著结果) -> 保存

Step 1: Load Source Data

步骤1:加载源数据

Use
native.gettablebyname
. The input table typically contains point geometries or pre-indexed grid data.
Success: Node outputs a table with a geometry column (e.g.
geom
) or an existing spatial index column.
使用
native.gettablebyname
。输入表通常包含点几何数据或预索引的网格数据。
成功标志:节点输出包含几何列(如
geom
)或现有空间索引列的表格。

Step 2: Filter (if needed)

步骤2:过滤(如有需要)

Use
native.wheresimplified
or
native.where
to narrow the dataset (e.g. filter by category, date range, non-null values).
Success: Output contains only the subset relevant to the analysis.
使用
native.wheresimplified
native.where
缩小数据集范围(例如按类别、日期范围、非空值过滤)。
成功标志:输出仅包含与分析相关的子集。

Step 3: Spatial Indexing

步骤3:空间索引

Convert point geometries to H3 cells using
native.h3frompoint
.
Resolution guidance -- higher resolution = smaller cells = more local patterns:
ResolutionCell sizeUse case
H3 res 7~5 km edgeDistrict/city-level patterns
H3 res 8~2 km edgeNeighborhood-level
H3 res 9~500m edgeStreet-level (used in Berlin POI tutorial)
Success: Every row has a spatial index column (e.g.
h3
).
使用
native.h3frompoint
将点几何数据转换为H3单元格。
分辨率指南——分辨率越高,单元格越小,越能捕捉局部模式:
分辨率单元格边长适用场景
H3 res 7~5公里区域/城市级模式
H3 res 8~2公里社区级模式
H3 res 9~500米街道级模式(柏林POI教程中使用)
成功标志:每一行都包含空间索引列(如
h3
)。

Step 4: Aggregate per Cell

步骤4:按单元格聚合

Use
native.groupby
to produce one row per cell with a numeric value:
  • Group by: the spatial index column (
    h3
    )
  • Aggregation:
    geoid,count
    (or
    value_col,sum
    /
    value_col,avg
    )
Success: Output has exactly one row per unique cell with a numeric column (e.g.
geoid_count
).
使用
native.groupby
生成每个单元格对应一行的数值结果:
  • 分组依据:空间索引列(
    h3
  • 聚合方式
    geoid,count
    (或
    value_col,sum
    /
    value_col,avg
成功标志:输出中每个唯一单元格对应一行,且包含数值列(如
geoid_count
)。

Step 5: Run Moran's I

步骤5:运行Moran's I分析

Use
native.moransi
with:
InputDescriptionDefault
indexcol
Column with H3/Quadbin indexes
h3
valuecol
Numeric column to test for autocorrelation
geoid_count
size
K-ring neighborhood radius (in hops)
3
decay
Distance decay function for spatial weights
uniform
Decay options:
uniform
,
inverse
,
inverse_square
,
exponential
.
  • uniform
    : Equal weight to all neighbors within the k-ring
  • exponential
    : Weight decreases exponentially with distance (used in Berlin POI tutorial)
K-ring size: Larger = broader neighborhood = smoother global patterns. Smaller = more localized assessment. The choice of neighborhood size significantly affects results.
Success: Output contains
index
,
morans_i
,
p_value
, and
quadrant
columns for every cell. (See the Provider casing note in Gotchas — Snowflake surfaces these UPPERCASE.)
使用
native.moransi
,参数如下:
输入参数描述默认值
indexcol
包含H3/Quadbin索引的列
h3
valuecol
用于测试自相关的数值列
geoid_count
size
K环邻域半径(以跳数为单位)
3
decay
空间权重的距离衰减函数
uniform
衰减函数选项
uniform
inverse
inverse_square
exponential
  • uniform
    :K环内所有邻域权重相等
  • exponential
    :权重随距离呈指数衰减(柏林POI教程中使用)
K环大小:数值越大,邻域范围越广,全局模式越平滑;数值越小,评估越本地化。邻域大小的选择对结果影响显著。
成功标志:输出包含每个单元格的
index
morans_i
p_value
quadrant
列。(注意“注意事项”中的提供商大小写说明——Snowflake会将这些列名显示为大写。)

Step 6: Filter Significant Results (recommended)

步骤6:过滤显著结果(推荐)

Use
native.where
to keep only statistically significant cells. Quadrant classification is only meaningful for significant cells.
Common filters:
  • p_value < 0.05
    -- all significant cells (95% confidence)
  • p_value < 0.05 AND quadrant = 'HH'
    -- high-value clusters only
  • p_value < 0.05 AND (quadrant = 'HL' OR quadrant = 'LH')
    -- spatial outliers only
Success: Only cells with statistically meaningful spatial patterns remain.
使用
native.where
仅保留具有统计显著性的单元格。只有显著单元格的象限分类才有意义。
常用过滤条件:
  • p_value < 0.05
    ——所有显著单元格(95%置信度)
  • p_value < 0.05 AND quadrant = 'HH'
    ——仅保留高值聚类
  • p_value < 0.05 AND (quadrant = 'HL' OR quadrant = 'LH')
    ——仅保留空间异常值
成功标志:仅保留具有统计意义空间模式的单元格。

Step 7: Save

步骤7:保存结果

Use
native.saveastable
to persist results. The H3/Quadbin column is directly visualizable in CARTO Builder without geometry conversion.
Success: Validated workflow that can be uploaded via
carto workflows create
.

使用
native.saveastable
保存结果。H3/Quadbin列可直接在CARTO Builder中可视化,无需转换几何数据。
成功标志:生成可通过
carto workflows create
上传的验证通过的工作流。

Output Columns

输出列说明

ColumnMeaning
index
Spatial index cell ID (H3 or Quadbin)
morans_i
Local Moran's I value -- positive = similar neighbors, negative = dissimilar neighbors
p_value
Statistical significance -- lower = more confident
quadrant
Cluster classification:
HH
,
HL
,
LH
, or
LL
The engine declares these lowercase. See the Provider casing note in Gotchas for Snowflake.
列名含义
index
空间索引单元格ID(H3或Quadbin)
morans_i
局部Moran's I值——正值表示邻域值相似,负值表示邻域值相异
p_value
统计显著性——值越低,置信度越高
quadrant
聚类分类:
HH
HL
LH
LL
引擎默认将这些列名定义为小写。Snowflake的列名规则请查看“注意事项”部分。

Interpreting Results

结果解读

Global Moran's I (overall pattern):
  • > 0 = spatial clustering (similar values near each other)
  • < 0 = spatial dispersion (dissimilar values near each other)
  • Near 0 = spatial randomness
Local quadrants (per-cell classification):
QuadrantMeaningInterpretation
HHHigh value surrounded by high valuesCluster core
LLLow value surrounded by low valuesLow-value cluster
HLHigh value surrounded by low valuesSpatial outlier (high anomaly)
LHLow value surrounded by high valuesSpatial outlier (low anomaly)

全局Moran's I(整体模式):
  • 0 = 空间聚类(相似值聚集)
  • < 0 = 空间离散(相异值相邻)
  • 接近0 = 空间随机分布
局部象限(按单元格分类):
象限含义解读
HH高值被高值包围聚类核心
LL低值被低值包围低值聚类
HL高值被低值包围空间异常值(高值异常点)
LH低值被高值包围空间异常值(低值异常点)

Gotchas

注意事项

  • Provider casing & SQL dialect. This skill documents columns in lowercase (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
    H3
    ,
    INDEX
    ,
    MORANS_I
    ,
    P_VALUE
    ,
    QUADRANT
    ,
    GEOID_COUNT
    in expressions. See
    carto-create-workflow/references/providers/<provider>.md
    for casing rules and SQL dialect equivalents.
  • The Moran's I component requires the Analytics Toolbox. Always run
    carto workflows verify-remote --connection <conn>
    to ensure the AT path is resolved.
    carto workflows validate
    is offline and cannot resolve AT location.
  • The output column is named
    index
    , not
    h3
    or
    quadbin
    . If you need to join back to original data, rename it (e.g. with
    native.renamecolumn
    ). This is the same behavior as Getis-Ord.
  • The
    valuecol
    must be numeric. If you are counting features, the group-by step must produce a count column -- do not pass the raw index column as the value.
  • Resolution too high + large area = very many cells, which can be slow or hit memory limits. Start with a moderate resolution and refine.
  • Moran's I is sensitive to the definition of neighborhood. Both k-ring size and decay function choice materially affect results. Document your choices and consider testing alternatives.
  • Quadrant classification is only meaningful for statistically significant cells. Always filter by
    p_value
    before interpreting quadrants -- non-significant cells may show any quadrant label by chance.
  • The decay input parameter is named
    decay
    (not
    kernel
    ). Check the component schema if unsure.

  • 提供商大小写与SQL方言:本技能文档中的列名为小写(符合BigQuery / Databricks / Postgres / Redshift的约定)。在Snowflake中,未加引号的标识符会显示为大写——在表达式中需引用
    H3
    INDEX
    MORANS_I
    P_VALUE
    QUADRANT
    GEOID_COUNT
    。请查看
    carto-create-workflow/references/providers/<provider>.md
    获取大小写规则和SQL方言对应关系。
  • Moran's I组件需要Analytics Toolbox。请始终运行
    carto workflows verify-remote --connection <conn>
    以确保AT路径已解析。
    carto workflows validate
    为离线验证,无法解析AT位置。
  • 输出列名为
    index
    ,而非
    h3
    quadbin
    。如果需要与原始数据关联,请重命名该列(例如使用
    native.renamecolumn
    )。此行为与Getis-Ord一致。
  • valuecol
    必须为数值类型。如果是统计要素数量,分组步骤必须生成计数列——请勿将原始索引列作为值传入。
  • 分辨率过高+分析区域过大=单元格数量过多,可能导致运行缓慢或触发内存限制。建议从适中分辨率开始,逐步优化。
  • Moran's I对邻域定义敏感。K环大小和衰减函数的选择都会对结果产生实质性影响。请记录你的选择,并考虑测试替代方案。
  • 只有具有统计显著性的单元格,其象限分类才有意义。解读象限前请务必按
    p_value
    过滤——非显著单元格的象限标签可能是随机产生的。
  • 衰减输入参数名为
    decay
    (而非
    kernel
    )。如有疑问,请查看组件架构。

Reference Templates

参考模板

ResourceDescription
BQ TutorialComputing spatial autocorrelation of POI locations in Berlin (BigQuery)
SF TutorialSame tutorial for Snowflake
Workflow template"Computing the spatial auto-correlation of point of interest locations" (available in CARTO Workspace)

资源描述
BQ教程计算柏林POI位置的空间自相关(BigQuery)
SF教程适用于Snowflake的同款教程
工作流模板“计算兴趣点位置的空间自相关”(在CARTO Workspace中可用)

Common Variations

常见变体

VariantHow
Pre-indexed dataSkip Step 3 if data already has H3/Quadbin column
Polygon input instead of pointsUse
native.h3polyfill
instead of
native.h3frompoint
Complete grid (no gaps)Polyfill study area boundary first, then enrich with data (same approach as hotspot analysis)
Combine with Getis-OrdRun both analyses on the same aggregated grid, then join results for a richer picture
Filter to outliers onlyKeep
HL
and
LH
quadrants to find anomalous locations
变体实现方式
预索引数据若数据已包含H3/Quadbin列,跳过步骤3
输入为面数据而非点数据使用
native.h3polyfill
替代
native.h3frompoint
完整网格(无间隙)先对研究区域边界进行Polyfill,再补充数据(与热点分析方法相同)
结合Getis-Ord分析在同一聚合网格上运行两种分析,然后关联结果以获取更丰富的信息
仅保留异常值保留
HL
LH
象限以查找异常位置