carto-spatial-enrichment

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Spatial Enrichment in CARTO Workflows

CARTO工作流中的空间富集

Prerequisites: Load
carto-create-workflow
for the development process.
This skill covers the universal pattern for enriching spatial data with demographics, risk scores, or any variable from a spatial features dataset.

前置条件:开发过程中需加载
carto-create-workflow
本技能涵盖了使用人口统计数据、风险评分或空间特征数据集中的任意变量来富集空间数据的通用模式。

Instructions

操作步骤

Follow the 5-step universal enrichment pattern. Each step maps to one or more workflow components.
遵循以下5步通用富集模式,每个步骤对应一个或多个工作流组件。

Step 1: Load source data

步骤1:加载源数据

Load the business entities (stores, points, polygons) that need enrichment.
  • Points or polygons from a table: use a
    ReadTable
    source node
  • Custom geometry: use
    native.tablefromgeojson
    to inline a GeoJSON polygon
加载需要进行富集的业务实体(门店、点、多边形)。
  • 来自表格的点或多边形:使用
    ReadTable
    源节点
  • 自定义几何图形:使用
    native.tablefromgeojson
    嵌入GeoJSON多边形

Step 2: Define target area

步骤2:定义目标区域

Choose one method based on the use case:
MethodComponentWhen to use
Buffer
native.buffer
Distance-based area (e.g. 1km around each store)
Isochrones
native.isolines
Drive-time or walk-time areas
Direct polygon(none needed)Source data is already polygons
Direct points(none needed)Skip to Step 4 with
native.enrichpoints
根据使用场景选择以下方法之一:
方法组件使用场景
缓冲区
native.buffer
基于距离的区域(例如每个门店周边1公里范围)
等时线
native.isolines
驾车或步行可达区域
直接使用多边形(无需组件)源数据本身已是多边形
直接使用点(无需组件)跳过步骤3,直接使用
native.enrichpoints

Step 3: Spatial indexing (polyfill)

步骤3:空间索引(多边形填充)

Convert areas into a grid for enrichment. This step is required when using grid-based enrichment (
native.h3enrich
) or manual JOIN.
  • H3 grid (most common):
    native.h3polyfill
    -- set
    resolution
    to match the enrichment dataset
  • Quadbin grid:
    native.quadbinpolyfill
    -- set
    resolution
    to match the enrichment dataset
Key decision -- index type: Use H3 unless the enrichment data is natively in Quadbin.
将区域转换为网格以进行富集。当使用基于网格的富集(
native.h3enrich
)或手动JOIN时,此步骤为必填项。
  • H3网格(最常用):
    native.h3polyfill
    —— 设置
    resolution
    以匹配富集数据集
  • Quadbin网格:
    native.quadbinpolyfill
    —— 设置
    resolution
    以匹配富集数据集
关键决策——索引类型:除非富集数据原生为Quadbin格式,否则默认使用H3。

Step 4: Enrich

步骤4:执行富集

Two approaches, each with different column naming:
A) CARTO ENRICH procedures (recommended for Data Observatory or spatial features data):
  • native.h3enrich
    -- enrich an H3 grid
  • native.enrichpoints
    -- enrich points directly (skip Step 3)
  • native.enrichpolygons
    -- enrich polygons directly (skip Step 3)
  • Output columns are named
    {variable}_{aggregation}
    (e.g.
    population_sum
    ,
    air_quality_avg
    )
B) Manual JOIN on the spatial index column:
  • Use
    native.join
    with the H3/Quadbin column as the join key
  • Output columns from the secondary table get a
    _joined
    suffix
  • Default is INNER JOIN (silently drops unmatched cells)
Aggregation method guidance:
  • SUM
    -- population counts, totals
  • MAX
    /
    MIN
    -- risk scores, thresholds
  • AVG
    -- quality metrics, indices
有两种方法,各自的列命名规则不同:
A) CARTO ENRICH 流程(推荐用于数据观测站或空间特征数据):
  • native.h3enrich
    —— 富集H3网格
  • native.enrichpoints
    —— 直接富集点数据(跳过步骤3)
  • native.enrichpolygons
    —— 直接富集多边形数据(跳过步骤3)
  • 输出列命名格式为
    {variable}_{aggregation}
    (例如
    population_sum
    air_quality_avg
B) 基于空间索引列的手动JOIN
  • 使用
    native.join
    ,以H3/Quadbin列作为连接键
  • 从次级表输出的列会添加
    _joined
    后缀
  • 默认使用INNER JOIN(自动丢弃未匹配的单元格)
聚合方法指南
  • SUM
    —— 人口统计、总计类数据
  • MAX
    /
    MIN
    —— 风险评分、阈值类数据
  • AVG
    —— 质量指标、指数类数据

Step 5: Save results

步骤5:保存结果

Use
native.saveastable
to persist the enriched output.
If the goal is per-entity enrichment (e.g. population per store), add a second JOIN + GROUP BY to aggregate grid-level results back to the source entity level.
Success: The workflow loads source data, defines areas, indexes to a grid (if needed), enriches with the target variables using the correct aggregation, and saves the result. Column names in downstream references match the enrichment method used.

使用
native.saveastable
保存富集后的输出结果。
如果目标是实现按实体富集(例如每个门店的人口数据),需添加第二次JOIN + GROUP BY操作,将网格级别的结果聚合回源实体级别。
成功标准:工作流需完成源数据加载、区域定义、网格索引(如需要)、使用正确聚合方式富集目标变量,并保存结果。下游引用中的列名需与所使用的富集方法匹配。

Gotchas

注意事项

  • Provider casing & SQL dialect. This skill documents output column names in lowercase (
    population_sum
    ,
    air_quality_avg
    ,
    <column>_joined
    , etc.) — BigQuery / Databricks / Postgres / Redshift convention. On Snowflake, unquoted identifiers surface UPPERCASE — reference them as
    POPULATION_SUM
    ,
    AIR_QUALITY_AVG
    ,
    <COLUMN>_JOINED
    . See
    carto-create-workflow/references/providers/<provider>.md
    for casing rules and SQL dialect equivalents.
  • Resolution alignment is critical. The polyfill resolution MUST match the enrichment dataset's native resolution (e.g. H3 resolution 8 with resolution 8 spatial features). A mismatch produces zero JOIN matches with NO error.
  • Manual JOIN drops unmatched cells.
    native.join
    defaults to INNER JOIN, silently dropping cells with no enrichment data. Use LEFT JOIN if completeness matters.
  • Deduplicate after polyfill. Use
    SELECT DISTINCT
    or GROUP BY on the index column to remove duplicate cells. If you need to preserve source identity (e.g. which store each cell came from), set
    includecols: true
    in the polyfill node.
  • Column naming differs by method. ENRICH procedures produce
    {variable}_{aggregation}
    columns. Manual JOIN produces
    {column}_joined
    columns. Plan downstream SQL references accordingly.
  • Buffer distance is in meters. Isoline range units depend on type: seconds for time-based, meters for distance-based.
  • Re-aggregation needed for entity-level results. After grid enrichment, data is at the cell level. To get per-store or per-location totals, add a second JOIN + GROUP BY step to roll cell-level values back to the source entity.

  • 提供商大小写规则与SQL方言:本技能文档中的输出列名为小写格式(
    population_sum
    air_quality_avg
    <column>_joined
    等)—— 这是BigQuery / Databricks / Postgres / Redshift的约定。在Snowflake中,未加引号的标识符会显示为大写——需以
    POPULATION_SUM
    AIR_QUALITY_AVG
    <COLUMN>_JOINED
    的形式引用。请查看
    carto-create-workflow/references/providers/<provider>.md
    获取大小写规则和SQL方言对应说明。
  • 分辨率对齐至关重要:多边形填充的分辨率必须与富集数据集的原生分辨率匹配(例如H3分辨率8对应分辨率8的空间特征数据)。不匹配会导致JOIN无匹配结果且无错误提示。
  • 手动JOIN会丢弃未匹配单元格
    native.join
    默认使用INNER JOIN,会自动丢弃无富集数据的单元格。如果需要保证完整性,请使用LEFT JOIN。
  • 多边形填充后需去重:对索引列使用
    SELECT DISTINCT
    或GROUP BY来移除重复单元格。如果需要保留源实体标识(例如每个单元格来自哪个门店),请在多边形填充节点中设置
    includecols: true
  • 列命名因方法而异:ENRICH流程生成
    {variable}_{aggregation}
    格式的列,手动JOIN生成
    {column}_joined
    格式的列。请据此规划下游SQL引用。
  • 缓冲区距离单位为米:等时线范围单位取决于类型:基于时间的为秒,基于距离的为米。
  • 实体级结果需重新聚合:网格富集后的数据为单元格级别。如需获取每个门店或位置的总计数据,需添加第二次JOIN + GROUP BY步骤,将单元格级别的值汇总回源实体级别。

Reference Templates

参考模板

Templates included in this skill folder (from the CARTO Workflows template repository):
FilePatternDescription
enrich_grid.jsonGeoJSON polygon -> H3 polyfill -> ENRICH_GRIDEnrich a custom area with sociodemographic H3 data
enrich_points.jsonFilter points -> ENRICH_POINTSEnrich point locations with polygon-based risk data
estimate_population_around_retail_stores.jsonPoints -> Buffer -> H3 polyfill -> JOIN -> GROUP BYFull entity-level enrichment with re-aggregation

本技能文件夹中包含以下模板(来自CARTO工作流模板仓库):
文件模式描述
enrich_grid.jsonGeoJSON多边形 -> H3多边形填充 -> ENRICH_GRID使用社会人口统计H3数据富集自定义区域
enrich_points.json过滤点 -> ENRICH_POINTS使用基于多边形的风险数据富集点位置
estimate_population_around_retail_stores.json点 -> 缓冲区 -> H3多边形填充 -> JOIN -> GROUP BY包含重新聚合的完整实体级富集流程

Common Variations

常见变体

VariationSteps usedKey differences
Enrich points directly1 -> 4 -> 5Skip grid; use
native.enrichpoints
Enrich polygons directly1 -> 4 -> 5Skip grid; use
native.enrichpolygons
Buffer + grid enrichment1 -> 2 -> 3 -> 4 -> 5
native.buffer
then polyfill then enrich
Isochrone + grid enrichment1 -> 2 -> 3 -> 4 -> 5
native.isolines
then polyfill then enrich
Re-aggregate to source entity1 -> 2 -> 3 -> 4 -> JOIN + GROUP BY -> 5Add second JOIN to map cells back to source entities
变体使用步骤核心差异
直接富集点数据1 -> 4 -> 5跳过网格;使用
native.enrichpoints
直接富集多边形数据1 -> 4 -> 5跳过网格;使用
native.enrichpolygons
缓冲区+网格富集1 -> 2 -> 3 -> 4 -> 5先使用
native.buffer
,再进行多边形填充,最后富集
等时线+网格富集1 -> 2 -> 3 -> 4 -> 5先使用
native.isolines
,再进行多边形填充,最后富集
重新聚合至源实体1 -> 2 -> 3 -> 4 -> JOIN + GROUP BY -> 5添加第二次JOIN,将单元格映射回源实体