carto-composite-scoring

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Use this skill whenever the user wants to create a composite score, index, or multi-variable ranking in a CARTO Workflow.

Prerequisites: Load

carto-create-workflow

for the development process.

当用户想要在CARTO Workflow中创建综合得分、指数或多变量排名时，即可使用本技能。

前提条件：开发过程中需加载

carto-create-workflow

。

Instructions

操作步骤

Step 1: Determine the scoring approach

步骤1：确定评分方法

Ask the user the following decision tree:

"Do you have a target/outcome variable?" (e.g. revenue, sales, crime rate)
- Yes → Supervised method using
```
native.spatialcompositesupervised
```
"No target, but do you have expert knowledge of variable importance?"
- Yes → Unsupervised method with
```
CUSTOM_WEIGHTS
```
  using
```
native.spatialcompositeunsupervised
```
"No target, no weights?"
- → Unsupervised method with
```
ENTROPY
```
  or
```
FIRST_PC
```
  using
```
native.spatialcompositeunsupervised
```

Success: You have identified which component and scoring method to use before designing the pipeline.

向用户询问以下决策树问题：

"您是否有目标/结果变量？"（例如收入、销售额、犯罪率）
- 是 → 使用
```
native.spatialcompositesupervised
```
  的有监督方法
"无目标变量，但您是否具备变量重要性的专业知识？"
- 是 → 使用
```
native.spatialcompositeunsupervised
```
  并搭配
```
CUSTOM_WEIGHTS
```
  的无监督方法
"无目标变量，也无权重？"
- → 使用
```
native.spatialcompositeunsupervised
```
  并搭配
```
ENTROPY
```
  或
```
FIRST_PC
```
  的无监督方法

成功标志：在设计流水线前，您已确定要使用的组件和评分方法。

Step 2: Build the pipeline

步骤2：构建流水线

Supervised pipeline (

native.spatialcompositesupervised

)

有监督流水线（

native.spatialcompositesupervised

）

Load the spatial features dataset (pre-indexed at H3 or Quadbin)
Load the target/outcome dataset
Join both datasets on the spatial index column
Select only the relevant feature columns (drop spatial index column and geometry from feature selection — pass only actual feature variables)

Run

native.spatialcompositesupervised

with parameters:

```
model_type
```
:
```
LINEAR_REG
```
```
bucketize
```
:
```
EQUAL_INTERVALS_ZERO_CENTERED
```
```
n_buckets
```
:
```
5
```
```
outlier_removal
```
:
```
true
```
```
r_squared_threshold
```
:
```
0.4
```

Output: composite score based on regression residuals (identifies areas that over/under-perform relative to the model)

Success: The workflow joins features with the target variable, selects only numeric feature columns, and produces a residual-based score per spatial cell.

加载空间要素数据集（已预先以H3或Quadbin索引）
加载目标/结果数据集
基于空间索引列关联两个数据集
仅选择相关要素列（从要素选择中移除空间索引列和几何列——仅传入实际要素变量）

使用以下参数运行

native.spatialcompositesupervised

：

```
model_type
```
:
```
LINEAR_REG
```
```
bucketize
```
:
```
EQUAL_INTERVALS_ZERO_CENTERED
```
```
n_buckets
```
:
```
5
```
```
outlier_removal
```
:
```
true
```
```
r_squared_threshold
```
:
```
0.4
```

输出：基于回归残差的综合得分（识别相对于模型表现超出/低于预期的区域）

成功标志：工作流将要素与目标变量关联，仅选择数值型要素列，并为每个空间单元生成基于残差的得分。

Unsupervised pipeline (

native.spatialcompositeunsupervised

)

无监督流水线（

native.spatialcompositeunsupervised

）

Load the spatial features dataset
Select only the relevant feature columns
Encode any categorical/ordinal variables to numeric using
```
native.casewhen
```
(e.g. "Low_density_urban" → 4, "High_density_urban" → 2)
Optionally reverse variables where higher = worse by multiplying by -1 in the SELECT query passed to the component
Run
```
native.spatialcompositeunsupervised
```
with parameters:
- ```
scoring_method
```
  :
```
CUSTOM_WEIGHTS
```
  /
```
ENTROPY
```
  /
```
FIRST_PC
```
- ```
weights
```
  : required if
```
CUSTOM_WEIGHTS
```
  . Wire format is a JSON-encoded string of triples
  [[<column>, <weight>, <reverse>], ...]
  —
```
column
```
  is the feature column name,
```
weight
```
  is a numeric weight (normalized internally to sum to 1),
```
reverse
```
  is a boolean (
```
true
```
  flips the variable so higher = worse becomes higher = better without needing a CASE WHEN). Example:
```
'[["population_density", 0.5, false], ["accident_rate", 0.5, true]]'
```
  . Not an object map.
- ```
scaling
```
  :
```
RANKING
```
- ```
aggregation
```
  :
```
LINEAR
```
- ```
output
```
  :
```
RETURN_RANGE
```
  with range
```
[0, 1]
```
Output: composite index score per location

Success: All input variables are numeric, variable directions are aligned (higher = better for the score), and the output is a normalized score per spatial cell.

加载空间要素数据集
仅选择相关要素列
使用
```
native.casewhen
```
将任何分类/有序变量编码为数值（例如 "Low_density_urban" → 4，"High_density_urban" → 2）
（可选）在传入组件的SELECT查询中，将数值越高代表结果越差的变量乘以-1进行反转
使用以下参数运行
```
native.spatialcompositeunsupervised
```
：
- ```
scoring_method
```
  :
```
CUSTOM_WEIGHTS
```
  /
```
ENTROPY
```
  /
```
FIRST_PC
```
- ```
weights
```
  : 若使用
```
CUSTOM_WEIGHTS
```
  则为必填项。格式为JSON编码的三元组字符串
  [[<column>, <weight>, <reverse>], ...]
  ——
```
column
```
  为要素列名称，
```
weight
```
  为数值权重（内部会归一化至总和为1），
```
reverse
```
  为布尔值（
```
true
```
  会反转变量，无需CASE WHEN即可将“数值越高越差”转为“数值越高越好”）。示例：
```
'[["population_density", 0.5, false], ["accident_rate", 0.5, true]]'
```
  。注意不是对象映射。
- ```
scaling
```
  :
```
RANKING
```
- ```
aggregation
```
  :
```
LINEAR
```
- ```
output
```
  :
```
RETURN_RANGE
```
  ，范围为
```
[0, 1]
```
输出：每个位置的综合指数得分

成功标志：所有输入变量均为数值型，变量方向已对齐（得分越高代表结果越好），且输出为每个空间单元的归一化得分。

Gotchas

注意事项

Provider casing & SQL dialect. This skill uses lowercase column names (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference
```
POPULATION_DENSITY
```
,
```
ACCIDENT_RATE
```
, etc. in weights, expressions, and downstream SQL. See
```
carto-create-workflow/references/providers/<provider>.md
```
for casing rules and SQL dialect equivalents.
All input variables must be numeric. Ordinal strings (e.g. "low"/"medium"/"high") must be manually encoded via CASE WHEN before passing to the component.
Variable direction matters. If "higher is worse" for a variable, multiply by -1 before scoring. Forgetting this inverts the score meaning.
Supervised R-squared threshold (default 0.4) is permissive. If model fit is poor, the residual-based score is mostly noise. Inspect model diagnostics.
Custom weights are normalized internally to sum to 1. The absolute values do not matter, only the ratios.
Supervised scores are residuals, not raw values. The score identifies areas that DEVIATE from the model, not areas with the highest raw values.
Drop the spatial index column and geometry from the feature selection — only pass actual feature variables to the scoring component.

服务商大小写规则与SQL方言：本技能采用小写列名（BigQuery / Databricks / Postgres / Redshift的惯例）。在Snowflake中，未加引号的标识符会显示为大写——在权重、表达式和下游SQL中需引用
```
POPULATION_DENSITY
```
、
```
ACCIDENT_RATE
```
等名称。可查看
```
carto-create-workflow/references/providers/<provider>.md
```
获取大小写规则和SQL方言对应关系。
所有输入变量必须为数值型：有序字符串（例如"low"/"medium"/"high"）必须在传入组件前通过CASE WHEN手动编码。
变量方向至关重要：若某个变量“数值越高代表结果越差”，需在评分前乘以-1。遗漏此步骤会导致得分含义反转。
有监督R平方阈值（默认0.4）较为宽松。若模型拟合度差，基于残差的得分大多为噪声。需检查模型诊断信息。
自定义权重会在内部归一化至总和为1。绝对值无关紧要，仅比例有效。
有监督得分是残差，而非原始数值。得分识别的是与模型偏离的区域，而非原始数值最高的区域。
从要素选择中移除空间索引列和几何列——仅将实际要素变量传入评分组件。

Reference Templates

参考模板

Both examples use Milan spatial features data at Quadbin resolution 18.

Template	Component	File
Supervised — Identifying resilient neighbourhoods	`native.spatialcompositesupervised`	composite-score-supervised.json
Unsupervised — Market potential scoring	`native.spatialcompositeunsupervised`	composite-score-unsupervised.json

两个示例均使用Quadbin分辨率18的米兰空间要素数据。

模板	组件	文件
有监督——识别韧性社区	`native.spatialcompositesupervised`	composite-score-supervised.json
无监督——市场潜力评分	`native.spatialcompositeunsupervised`	composite-score-unsupervised.json

Common Variations

常见变体

Variation	Approach
Risk index (flood, crime, etc.)	Unsupervised with `CUSTOM_WEIGHTS` ; reverse variables where higher = safer
Market potential / site scoring	Unsupervised with `CUSTOM_WEIGHTS` or `ENTROPY` ; weight demand-side variables higher
Resilience index	Supervised with outcome variable (e.g. revenue change); residuals reveal over/under-performers
Data-driven index (no domain expertise)	Unsupervised with `FIRST_PC` or `ENTROPY` to let variance drive the weights
Human development / composite indicator	Unsupervised with `CUSTOM_WEIGHTS` and `RANKING` scaling for ordinal-safe aggregation

变体	实现方法
风险指数（洪水、犯罪等）	使用 `CUSTOM_WEIGHTS` 的无监督方法；反转“数值越高越安全”的变量
市场潜力/场地评分	使用 `CUSTOM_WEIGHTS` 或 `ENTROPY` 的无监督方法；提高需求侧变量的权重
韧性指数	使用带结果变量（例如收入变化）的有监督方法；残差可揭示表现超出/低于预期的对象
数据驱动型指数（无领域专业知识）	使用 `FIRST_PC` 或 `ENTROPY` 的无监督方法，让方差决定权重
人类发展/综合指标	使用 `CUSTOM_WEIGHTS` 和 `RANKING` 缩放的无监督方法，实现有序安全聚合

carto-composite-scoring

Original

Translation

Instructions

操作步骤

Step 1: Determine the scoring approach

步骤1：确定评分方法

Step 2: Build the pipeline

步骤2：构建流水线

Supervised pipeline (
`native.spatialcompositesupervised`
)

有监督流水线（
`native.spatialcompositesupervised`
）

Unsupervised pipeline (
`native.spatialcompositeunsupervised`
)

无监督流水线（
`native.spatialcompositeunsupervised`
）

Gotchas

注意事项

Reference Templates

参考模板

Common Variations

常见变体

carto-composite-scoring

Original

Translation

Instructions

操作步骤

Step 1: Determine the scoring approach

步骤1：确定评分方法

Step 2: Build the pipeline

步骤2：构建流水线

Supervised pipeline (native.spatialcompositesupervised)

有监督流水线（native.spatialcompositesupervised）

Unsupervised pipeline (native.spatialcompositeunsupervised)

无监督流水线（native.spatialcompositeunsupervised）

Gotchas

注意事项

Reference Templates

参考模板

Common Variations

常见变体

Supervised pipeline (
`native.spatialcompositesupervised`
)

有监督流水线（
`native.spatialcompositesupervised`
）

Unsupervised pipeline (
`native.spatialcompositeunsupervised`
)

无监督流水线（
`native.spatialcompositeunsupervised`
）