carto-composite-scoring
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUse this skill whenever the user wants to create a composite score, index, or multi-variable ranking in a CARTO Workflow.
Prerequisites: Load for the development process.
carto-create-workflow当用户想要在CARTO Workflow中创建综合得分、指数或多变量排名时,即可使用本技能。
前提条件:开发过程中需加载。
carto-create-workflowInstructions
操作步骤
Step 1: Determine the scoring approach
步骤1:确定评分方法
Ask the user the following decision tree:
- "Do you have a target/outcome variable?" (e.g. revenue, sales, crime rate)
- Yes → Supervised method using
native.spatialcompositesupervised
- Yes → Supervised method using
- "No target, but do you have expert knowledge of variable importance?"
- Yes → Unsupervised method with using
CUSTOM_WEIGHTSnative.spatialcompositeunsupervised
- Yes → Unsupervised method with
- "No target, no weights?"
- → Unsupervised method with or
ENTROPYusingFIRST_PCnative.spatialcompositeunsupervised
- → Unsupervised method with
Success: You have identified which component and scoring method to use before designing the pipeline.
向用户询问以下决策树问题:
- "您是否有目标/结果变量?"(例如收入、销售额、犯罪率)
- 是 → 使用的有监督方法
native.spatialcompositesupervised
- 是 → 使用
- "无目标变量,但您是否具备变量重要性的专业知识?"
- 是 → 使用并搭配
native.spatialcompositeunsupervised的无监督方法CUSTOM_WEIGHTS
- 是 → 使用
- "无目标变量,也无权重?"
- → 使用并搭配
native.spatialcompositeunsupervised或ENTROPY的无监督方法FIRST_PC
- → 使用
成功标志:在设计流水线前,您已确定要使用的组件和评分方法。
Step 2: Build the pipeline
步骤2:构建流水线
Supervised pipeline (native.spatialcompositesupervised
)
native.spatialcompositesupervised有监督流水线(native.spatialcompositesupervised
)
native.spatialcompositesupervised- Load the spatial features dataset (pre-indexed at H3 or Quadbin)
- Load the target/outcome dataset
- Join both datasets on the spatial index column
- Select only the relevant feature columns (drop spatial index column and geometry from feature selection — pass only actual feature variables)
- Run with parameters:
native.spatialcompositesupervised- :
model_typeLINEAR_REG - :
bucketizeEQUAL_INTERVALS_ZERO_CENTERED - :
n_buckets5 - :
outlier_removaltrue - :
r_squared_threshold0.4
- Output: composite score based on regression residuals (identifies areas that over/under-perform relative to the model)
Success: The workflow joins features with the target variable, selects only numeric feature columns, and produces a residual-based score per spatial cell.
- 加载空间要素数据集(已预先以H3或Quadbin索引)
- 加载目标/结果数据集
- 基于空间索引列关联两个数据集
- 仅选择相关要素列(从要素选择中移除空间索引列和几何列——仅传入实际要素变量)
- 使用以下参数运行:
native.spatialcompositesupervised- :
model_typeLINEAR_REG - :
bucketizeEQUAL_INTERVALS_ZERO_CENTERED - :
n_buckets5 - :
outlier_removaltrue - :
r_squared_threshold0.4
- 输出:基于回归残差的综合得分(识别相对于模型表现超出/低于预期的区域)
成功标志:工作流将要素与目标变量关联,仅选择数值型要素列,并为每个空间单元生成基于残差的得分。
Unsupervised pipeline (native.spatialcompositeunsupervised
)
native.spatialcompositeunsupervised无监督流水线(native.spatialcompositeunsupervised
)
native.spatialcompositeunsupervised- Load the spatial features dataset
- Select only the relevant feature columns
- Encode any categorical/ordinal variables to numeric using (e.g. "Low_density_urban" → 4, "High_density_urban" → 2)
native.casewhen - Optionally reverse variables where higher = worse by multiplying by -1 in the SELECT query passed to the component
- Run with parameters:
native.spatialcompositeunsupervised- :
scoring_method/CUSTOM_WEIGHTS/ENTROPYFIRST_PC - : required if
weights. Wire format is a JSON-encoded string of triplesCUSTOM_WEIGHTS—[[<column>, <weight>, <reverse>], ...]is the feature column name,columnis a numeric weight (normalized internally to sum to 1),weightis a boolean (reverseflips the variable so higher = worse becomes higher = better without needing a CASE WHEN). Example:true. Not an object map.'[["population_density", 0.5, false], ["accident_rate", 0.5, true]]' - :
scalingRANKING - :
aggregationLINEAR - :
outputwith rangeRETURN_RANGE[0, 1]
- Output: composite index score per location
Success: All input variables are numeric, variable directions are aligned (higher = better for the score), and the output is a normalized score per spatial cell.
- 加载空间要素数据集
- 仅选择相关要素列
- 使用将任何分类/有序变量编码为数值(例如 "Low_density_urban" → 4,"High_density_urban" → 2)
native.casewhen - (可选)在传入组件的SELECT查询中,将数值越高代表结果越差的变量乘以-1进行反转
- 使用以下参数运行:
native.spatialcompositeunsupervised- :
scoring_method/CUSTOM_WEIGHTS/ENTROPYFIRST_PC - : 若使用
weights则为必填项。格式为JSON编码的三元组字符串CUSTOM_WEIGHTS——[[<column>, <weight>, <reverse>], ...]为要素列名称,column为数值权重(内部会归一化至总和为1),weight为布尔值(reverse会反转变量,无需CASE WHEN即可将“数值越高越差”转为“数值越高越好”)。示例:true。注意不是对象映射。'[["population_density", 0.5, false], ["accident_rate", 0.5, true]]' - :
scalingRANKING - :
aggregationLINEAR - :
output,范围为RETURN_RANGE[0, 1]
- 输出:每个位置的综合指数得分
成功标志:所有输入变量均为数值型,变量方向已对齐(得分越高代表结果越好),且输出为每个空间单元的归一化得分。
Gotchas
注意事项
- Provider casing & SQL dialect. This skill uses lowercase column names (BigQuery / Databricks / Postgres / Redshift convention). On Snowflake, unquoted identifiers surface UPPERCASE — reference ,
POPULATION_DENSITY, etc. in weights, expressions, and downstream SQL. SeeACCIDENT_RATEfor casing rules and SQL dialect equivalents.carto-create-workflow/references/providers/<provider>.md - All input variables must be numeric. Ordinal strings (e.g. "low"/"medium"/"high") must be manually encoded via CASE WHEN before passing to the component.
- Variable direction matters. If "higher is worse" for a variable, multiply by -1 before scoring. Forgetting this inverts the score meaning.
- Supervised R-squared threshold (default 0.4) is permissive. If model fit is poor, the residual-based score is mostly noise. Inspect model diagnostics.
- Custom weights are normalized internally to sum to 1. The absolute values do not matter, only the ratios.
- Supervised scores are residuals, not raw values. The score identifies areas that DEVIATE from the model, not areas with the highest raw values.
- Drop the spatial index column and geometry from the feature selection — only pass actual feature variables to the scoring component.
- 服务商大小写规则与SQL方言:本技能采用小写列名(BigQuery / Databricks / Postgres / Redshift的惯例)。在Snowflake中,未加引号的标识符会显示为大写——在权重、表达式和下游SQL中需引用、
POPULATION_DENSITY等名称。可查看ACCIDENT_RATE获取大小写规则和SQL方言对应关系。carto-create-workflow/references/providers/<provider>.md - 所有输入变量必须为数值型:有序字符串(例如"low"/"medium"/"high")必须在传入组件前通过CASE WHEN手动编码。
- 变量方向至关重要:若某个变量“数值越高代表结果越差”,需在评分前乘以-1。遗漏此步骤会导致得分含义反转。
- 有监督R平方阈值(默认0.4)较为宽松。若模型拟合度差,基于残差的得分大多为噪声。需检查模型诊断信息。
- 自定义权重会在内部归一化至总和为1。绝对值无关紧要,仅比例有效。
- 有监督得分是残差,而非原始数值。得分识别的是与模型偏离的区域,而非原始数值最高的区域。
- 从要素选择中移除空间索引列和几何列——仅将实际要素变量传入评分组件。
Reference Templates
参考模板
Both examples use Milan spatial features data at Quadbin resolution 18.
| Template | Component | File |
|---|---|---|
| Supervised — Identifying resilient neighbourhoods | | composite-score-supervised.json |
| Unsupervised — Market potential scoring | | composite-score-unsupervised.json |
两个示例均使用Quadbin分辨率18的米兰空间要素数据。
| 模板 | 组件 | 文件 |
|---|---|---|
| 有监督——识别韧性社区 | | composite-score-supervised.json |
| 无监督——市场潜力评分 | | composite-score-unsupervised.json |
Common Variations
常见变体
| Variation | Approach |
|---|---|
| Risk index (flood, crime, etc.) | Unsupervised with |
| Market potential / site scoring | Unsupervised with |
| Resilience index | Supervised with outcome variable (e.g. revenue change); residuals reveal over/under-performers |
| Data-driven index (no domain expertise) | Unsupervised with |
| Human development / composite indicator | Unsupervised with |
| 变体 | 实现方法 |
|---|---|
| 风险指数(洪水、犯罪等) | 使用 |
| 市场潜力/场地评分 | 使用 |
| 韧性指数 | 使用带结果变量(例如收入变化)的有监督方法;残差可揭示表现超出/低于预期的对象 |
| 数据驱动型指数(无领域专业知识) | 使用 |
| 人类发展/综合指标 | 使用 |