crm-data-quality

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Read
bulk-operations/SKILL.md
first — JSONL piping, batch read, pagination, and dry-run/digest/confirm gating apply to every command below.
请先阅读
bulk-operations/SKILL.md
——JSONL管道传输、批量读取、分页以及试运行/摘要确认机制适用于以下所有命令。

Property discovery

属性发现

Don't guess property names. List them:
bash
hubspot properties list --type contacts --format table
hubspot properties list --type contacts | jq -c 'select(.type=="enumeration") | {name, label}'
Same for
--type companies
,
deals
, or any custom type (
hubspot objects types
).
不要猜测属性名称,直接列出它们:
bash
hubspot properties list --type contacts --format table
hubspot properties list --type contacts | jq -c 'select(.type=="enumeration") | {name, label}'
同样适用于
--type companies
deals
或任何自定义类型(
hubspot objects types
)。

1. Find incomplete records

1. 查找不完整记录

!name
= NOT_HAS_PROPERTY (missing or empty). Bare
name
= HAS_PROPERTY. Within one
--filter
, chain with
AND
; multiple
--filter
flags are OR'd.
bash
hubspot objects search --type contacts --filter "!email" --properties firstname,lastname,company
hubspot objects search --type contacts --filter "!phone AND !mobilephone" --properties email
hubspot objects search --type contacts --filter "!hubspot_owner_id" --properties email,lifecyclestage
For >100 results, use the pagination loop from
bulk-operations
.
!name
= 无该属性(缺失或为空)。单独的
name
= 存在该属性。在单个
--filter
内,用
AND
连接多个条件;多个
--filter
参数为或关系。
bash
hubspot objects search --type contacts --filter "!email" --properties firstname,lastname,company
hubspot objects search --type contacts --filter "!phone AND !mobilephone" --properties email
hubspot objects search --type contacts --filter "!hubspot_owner_id" --properties email,lifecyclestage
若结果超过100条,请使用
bulk-operations
中的分页循环。

2. Normalize field values

2. 标准化字段值

Search → reshape with
jq
→ pipe into
update
. Always
--dry-run
first;
bulk-operations
covers digest/confirm escalation for >100 rows. Reshape patterns:
bulk-operations/resources/json-patterns.md
.
bash
undefined
搜索 → 用
jq
重塑数据 → 管道输入到
update
命令。请始终先执行
--dry-run
;当处理超过100行数据时,
bulk-operations
会触发摘要/确认流程。重塑模式可参考:
bulk-operations/resources/json-patterns.md
bash
undefined

Collapse spellings into one canonical value

将不同拼写统一为标准值

hubspot objects search --type contacts --filter "company~acme"
| jq -c '{id, properties:{company:"Acme Corporation"}}'
| hubspot objects update --type contacts --dry-run
hubspot objects search --type contacts --filter "company~acme"
| jq -c '{id, properties:{company:"Acme Corporation"}}'
| hubspot objects update --type contacts --dry-run

Lowercase emails (read, reshape, write)

将邮箱转换为小写(读取、重塑、写入)

hubspot objects search --type contacts --filter "email" --properties email
| jq -c '{id, properties:{email: (.properties.email | ascii_downcase)}}'
| hubspot objects update --type contacts --dry-run
undefined
hubspot objects search --type contacts --filter "email" --properties email
| jq -c '{id, properties:{email: (.properties.email | ascii_downcase)}}'
| hubspot objects update --type contacts --dry-run
undefined

3. Dedupe with
hubspot objects merge

3. 使用
hubspot objects merge
去重

Secondary is folded into primary and deleted. Irreversible. Dry-run/digest/confirm gating applies.
bash
undefined
次要记录会合并到主记录中并被删除。此操作不可逆。试运行/摘要确认机制同样适用。
bash
undefined

Single pair

单组记录对

hubspot objects merge --type contacts --primary 149 --secondary 425 --dry-run hubspot objects merge --type contacts --primary 149 --secondary 425 # execute (≤100 pairs)

Bulk: pipe JSONL `{"primary":"...","secondary":"..."}` on stdin (omit `--primary`/`--secondary`).

**Pagination required.** `objects search` caps at 100 rows per call and `jq -s` slurps a single stream into memory — running the snippet below against a raw `search` will silently miss every duplicate that crosses a page boundary. Collect the full set first with the pagination loop from `bulk-operations/SKILL.md` (write to `/tmp/contacts.jsonl`), then dedupe from the file:

```bash
hubspot objects merge --type contacts --primary 149 --secondary 425 --dry-run hubspot objects merge --type contacts --primary 149 --secondary 425 # 执行操作(≤100组记录对)

批量处理:通过标准输入传入JSONL格式的`{"primary":"...","secondary":"..."}`(无需指定`--primary`/`--secondary`参数)。

**必须使用分页**。`objects search`每次调用最多返回100条记录,而`jq -s`会将单条数据流加载到内存中——直接对原始`search`结果运行以下代码片段会遗漏跨分页的重复记录。请先通过`bulk-operations/SKILL.md`中的分页循环收集完整数据集(写入`/tmp/contacts.jsonl`),再从文件中进行去重:

```bash

/tmp/contacts.jsonl produced by the pagination loop (bulk-operations/SKILL.md)

/tmp/contacts.jsonl 由分页循环生成(参考bulk-operations/SKILL.md)

jq -s -c ' group_by(.properties.email)[] | select(length > 1) | sort_by(.id | tonumber) | .[0].id as $p | .[1:][] | {primary: $p, secondary: .id} ' /tmp/contacts.jsonl
| hubspot objects merge --type contacts --dry-run | tee /tmp/merge-preview.jsonl

For >100 pairs, lift `digest` and `impact.records_affected` from the `BulkData` line and re-pipe the same producer with `--digest`/`--confirm` (see `bulk-operations`).
jq -s -c ' group_by(.properties.email)[] | select(length > 1) | sort_by(.id | tonumber) | .[0].id as $p | .[1:][] | {primary: $p, secondary: .id} ' /tmp/contacts.jsonl
| hubspot objects merge --type contacts --dry-run | tee /tmp/merge-preview.jsonl

若处理超过100组记录对,请从`BulkData`行中提取`digest`和`impact.records_affected`,并将相同的数据源通过`--digest`/`--confirm`参数重新管道输入(参考`bulk-operations`文档)。

4. Audit properties

4. 审核属性

hubspot properties list
(and
get
,
batch-read
) emits
{name, label, type, fieldType, groupName}
per row. Enum option values are not currently exposed by the CLI — read them off a real record (
hubspot objects search ... --properties <enum>
) or the HubSpot UI.
bash
undefined
hubspot properties list
(以及
get
batch-read
)会输出每条属性的
{name, label, type, fieldType, groupName}
信息。目前CLI未暴露枚举选项值——可通过实际记录(
hubspot objects search ... --properties <enum>
)或HubSpot UI查看。
bash
undefined

Count properties per group (HubSpot groups standard fields; custom groups stand out)

按分组统计属性数量(HubSpot会对标准字段分组,自定义分组会很显眼)

hubspot properties list --type contacts | jq -rs 'group_by(.groupName) | map({group: .[0].groupName, count: length}) | .[]'
hubspot properties list --type contacts | jq -rs 'group_by(.groupName) | map({group: .[0].groupName, count: length}) | .[]'

All enumeration properties

所有枚举类型属性

hubspot properties list --type contacts | jq -c 'select(.type=="enumeration") | {name, label, fieldType}'
hubspot properties list --type contacts | jq -c 'select(.type=="enumeration") | {name, label, fieldType}'

Create a DQ flag property, then set it via the normalize pattern in section 2

创建数据质量标记属性,然后通过第2节的标准化模式设置它

hubspot properties create --type contacts --name dq_missing_phone --label "DQ: Missing Phone" --prop-type string --field-type text
undefined
hubspot properties create --type contacts --name dq_missing_phone --label "DQ: Missing Phone" --prop-type string --field-type text
undefined

Recovery

恢复

Merge is irreversible. After any merge,
hubspot history --since 1h
captures the audit trail. If wrong direction, restore the secondary from the UI's recycle bin.
合并操作不可逆。任何合并操作后,
hubspot history --since 1h
会捕获审计日志。若合并方向错误,可从UI的回收站恢复次要记录。