data-enrichment

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Prereq: read
bulk-operations/SKILL.md
first — JSONL piping, dry-run/digest, history, and rate-limit hygiene live there. This skill is the upsert-by-natural-key workflow on top.
前置要求:请先阅读
bulk-operations/SKILL.md
——其中包含JSONL管道、试运行/摘要、历史记录以及速率限制规范等内容。本技能是基于自然键的upsert工作流的上层实现。

The core move: upsert, not search-then-create

核心操作:使用upsert,而非先搜索再创建

hubspot objects upsert --type X --id-property <natural-key>
reads JSONL on stdin and creates-or-updates each row in one CLI call per record, keyed by a property (email for contacts, domain for companies). No race window, no branching. Do not loop
search
→ empty? →
create
.
Per line in:
{"id":"jane@example.com","properties":{"firstname":"Jane","jobtitle":"VP"}}
Per line out:
{"id":"123","ok":true,"data":{...,"new":true|false}}
or
{"ok":false,"error":{...}}
. Order matches input.
hubspot objects upsert --type X --id-property <natural-key>
从标准输入读取JSONL数据,并针对每条记录通过一次CLI调用完成创建或更新操作,以某个属性作为键(联系人用邮箱,公司用域名)。不存在竞争窗口,无需分支逻辑。请勿循环执行
search
→ 无结果?→
create
的流程。
输入每行格式:
{"id":"jane@example.com","properties":{"firstname":"Jane","jobtitle":"VP"}}
输出每行格式:
{"id":"123","ok":true,"data":{...,"new":true|false}}
{"ok":false,"error":{...}}
。输出顺序与输入一致。

CSV/JSONL → upsert stream

CSV/JSONL → upsert数据流

Reshape with
jq
, preview with
--dry-run
, then execute. Always lowercase the natural key — CRM match is exact. Confirm available property names with
hubspot properties list --type contacts
; never hard-code a list. See
bulk-operations/resources/json-patterns.md
for reshape idioms.
bash
undefined
使用
jq
重塑数据,通过
--dry-run
预览,然后执行。请务必将自然键转为小写——CRM的匹配是精确匹配。可通过
hubspot properties list --type contacts
确认可用的属性名称;切勿硬编码属性列表。有关数据重塑的惯用写法,请参阅
bulk-operations/resources/json-patterns.md
bash
undefined

CSV → JSONL (any tool); example using csvkit

CSV → JSONL (any tool); example using csvkit

csvjson external.csv | jq -c '.[]' > external.jsonl
csvjson external.csv | jq -c '.[]' > external.jsonl

Preview

Preview

cat external.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email --dry-run | head
cat external.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email --dry-run | head

Execute (same pipeline, drop --dry-run, capture results)

Execute (same pipeline, drop --dry-run, capture results)

cat external.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email
| tee /tmp/upsert.results.jsonl

Companies: swap `--type companies --id-property domain` and reshape with `.domain|ascii_downcase` as `id`.
cat external.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email
| tee /tmp/upsert.results.jsonl

针对公司数据:替换为`--type companies --id-property domain`,并将`.domain|ascii_downcase`作为`id`进行数据重塑。

Handle per-record OK / error output

处理单条记录的成功/错误输出

Split with
jq
, inspect failure modes, retry just the failures after fixing the inputs:
bash
jq -c 'select(.ok==true)'  /tmp/upsert.results.jsonl > /tmp/upsert.ok.jsonl
jq -c 'select(.ok==false)' /tmp/upsert.results.jsonl > /tmp/upsert.failed.jsonl
jq -r '.error.status' /tmp/upsert.failed.jsonl | sort | uniq -c   # status → count
jq -r '.data.new'    /tmp/upsert.ok.jsonl     | sort | uniq -c   # created vs updated
429s: split the input and rerun smaller chunks (see
bulk-operations
rate-limit notes). 400s usually mean a bad property name or invalid enum value — fix the reshape, rerun the failed inputs.
使用
jq
拆分结果,检查失败原因,修复输入后仅重试失败的记录:
bash
jq -c 'select(.ok==true)'  /tmp/upsert.results.jsonl > /tmp/upsert.ok.jsonl
jq -c 'select(.ok==false)' /tmp/upsert.results.jsonl > /tmp/upsert.failed.jsonl
jq -r '.error.status' /tmp/upsert.failed.jsonl | sort | uniq -c   # status → count
jq -r '.data.new'    /tmp/upsert.ok.jsonl     | sort | uniq -c   # created vs updated
429错误:拆分输入数据,以更小的批次重新运行(请参阅
bulk-operations
中的速率限制说明)。400错误通常意味着属性名称错误或枚举值无效——修复数据重塑逻辑,重新运行失败的输入。

Destructive-op safety

破坏性操作安全规范

upsert
itself is non-destructive, but write-back can clobber populated fields. Always
--dry-run
first and spot-check. For bulk delete or overwrite of existing data, follow the dry-run → digest → confirm flow in
bulk-operations/SKILL.md
. Recovery:
hubspot history --since 1h
.
upsert
本身是非破坏性的,但回写操作可能会覆盖已填充的字段。请始终先执行
--dry-run
并抽查结果。如需批量删除或覆盖现有数据,请遵循
bulk-operations/SKILL.md
中的试运行→摘要→确认流程。恢复方法:
hubspot history --since 1h

Match without upsert: OR-search → update

仅匹配不执行upsert:或搜索→更新

When you only want to read matches (no write-back), or the natural key isn't a CRM property, use repeated
--filter
flags — each flag is one OR group.
Verified cap: 5 OR groups per call. 6+ returns
400 too many filterGroups (count: N, max allowed: 5)
. Chunk 5 at a time:
bash
undefined
当您仅需读取匹配结果(无需回写),或者自然键并非CRM属性时,请使用重复的
--filter
标志——每个标志代表一个或组。
已验证上限:每次调用最多5个或组。超过6个会返回
400 too many filterGroups (count: N, max allowed: 5)
。请按5个一组拆分:
bash
undefined

emails.txt: one lowercased email per line

emails.txt: one lowercased email per line

xargs -n5 < emails.txt | while read -r e1 e2 e3 e4 e5; do args=() for e in "$e1" "$e2" "$e3" "$e4" "$e5"; do [ -n "$e" ] && args+=(--filter "email=$e"); done hubspot objects search --type contacts "${args[@]}" --properties email,firstname,company done > /tmp/matches.jsonl
jq -c '{id, properties:{lifecyclestage:"marketingqualifiedlead"}}' /tmp/matches.jsonl
| hubspot objects update --type contacts --dry-run

For larger keyed enrichments, prefer `upsert` — one pipeline, no chunking math.
xargs -n5 < emails.txt | while read -r e1 e2 e3 e4 e5; do args=() for e in "$e1" "$e2" "$e3" "$e4" "$e5"; do [ -n "$e" ] && args+=(--filter "email=$e"); done hubspot objects search --type contacts "${args[@]}" --properties email,firstname,company done > /tmp/matches.jsonl
jq -c '{id, properties:{lifecyclestage:"marketingqualifiedlead"}}' /tmp/matches.jsonl
| hubspot objects update --type contacts --dry-run

对于大规模的键值增强操作,建议优先使用`upsert`——只需一个管道,无需计算拆分批次。