benchmark-due-diligence

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Benchmark Due Diligence

标杆对象尽职调查

Take a benchmark the user envies — a founder, KOL, company, or product whose success looks suspiciously shiny — and produce a teardown that ends in "what this means for ME", not a neutral report. The deliverable answers three questions a balanced briefing never does: How much of this success is real vs marketing bubble? How much is replicable method vs luck/timing? And what, specifically, can the commissioner do with it?

This is the adversarial, decision-oriented cousin of

deep-research

. Where deep-research builds a trustworthy picture of the world, this skill assumes the picture is inflated until proven otherwise and converts the survivors into the commissioner's own moves.

选取用户羡慕的标杆对象——创始人、KOL、公司或产品，其成功看起来可疑地光鲜——进行拆解分析，最终产出**“这对我意味着什么”**的结论，而非中立报告。交付成果要回答一份平衡简报绝不会涉及的三个问题：这份成功有多少是真实的，多少是营销泡沫？有多少是可复制的方法，多少是运气/时机？以及委托方具体可以如何利用这些结论？

这是

deep-research

（深度研究）的对抗性、决策导向型同类技能。深度研究旨在构建可信的全局图景，而本技能默认假设该图景存在注水，直至被证明并非如此，并将筛选出的真实有效经验转化为委托方的行动方案。

CRITICAL: run inline, never

context: fork

关键要求：以内嵌式运行，绝不能使用

context: fork

This skill is an orchestrator — it spawns parallel collection + verification agents (via the

Workflow

tool, or

Task

agents) and may invoke other skills (

deep-research

osint-investigate

qcc

). Subagents cannot spawn subagents or call skills. Setting

context: fork

would silently break the entire fan-out. Do not add a
context
field. (Same constraint osint-investigate documents — it's a hard runtime rule, not a preference.)

本技能是一个编排器——它会生成并行的收集+验证Agent（通过

Workflow

工具或

Task

Agent），并可能调用其他技能（

deep-research

、

osint-investigate

、

qcc

）。子Agent不能生成子Agent或调用其他技能。设置

context: fork

会悄无声息地破坏整个扇形收集流程。请勿添加
context
字段。（

osint-investigate

文档也有同样的约束——这是硬性运行时规则，而非偏好。）

The one rule that protects the commissioner: two injection channels

保护委托方的核心规则：双注入通道

Everything the agents see flows through exactly two channels. Keeping them separate is the single most important discipline in this skill:

Channel	Content	Injected into
FACTS	Already-verified public facts about the benchmark (relationships, who-owns-what, the headline claim flagged `⚠️ to-verify` )	Every agent — collection, verification, synthesis
COMMISSIONER_CONTEXT	The commissioner's private reality — real resources, client names, strategic intent, what they can actually leverage	Only the final mapping agent (Phase 4)

Why this split is non-negotiable: collection and verification agents take their input and run external

WebSearch

on it. If the commissioner's client names or strategy leak into those prompts, they get searched on the open web — a privacy breach. The mapping phase genuinely needs "who is the commissioner"; the collection phase must never see it. Encode this in the orchestration (see

references/workflow_orchestration_template.md

), don't rely on remembering it mid-run.

Agent获取的所有信息都严格通过两个通道流转。保持通道分离是本技能最重要的纪律：

通道	内容	注入对象
FACTS	已验证的关于标杆对象的公开事实（关系、权属、标记为 `⚠️ to-verify` 的核心声明）	所有Agent——收集、验证、合成类Agent
COMMISSIONER_CONTEXT	委托方的私有实际情况——真实资源、客户名称、战略意图、可实际利用的资源	仅最终匹配Agent（第4阶段）

**为何这种分离不可协商：**收集和验证Agent会基于输入内容运行外部

WebSearch

。如果委托方的客户名称或策略泄露到这些提示词中，会在公开网络上被搜索——这属于隐私泄露。匹配阶段确实需要“委托方是谁”的信息；而收集阶段绝不能接触到这些信息。请在编排中明确编码这一规则（参见

references/workflow_orchestration_template.md

），不要依赖运行过程中的记忆。

Phase 0 — nail the foundation by evidence, not appearance (do this BEFORE any agent)

第0阶段：基于证据而非表象筑牢基础（在启动任何Agent前完成）

The fastest way to waste a 12-agent fan-out is to build it on a foundation you inferred from appearances. Two failure modes recur and both have burned real runs:

Inferring relationships between entities from names/domains. "Their content lives at
```
academy.example.com
```
, and they're the founder, so they must own that community" — when in reality they were just an invited guest. A shared domain, a similar name, or co-occurrence is an observation, not ownership. Verify with an authoritative source before treating any A↔B relationship as fact.
Treating the commissioner's client as the commissioner's asset. If the commissioner does service work for an accelerator/brand, that accelerator is the client's asset — the commissioner can't leverage its audience or capital. Mapping the benchmark's playbook onto resources the commissioner doesn't actually control produces castles in the air.

So before fanning out, establish by evidence (not vibes):

The benchmark's real entity graph — who owns whom, who merely partners/guests. Don't reason from names.
The headline-claim attribution — the benchmark's whole narrative usually rests on one trophy stat ("took product X from 0 → 1M users"). Are they the founder, or the departed growth lead? This is the #1 to-verify target; write it into FACTS with a
```
⚠️
```
.
What the commissioner truly controls — separate owned assets from client/partner assets.

Write the results into

FACTS

(public half) and

COMMISSIONER_CONTEXT

(private half). A shaky foundation makes every downstream agent confidently wrong.

最浪费12个Agent扇形收集资源的方式，就是基于从表象推断出的结论构建流程。有两种反复出现的失败模式，曾在实际运行中造成损失：

从名称/域名推断实体间的关系。“他们的内容发布在
```
academy.example.com
```
，而他们是创始人，所以他们肯定拥有这个社区”——但实际上他们只是受邀嘉宾。共享域名、相似名称或共同出现只是观察结果，而非权属证明。在将任何A↔B关系视为事实前，需通过权威来源验证。
将委托方的客户视为委托方的资产。如果委托方为加速器/品牌提供服务，该加速器是客户的资产——委托方无法利用其受众或资金。将标杆对象的策略手册匹配到委托方实际无法掌控的资源上，只会产出空中楼阁。

因此，在启动扇形收集前，需基于证据（而非直觉）确认：

标杆对象的真实实体关系图——谁拥有谁，谁只是合作伙伴/嘉宾。不要从名称推断。
核心声明的归因——标杆对象的整个叙事通常依赖一个关键数据（“将产品X从0做到100万用户”）。他们是创始人，还是已离职的增长负责人？这是首要验证目标；请将其写入FACTS并标记
```
⚠️
```
。
委托方真正掌控的资源——区分自有资产与客户/合作伙伴资产。

将结果写入

FACTS

（公开部分）和

COMMISSIONER_CONTEXT

（私有部分）。基础不牢，后续所有Agent都会自信地给出错误结论。

The four-phase orchestration

四阶段编排流程

Use the

Workflow

tool (preferred — deterministic fan-out, see the ready-to-fill template in

references/workflow_orchestration_template.md

) or

Task

agents. Scale agent count to how thorough the user wants (a few dimensions for a quick read, 6+ with multi-vote verification for a deep audit).

Phase 1 + 2 — collect → verify, per dimension, as a pipeline (each dimension verifies the moment its collection finishes; no global barrier):

Collection agent — objective stance. Every finding carries a source URL and a
```
source_kind
```
(
```
对象自述/营销
```
vs
```
第三方独立信源
```
vs
```
混合
```
). Anything not found goes in
```
gaps
```
— never filled by guessing.
Verification agent — adversarial, default-skeptical stance. Grade every claim
```
L1–L4
```
and rule
```
坐实 / 大体可信 / 存疑 / 证伪-水分
```
. The job is to actively hunt falsifying evidence, especially for the headline claims (the trophy stat, "#1 ranking", funding amount, user counts).
```
bubble_summary
```
names the biggest water in that dimension.

Grading rubric,

source_kind

, verdicts, and both JSON schemas → references/evidence_grading_rubric.md
.

Typical dimensions (tailor to the benchmark type — person / company / product):

Subject background + headline-claim attribution (the #1 bubble target)
Corporate base — entity, founding, funding/valuation
Core product/business real metrics — user counts, revenue, rankings, awards, cross-verified against third parties
Playbook teardown — platform matrix, persona, content types, how they borrow other people's audiences, how personal IP funnels to the product
Comparison sample — a structurally-similar peer or parallel path
Sector + how this class of playbook usually wins and usually fails

Phase 3 — synthesis: due-diligence conclusion (single agent, consumes all verdicts):

Real relationship map (correcting the common misreadings from Phase 0)
Bubble-busting table — claim | evidence level | verdict | one-line basis, sorted by most-water-first
Playbook teardown — concrete, copyable actions
Attribution breakdown (the core) — what share of the success is product vs market-timing vs personal-IP-marketing vs operations? Give % ranges with reasons, and explicitly split replicable method from luck / timing / non-transferable endowment.

Phase 4 — synthesis: what this means for the commissioner (single agent; consumes Phase 3 + COMMISSIONER_CONTEXT):

Resource-mapping table — benchmark's playbook elements × the commissioner's real resources; tag each cell ✅ borrow-able / ⚠️ not-replicable (luck/timing) / 🔄 already-doing / 🚫 bubble-don't-copy, one line each
Landing points — exactly how the commissioner uses it (their to-B service / their own IP / their tooling)
Action list + open questions (what's still unconfirmed)

Attribution weighting and the four-tag mapping framework → references/attribution_and_resource_mapping.md
.

使用

Workflow

工具（优先选择——确定性扇形收集，参见

references/workflow_orchestration_template.md

中可直接填充的模板）或

Task

Agent。根据用户需要的细致程度调整Agent数量（快速解读可设置几个维度，深度审计则设置6个以上Agent并采用多投票验证）。

阶段1 + 2 —— 收集→验证，按维度形成流水线（每个维度完成收集后立即验证；无需全局等待）：

收集Agent —— 保持客观立场。每个发现都需附带来源URL和
```
source_kind
```
（
```
对象自述/营销
```
vs
```
第三方独立信源
```
vs
```
混合
```
）。未找到的内容归入
```
gaps
```
——绝不能通过猜测填补。
验证Agent —— 保持对抗性、默认怀疑立场。将每项声明按
```
L1–L4
```
分级，并判定
```
坐实 / 大体可信 / 存疑 / 证伪-水分
```
。核心任务是主动寻找证伪证据，尤其是核心声明（关键数据、“排名第一”、融资额、用户数）。
```
bubble_summary
```
需列出该维度中最明显的水分。

分级标准、

source_kind

、判定结果及两种JSON schema → references/evidence_grading_rubric.md
。

典型维度（根据标杆对象类型调整——个人/公司/产品）：

主体背景 + 核心声明归因（头号泡沫目标）
企业基础——实体、成立时间、融资/估值
核心产品/业务真实指标——用户数、收入、排名、奖项，需与第三方交叉验证
策略手册拆解——平台矩阵、用户画像、内容类型、如何借势他人受众、个人IP如何引流至产品
对比样本——结构相似的同行或可参考路径
行业领域 + 此类策略手册通常成功及失败的原因

阶段3 —— 合成：尽职调查结论（单个Agent，整合所有判定结果）：

真实关系图（修正第0阶段常见的误判）
泡沫破除表——声明 | 证据等级 | 判定结果 | 一句话依据，按水分从多到少排序
策略手册拆解——具体可复制的行动
归因分析（核心部分）——成功因素中产品、市场时机、个人IP营销、运营各占多少比例？给出百分比范围及理由，并明确区分可复制方法与运气/时机/不可转移的禀赋。

阶段4 —— 合成：对委托方的意义（单个Agent；整合阶段3结果 + COMMISSIONER_CONTEXT）：

资源匹配表——标杆对象的策略手册元素 × 委托方的真实资源；为每个单元格标记 ✅ 可借鉴 / ⚠️ 不可复制（运气/时机） / 🔄 已在执行 / 🚫 泡沫内容-勿复制，每个单元格附一句话说明
落地要点——委托方如何将其应用到实际业务中（ToB服务/自有IP/工具）
行动清单 + 待确认问题（仍未核实的内容）

归因权重及四标签匹配框架 → references/attribution_and_resource_mapping.md
。

Don't rebuild what already exists

无需重复造轮子

This skill's edge is the adversarial bubble-busting + attribution + commissioner-mapping layers. The plumbing underneath is not novel — reuse it:

Fan-out collection / source governance — borrow the lead-agent + subagent pattern from
```
deep-research
```
. (What's unique here is the skeptical verification stance and the L1–L4 bubble grading, not the parallelism.)
Person-subject identity / footprint checks — invoke
```
osint-investigate
```
(ACH hypothesis matrix, Bellingcat-style pivots) rather than re-deriving identity attribution.
Mainland-China corporate registration / funding — invoke the
```
qcc
```
family of skills for 工商 data.
Social-platform playbook data — the
```
agent-reach
```
CLI covers B站/小红书/抖音/YouTube/X.

本技能的优势在于对抗性泡沫破除 + 归因分析 + 委托方资源匹配层。底层基础功能并无新意——直接复用即可：

扇形收集 / 来源治理——借鉴
```
deep-research
```
中的主Agent+子Agent模式。（本技能的独特之处在于怀疑性验证立场和L1-L4泡沫分级，而非并行机制。）
个人主体身份/足迹核查——调用
```
osint-investigate
```
（ACH假设矩阵、Bellingcat式转向分析），而非重新推导身份归因。
中国大陆企业注册/融资信息——调用
```
qcc
```
系列技能获取工商数据。
社交平台策略数据——
```
agent-reach
```
CLI覆盖B站/小红书/抖音/YouTube/X。

Read before you run

运行前必读

references/evidence_discipline_traps.md
— the recurring traps (inferring relationships from appearances, headline-claim attribution, client-vs-asset, foundation-before-fan-out, grade-don't-binary, privacy leak) with real teardown war-stories. Read this first; it's where runs actually break.
references/evidence_grading_rubric.md
— L1–L4, source_kind, verdicts, collection/verification schemas.
references/attribution_and_resource_mapping.md
— attribution weighting + four-tag mapping + landing-point framework.
references/workflow_orchestration_template.md
— a ready-to-fill
```
Workflow
```
script with the FACTS / COMMISSIONER_CONTEXT injection split already wired in.

references/evidence_discipline_traps.md
——常见陷阱（从表象推断关系、核心声明归因、客户vs资产、先筑牢基础再扇形收集、分级而非二元判定、隐私泄露）及真实拆解案例。请先阅读此文档；实际运行中的问题大多源于这些陷阱。
references/evidence_grading_rubric.md
——L1–L4分级、source_kind、判定结果、收集/验证schema。
references/attribution_and_resource_mapping.md
——归因权重 + 四标签匹配 + 落地要点框架。
references/workflow_orchestration_template.md
——可直接填充的
```
Workflow
```
脚本，已内置FACTS / COMMISSIONER_CONTEXT的分离注入逻辑。

Next Step

下一步

After the due-diligence conclusion is ready, suggest the natural follow-on (opt-in, never auto-run):

Due-diligence teardown is done.

Options:
A) Render it as a shareable PDF report — pdf-creator (Recommended if this goes to a partner/team)
B) One dimension needs deeper neutral background — deep-research on that sub-topic
C) No thanks — the markdown teardown is enough

尽职调查拆解完成后，建议自然的后续操作（可选，绝不自动运行）：

尽职调查拆解已完成。

选项：
A) 生成为可分享的PDF报告 —— pdf-creator（若需分享给合作伙伴/团队，推荐此选项）
B) 某一维度需要更深入的中立背景研究 —— 对该子主题进行deep-research
C) 无需额外操作 —— markdown格式的拆解内容已足够

benchmark-due-diligence

Original

Translation

Benchmark Due Diligence

标杆对象尽职调查

CRITICAL: run inline, never
`context: fork`

关键要求：以内嵌式运行，绝不能使用
`context: fork`

The one rule that protects the commissioner: two injection channels

保护委托方的核心规则：双注入通道

Phase 0 — nail the foundation by evidence, not appearance (do this BEFORE any agent)

第0阶段：基于证据而非表象筑牢基础（在启动任何Agent前完成）

The four-phase orchestration

四阶段编排流程

Don't rebuild what already exists

无需重复造轮子

Read before you run

运行前必读

Next Step

下一步

benchmark-due-diligence

Original

Translation

Benchmark Due Diligence

标杆对象尽职调查

CRITICAL: run inline, never context: fork

关键要求：以内嵌式运行，绝不能使用context: fork

The one rule that protects the commissioner: two injection channels

保护委托方的核心规则：双注入通道

Phase 0 — nail the foundation by evidence, not appearance (do this BEFORE any agent)

第0阶段：基于证据而非表象筑牢基础（在启动任何Agent前完成）

The four-phase orchestration

四阶段编排流程

Don't rebuild what already exists

无需重复造轮子

Read before you run

运行前必读

Next Step

下一步

CRITICAL: run inline, never
`context: fork`

关键要求：以内嵌式运行，绝不能使用
`context: fork`