osint

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OSINT

Open-source intelligence gathering with disambiguation discipline, confidence grading, and tiered depth. Public sources only. No breach data. No social engineering. Scope-respecting.

This skill is the orchestrator. You do the thinking. Tools and scripts are workers — they fetch, you decide.

具备消歧规范、可信度评级和分层深度的开源情报收集工具。仅使用公开来源数据，不涉及泄露数据，不采用社会工程学手段，严格遵守调查范围。

本技能负责统筹协调，您负责决策。工具和脚本负责执行数据获取，您负责判断分析。

How to use this skill

如何使用本技能

Classify the target. Person, company, domain/IP, B2B account, due-diligence subject, or threat entity. If ambiguous, ask the user.
Pick a depth. Quick (≤5 min, web search only) → Standard (15–30 min, multi-tool) → Deep (full dossier, may take an hour). Default to Standard unless the user signals otherwise.
Run the gates below — every investigation, every time.
Route to a workflow. See the routing table.
Produce the output contract — a markdown dossier plus a
```
findings.json
```
sidecar.

分类目标类型。包括人物、企业、域名/IP、B2B客户、尽职调查对象或威胁实体。若目标类型不明确，请询问用户。
选择调查深度。快速调查（≤5分钟，仅网页搜索）→ 标准调查（15–30分钟，多工具协作）→ 深度调查（完整档案，可能耗时1小时）。除非用户另有说明，默认采用标准调查。
执行以下四道关卡——每次调查都必须执行。
选择对应工作流。参考路由表。
生成输出成果——一份Markdown格式的档案文件，外加一个
```
findings.json
```
附属文件。

The Four Gates

四道关卡

Run these in order. Skipping any of them is how OSINT goes wrong.

按顺序执行以下步骤。跳过任何一步都会导致OSINT调查出错。

Gate 1 — Scope & Authorization

关卡1：范围与授权

Before a single query, confirm:

Lawful basis. Public information about a public-facing target (executive, company, public domain) is fine. Private individuals investigated for stalking, harassment, or unauthorized vetting are not. If the request smells like that, decline and say why.
No prohibited methods. No paid breach databases. No social engineering. No accessing accounts you don't own. No bypassing paywalls or platform ToS. No active probing of infrastructure (port scans, vuln scans) without explicit authorization.
Scope statement. Write one sentence: "I am researching [target] for [purpose] using [public sources]." If you can't write that sentence cleanly, stop.

See

references/ethics-and-scope.md

for full rules and the decline-it list.

在发起任何查询前，请确认：

合法依据。针对公开目标（高管、企业、公共域名）的公开信息调查是合规的。但针对私人个体的跟踪、骚扰或未经授权的审核调查均不允许。若请求存在此类嫌疑，请拒绝并说明原因。
禁止手段。不得使用付费泄露数据库，不得采用社会工程学手段，不得访问非本人所有的账户，不得绕过付费墙或违反平台服务条款。未经明确授权，不得对基础设施进行主动探测（端口扫描、漏洞扫描）。
范围声明。撰写一句话声明：“我正在使用[公开来源]，为[目的]调查[目标]。” 若无法清晰撰写该声明，请停止调查。

完整规则及拒绝清单请查看

references/ethics-and-scope.md

。

Gate 2 — Disambiguation

关卡2：身份消歧

The single biggest OSINT failure mode is conflating two people (or two companies) with the same name. Before reporting anything about a target:

Identify all plausible candidates that match the name/handle.
Pick a discriminator — employer, location, age range, domain, industry, photo.
Confirm the discriminator from a primary source (LinkedIn, corporate bio, company website, official registry).
State which candidate you're tracking and which ones you're explicitly not.

If you can't disambiguate, say so. Don't guess.

See

references/disambiguation.md

for the full protocol.

OSINT调查最常见的失败原因是混淆同名的两个人（或两家企业）。在报告目标的任何信息前：

识别所有可能符合姓名/标识的候选对象。
选定一个区分特征——雇主、所在地、年龄范围、域名、行业、照片。
通过主来源（LinkedIn、企业简介、公司官网、官方注册机构）确认该区分特征。
明确说明您跟踪的是哪个候选对象，以及明确排除哪些对象。

若无法完成消歧，请如实告知，切勿猜测。

完整流程请查看

references/disambiguation.md

。

Gate 3 — Source Independence

关卡3：来源独立性

A "fact" repeated by ten content-farm sites that all copied one LinkedIn bio is one source, not ten. Before grading a finding:

Trace each claim to a primary source (the entity itself, an authoritative registry, a first-hand account).
Two sources owned by the same parent, or two articles citing each other, count as one.
Self-published claims (the target's own LinkedIn, bio, website) are useful for what they claim but not independent confirmation of the underlying fact.

如果十个内容聚合网站的“事实”均复制自同一份LinkedIn简介，那么这只是一个来源，而非十个。在对结论评级前：

追溯每项声明至主来源（实体自身、权威注册机构、一手资料）。
隶属于同一母公司的两个来源，或互相引用的两篇文章，仅算作一个来源。
自我发布的声明（目标自身的LinkedIn、简介、官网）可用于了解他们的主张，但不能作为基础事实的独立确认依据。

Gate 4 — Confidence Grading

关卡4：可信度评级

Every finding gets a letter grade in the dossier:

A — Three or more independent sources, including at least one primary or authoritative.
B — Two independent sources.
C — One source, or multiple non-independent sources.
D — Inferred, likely-but-unconfirmed, or single self-published source.

If a finding can only get a D, either drop it or label it as inference.

See

references/confidence-grading.md

for the rubric and edge cases.

档案中的每项结论都需标注字母评级：

A —— 三个及以上独立来源，其中至少包含一个主来源或权威来源。
B —— 两个独立来源。
C —— 一个来源，或多个非独立来源。
D —— 推断得出、大概率但未确认，或单一自我发布来源。

若某项结论只能获得D级，要么舍弃该结论，要么标注为推断内容。

评级标准及特殊情况请查看

references/confidence-grading.md

。

Routing table

路由表

Match the user's intent to a workflow. If unclear, ask.

User intent / phrasing	Workflow
"Research this person", "background check on", "who is X"	`workflows/person.md`
"Look into this company", "what does X do", "is X legit"	`workflows/company.md`
"B2B account intel", "is this prospect worth pursuing", "find the buyer at", "what's their stack", "who's their MSP"	`workflows/b2b-account.md`
"Check this domain/IP", "tech footprint", "what's running on", "subdomains of"	`workflows/domain.md`
"Due diligence", "investment-grade research", "vet this vendor/partner"	`workflows/due-diligence.md`

For multi-target investigations (e.g., a person at a company), run the relevant workflows in sequence and merge the outputs.

根据用户意图匹配对应工作流。若意图不明确，请询问用户。

用户意图/表述	工作流
“研究此人”、“背景调查”、“X是谁”	`workflows/person.md`
“调查这家企业”、“X是做什么的”、“X是否合法合规”	`workflows/company.md`
“B2B客户情报”、“这个潜在客户是否值得跟进”、“找到采购负责人”、“他们的技术栈是什么”、“谁是他们的MSP”	`workflows/b2b-account.md`
“检查该域名/IP”、“技术足迹”、“运行的是什么服务”、“子域名”	`workflows/domain.md`
“尽职调查”、“投资级研究”、“审核该供应商/合作伙伴”	`workflows/due-diligence.md`

针对多目标调查（例如，某企业中的某个人），请依次运行相关工作流并合并输出结果。

Phase model

阶段模型

Pick a depth at the start. Tell the user the depth and the rough budget so they can adjust.

调查开始时选定深度。告知用户调查深度和大致耗时，以便用户调整。

Quick (≤5 min, web_search only)

快速调查（≤5分钟，仅网页搜索）

3–5 targeted searches.
Read 1–2 primary sources.
Output: 2–3 paragraph summary + 3–5 cited findings + confidence grades.
Use for: triage, "do I care about this", before-a-meeting prep.

3–5次定向搜索。
阅读1–2份主来源。
输出：2–3段总结 + 3–5项带引用的结论 + 可信度评级。
适用场景：分类筛选、“我是否需要关注这个目标”、会前准备。

Standard (15–30 min, multi-tool)

标准调查（15–30分钟，多工具协作）

8–15 targeted searches across general web, LinkedIn, corporate registries.
Read 5–10 primary sources, including the target's own properties.
Cross-check at least three claims for source independence.
Output: full dossier per
```
templates/dossier.md
```
+
```
findings.json
```
sidecar.
Use for: most B2B account work, prospect research, person research.

在通用网页、LinkedIn、企业注册机构进行8–15次定向搜索。
阅读5–10份主来源，包括目标自身的平台内容。
至少交叉验证三项结论的来源独立性。
输出：符合
```
templates/dossier.md
```
格式的完整档案 +
```
findings.json
```
附属文件。
适用场景：大多数B2B客户工作、潜在客户研究、人物研究。

Deep (60+ min, all available tools)

深度调查（60+分钟，使用所有可用工具）

Iterative search with progressive narrowing.
DNS/WHOIS/cert transparency for domain targets.
Multiple platforms cross-referenced.
Historical data (Wayback) where it changes the picture.
Output: full dossier, findings.json, plus a one-page executive summary.
Use for: due diligence, investment decisions, big-account work.

If you're going to exceed the budget you stated, stop and tell the user before continuing.

迭代搜索，逐步缩小范围。
针对域名目标使用DNS/WHOIS/证书透明度数据。
跨多个平台交叉验证。
若历史数据会影响结论，使用Wayback Machine获取历史信息。
输出：完整档案、findings.json文件，外加一页执行摘要。
适用场景：尽职调查、投资决策、重要客户工作。

若调查耗时将超出告知用户的时长，请先停止并告知用户，再继续执行。

Output contract

输出规范

Every Standard or Deep investigation produces two artifacts:

<target-slug>-dossier.md
— human-readable, follows
```
templates/dossier.md
```
. Section headers, confidence grades on every claim, source URLs collected at the bottom.

<target-slug>-findings.json
— machine-readable, follows

templates/findings.schema.json

. Each finding has

claim

confidence

sources[]

category

extracted_at

For Quick investigations, the markdown summary is enough. JSON is optional.

Save outputs to a folder named

osint/<target-slug>/

in the working directory unless the user specifies otherwise.

所有标准或深度调查需生成两份成果：

<target-slug>-dossier.md
—— 人类可读格式，遵循
```
templates/dossier.md
```
模板。包含章节标题、每项结论的可信度评级，底部汇总来源URL。

<target-slug>-findings.json
—— 机器可读格式，遵循

templates/findings.schema.json

规范。每项结论包含

claim

、

confidence

、

sources[]

、

category

、

extracted_at

字段。

快速调查仅需输出Markdown格式的总结，JSON文件为可选。

除非用户另有指定，否则请将输出文件保存至工作目录下的

osint/<target-slug>/

文件夹中。

Tool usage

工具使用

Required minimum: web search + ability to fetch URLs. The skill works with just these.

Strongly helpful (graceful degradation):

LinkedIn search (via web search operators or scraper)
WHOIS / DNS / certificate transparency for domain work
GitHub search (free API tier is plenty)
Wayback Machine (
```
web.archive.org
```
) for historical
Apify or Bright Data actors for social platform extraction (if API keys present)

Search operator reference:

references/search-operators.md

— Google dorks, LinkedIn X-ray, GitHub search syntax, Wayback usage.

API key check (do this once at the start of any Standard or Deep investigation, silently):

bash

[ -n "$GITHUB_TOKEN" ]   && echo "github: yes"   || echo "github: no"
[ -n "$BRAVE_API_KEY" ]  && echo "brave: yes"    || echo "brave: no"
[ -n "$APIFY_TOKEN" ]    && echo "apify: yes"    || echo "apify: no"
[ -n "$SHODAN_API_KEY" ] && echo "shodan: yes"   || echo "shodan: no"

If a key is missing, fall back to web search. Never tell the user "I can't do this" without first trying the fallback.

最低要求： 网页搜索 + URL获取能力。仅使用这两项工具即可运行本技能。

强烈推荐（支持优雅降级）：

LinkedIn搜索（通过网页搜索运算符或爬虫）
域名调查使用WHOIS / DNS / 证书透明度工具
GitHub搜索（免费API tier已足够）
Wayback Machine (
```
web.archive.org
```
) 获取历史数据
若有API密钥，使用Apify或Bright Data的actor提取社交平台数据

搜索运算符参考：

references/search-operators.md

—— Google高级搜索语法、LinkedIn X射线搜索、GitHub搜索语法、Wayback Machine使用方法。

API密钥检查（在任何标准或深度调查开始时静默执行一次）：

bash

[ -n "$GITHUB_TOKEN" ]   && echo "github: yes"   || echo "github: no"
[ -n "$BRAVE_API_KEY" ]  && echo "brave: yes"    || echo "brave: no"
[ -n "$APIFY_TOKEN" ]    && echo "apify: yes"    || echo "apify: no"
[ -n "$SHODAN_API_KEY" ] && echo "shodan: yes"   || echo "shodan: no"

若缺少某一密钥，请 fallback 至网页搜索。在尝试 fallback 前，切勿告知用户“我无法完成此任务”。

Anti-patterns to avoid

需避免的反模式

Don't open with the dossier. State the depth, scope, and disambiguation result first; produce the dossier at the end.
Don't manufacture confidence. A C-grade finding labeled C is useful. A C-grade finding labeled A poisons the rest of the report.
Don't pad with noise. Five well-sourced findings beat fifty drive-by mentions.
Don't assume the most prominent match is the right one. The CEO is famous; the IT director you're actually trying to reach is not.
Don't write speculative biography. If a source doesn't say it, you don't either.
Don't keep mining after the question is answered. Quick means quick.

不要直接输出档案。先说明调查深度、范围和消歧结果；最后再生成档案。
不要伪造可信度。标注为C级的结论是有用的，但将C级结论标注为A级会破坏整个报告的可信度。
不要堆砌无关信息。五项来源可靠的结论胜过五十项无关提及。
不要假设最显眼的匹配项就是正确目标。CEO可能很有名，但您实际要找的IT总监可能并不知名。
不要撰写推测性传记。若来源未提及，请勿自行添加。
不要在问题已解决后继续收集数据。快速调查就要快速完成。

Agentic OS Integration

Agentic OS集成

If the current project root contains an

.agent/

directory, this skill MUST participate in the Agentic OS shared-memory model.

At the end of your execution, check for

.agent/state/last-run.json

. If it exists, append or update the file using its required schema to log your run. Ensure you capture your runtime (

agent_id

assigned_skill

, a concise

description

decision_log

, and

outcome

. Use

python3 ~/.agents/skills/heartbeat/scripts/heartbeat.py complete <task_id> ...

if completing a task from the queue.

若当前项目根目录包含

.agent/

目录，本技能必须参与Agentic OS共享内存模型。

执行结束后，检查

.agent/state/last-run.json

文件。若文件存在，请按照其要求的模式追加或更新文件，记录本次运行信息。确保捕获运行时信息（

agent_id

）、

assigned_skill

、简洁的

description

、

decision_log

和

outcome

。若完成队列中的任务，请使用

python3 ~/.agents/skills/heartbeat/scripts/heartbeat.py complete <task_id> ...

命令。

Optional: PAI / OpenCode integration

可选：PAI / OpenCode集成

If running inside PAI or PAI-OpenCode, the following are honored automatically:

User customizations at

~/.opencode/skills/PAI/USER/SKILLCUSTOMIZATIONS/OSINT/PREFERENCES.md

override defaults.

Apify actors via
```
~/.opencode/skills/Apify/
```
are used when present.
Memory/context loading from
```
~/.opencode/skills/PAI/SKILL.md
```
runs first if the file exists.

If those paths don't exist, ignore them. The skill is fully functional standalone.

若在PAI或PAI-OpenCode环境中运行，将自动遵循以下规则：

~/.opencode/skills/PAI/USER/SKILLCUSTOMIZATIONS/OSINT/PREFERENCES.md

中的用户自定义设置将覆盖默认值。

若存在
```
~/.opencode/skills/Apify/
```
路径，将使用其中的Apify actors。
若
```
~/.opencode/skills/PAI/SKILL.md
```
文件存在，将先加载该文件的内存/上下文。

若上述路径不存在，可忽略。本技能完全支持独立运行。

Quick-start example

快速入门示例

User: "Look into Jason Gurash at Avenir Senior Living for me — B2B prospect, infrastructure consulting angle."

Your turn 1 (don't dossier yet):

Setting up a B2B account intel investigation. Routing to
workflows/b2b-account.md
. Depth: Standard (~20 min). Disambiguation target: Jason Gurash, VP Operations, Avenir Senior Living. I'll confirm that match before reporting any details about him personally. Scope: public sources for sales discovery — corporate site, LinkedIn, news, tech footprint. I'll skip personal-life angles since they're not relevant to the consulting use case. Starting now.

Then run the workflow. Then produce the dossier with grades.

用户：“帮我调查Avenir Senior Living的Jason Gurash——B2B潜在客户，从基础设施咨询角度出发。”

您的第一步（不要直接输出档案）：

正在启动B2B客户情报调查，将路由至
workflows/b2b-account.md
。调查深度：标准（约20分钟）。消歧目标：Avenir Senior Living的运营副总裁Jason Gurash。在报告他的个人细节前，我会先确认该匹配项的准确性。调查范围：用于销售挖掘的公开来源——企业官网、LinkedIn、新闻、技术足迹。由于与咨询场景无关，我将跳过个人生活相关内容。现在开始执行。

然后运行对应工作流，最后生成带评级的档案文件。