dossier-collect

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Dossier Collect

档案收集

Recursive parallel investigation that builds a graph-structured dossier on a seed entity.

基于种子实体构建图结构档案的递归并行调查工具。

When to use

使用场景

You have a seed (a username, file, symbol, ADR-id, URL, or concept) and want to expand outward discovering every connected entity, with provenance per claim — rather than answering a specific question.

For specific questions use

deep-research

. For multi-step plans use

goal-plan

当你拥有一个种子（用户名、文件、符号、ADR编号、URL或概念），希望向外扩展发现所有关联实体，并为每个声明保留来源信息——而非回答特定问题时，可使用此工具。

若需回答特定问题，请使用

deep-research

；若需制定多步骤计划，请使用

goal-plan

。

Steps

步骤

Detect seed type — classify as one of:
```
username
```
(handle),
```
file
```
(path),
```
symbol
```
(code identifier),
```
adr
```
(ADR-NNN),
```
url
```
, or
```
concept
```
(free text).
Pick sources — match the source matrix to the seed type. Default: all applicable.

Start trajectory — call

mcp__claude-flow__hooks_intelligence_trajectory-start

with task

dossier:<slug>

Round 0 fan-out — issue ALL source queries in ONE message. Examples:

For

username

WebSearch

WebFetch

on github.com/<user>,

mcp__claude-flow__memory_search_unified

For

adr

Read

ADR file,

Grep

references,

mcp__claude-flow__memory_search

namespace

adr

For

symbol

Grep

Glob

mcp__claude-flow__embeddings_search

Extract entities — from each hit, surface entities (people, repos, files, adrs, urls, terms). Lightweight regex + heuristics; no LLM extraction unless ambiguous.
De-dup — drop entities already in the dossier. If
```
--exact
```
is unset, also drop entities whose embedding cosine similarity ≥ 0.92 to an existing node.
Round k recursion — for each new entity (capped at
```
--max-breadth
```
per source), recurse to step 4 until depth ≥
```
--max-depth
```
OR budget exhausted.

Aggregate — build

{ nodes, edges }

graph. Each node carries

{ id, type, attrs, sources: [...] }

. Each edge carries

{ from, to, kind, source, confidence }

Render artifacts:
- ```
<slug>.md
```
  — executive summary, entity table, mermaid graph, source-provenance footnotes
- ```
<slug>.json
```
  — machine-readable graph
- Default location:
```
v3/docs/examples/dossiers/<slug>/
```

Persist —

mcp__claude-flow__memory_store

namespace

dossier

key

<slug>

End trajectory —

mcp__claude-flow__hooks_intelligence_trajectory-end

with success status.

检测种子类型——将种子分类为以下类型之一：
```
username
```
（用户名/账号）、
```
file
```
（路径）、
```
symbol
```
（代码标识符）、
```
adr
```
（ADR-NNN）、
```
url
```
或
```
concept
```
（自由文本）。
选择数据源——根据种子类型匹配数据源矩阵。默认：所有适用的数据源。

启动轨迹——调用

mcp__claude-flow__hooks_intelligence_trajectory-start

，任务为

dossier:<slug>

。

第0轮扩散——在一条消息中发起所有数据源查询。示例：
- 对于
```
username
```
  ：
```
WebSearch
```
  、对github.com/<user>执行
```
WebFetch
```
  、
```
mcp__claude-flow__memory_search_unified
```
- 对于
```
adr
```
  ：读取ADR文件、
```
Grep
```
  引用、
```
mcp__claude-flow__memory_search
```
  命名空间
```
adr
```
- 对于
```
symbol
```
  ：
```
Grep
```
  、
```
Glob
```
  、
```
mcp__claude-flow__embeddings_search
```
提取实体——从每个命中结果中提取实体（人员、仓库、文件、ADR、URL、术语）。使用轻量级正则表达式+启发式方法；仅在模糊情况下使用LLM提取。
去重——移除已存在于档案中的实体。若未设置
```
--exact
```
，还需移除与现有节点嵌入余弦相似度≥0.92的实体。
第k轮递归——针对每个新实体（每个数据源最多限制为
```
--max-breadth
```
个），递归执行步骤4，直到深度≥
```
--max-depth
```
或预算耗尽。

聚合——构建

{ nodes, edges }

图结构。每个节点包含

{ id, type, attrs, sources: [...] }

。每条边包含

{ from, to, kind, source, confidence }

。

生成产物：
- ```
<slug>.md
```
  ——执行摘要、实体表格、Mermaid图、来源注释
- ```
<slug>.json
```
  ——机器可读的图结构
- 默认存储位置：
```
v3/docs/examples/dossiers/<slug>/
```
持久化——通过
```
mcp__claude-flow__memory_store
```
存储到命名空间
```
dossier
```
，键为
```
<slug>
```
。

结束轨迹——调用

mcp__claude-flow__hooks_intelligence_trajectory-end

，标记成功状态。

Output schema (JSON)

输出 schema（JSON）

json

{
  "seed": "ruvnet",
  "seedType": "username",
  "depth": 2,
  "truncated": false,
  "generatedAt": "ISO-8601",
  "nodes": [
    { "id": "ruvnet", "type": "username", "attrs": { "...": "..." }, "sources": ["WebSearch", "github.com"] }
  ],
  "edges": [
    { "from": "ruvnet", "to": "ruflo", "kind": "owns", "source": "github.com", "confidence": "high" }
  ],
  "stats": { "nodesByType": {}, "sourcesUsed": [], "tokensSpent": 0 }
}

json

{
  "seed": "ruvnet",
  "seedType": "username",
  "depth": 2,
  "truncated": false,
  "generatedAt": "ISO-8601",
  "nodes": [
    { "id": "ruvnet", "type": "username", "attrs": { "...": "..." }, "sources": ["WebSearch", "github.com"] }
  ],
  "edges": [
    { "from": "ruvnet", "to": "ruflo", "kind": "owns", "source": "github.com", "confidence": "high" }
  ],
  "stats": { "nodesByType": {}, "sourcesUsed": [], "tokensSpent": 0 }
}

Budget discipline

预算管控

If
```
--budget-usd
```
is set, track approximate cost via trajectory. On exhaustion: emit partial dossier with
```
truncated: true
```
and the entities still queued.
BFS expansion only — finish round k before round k+1.
Never silently truncate. Always mark and record what was skipped.

若设置了
```
--budget-usd
```
，通过轨迹跟踪近似成本。预算耗尽时：生成部分档案，标记
```
truncated: true
```
并记录待处理的实体。
仅使用BFS扩展——完成第k轮后再开始第k+1轮。
绝不静默截断。始终标记并记录跳过的内容。

Examples

示例

/ruflo-goals:dossier-collect ruvnet
/ruflo-goals:dossier-collect ADR-097 --max-depth 1
/ruflo-goals:dossier-collect "src/memory/hnsw.ts" --sources codebase,git,memory
/ruflo-goals:dossier-collect "ruflo-goals" --max-breadth 5 --budget-usd 1

/ruflo-goals:dossier-collect ruvnet
/ruflo-goals:dossier-collect ADR-097 --max-depth 1
/ruflo-goals:dossier-collect "src/memory/hnsw.ts" --sources codebase,git,memory
/ruflo-goals:dossier-collect "ruflo-goals" --max-breadth 5 --budget-usd 1