dossier-collect

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Dossier Collect

档案收集

Recursive parallel investigation that builds a graph-structured dossier on a seed entity.
基于种子实体构建图结构档案的递归并行调查工具。

When to use

使用场景

You have a seed (a username, file, symbol, ADR-id, URL, or concept) and want to expand outward discovering every connected entity, with provenance per claim — rather than answering a specific question.
For specific questions use
deep-research
. For multi-step plans use
goal-plan
.
当你拥有一个种子(用户名、文件、符号、ADR编号、URL或概念),希望向外扩展发现所有关联实体,并为每个声明保留来源信息——而非回答特定问题时,可使用此工具。
若需回答特定问题,请使用
deep-research
;若需制定多步骤计划,请使用
goal-plan

Steps

步骤

  1. Detect seed type — classify as one of:
    username
    (handle),
    file
    (path),
    symbol
    (code identifier),
    adr
    (ADR-NNN),
    url
    , or
    concept
    (free text).
  2. Pick sources — match the source matrix to the seed type. Default: all applicable.
  3. Start trajectory — call
    mcp__claude-flow__hooks_intelligence_trajectory-start
    with task
    dossier:<slug>
    .
  4. Round 0 fan-out — issue ALL source queries in ONE message. Examples:
    • For
      username
      :
      WebSearch
      ,
      WebFetch
      on github.com/<user>,
      mcp__claude-flow__memory_search_unified
    • For
      adr
      :
      Read
      ADR file,
      Grep
      references,
      mcp__claude-flow__memory_search
      namespace
      adr
    • For
      symbol
      :
      Grep
      ,
      Glob
      ,
      mcp__claude-flow__embeddings_search
  5. Extract entities — from each hit, surface entities (people, repos, files, adrs, urls, terms). Lightweight regex + heuristics; no LLM extraction unless ambiguous.
  6. De-dup — drop entities already in the dossier. If
    --exact
    is unset, also drop entities whose embedding cosine similarity ≥ 0.92 to an existing node.
  7. Round k recursion — for each new entity (capped at
    --max-breadth
    per source), recurse to step 4 until depth ≥
    --max-depth
    OR budget exhausted.
  8. Aggregate — build
    { nodes, edges }
    graph. Each node carries
    { id, type, attrs, sources: [...] }
    . Each edge carries
    { from, to, kind, source, confidence }
    .
  9. Render artifacts:
    • <slug>.md
      — executive summary, entity table, mermaid graph, source-provenance footnotes
    • <slug>.json
      — machine-readable graph
    • Default location:
      v3/docs/examples/dossiers/<slug>/
  10. Persist
    mcp__claude-flow__memory_store
    namespace
    dossier
    key
    <slug>
    .
  11. End trajectory
    mcp__claude-flow__hooks_intelligence_trajectory-end
    with success status.
  1. 检测种子类型——将种子分类为以下类型之一:
    username
    (用户名/账号)、
    file
    (路径)、
    symbol
    (代码标识符)、
    adr
    (ADR-NNN)、
    url
    concept
    (自由文本)。
  2. 选择数据源——根据种子类型匹配数据源矩阵。默认:所有适用的数据源。
  3. 启动轨迹——调用
    mcp__claude-flow__hooks_intelligence_trajectory-start
    ,任务为
    dossier:<slug>
  4. 第0轮扩散——在一条消息中发起所有数据源查询。示例:
    • 对于
      username
      WebSearch
      、对github.com/<user>执行
      WebFetch
      mcp__claude-flow__memory_search_unified
    • 对于
      adr
      :读取ADR文件、
      Grep
      引用、
      mcp__claude-flow__memory_search
      命名空间
      adr
    • 对于
      symbol
      Grep
      Glob
      mcp__claude-flow__embeddings_search
  5. 提取实体——从每个命中结果中提取实体(人员、仓库、文件、ADR、URL、术语)。使用轻量级正则表达式+启发式方法;仅在模糊情况下使用LLM提取。
  6. 去重——移除已存在于档案中的实体。若未设置
    --exact
    ,还需移除与现有节点嵌入余弦相似度≥0.92的实体。
  7. 第k轮递归——针对每个新实体(每个数据源最多限制为
    --max-breadth
    个),递归执行步骤4,直到深度≥
    --max-depth
    或预算耗尽。
  8. 聚合——构建
    { nodes, edges }
    图结构。每个节点包含
    { id, type, attrs, sources: [...] }
    。每条边包含
    { from, to, kind, source, confidence }
  9. 生成产物
    • <slug>.md
      ——执行摘要、实体表格、Mermaid图、来源注释
    • <slug>.json
      ——机器可读的图结构
    • 默认存储位置:
      v3/docs/examples/dossiers/<slug>/
  10. 持久化——通过
    mcp__claude-flow__memory_store
    存储到命名空间
    dossier
    ,键为
    <slug>
  11. 结束轨迹——调用
    mcp__claude-flow__hooks_intelligence_trajectory-end
    ,标记成功状态。

Output schema (JSON)

输出 schema(JSON)

json
{
  "seed": "ruvnet",
  "seedType": "username",
  "depth": 2,
  "truncated": false,
  "generatedAt": "ISO-8601",
  "nodes": [
    { "id": "ruvnet", "type": "username", "attrs": { "...": "..." }, "sources": ["WebSearch", "github.com"] }
  ],
  "edges": [
    { "from": "ruvnet", "to": "ruflo", "kind": "owns", "source": "github.com", "confidence": "high" }
  ],
  "stats": { "nodesByType": {}, "sourcesUsed": [], "tokensSpent": 0 }
}
json
{
  "seed": "ruvnet",
  "seedType": "username",
  "depth": 2,
  "truncated": false,
  "generatedAt": "ISO-8601",
  "nodes": [
    { "id": "ruvnet", "type": "username", "attrs": { "...": "..." }, "sources": ["WebSearch", "github.com"] }
  ],
  "edges": [
    { "from": "ruvnet", "to": "ruflo", "kind": "owns", "source": "github.com", "confidence": "high" }
  ],
  "stats": { "nodesByType": {}, "sourcesUsed": [], "tokensSpent": 0 }
}

Budget discipline

预算管控

  • If
    --budget-usd
    is set, track approximate cost via trajectory. On exhaustion: emit partial dossier with
    truncated: true
    and the entities still queued.
  • BFS expansion only — finish round k before round k+1.
  • Never silently truncate. Always mark and record what was skipped.
  • 若设置了
    --budget-usd
    ,通过轨迹跟踪近似成本。预算耗尽时:生成部分档案,标记
    truncated: true
    并记录待处理的实体。
  • 仅使用BFS扩展——完成第k轮后再开始第k+1轮。
  • 绝不静默截断。始终标记并记录跳过的内容。

Examples

示例

/ruflo-goals:dossier-collect ruvnet
/ruflo-goals:dossier-collect ADR-097 --max-depth 1
/ruflo-goals:dossier-collect "src/memory/hnsw.ts" --sources codebase,git,memory
/ruflo-goals:dossier-collect "ruflo-goals" --max-breadth 5 --budget-usd 1
/ruflo-goals:dossier-collect ruvnet
/ruflo-goals:dossier-collect ADR-097 --max-depth 1
/ruflo-goals:dossier-collect "src/memory/hnsw.ts" --sources codebase,git,memory
/ruflo-goals:dossier-collect "ruflo-goals" --max-breadth 5 --budget-usd 1