deep-research

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Deep Research Skill

深度研究技能

Trigger

触发条件

Activate this skill when the user wants to:

"Research a topic", "literature review", "find papers about", "survey papers on"
"Deep dive into [topic]", "what's the state of the art in [topic]"
Uses
```
/research <topic>
```
slash command

当用户有以下需求时激活该技能：

"研究某个主题"、"文献综述"、"查找关于xxx的论文"、"调研xxx领域的论文"
"深入探究[主题]"、"[主题]的当前最优技术是什么"
使用
```
/research <topic>
```
斜杠命令

Overview

概述

This skill conducts systematic academic literature reviews in 6 phases, producing structured notes, a curated paper database, and a synthesized final report. Output is organized by phase for clarity.

Installation:

~/.claude/skills/deep-research/

— scripts, references, and this skill definition. Output:

.//Users/lingzhi/Code/deep-research-output/{slug}/

relative to the current working directory.

本技能分6个阶段开展系统化的学术文献综述，生成结构化笔记、精选论文数据库以及综合最终报告。为清晰起见，输出按阶段组织。

安装路径：

~/.claude/skills/deep-research/

— 存放脚本、参考资料和本技能的定义。 输出路径：相对于当前工作目录的

.//Users/lingzhi/Code/deep-research-output/{slug}/

。

CRITICAL: Strict Sequential Phase Execution

重要须知：严格按顺序执行阶段

You MUST execute all 6 phases in strict order: 1 → 2 → 3 → 4 → 5 → 6. NEVER skip any phase.

This is the single most important rule of this skill. Violations include:

❌ Jumping from Phase 2 to Phase 5/6 (skipping Deep Dive and Code)
❌ Writing synthesis or report before completing Phase 3 deep reading
❌ Producing a final report based only on abstracts/titles from search results
❌ Combining or merging phases (e.g., doing "Phase 3-5 together")

你必须严格按照 1 → 2 → 3 → 4 → 5 → 6 的顺序执行全部6个阶段，严禁跳过任何阶段。

这是本技能最重要的规则。违规行为包括：

❌ 从第2阶段直接跳到第5/6阶段（跳过深度研读和代码调研阶段）
❌ 在完成第3阶段深度研读之前就撰写综合分析或报告
❌ 仅基于搜索结果的摘要/标题生成最终报告
❌ 合并多个阶段（例如“同时执行第3-5阶段”）

Phase Gate Protocol

阶段关口协议

Before starting Phase N+1, you MUST verify that Phase N's required output files exist on disk. If they don't exist, you have NOT completed that phase.

Phase	Gate: Required Output Files
1 → 2	`phase1_frontier/frontier.md` exists AND contains ≥10 papers
2 → 3	`phase2_survey/survey.md` exists AND `paper_db.jsonl` has 35-80 papers
3 → 4	`phase3_deep_dive/selection.md` AND `phase3_deep_dive/deep_dive.md` exist AND deep_dive.md contains detailed notes for ≥8 papers
4 → 5	`phase4_code/code_repos.md` exists AND contains ≥3 repositories
5 → 6	`phase5_synthesis/synthesis.md` AND `phase5_synthesis/gaps.md` exist

After completing each phase, print a phase completion checkpoint:

✅ Phase N complete. Output: [list files written]. Proceeding to Phase N+1.

在开始第N+1阶段前，你必须验证第N阶段的要求输出文件已存在于磁盘中。如果文件不存在，说明你尚未完成该阶段。

阶段	关口：要求输出文件
1 → 2	`phase1_frontier/frontier.md` 存在且包含≥10篇论文
2 → 3	`phase2_survey/survey.md` 存在且 `paper_db.jsonl` 包含35-80篇论文
3 → 4	`phase3_deep_dive/selection.md` 和 `phase3_deep_dive/deep_dive.md` 存在且 deep_dive.md 包含≥8篇论文的详细笔记
4 → 5	`phase4_code/code_repos.md` 存在且包含≥3个代码仓库
5 → 6	`phase5_synthesis/synthesis.md` 和 `phase5_synthesis/gaps.md` 存在

每个阶段完成后，打印阶段完成检查点：

✅ Phase N complete. Output: [list files written]. Proceeding to Phase N+1.

Why Every Phase Matters

各阶段的必要性说明

Phase 3 (Deep Dive) is where you actually READ papers — without it, your synthesis is superficial and based only on abstracts
Phase 4 (Code & Tools) grounds the research in practical implementations — without it, you miss the open-source ecosystem
Phase 5 (Synthesis) requires deep knowledge from Phase 3 — you cannot synthesize papers you haven't read
Phase 6 (Report) assembles content from ALL prior phases — it should cite specific findings from Phase 3 notes

第3阶段（深度研读）：你会真正阅读论文全文——没有这一步，你的综合分析会非常浅显，仅基于摘要内容
第4阶段（代码与工具）：将研究落地到实际实现——没有这一步，你会遗漏开源生态相关内容
第5阶段（综合分析）：需要第3阶段积累的深度知识——你无法对未读过的论文做综合分析
第6阶段（报告）：整合所有先前阶段的内容——应当引用第3阶段笔记中的具体发现

Paper Quality Policy

论文质量政策

Peer-reviewed conference papers take priority over arXiv preprints. Many arXiv papers have not undergone peer review and may contain unverified claims.

同行评审的会议论文优先级高于arXiv预印本。 很多arXiv论文未经过同行评审，可能包含未经验证的结论。

Source Priority (highest to lowest)

来源优先级（从高到低）

Top AI conferences: NeurIPS, ICLR, ICML, ACL, EMNLP, NAACL, AAAI, IJCAI, CVPR, KDD, CoRL
Peer-reviewed journals: JMLR, TACL, Nature, Science, etc.
Workshop papers: NeurIPS/ICML workshops (lower bar but still reviewed)
arXiv preprints with high citations: Likely high-quality but unverified
Recent arXiv preprints: Use cautiously, note "preprint" status explicitly

顶级AI会议：NeurIPS, ICLR, ICML, ACL, EMNLP, NAACL, AAAI, IJCAI, CVPR, KDD, CoRL
同行评审期刊：JMLR, TACL, Nature, Science 等
研讨会论文：NeurIPS/ICML 研讨会（录用门槛更低但仍经过评审）
高引用量arXiv预印本：质量可能较高但未经验证
近期arXiv预印本：谨慎使用，明确标注“预印本”状态

When to Use arXiv Papers

arXiv论文适用场景

As supplementary evidence alongside peer-reviewed work
For very recent results (< 3 months old) not yet at conferences
When a peer-reviewed version doesn't exist yet — note
```
(preprint)
```
in citations
For survey/review papers (these are useful even without peer review)

作为同行评审成果的补充证据
针对发布时间极短（<3个月）、尚未在会议发表的研究结果
当暂时没有同行评审版本时——在引用中注明
```
(preprint)
```
用于调研/综述类论文（即使没有同行评审也有参考价值）

Search Tools (by priority)

搜索工具（按优先级排序）

1. paper_finder (primary — conference papers only)

1. paper_finder（主要工具——仅支持会议论文）

Location:

/Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py

Searches ai-paper-finder.info (HuggingFace Space) for published conference papers. Supports filtering by conference + year. Outputs JSONL with BibTeX.

bash

python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode scrape --config <config.yaml>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode download --jsonl <results.jsonl>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --list-venues

Config example:

yaml

searches:
  - query: "long horizon reasoning agent"
    num_results: 100
    venues:
      neurips: [2024, 2025]
      iclr: [2024, 2025, 2026]
      icml: [2024, 2025]
output:
  root: /Users/lingzhi/Code/deep-research-output/{slug}/phase1_frontier/search_results
  overwrite: true

路径：

/Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py

在 ai-paper-finder.info（HuggingFace Space）中搜索已发表的会议论文，支持按会议+年份筛选，输出带BibTeX的JSONL格式内容。

bash

python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode scrape --config <config.yaml>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode download --jsonl <results.jsonl>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --list-venues

配置示例：

yaml

searches:
  - query: "long horizon reasoning agent"
    num_results: 100
    venues:
      neurips: [2024, 2025]
      iclr: [2024, 2025, 2026]
      icml: [2024, 2025]
output:
  root: /Users/lingzhi/Code/deep-research-output/{slug}/phase1_frontier/search_results
  overwrite: true

2. search_semantic_scholar.py (supplementary — citation data + broader coverage)

2. search_semantic_scholar.py（补充工具——提供引用数据+更广覆盖范围）

Location:

/Users/lingzhi/.claude/skills/deep-research/scripts/search_semantic_scholar.py

Supports

--peer-reviewed-only

and

--top-conferences

filters. API key:

/Users/lingzhi/Code/keys.md

(field

S2_API_Key

)

路径：

/Users/lingzhi/.claude/skills/deep-research/scripts/search_semantic_scholar.py

支持

--peer-reviewed-only

和

--top-conferences

筛选参数。API密钥：

/Users/lingzhi/Code/keys.md

（字段

S2_API_Key

）

3. search_arxiv.py (supplementary — latest preprints)

3. search_arxiv.py（补充工具——最新预印本）

Location:

/Users/lingzhi/.claude/skills/deep-research/scripts/search_arxiv.py

For searching recent papers not yet published at conferences. Mark citations with

(preprint)

路径：

/Users/lingzhi/.claude/skills/deep-research/scripts/search_arxiv.py

用于搜索还未在会议上发表的最新论文，引用时标注

(preprint)

。

Other Scripts

其他脚本

Script	Location	Key Flags
`download_papers.py`	`~/.claude/skills/deep-research/scripts/`	`--jsonl` , `--output-dir` , `--max-downloads` , `--sort-by-citations`
`extract_pdf.py`	`~/.claude/skills/deep-research/scripts/`	`--pdf` , `--pdf-dir` , `--output-dir` , `--sections-only`
`paper_db.py`	`~/.claude/skills/deep-research/scripts/`	subcommands: `merge` , `search` , `filter` , `tag` , `stats` , `add` , `export`
`bibtex_manager.py`	`~/.claude/skills/deep-research/scripts/`	`--jsonl` , `--output` , `--keys-only`
`compile_report.py`	`~/.claude/skills/deep-research/scripts/`	`--topic-dir`

脚本	路径	核心参数
`download_papers.py`	`~/.claude/skills/deep-research/scripts/`	`--jsonl` , `--output-dir` , `--max-downloads` , `--sort-by-citations`
`extract_pdf.py`	`~/.claude/skills/deep-research/scripts/`	`--pdf` , `--pdf-dir` , `--output-dir` , `--sections-only`
`paper_db.py`	`~/.claude/skills/deep-research/scripts/`	子命令： `merge` , `search` , `filter` , `tag` , `stats` , `add` , `export`
`bibtex_manager.py`	`~/.claude/skills/deep-research/scripts/`	`--jsonl` , `--output` , `--keys-only`
`compile_report.py`	`~/.claude/skills/deep-research/scripts/`	`--topic-dir`

WebFetch Mode (no Bash)

WebFetch模式（无需Bash）

Paper discovery:
```
WebSearch
```
+
```
WebFetch
```
to query Semantic Scholar/arXiv APIs
Paper reading:
```
WebFetch
```
on ar5iv HTML or
```
Read
```
tool on downloaded PDFs
Writing:
```
Write
```
tool for JSONL, notes, report files

论文发现：使用
```
WebSearch
```
+
```
WebFetch
```
查询 Semantic Scholar/arXiv API
论文阅读：对ar5iv的HTML页面使用
```
WebFetch
```
，或对下载的PDF使用
```
Read
```
工具
写作：使用
```
Write
```
工具生成JSONL、笔记、报告文件

6-Phase Workflow

6阶段工作流

Phase 1: Frontier

第1阶段：前沿调研

Search the latest conference proceedings and preprints to understand current trends.

Write

phase1_frontier/paper_finder_config.yaml

targeting latest 1-2 years

Run paper_finder scrape
WebSearch for latest accepted paper lists
Identify trending directions, key breakthroughs → Output:
```
phase1_frontier/frontier.md
```
,
```
phase1_frontier/search_results/
```

搜索最新的会议论文集和预印本，了解当前趋势。

编写

phase1_frontier/paper_finder_config.yaml

，目标范围为最近1-2年

运行 paper_finder scrape
网页搜索最新的录用论文列表

识别趋势方向、关键突破 → 输出：

phase1_frontier/frontier.md

phase1_frontier/search_results/

Phase 2: Survey

第2阶段：全面调研

Build a comprehensive landscape with broader time range. Target 35-80 papers after filtering.

Write
```
phase2_survey/paper_finder_config.yaml
```
covering 2023-2025
Run paper_finder + Semantic Scholar + arXiv

Merge all results:

python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py merge

Filter to 35-80 most relevant:

python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py filter --min-score 0.80 --max-papers 70

Cluster by theme, write survey notes → Output:

phase2_survey/survey.md

phase2_survey/search_results/

paper_db.jsonl

覆盖更广时间范围，构建全面的领域全景。筛选后目标收录35-80篇论文。

编写
```
phase2_survey/paper_finder_config.yaml
```
，覆盖2023-2025年
运行 paper_finder + Semantic Scholar + arXiv 搜索

合并所有结果：

python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py merge

筛选出35-80篇最相关的论文：

python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py filter --min-score 0.80 --max-papers 70

按主题聚类，撰写调研笔记 → 输出：

phase2_survey/survey.md

phase2_survey/search_results/

paper_db.jsonl

Phase 3: Deep Dive ⚠️ DO NOT SKIP

第3阶段：深度研读 ⚠️ 请勿跳过

This phase is MANDATORY. You must actually READ 8-15 full papers, not just their abstracts.

Select 8-15 papers from paper_db.jsonl with rationale → write
```
phase3_deep_dive/selection.md
```

Download PDFs:

python download_papers.py --jsonl paper_db.jsonl --output-dir phase3_deep_dive/papers/ --sort-by-citations --max-downloads 15

For EACH selected paper, read the full text (PDF via
```
Read
```
or HTML via
```
WebFetch
```
on ar5iv)
Write detailed structured notes per paper (see note-format.md template): problem, contributions, methodology, experiments, limitations, connections
Write ALL notes →
```
phase3_deep_dive/deep_dive.md
```

Phase 3 Gate:

deep_dive.md

must contain detailed notes for ≥8 papers, each with methodology and experiment sections filled in. Abstract-only summaries do NOT count.

→ Output:

phase3_deep_dive/selection.md

phase3_deep_dive/deep_dive.md

phase3_deep_dive/papers/

本阶段为强制要求。 你必须实际阅读8-15篇完整论文，而不仅仅是摘要。

从 paper_db.jsonl 中选择8-15篇论文并说明选择理由 → 写入
```
phase3_deep_dive/selection.md
```

下载PDF：

python download_papers.py --jsonl paper_db.jsonl --output-dir phase3_deep_dive/papers/ --sort-by-citations --max-downloads 15

对每篇选中的论文，阅读全文（通过
```
Read
```
工具读取PDF，或通过
```
WebFetch
```
读取ar5iv的HTML版本）
为每篇论文撰写结构化的详细笔记（参考note-format.md模板）：问题、贡献、方法、实验、局限性、关联关系
汇总所有笔记 → 写入
```
phase3_deep_dive/deep_dive.md
```

第3阶段关口：

deep_dive.md

必须包含≥8篇论文的详细笔记，每篇都要有方法和实验部分的内容，仅摘要总结不满足要求。

→ 输出：

phase3_deep_dive/selection.md

phase3_deep_dive/deep_dive.md

phase3_deep_dive/papers/

Phase 4: Code & Tools ⚠️ DO NOT SKIP

第4阶段：代码与工具 ⚠️ 请勿跳过

This phase is MANDATORY. You must survey the open-source ecosystem.

Extract GitHub URLs from papers read in Phase 3
WebSearch for implementations: "site:github.com {method name}", "site:paperswithcode.com {topic}"
For each repo found: record URL, stars, language, last updated, documentation quality
Search for related benchmarks and datasets
Write →
```
phase4_code/code_repos.md
```
(must contain ≥3 repositories)

Phase 4 Gate:

code_repos.md

must exist and contain at least 3 repositories with metadata.

→ Output:

phase4_code/code_repos.md

本阶段为强制要求。 你必须调研开源生态。

从第3阶段读过的论文中提取GitHub链接
网页搜索相关实现："site:github.com {方法名}"、"site:paperswithcode.com {主题}"
对每个找到的仓库：记录URL、星标数、开发语言、最后更新时间、文档质量
搜索相关的基准测试和数据集
写入 →
```
phase4_code/code_repos.md
```
（必须包含≥3个代码仓库）

第4阶段关口：

code_repos.md

必须存在且包含至少3个带元数据的代码仓库。

→ 输出：

phase4_code/code_repos.md

Phase 5: Synthesis (REQUIRES Phase 3 + 4 complete)

第5阶段：综合分析（需完成第3+4阶段）

Cross-paper analysis. Weight peer-reviewed findings higher. This phase MUST build on the detailed notes from Phase 3 and the code landscape from Phase 4. Taxonomy, comparative tables, gap analysis.

Before starting: Verify

phase3_deep_dive/deep_dive.md

and

phase4_code/code_repos.md

exist. If not, go back and complete those phases first.

→ Output:

phase5_synthesis/synthesis.md

phase5_synthesis/gaps.md

跨论文分析。同行评审的研究结果权重更高。本阶段必须基于第3阶段的详细笔记和第4阶段的代码生态内容，生成分类体系、对比表格、研究空白分析。

开始前验证：确认

phase3_deep_dive/deep_dive.md

和

phase4_code/code_repos.md

存在，否则返回完成对应阶段。

→ 输出：

phase5_synthesis/synthesis.md

phase5_synthesis/gaps.md

Phase 6: Compilation (REQUIRES Phase 1-5 complete)

第6阶段：报告汇编（需完成第1-5阶段）

Assemble final report from ALL prior phase outputs. Mark preprint citations with

(preprint)

suffix.

Before starting: Verify ALL phase outputs exist:

```
phase1_frontier/frontier.md
```
```
phase2_survey/survey.md
```
```
phase3_deep_dive/deep_dive.md
```
```
phase4_code/code_repos.md
```
```
phase5_synthesis/synthesis.md
```
+
```
gaps.md
```

If ANY are missing, go back and complete the missing phase(s) first.

→ Output:

phase6_report/report.md

phase6_report/references.bib

整合所有先前阶段的输出，生成最终报告。预印本引用后缀标注

(preprint)

。

开始前验证：确认所有阶段的输出都存在：

```
phase1_frontier/frontier.md
```
```
phase2_survey/survey.md
```
```
phase3_deep_dive/deep_dive.md
```
```
phase4_code/code_repos.md
```
```
phase5_synthesis/synthesis.md
```
+
```
gaps.md
```

如果有任何文件缺失，返回完成对应的阶段。

→ 输出：

phase6_report/report.md

phase6_report/references.bib

Output Directory

输出目录

output/{topic-slug}/
├── paper_db.jsonl                    # Master database (accumulated)
├── phase1_frontier/
│   ├── paper_finder_config.yaml
│   ├── search_results/
│   └── frontier.md
├── phase2_survey/
│   ├── paper_finder_config.yaml
│   ├── search_results/
│   └── survey.md
├── phase3_deep_dive/
│   ├── papers/
│   ├── selection.md
│   └── deep_dive.md
├── phase4_code/
│   └── code_repos.md
├── phase5_synthesis/
│   ├── synthesis.md
│   └── gaps.md
└── phase6_report/
    ├── report.md
    └── references.bib

output/{topic-slug}/
├── paper_db.jsonl                    # 主数据库（累计更新）
├── phase1_frontier/
│   ├── paper_finder_config.yaml
│   ├── search_results/
│   └── frontier.md
├── phase2_survey/
│   ├── paper_finder_config.yaml
│   ├── search_results/
│   └── survey.md
├── phase3_deep_dive/
│   ├── papers/
│   ├── selection.md
│   └── deep_dive.md
├── phase4_code/
│   └── code_repos.md
├── phase5_synthesis/
│   ├── synthesis.md
│   └── gaps.md
└── phase6_report/
    ├── report.md
    └── references.bib

Key Conventions

核心约定

Paper IDs: Use
```
arxiv_id
```
when available, otherwise Semantic Scholar
```
paperId
```
Citations:
```
[@key]
```
format, key = firstAuthorYearWord (e.g.,
```
[@vaswani2017attention]
```
)
JSONL schema: title, authors, abstract, year, venue, venue_normalized, peer_reviewed, citationCount, paperId, arxiv_id, pdf_url, tags, source
Preprint marking: Always note
```
(preprint)
```
when citing non-peer-reviewed work
Incremental saves: Each phase writes to disk immediately
Paper count: Target 35-80 papers in final paper_db.jsonl (use
```
paper_db.py filter
```
)

论文ID：优先使用
```
arxiv_id
```
，否则使用 Semantic Scholar
```
paperId
```
引用格式：
```
[@key]
```
格式，key = 第一作者年份关键词（例如
```
[@vaswani2017attention]
```
）
JSONL schema：title, authors, abstract, year, venue, venue_normalized, peer_reviewed, citationCount, paperId, arxiv_id, pdf_url, tags, source
预印本标注：引用非同行评审内容时必须标注
```
(preprint)
```
增量保存：每个阶段的内容立即写入磁盘
论文数量：最终 paper_db.jsonl 目标收录35-80篇论文（使用
```
paper_db.py filter
```
调整）

References

参考资料

/Users/lingzhi/.claude/skills/deep-research/references/workflow-phases.md

— Detailed 6-phase methodology

/Users/lingzhi/.claude/skills/deep-research/references/note-format.md

— Note templates, BibTeX format, report structure

/Users/lingzhi/.claude/skills/deep-research/references/api-reference.md

— arXiv, Semantic Scholar, ar5iv API guide

/Users/lingzhi/.claude/skills/deep-research/references/workflow-phases.md

— 6阶段方法论详情

/Users/lingzhi/.claude/skills/deep-research/references/note-format.md

— 笔记模板、BibTeX格式、报告结构

/Users/lingzhi/.claude/skills/deep-research/references/api-reference.md

— arXiv、Semantic Scholar、ar5iv API指南

deep-research

Original

Translation

Deep Research Skill

深度研究技能

Trigger

触发条件

Overview

概述

CRITICAL: Strict Sequential Phase Execution

重要须知：严格按顺序执行阶段

Phase Gate Protocol

阶段关口协议

Why Every Phase Matters

各阶段的必要性说明

Paper Quality Policy

论文质量政策

Source Priority (highest to lowest)

来源优先级（从高到低）

When to Use arXiv Papers

arXiv论文适用场景

Search Tools (by priority)

搜索工具（按优先级排序）

1. paper_finder (primary — conference papers only)

1. paper_finder（主要工具——仅支持会议论文）

2. search_semantic_scholar.py (supplementary — citation data + broader coverage)

2. search_semantic_scholar.py（补充工具——提供引用数据+更广覆盖范围）

3. search_arxiv.py (supplementary — latest preprints)

3. search_arxiv.py（补充工具——最新预印本）

Other Scripts

其他脚本

WebFetch Mode (no Bash)

WebFetch模式（无需Bash）

6-Phase Workflow

6阶段工作流

Phase 1: Frontier

第1阶段：前沿调研

Phase 2: Survey

第2阶段：全面调研

Phase 3: Deep Dive ⚠️ DO NOT SKIP

第3阶段：深度研读 ⚠️ 请勿跳过

Phase 4: Code & Tools ⚠️ DO NOT SKIP

第4阶段：代码与工具 ⚠️ 请勿跳过

Phase 5: Synthesis (REQUIRES Phase 3 + 4 complete)

第5阶段：综合分析（需完成第3+4阶段）

Phase 6: Compilation (REQUIRES Phase 1-5 complete)

第6阶段：报告汇编（需完成第1-5阶段）

Output Directory

输出目录

Key Conventions

核心约定

References

参考资料

Related Skills

相关技能