deep-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Deep Research Skill

深度研究技能

Trigger

触发条件

Activate this skill when the user wants to:
  • "Research a topic", "literature review", "find papers about", "survey papers on"
  • "Deep dive into [topic]", "what's the state of the art in [topic]"
  • Uses
    /research <topic>
    slash command
当用户有以下需求时激活该技能:
  • "研究某个主题"、"文献综述"、"查找关于xxx的论文"、"调研xxx领域的论文"
  • "深入探究[主题]"、"[主题]的当前最优技术是什么"
  • 使用
    /research <topic>
    斜杠命令

Overview

概述

This skill conducts systematic academic literature reviews in 6 phases, producing structured notes, a curated paper database, and a synthesized final report. Output is organized by phase for clarity.
Installation:
~/.claude/skills/deep-research/
— scripts, references, and this skill definition. Output:
.//Users/lingzhi/Code/deep-research-output/{slug}/
relative to the current working directory.
本技能分6个阶段开展系统化的学术文献综述,生成结构化笔记、精选论文数据库以及综合最终报告。为清晰起见,输出按阶段组织。
安装路径
~/.claude/skills/deep-research/
— 存放脚本、参考资料和本技能的定义。 输出路径:相对于当前工作目录的
.//Users/lingzhi/Code/deep-research-output/{slug}/

CRITICAL: Strict Sequential Phase Execution

重要须知:严格按顺序执行阶段

You MUST execute all 6 phases in strict order: 1 → 2 → 3 → 4 → 5 → 6. NEVER skip any phase.
This is the single most important rule of this skill. Violations include:
  • ❌ Jumping from Phase 2 to Phase 5/6 (skipping Deep Dive and Code)
  • ❌ Writing synthesis or report before completing Phase 3 deep reading
  • ❌ Producing a final report based only on abstracts/titles from search results
  • ❌ Combining or merging phases (e.g., doing "Phase 3-5 together")
你必须严格按照 1 → 2 → 3 → 4 → 5 → 6 的顺序执行全部6个阶段,严禁跳过任何阶段。
这是本技能最重要的规则。违规行为包括:
  • ❌ 从第2阶段直接跳到第5/6阶段(跳过深度研读和代码调研阶段)
  • ❌ 在完成第3阶段深度研读之前就撰写综合分析或报告
  • ❌ 仅基于搜索结果的摘要/标题生成最终报告
  • ❌ 合并多个阶段(例如“同时执行第3-5阶段”)

Phase Gate Protocol

阶段关口协议

Before starting Phase N+1, you MUST verify that Phase N's required output files exist on disk. If they don't exist, you have NOT completed that phase.
PhaseGate: Required Output Files
1 → 2
phase1_frontier/frontier.md
exists AND contains ≥10 papers
2 → 3
phase2_survey/survey.md
exists AND
paper_db.jsonl
has 35-80 papers
3 → 4
phase3_deep_dive/selection.md
AND
phase3_deep_dive/deep_dive.md
exist AND deep_dive.md contains detailed notes for ≥8 papers
4 → 5
phase4_code/code_repos.md
exists AND contains ≥3 repositories
5 → 6
phase5_synthesis/synthesis.md
AND
phase5_synthesis/gaps.md
exist
After completing each phase, print a phase completion checkpoint:
✅ Phase N complete. Output: [list files written]. Proceeding to Phase N+1.
在开始第N+1阶段前,你必须验证第N阶段的要求输出文件已存在于磁盘中。如果文件不存在,说明你尚未完成该阶段。
阶段关口:要求输出文件
1 → 2
phase1_frontier/frontier.md
存在且包含≥10篇论文
2 → 3
phase2_survey/survey.md
存在且
paper_db.jsonl
包含35-80篇论文
3 → 4
phase3_deep_dive/selection.md
phase3_deep_dive/deep_dive.md
存在且 deep_dive.md 包含≥8篇论文的详细笔记
4 → 5
phase4_code/code_repos.md
存在且包含≥3个代码仓库
5 → 6
phase5_synthesis/synthesis.md
phase5_synthesis/gaps.md
存在
每个阶段完成后,打印阶段完成检查点:
✅ Phase N complete. Output: [list files written]. Proceeding to Phase N+1.

Why Every Phase Matters

各阶段的必要性说明

  • Phase 3 (Deep Dive) is where you actually READ papers — without it, your synthesis is superficial and based only on abstracts
  • Phase 4 (Code & Tools) grounds the research in practical implementations — without it, you miss the open-source ecosystem
  • Phase 5 (Synthesis) requires deep knowledge from Phase 3 — you cannot synthesize papers you haven't read
  • Phase 6 (Report) assembles content from ALL prior phases — it should cite specific findings from Phase 3 notes
  • 第3阶段(深度研读):你会真正阅读论文全文——没有这一步,你的综合分析会非常浅显,仅基于摘要内容
  • 第4阶段(代码与工具):将研究落地到实际实现——没有这一步,你会遗漏开源生态相关内容
  • 第5阶段(综合分析):需要第3阶段积累的深度知识——你无法对未读过的论文做综合分析
  • 第6阶段(报告):整合所有先前阶段的内容——应当引用第3阶段笔记中的具体发现

Paper Quality Policy

论文质量政策

Peer-reviewed conference papers take priority over arXiv preprints. Many arXiv papers have not undergone peer review and may contain unverified claims.
同行评审的会议论文优先级高于arXiv预印本。 很多arXiv论文未经过同行评审,可能包含未经验证的结论。

Source Priority (highest to lowest)

来源优先级(从高到低)

  1. Top AI conferences: NeurIPS, ICLR, ICML, ACL, EMNLP, NAACL, AAAI, IJCAI, CVPR, KDD, CoRL
  2. Peer-reviewed journals: JMLR, TACL, Nature, Science, etc.
  3. Workshop papers: NeurIPS/ICML workshops (lower bar but still reviewed)
  4. arXiv preprints with high citations: Likely high-quality but unverified
  5. Recent arXiv preprints: Use cautiously, note "preprint" status explicitly
  1. 顶级AI会议:NeurIPS, ICLR, ICML, ACL, EMNLP, NAACL, AAAI, IJCAI, CVPR, KDD, CoRL
  2. 同行评审期刊:JMLR, TACL, Nature, Science 等
  3. 研讨会论文:NeurIPS/ICML 研讨会(录用门槛更低但仍经过评审)
  4. 高引用量arXiv预印本:质量可能较高但未经验证
  5. 近期arXiv预印本:谨慎使用,明确标注“预印本”状态

When to Use arXiv Papers

arXiv论文适用场景

  • As supplementary evidence alongside peer-reviewed work
  • For very recent results (< 3 months old) not yet at conferences
  • When a peer-reviewed version doesn't exist yet — note
    (preprint)
    in citations
  • For survey/review papers (these are useful even without peer review)
  • 作为同行评审成果的补充证据
  • 针对发布时间极短(<3个月)、尚未在会议发表的研究结果
  • 当暂时没有同行评审版本时——在引用中注明
    (preprint)
  • 用于调研/综述类论文(即使没有同行评审也有参考价值)

Search Tools (by priority)

搜索工具(按优先级排序)

1. paper_finder (primary — conference papers only)

1. paper_finder(主要工具——仅支持会议论文)

Location:
/Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py
Searches ai-paper-finder.info (HuggingFace Space) for published conference papers. Supports filtering by conference + year. Outputs JSONL with BibTeX.
bash
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode scrape --config <config.yaml>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode download --jsonl <results.jsonl>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --list-venues
Config example:
yaml
searches:
  - query: "long horizon reasoning agent"
    num_results: 100
    venues:
      neurips: [2024, 2025]
      iclr: [2024, 2025, 2026]
      icml: [2024, 2025]
output:
  root: /Users/lingzhi/Code/deep-research-output/{slug}/phase1_frontier/search_results
  overwrite: true
路径
/Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py
在 ai-paper-finder.info(HuggingFace Space)中搜索已发表的会议论文,支持按会议+年份筛选,输出带BibTeX的JSONL格式内容。
bash
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode scrape --config <config.yaml>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode download --jsonl <results.jsonl>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --list-venues
配置示例:
yaml
searches:
  - query: "long horizon reasoning agent"
    num_results: 100
    venues:
      neurips: [2024, 2025]
      iclr: [2024, 2025, 2026]
      icml: [2024, 2025]
output:
  root: /Users/lingzhi/Code/deep-research-output/{slug}/phase1_frontier/search_results
  overwrite: true

2. search_semantic_scholar.py (supplementary — citation data + broader coverage)

2. search_semantic_scholar.py(补充工具——提供引用数据+更广覆盖范围)

Location:
/Users/lingzhi/.claude/skills/deep-research/scripts/search_semantic_scholar.py
Supports
--peer-reviewed-only
and
--top-conferences
filters. API key:
/Users/lingzhi/Code/keys.md
(field
S2_API_Key
)
路径
/Users/lingzhi/.claude/skills/deep-research/scripts/search_semantic_scholar.py
支持
--peer-reviewed-only
--top-conferences
筛选参数。API密钥:
/Users/lingzhi/Code/keys.md
(字段
S2_API_Key

3. search_arxiv.py (supplementary — latest preprints)

3. search_arxiv.py(补充工具——最新预印本)

Location:
/Users/lingzhi/.claude/skills/deep-research/scripts/search_arxiv.py
For searching recent papers not yet published at conferences. Mark citations with
(preprint)
.
路径
/Users/lingzhi/.claude/skills/deep-research/scripts/search_arxiv.py
用于搜索还未在会议上发表的最新论文,引用时标注
(preprint)

Other Scripts

其他脚本

ScriptLocationKey Flags
download_papers.py
~/.claude/skills/deep-research/scripts/
--jsonl
,
--output-dir
,
--max-downloads
,
--sort-by-citations
extract_pdf.py
~/.claude/skills/deep-research/scripts/
--pdf
,
--pdf-dir
,
--output-dir
,
--sections-only
paper_db.py
~/.claude/skills/deep-research/scripts/
subcommands:
merge
,
search
,
filter
,
tag
,
stats
,
add
,
export
bibtex_manager.py
~/.claude/skills/deep-research/scripts/
--jsonl
,
--output
,
--keys-only
compile_report.py
~/.claude/skills/deep-research/scripts/
--topic-dir
脚本路径核心参数
download_papers.py
~/.claude/skills/deep-research/scripts/
--jsonl
,
--output-dir
,
--max-downloads
,
--sort-by-citations
extract_pdf.py
~/.claude/skills/deep-research/scripts/
--pdf
,
--pdf-dir
,
--output-dir
,
--sections-only
paper_db.py
~/.claude/skills/deep-research/scripts/
子命令:
merge
,
search
,
filter
,
tag
,
stats
,
add
,
export
bibtex_manager.py
~/.claude/skills/deep-research/scripts/
--jsonl
,
--output
,
--keys-only
compile_report.py
~/.claude/skills/deep-research/scripts/
--topic-dir

WebFetch Mode (no Bash)

WebFetch模式(无需Bash)

  1. Paper discovery:
    WebSearch
    +
    WebFetch
    to query Semantic Scholar/arXiv APIs
  2. Paper reading:
    WebFetch
    on ar5iv HTML or
    Read
    tool on downloaded PDFs
  3. Writing:
    Write
    tool for JSONL, notes, report files
  1. 论文发现:使用
    WebSearch
    +
    WebFetch
    查询 Semantic Scholar/arXiv API
  2. 论文阅读:对ar5iv的HTML页面使用
    WebFetch
    ,或对下载的PDF使用
    Read
    工具
  3. 写作:使用
    Write
    工具生成JSONL、笔记、报告文件

6-Phase Workflow

6阶段工作流

Phase 1: Frontier

第1阶段:前沿调研

Search the latest conference proceedings and preprints to understand current trends.
  1. Write
    phase1_frontier/paper_finder_config.yaml
    targeting latest 1-2 years
  2. Run paper_finder scrape
  3. WebSearch for latest accepted paper lists
  4. Identify trending directions, key breakthroughs → Output:
    phase1_frontier/frontier.md
    ,
    phase1_frontier/search_results/
搜索最新的会议论文集和预印本,了解当前趋势。
  1. 编写
    phase1_frontier/paper_finder_config.yaml
    ,目标范围为最近1-2年
  2. 运行 paper_finder scrape
  3. 网页搜索最新的录用论文列表
  4. 识别趋势方向、关键突破 → 输出:
    phase1_frontier/frontier.md
    ,
    phase1_frontier/search_results/

Phase 2: Survey

第2阶段:全面调研

Build a comprehensive landscape with broader time range. Target 35-80 papers after filtering.
  1. Write
    phase2_survey/paper_finder_config.yaml
    covering 2023-2025
  2. Run paper_finder + Semantic Scholar + arXiv
  3. Merge all results:
    python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py merge
  4. Filter to 35-80 most relevant:
    python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py filter --min-score 0.80 --max-papers 70
  5. Cluster by theme, write survey notes → Output:
    phase2_survey/survey.md
    ,
    phase2_survey/search_results/
    ,
    paper_db.jsonl
覆盖更广时间范围,构建全面的领域全景。筛选后目标收录35-80篇论文
  1. 编写
    phase2_survey/paper_finder_config.yaml
    ,覆盖2023-2025年
  2. 运行 paper_finder + Semantic Scholar + arXiv 搜索
  3. 合并所有结果:
    python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py merge
  4. 筛选出35-80篇最相关的论文:
    python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py filter --min-score 0.80 --max-papers 70
  5. 按主题聚类,撰写调研笔记 → 输出:
    phase2_survey/survey.md
    ,
    phase2_survey/search_results/
    ,
    paper_db.jsonl

Phase 3: Deep Dive ⚠️ DO NOT SKIP

第3阶段:深度研读 ⚠️ 请勿跳过

This phase is MANDATORY. You must actually READ 8-15 full papers, not just their abstracts.
  1. Select 8-15 papers from paper_db.jsonl with rationale → write
    phase3_deep_dive/selection.md
  2. Download PDFs:
    python download_papers.py --jsonl paper_db.jsonl --output-dir phase3_deep_dive/papers/ --sort-by-citations --max-downloads 15
  3. For EACH selected paper, read the full text (PDF via
    Read
    or HTML via
    WebFetch
    on ar5iv)
  4. Write detailed structured notes per paper (see note-format.md template): problem, contributions, methodology, experiments, limitations, connections
  5. Write ALL notes →
    phase3_deep_dive/deep_dive.md
Phase 3 Gate:
deep_dive.md
must contain detailed notes for ≥8 papers, each with methodology and experiment sections filled in. Abstract-only summaries do NOT count.
→ Output:
phase3_deep_dive/selection.md
,
phase3_deep_dive/deep_dive.md
,
phase3_deep_dive/papers/
本阶段为强制要求。 你必须实际阅读8-15篇完整论文,而不仅仅是摘要。
  1. 从 paper_db.jsonl 中选择8-15篇论文并说明选择理由 → 写入
    phase3_deep_dive/selection.md
  2. 下载PDF:
    python download_papers.py --jsonl paper_db.jsonl --output-dir phase3_deep_dive/papers/ --sort-by-citations --max-downloads 15
  3. 对每篇选中的论文,阅读全文(通过
    Read
    工具读取PDF,或通过
    WebFetch
    读取ar5iv的HTML版本)
  4. 为每篇论文撰写结构化的详细笔记(参考note-format.md模板):问题、贡献、方法、实验、局限性、关联关系
  5. 汇总所有笔记 → 写入
    phase3_deep_dive/deep_dive.md
第3阶段关口
deep_dive.md
必须包含≥8篇论文的详细笔记,每篇都要有方法和实验部分的内容,仅摘要总结不满足要求。
→ 输出:
phase3_deep_dive/selection.md
,
phase3_deep_dive/deep_dive.md
,
phase3_deep_dive/papers/

Phase 4: Code & Tools ⚠️ DO NOT SKIP

第4阶段:代码与工具 ⚠️ 请勿跳过

This phase is MANDATORY. You must survey the open-source ecosystem.
  1. Extract GitHub URLs from papers read in Phase 3
  2. WebSearch for implementations: "site:github.com {method name}", "site:paperswithcode.com {topic}"
  3. For each repo found: record URL, stars, language, last updated, documentation quality
  4. Search for related benchmarks and datasets
  5. Write →
    phase4_code/code_repos.md
    (must contain ≥3 repositories)
Phase 4 Gate:
code_repos.md
must exist and contain at least 3 repositories with metadata.
→ Output:
phase4_code/code_repos.md
本阶段为强制要求。 你必须调研开源生态。
  1. 从第3阶段读过的论文中提取GitHub链接
  2. 网页搜索相关实现:"site:github.com {方法名}"、"site:paperswithcode.com {主题}"
  3. 对每个找到的仓库:记录URL、星标数、开发语言、最后更新时间、文档质量
  4. 搜索相关的基准测试和数据集
  5. 写入 →
    phase4_code/code_repos.md
    (必须包含≥3个代码仓库)
第4阶段关口
code_repos.md
必须存在且包含至少3个带元数据的代码仓库。
→ 输出:
phase4_code/code_repos.md

Phase 5: Synthesis (REQUIRES Phase 3 + 4 complete)

第5阶段:综合分析(需完成第3+4阶段)

Cross-paper analysis. Weight peer-reviewed findings higher. This phase MUST build on the detailed notes from Phase 3 and the code landscape from Phase 4. Taxonomy, comparative tables, gap analysis.
Before starting: Verify
phase3_deep_dive/deep_dive.md
and
phase4_code/code_repos.md
exist. If not, go back and complete those phases first.
→ Output:
phase5_synthesis/synthesis.md
,
phase5_synthesis/gaps.md
跨论文分析。同行评审的研究结果权重更高。 本阶段必须基于第3阶段的详细笔记和第4阶段的代码生态内容,生成分类体系、对比表格、研究空白分析。
开始前验证:确认
phase3_deep_dive/deep_dive.md
phase4_code/code_repos.md
存在,否则返回完成对应阶段。
→ 输出:
phase5_synthesis/synthesis.md
,
phase5_synthesis/gaps.md

Phase 6: Compilation (REQUIRES Phase 1-5 complete)

第6阶段:报告汇编(需完成第1-5阶段)

Assemble final report from ALL prior phase outputs. Mark preprint citations with
(preprint)
suffix.
Before starting: Verify ALL phase outputs exist:
  • phase1_frontier/frontier.md
  • phase2_survey/survey.md
  • phase3_deep_dive/deep_dive.md
  • phase4_code/code_repos.md
  • phase5_synthesis/synthesis.md
    +
    gaps.md
If ANY are missing, go back and complete the missing phase(s) first.
→ Output:
phase6_report/report.md
,
phase6_report/references.bib
整合所有先前阶段的输出,生成最终报告。预印本引用后缀标注
(preprint)
开始前验证:确认所有阶段的输出都存在:
  • phase1_frontier/frontier.md
  • phase2_survey/survey.md
  • phase3_deep_dive/deep_dive.md
  • phase4_code/code_repos.md
  • phase5_synthesis/synthesis.md
    +
    gaps.md
如果有任何文件缺失,返回完成对应的阶段。
→ 输出:
phase6_report/report.md
,
phase6_report/references.bib

Output Directory

输出目录

output/{topic-slug}/
├── paper_db.jsonl                    # Master database (accumulated)
├── phase1_frontier/
│   ├── paper_finder_config.yaml
│   ├── search_results/
│   └── frontier.md
├── phase2_survey/
│   ├── paper_finder_config.yaml
│   ├── search_results/
│   └── survey.md
├── phase3_deep_dive/
│   ├── papers/
│   ├── selection.md
│   └── deep_dive.md
├── phase4_code/
│   └── code_repos.md
├── phase5_synthesis/
│   ├── synthesis.md
│   └── gaps.md
└── phase6_report/
    ├── report.md
    └── references.bib
output/{topic-slug}/
├── paper_db.jsonl                    # 主数据库(累计更新)
├── phase1_frontier/
│   ├── paper_finder_config.yaml
│   ├── search_results/
│   └── frontier.md
├── phase2_survey/
│   ├── paper_finder_config.yaml
│   ├── search_results/
│   └── survey.md
├── phase3_deep_dive/
│   ├── papers/
│   ├── selection.md
│   └── deep_dive.md
├── phase4_code/
│   └── code_repos.md
├── phase5_synthesis/
│   ├── synthesis.md
│   └── gaps.md
└── phase6_report/
    ├── report.md
    └── references.bib

Key Conventions

核心约定

  • Paper IDs: Use
    arxiv_id
    when available, otherwise Semantic Scholar
    paperId
  • Citations:
    [@key]
    format, key = firstAuthorYearWord (e.g.,
    [@vaswani2017attention]
    )
  • JSONL schema: title, authors, abstract, year, venue, venue_normalized, peer_reviewed, citationCount, paperId, arxiv_id, pdf_url, tags, source
  • Preprint marking: Always note
    (preprint)
    when citing non-peer-reviewed work
  • Incremental saves: Each phase writes to disk immediately
  • Paper count: Target 35-80 papers in final paper_db.jsonl (use
    paper_db.py filter
    )
  • 论文ID:优先使用
    arxiv_id
    ,否则使用 Semantic Scholar
    paperId
  • 引用格式
    [@key]
    格式,key = 第一作者年份关键词(例如
    [@vaswani2017attention]
  • JSONL schema:title, authors, abstract, year, venue, venue_normalized, peer_reviewed, citationCount, paperId, arxiv_id, pdf_url, tags, source
  • 预印本标注:引用非同行评审内容时必须标注
    (preprint)
  • 增量保存:每个阶段的内容立即写入磁盘
  • 论文数量:最终 paper_db.jsonl 目标收录35-80篇论文(使用
    paper_db.py filter
    调整)

References

参考资料

  • /Users/lingzhi/.claude/skills/deep-research/references/workflow-phases.md
    — Detailed 6-phase methodology
  • /Users/lingzhi/.claude/skills/deep-research/references/note-format.md
    — Note templates, BibTeX format, report structure
  • /Users/lingzhi/.claude/skills/deep-research/references/api-reference.md
    — arXiv, Semantic Scholar, ar5iv API guide
  • /Users/lingzhi/.claude/skills/deep-research/references/workflow-phases.md
    — 6阶段方法论详情
  • /Users/lingzhi/.claude/skills/deep-research/references/note-format.md
    — 笔记模板、BibTeX格式、报告结构
  • /Users/lingzhi/.claude/skills/deep-research/references/api-reference.md
    — arXiv、Semantic Scholar、ar5iv API指南

Related Skills

相关技能

  • Downstream: literature-search, literature-review, citation-management
  • See also: novelty-assessment, survey-generation
  • 下游技能:literature-searchliterature-reviewcitation-management
  • 另见:novelty-assessmentsurvey-generation