literature-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Literature Review

文献综述

Systematic discovery, extraction, and synthesis of academic research on a defined topic.
针对特定主题的学术研究进行系统性发现、提取与综合分析。

When to use this skill vs. others

本技能与其他技能的适用场景对比

NeedSkill
Survey a research area, synthesize multiple papersliterature-review (this skill)
Critique a single paper's methodology and claimsresearch-critique
Audit a manuscript's formatting, structure, citationsmanuscript-review
Verify a manuscript's numbers trace to codemanuscript-provenance
Search arXiv for papers matching a queryarxiv-search (utility)
需求技能
调研某研究领域,综合多篇论文内容literature-review(本技能)
批判单篇论文的方法论与核心观点research-critique
审核手稿的格式、结构与引用规范manuscript-review
验证手稿中的数据是否可追溯至对应代码manuscript-provenance
在arXiv中检索匹配特定查询的论文arxiv-search(工具类技能)

Workflow

工作流程

Phase 1: Scope Definition

第一阶段:范围界定

Before searching, establish the review boundaries:
  1. Research question — What specific question does the review answer? Vague topics produce vague reviews. "What techniques exist for X" is weaker than "How do methods for X compare on metric Y across domains Z?"
  2. Inclusion criteria — Define what counts:
    • Date range (e.g., 2020–present)
    • Publication type (peer-reviewed, preprints, both)
    • Domains/categories (e.g., cs.CL, cs.AI)
    • Minimum relevance threshold
  3. Exclusion criteria — Define what does not count:
    • Tangentially related work
    • Non-primary sources (blog posts, tutorials) unless explicitly included
    • Duplicate or superseded versions
  4. Expected output — What form should the review take? Narrative synthesis, tabular comparison, gap analysis, annotated bibliography, or related-work section?
Present the scope to the user for confirmation before proceeding.
检索前,需明确综述的边界:
  1. 研究问题 — 本综述要解答什么具体问题?模糊的主题会导致模糊的综述。“X有哪些技术方法”的表述不如“在Z领域中,X相关方法在Y指标上的表现对比如何?”精准。
  2. 纳入标准 — 定义符合要求的文献:
    • 时间范围(如2020年至今)
    • 出版物类型(同行评议论文、预印本或两者皆可)
    • 领域/分类(如cs.CL、cs.AI)
    • 最低相关性阈值
  3. 排除标准 — 定义不符合要求的文献:
    • 关联性较弱的研究
    • 非一手资料(博客、教程,除非明确要求纳入)
    • 重复或已被替代的版本
  4. 预期输出 — 综述应采用何种形式?叙事性综合分析、表格对比、缺口分析、带注释的参考文献列表,还是相关工作章节?
在进入下一阶段前,需将界定的范围提交用户确认。

Phase 2: Search & Discovery

第二阶段:检索与发现

Execute searches across available sources. Use multiple queries with varying specificity to avoid single-query blind spots.
Primary source: arXiv (via arxiv-search utility)
bash
uv run --with arxiv python scripts/arxiv_search.py "QUERY" --max-results 30 --sort-by relevance
Vary queries systematically:
  • Broad topic query:
    "retrieval augmented generation"
  • Field-scoped query:
    ti:retrieval AND abs:generation AND cat:cs.CL
  • Author-anchored query:
    au:lewis AND abs:retrieval
    (when key authors are known)
  • Recency query: same terms with
    --sort-by submitted
Secondary sources (via web search/fetch):
  • Semantic Scholar API:
    https://api.semanticscholar.org/graph/v1/paper/search?query=QUERY&limit=20&fields=title,authors,abstract,year,citationCount,externalIds
  • Google Scholar (via web search):
    site:scholar.google.com QUERY
  • Connected Papers (for citation graph exploration):
    https://www.connectedpapers.com/search?q=QUERY
Snowball strategy:
  • Forward snowball: find papers that cite a key paper (Semantic Scholar citations endpoint)
  • Backward snowball: follow the references of key papers
  • Use citation count as a signal for influence, not quality
在多个数据源中执行检索。使用不同特异性的查询语句,避免单一查询的盲区。
主要数据源:arXiv(通过arxiv-search工具)
bash
uv run --with arxiv python scripts/arxiv_search.py "QUERY" --max-results 30 --sort-by relevance
系统性地调整查询语句:
  • 宽泛主题查询:
    "retrieval augmented generation"
  • 领域限定查询:
    ti:retrieval AND abs:generation AND cat:cs.CL
  • 作者锚定查询:
    au:lewis AND abs:retrieval
    (已知核心作者时使用)
  • 时效性查询:使用相同关键词,搭配
    --sort-by submitted
    参数
次要数据源(通过网页检索/获取):
  • Semantic Scholar API:
    https://api.semanticscholar.org/graph/v1/paper/search?query=QUERY&limit=20&fields=title,authors,abstract,year,citationCount,externalIds
  • Google Scholar(通过网页检索):
    site:scholar.google.com QUERY
  • Connected Papers(用于探索引用图谱):
    https://www.connectedpapers.com/search?q=QUERY
滚雪球策略:
  • 正向滚雪球:查找引用核心论文的文献(通过Semantic Scholar引用端点)
  • 反向滚雪球:追踪核心论文的参考文献
  • 将引用量作为影响力信号,而非质量判断标准

Phase 3: Screening & Filtering

第三阶段:筛选与过滤

For each discovered paper, apply the inclusion/exclusion criteria from Phase 1.
Produce a screening table:
#IDTitleAuthorsYearRelevant?Reason
12301.07041Paper TitleAuthor et al.2023YesDirectly addresses RQ
22302.12345Other PaperAuthor B2023NoTangential — focuses on X not Y
Rules:
  • Screen on title + abstract first. Read full paper only for borderline cases.
  • When uncertain, include. It is cheaper to drop a paper later than to miss it.
  • Track exclusion reasons — they inform the review's limitations section.
  • Flag papers that appear in multiple search queries as likely high-relevance.
对每篇检索到的论文,应用第一阶段确定的纳入/排除标准。
生成筛选表格
序号ID标题作者年份是否相关原因
12301.07041论文标题Author et al.2023直接回应研究问题
22302.12345其他论文Author B2023关联性较弱——聚焦X而非Y
规则:
  • 先通过标题+摘要进行筛选。仅对边缘案例阅读全文。
  • 不确定时优先纳入。后期剔除论文比遗漏论文成本更低。
  • 记录排除原因——这些内容将用于撰写综述的局限性章节。
  • 标记在多个查询中出现的论文,这类论文通常相关性较高。

Phase 4: Data Extraction

第四阶段:数据提取

For each included paper, extract a structured record:
yaml
- id: "2301.07041"
  title: "Paper Title"
  authors: ["Author One", "Author Two"]
  year: 2023
  venue: "NeurIPS 2023"
  research_question: "How does X affect Y?"
  methodology: "Controlled experiment with N=1000"
  key_findings:
    - "Finding 1 with quantitative result"
    - "Finding 2 with effect size"
  limitations: "Single-domain evaluation"
  relevance_to_rq: "Directly compares methods A and B on metric Y"
  citation_count: 142
Extraction discipline:
  • Record what the paper demonstrates, not what it claims to demonstrate.
  • Distinguish empirical findings (data-backed) from interpretive claims (author's framing).
  • Note methodology details that enable cross-paper comparison (datasets, metrics, baselines).
  • If the paper is available via
    pdf_url
    , read it for extraction. Do not extract from abstracts alone for included papers.
对每篇纳入的论文,提取结构化记录:
yaml
- id: "2301.07041"
  title: "Paper Title"
  authors: ["Author One", "Author Two"]
  year: 2023
  venue: "NeurIPS 2023"
  research_question: "How does X affect Y?"
  methodology: "Controlled experiment with N=1000"
  key_findings:
    - "Finding 1 with quantitative result"
    - "Finding 2 with effect size"
  limitations: "Single-domain evaluation"
  relevance_to_rq: "Directly compares methods A and B on metric Y"
  citation_count: 142
提取原则:
  • 记录论文实际证明的内容,而非声称要证明的内容。
  • 区分实证发现(有数据支撑)与解释性观点(作者的框架性表述)。
  • 记录便于跨论文对比的方法论细节(数据集、指标、基线)。
  • 若论文可通过
    pdf_url
    获取,需阅读全文进行提取。纳入论文的提取不能仅依赖摘要。

Phase 5: Synthesis

第五阶段:综合分析

Transform extracted records into structured analysis. The synthesis method depends on the output format requested in Phase 1.
Thematic synthesis — Group papers by theme, approach, or finding:
  • Identify recurring themes across papers
  • Note where papers agree, disagree, or address different aspects
  • Highlight methodological trends (what approaches are gaining/losing traction)
Comparative synthesis — Build comparison tables:
MethodPaper(s)DatasetMetricResultLimitations
Method A[1], [3]D1F10.85Domain-specific
Method B[2], [4]D1, D2F10.82Requires X
Chronological synthesis — Map the evolution of the field:
  • What was the state of knowledge at time T?
  • What shifted and why?
  • Where is the field heading?
Gap analysis — Identify what is missing:
  • Questions raised but not answered by existing work
  • Methodological gaps (no one has tried approach X on problem Y)
  • Domain gaps (studied in domain A but not B)
  • Contradictions between studies that remain unresolved
将提取的记录转化为结构化分析。综合分析方法取决于第一阶段要求的输出格式。
主题式综合分析 — 按主题、方法或发现对论文分组:
  • 识别论文间的重复主题
  • 记录论文间的共识、分歧或研究侧重点的差异
  • 突出方法论趋势(哪些方法正在兴起/衰落)
对比式综合分析 — 构建对比表格:
方法论文来源数据集指标结果局限性
Method A[1], [3]D1F10.85领域特异性
Method B[2], [4]D1, D2F10.82依赖X
时间线式综合分析 — 梳理领域发展脉络:
  • T时刻的知识状态如何?
  • 发生了哪些转变,原因是什么?
  • 领域未来的发展方向是什么?
缺口分析 — 识别研究空白:
  • 现有研究提出但未解答的问题
  • 方法论空白(无人尝试用方法X解决问题Y)
  • 领域空白(在A领域已研究,但未在B领域开展)
  • 研究间未解决的矛盾

Phase 6: Output

第六阶段:输出

Produce the review document in the format specified in Phase 1.
Standard structure for a narrative review:
  1. Introduction — Research question, scope, and motivation for the review
  2. Search methodology — Databases searched, queries used, inclusion/exclusion criteria, screening results (N found → N screened → N included)
  3. Findings — Thematic or chronological synthesis of included papers
  4. Discussion — Cross-cutting analysis, trends, contradictions, gaps
  5. Limitations of this review — Search scope restrictions, potential biases, papers not accessible
  6. References — Full citation list for all included papers
Standard structure for a tabular review:
  1. Summary table (all included papers with key metadata)
  2. Comparison matrix (methods × metrics × results)
  3. Gap analysis table (questions × coverage)
  4. Reference list
Citation format: Default to
Author et al. (Year)
in-text with full references at the end. Adapt to the user's specified format (APA, Chicago, IEEE) if requested.
按照第一阶段指定的格式生成综述文档。
叙事性综述的标准结构:
  1. 引言 — 研究问题、范围以及综述的动机
  2. 检索方法论 — 检索的数据库、使用的查询语句、纳入/排除标准、筛选结果(检索到N篇→筛选N篇→纳入N篇)
  3. 研究发现 — 纳入论文的主题式或时间线式综合分析
  4. 讨论 — 跨维度分析、趋势、矛盾、研究缺口
  5. 本综述的局限性 — 检索范围限制、潜在偏差、无法获取的论文
  6. 参考文献 — 所有纳入论文的完整引用列表
表格式综述的标准结构:
  1. 摘要表格(所有纳入论文的核心元数据)
  2. 对比矩阵(方法×指标×结果)
  3. 缺口分析表格(问题×覆盖情况)
  4. 参考文献列表
引用格式: 默认采用文中
作者等(年份)
的格式,文末附完整参考文献。若用户指定格式(APA、芝加哥、IEEE等),则按要求调整。

Quality Checks

质量检查

Before delivering the review, verify:
  • Every included paper has a structured extraction record
  • Every claim in the synthesis is traceable to at least one extracted finding
  • The gap analysis identifies at least one concrete research opportunity
  • The review acknowledges its own limitations (search scope, access, biases)
  • Citation format is consistent throughout
  • No paper is cited that was excluded during screening
  • Contradictions between papers are noted, not silently resolved by picking a side
交付综述前,需验证以下内容:
  • 每篇纳入论文都有结构化提取记录
  • 综合分析中的每个观点都可追溯至至少一条提取的研究发现
  • 缺口分析至少识别出一个具体的研究机会
  • 综述明确承认自身局限性(检索范围、获取权限、偏差)
  • 全文引用格式一致
  • 未引用筛选阶段排除的论文
  • 已记录论文间的矛盾,未偏袒某一方

Edge Cases

边缘场景处理

SituationAdaptation
Very few papers found (<5)The field may be nascent. Note this explicitly. Broaden search terms or check if the topic goes by different terminology. Consider adjacent fields.
Too many papers found (>100)Tighten inclusion criteria. Consider limiting to top venues, recent years, or high-citation papers. Produce a scoping review rather than exhaustive review.
User provides a paper list instead of a topicSkip Phase 2 (Search). Start from Phase 3 (Screening) with the provided list.
User wants a related-work section for their own paperTailor synthesis to position the user's contribution. Organize by approaches the user's work builds on, alternatives it competes with, and gaps it fills.
No full-text access to key papersExtract from abstracts and note the limitation. Do not fabricate methodology details. Flag which papers were abstract-only in the extraction records.
Interdisciplinary topicSearch across multiple category prefixes. Note when different fields use different terminology for the same concept.
User asks for a "quick" literature reviewReduce Phase 2 to a single search query, Phase 3 to title-only screening, Phase 4 to abstract-only extraction. Label the output as a preliminary survey, not a systematic review.
场景应对方案
检索到的论文极少(<5篇)该领域可能处于萌芽阶段。需明确标注这一点。放宽检索关键词,或检查主题是否有其他术语。考虑拓展至相邻领域。
检索到的论文过多(>100篇)收紧纳入标准。考虑限定为顶级会议期刊、近年发表或高引用论文。生成范围界定型综述而非 exhaustive 综述。
用户提供论文列表而非研究主题跳过第二阶段(检索)。从第三阶段(筛选)开始,基于用户提供的列表开展工作。
用户需要为自己的论文撰写相关工作章节调整综合分析内容,突出用户研究的贡献。按用户研究依赖的方法、竞争的替代方案、填补的研究缺口进行组织。
无法获取核心论文的全文从摘要提取信息,并标注局限性。不得编造方法论细节。在提取记录中标记哪些论文仅基于摘要。
跨学科主题检索多个分类前缀。注意不同领域对同一概念可能使用不同术语。
用户要求“快速”文献综述将第二阶段简化为单一查询语句,第三阶段简化为仅标题筛选,第四阶段简化为仅摘要提取。将输出标记为初步调研,而非系统性综述。