scholar-deep-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Scholar Deep Research

学术深度研究工具

End-to-end academic research workflow that turns a question into a cited, structured report. Built for depth: multi-source federation, transparent ranking, citation chasing, and a mandatory self-critique pass before the report ships.
这是一个端到端的学术研究工作流,可将用户的问题转化为带有引用的结构化报告。专为深度研究设计:支持多数据源联合检索、透明排序、引用追踪,且在报告生成前必须经过自我审查环节。

When to use

使用场景

Explicit triggers: "literature review", "research report", "state of the art", "survey the field", "what's known about X", "deep research on Y", "systematic review", "scoping review", "compare papers on Z".
Proactive triggers (use without being asked):
  • User asks a factual question whose honest answer is "it depends on the literature"
  • User frames a research plan and needs the background section
  • User is drafting a paper intro/related-work and hasn't yet scoped prior work
  • User proposes a method and asks whether it's novel
Do not use when: a single known paper answers the question, the user wants a tutorial (not a survey), or they're debugging code.
明确触发场景:用户提及“文献综述”“研究报告”“前沿现状”“领域调研”“关于X目前有哪些研究成果”“针对Y的深度研究”“系统性综述”“范围界定综述”“对比Z相关的论文”等需求。
主动触发场景(无需用户明确请求)
  • 用户提出的事实性问题,如实回答应为“需参考现有文献才能确定”
  • 用户制定研究计划,需要撰写背景部分内容
  • 用户正在起草论文引言/相关工作章节,尚未梳理已有研究范围
  • 用户提出一种方法,询问其创新性
请勿使用场景:单个已知论文即可回答问题、用户需要教程而非调研、用户正在调试代码。

Guiding principles

指导原则

  1. Scripts over vibes. Every search, dedupe, rank, and export step runs through a script in
    scripts/
    . The same input should produce the same output. Do not improvise ranking or counting by eye.
  2. Sources are federated, not singular. OpenAlex is the primary backbone (free, 240M+ works, no key). arXiv (CS/ML/physics preprints), Crossref (DOI metadata), PubMed (biomedical), DBLP (CS conferences/journals), bioRxiv (life-sci preprints via Europe PMC), and Exa (open-web, requires
    EXA_API_KEY
    ) fill gaps. Semantic Scholar is also script-driven —
    build_citation_graph.py --source s2|both
    is the spine path for Phase 4, with better CS / arXiv / cross-disciplinary coverage than OpenAlex; the two graphs disagree more than you'd expect. The asta MCP tools (
    mcp__asta__*
    ) and Brave Search are skin — used opportunistically for relevance ranking or non-academic context, never on the critical path. If MCP times out, research continues.
  3. State is persistent. Everything goes through
    research_state.json
    . Queries ran, papers seen, decisions made, phase progress. Research becomes resumable and auditable.
  4. Citations are anchors, not decorations. Every non-trivial claim in the draft carries
    [^id]
    where
    id
    matches a paper in state. Unanchored claims are treated as hallucinations and fail the gate.
  5. Saturation, not exhaustion, is the stop signal. A phase ends when a new round of search adds <20% novel papers AND no new paper has >100 citations.
  6. Self-critique is a phase, not a checkbox. Phase 6 reads the draft with adversarial intent. Its output goes into the report appendix.
  1. 脚本驱动,而非主观判断。所有检索、去重、排序和导出步骤均通过
    scripts/
    目录下的脚本执行。相同输入应生成相同输出,不得手动调整排序或计数。
  2. 多源联合,而非单一数据源。OpenAlex是核心数据源(免费、包含2.4亿+研究成果、无需密钥)。arXiv(计算机科学/机器学习/物理学预印本)、Crossref(DOI元数据)、PubMed(生物医学领域)、DBLP(计算机科学会议/期刊)、bioRxiv(生命科学预印本,通过Europe PMC获取)和Exa(开放网页,需
    EXA_API_KEY
    )用于填补数据源空白。Semantic Scholar同样由脚本驱动——
    build_citation_graph.py --source s2|both
    是第4阶段的核心路径,其在计算机科学/arXiv/跨学科领域的覆盖范围优于OpenAlex;两者的引用图谱差异远超预期。asta MCP工具(
    mcp__asta__*
    )和Brave Search属于辅助工具——仅用于相关性排序或非学术背景补充,不参与核心流程。若MCP超时,研究将继续进行。
  3. 状态持久化。所有操作均写入
    research_state.json
    ,包括执行的查询、检索到的论文、做出的决策、阶段进度。研究可中断恢复,且具备可审计性。
  4. 引用为核心依据,而非装饰。报告草稿中所有非 trivial 结论均需附带
    [^id]
    标记,其中
    id
    与状态文件中的论文ID匹配。无引用依据的结论将被视为幻觉内容,无法通过审核。
  5. 以饱和状态为终止信号,而非穷尽检索。当新一轮检索新增论文占比<20%,且新增论文中无引用量>100的文献时,当前阶段结束。
  6. 自我审查是独立阶段,而非形式化步骤。第6阶段会以批判性视角审阅草稿,审查结果将纳入报告附录。

The 8-phase workflow (Phase 0..7)

8阶段工作流(第0至第7阶段)

Phase 0: Scope       → decompose question, pick archetype, init state
Phase 1: Discovery   → multi-source search, dedupe
Phase 2: Triage      → rank, select top-N for deep read
Phase 3: Deep read   → extract evidence per paper
Phase 4: Chasing     → citation graph (forward + backward)
Phase 5: Synthesis   → cluster by theme, map tensions
Phase 6: Self-critique → adversarial review, gap finding
Phase 7: Report      → render archetype template, export bibliography
Each phase writes to
research_state.json
before advancing. If the user pauses or a session crashes, the next run reads the state and picks up from the last completed phase.
Phase 0: Scope       → 分解问题,选择报告模板,初始化状态
Phase 1: Discovery   → 多源检索,去重
Phase 2: Triage      → 排序,筛选需深度阅读的Top-N论文
Phase 3: Deep read   → 提取单篇论文的证据
Phase 4: Chasing     → 构建引用图谱(正向+反向)
Phase 5: Synthesis   → 按主题聚类,梳理研究分歧
Phase 6: Self-critique → 批判性审查,发现研究空白
Phase 7: Report      → 渲染报告模板,导出参考文献
每个阶段推进前都会将状态写入
research_state.json
。若用户暂停或会话崩溃,下次运行时将读取状态并从最后完成的阶段继续。

Phase 0 — Scope

阶段0 — 范围界定

Before searching anything, decompose the question.
  1. Restate the question in one sentence. Surface ambiguities.
  2. PICO-style decomposition (or equivalent for non-biomedical fields):
    • Population / Problem — what system, species, setting, or phenomenon?
    • Intervention / Independent var — what method, factor, or manipulation?
    • Comparison — against what baseline or alternative?
    • Outcome — what is being measured or claimed?
  3. Pick an archetype that matches user intent (see
    references/report_templates.md
    ):
    • literature_review
      — what is known about X (default)
    • systematic_review
      — rigorous PRISMA-lite, comparison of many studies on one narrow question
    • scoping_review
      — what has been studied and how (breadth over depth)
    • comparative_analysis
      — X vs Y, head-to-head
    • grant_background
      — narrative background + gap for a proposal
  4. Draft keyword clusters — 3-5 Boolean clusters covering synonyms, acronyms, and variant spellings. Include a "negative" cluster (terms to exclude).
  5. Initialize state:
    bash
    python scripts/research_state.py --state research_state.json init \
      --question "<restated question>" \
      --archetype literature_review
    (
    --state
    is top-level and applies to every subcommand;
    init
    itself takes
    --question
    ,
    --archetype
    , and optional
    --force
    .)
When in doubt about archetype, ask the user. The choice shapes everything downstream.
在开始检索前,先分解用户的问题。
  1. 用一句话重述问题,明确潜在歧义。
  2. PICO式分解(非生物医学领域可采用等效框架):
    • Population / Problem — 研究对象、物种、场景或现象是什么?
    • Intervention / Independent var — 采用的方法、因素或操作是什么?
    • Comparison — 与哪些基线或替代方案对比?
    • Outcome — 测量或验证的指标是什么?
  3. 选择匹配用户需求的报告模板(详见
    references/report_templates.md
    ):
    • literature_review
      — 关于X的已有研究成果(默认模板)
    • systematic_review
      — 严格遵循PRISMA简化标准,针对单一窄问题对比多项研究
    • scoping_review
      — 梳理研究范围与方法(广度优先)
    • comparative_analysis
      — X与Y的直接对比
    • grant_background
      — 基金申请所需的叙述性背景+研究空白
  4. 起草关键词集群 — 3-5组布尔逻辑关键词,涵盖同义词、缩写和变体拼写,同时包含“排除”关键词组。
  5. 初始化状态
    bash
    python scripts/research_state.py --state research_state.json init \
      --question "<重述后的问题>" \
      --archetype literature_review
    (
    --state
    为顶级参数,适用于所有子命令;
    init
    子命令接受
    --question
    --archetype
    和可选的
    --force
    参数。)
若不确定模板类型,可询问用户。模板选择将决定后续所有流程。

Phase 1 — Discovery

阶段1 — 文献发现

Run searches across all available sources, in parallel where the source can take it. OpenAlex is primary; the others fill gaps.
Where parallelism actually pays off. The right place to fan out is Phase 3 (one agent per paper to read PDFs concurrently — see
references/agent_prompts/phase3_deep_read.md
). At Phase 1 the bottleneck is the upstream API, not local compute, and parallel fan-out across the same source mostly buys 429s and sticky cooldowns. The skill's bias should be: parallel between different sources, serial within one source. Concretely:
  • Parallel-friendly: OpenAlex (polite-pool, very tolerant), Crossref (polite-pool), Exa (paid quota), bioRxiv (Europe PMC).
  • Self-serialised (file-locked, automatic): arXiv (≥3s/req), PubMed (≥0.34s/req without
    NCBI_API_KEY
    , ≥0.10s with), DBLP (1s buffer to avoid SSL EOF flakes).
The serialised sources use a per-source file lock under
${SCHOLAR_CACHE_DIR:-.scholar_cache}/rate/<source>.lock
, so even N parallel
search_arxiv.py
invocations from the same agent will queue automatically and sleep the right gap — no agent-side coordination required, but parallel calls don't speed those sources up either, just don't error.
bash
undefined
在所有可用数据源中执行检索,支持不同数据源并行检索。OpenAlex为核心数据源,其他数据源用于补充。
并行检索的适用场景。最适合并行处理的环节是阶段3(为每篇论文分配一个Agent,并发读取PDF——详见
references/agent_prompts/phase3_deep_read.md
)。阶段1的瓶颈是上游API而非本地计算,同一数据源内的并行检索通常会导致429错误和冷却限制。工具的优先级为:不同数据源间并行,同一数据源内串行。具体来说:
  • 支持并行检索:OpenAlex(友好限流,容错性高)、Crossref(友好限流)、Exa(付费配额)、bioRxiv(Europe PMC)。
  • 自动串行处理(文件锁机制,自动执行):arXiv(≥3秒/请求)、PubMed(无
    NCBI_API_KEY
    时≥0.34秒/请求,有密钥时≥0.10秒/请求)、DBLP(添加1秒缓冲避免SSL EOF错误)。
串行数据源会在
${SCHOLAR_CACHE_DIR:-.scholar_cache}/rate/<source>.lock
路径下使用数据源级文件锁,因此即使同一Agent并行调用N次
search_arxiv.py
,也会自动排队并等待合适的间隔——无需Agent侧协调,但并行调用不会提升这些数据源的检索速度,仅避免报错。
bash
undefined

Primary (no API key, always available)

核心数据源(无需API密钥,始终可用)

python scripts/search_openalex.py --query "<cluster 1>" --limit 50 --state research_state.json python scripts/search_openalex.py --query "<cluster 2>" --limit 50 --state research_state.json
python scripts/search_openalex.py --query "<集群1>" --limit 50 --state research_state.json python scripts/search_openalex.py --query "<集群2>" --limit 50 --state research_state.json

Domain-specific (use when relevant)

领域专属数据源(按需使用)

python scripts/search_arxiv.py --query "<cluster>" --limit 50 --state research_state.json # CS/ML/physics preprints python scripts/search_dblp.py --query "<cluster>" --limit 50 --state research_state.json # CS gold-standard bibliography (no abstracts) python scripts/search_pubmed.py --query "<cluster>" --limit 50 --state research_state.json # biomedical (PubMed) python scripts/search_biorxiv.py --query "<cluster>" --limit 50 --state research_state.json # life-sci preprints (bioRxiv + medRxiv via Europe PMC) python scripts/search_crossref.py --query "<cluster>" --limit 50 --state research_state.json # DOI-backed metadata
python scripts/search_arxiv.py --query "<集群>" --limit 50 --state research_state.json # 计算机科学/机器学习/物理学预印本 python scripts/search_dblp.py --query "<集群>" --limit 50 --state research_state.json # 计算机科学权威文献库(无摘要) python scripts/search_pubmed.py --query "<集群>" --limit 50 --state research_state.json # 生物医学领域(PubMed) python scripts/search_biorxiv.py --query "<集群>" --limit 50 --state research_state.json # 生命科学预印本(bioRxiv + medRxiv,通过Europe PMC获取) python scripts/search_crossref.py --query "<集群>" --limit 50 --state research_state.json # DOI关联元数据

Open-web coverage (optional, requires EXA_API_KEY) — finds material the

开放网页覆盖(可选,需EXA_API_KEY)——检索学术API未覆盖的内容:实验室网站、机构PDF、会议镜像、arXiv外的预印本、NGO/政府报告。

scholarly APIs miss: lab sites, institutional PDFs, conference mirrors,

preprints parked outside arXiv, NGO/government reports.

python scripts/search_exa.py --query "<cluster>" --limit 50 --state research_state.json
python scripts/search_exa.py --query "<集群>" --limit 50 --state research_state.json

Dedupe across sources (DOI-first, title-similarity fallback)

跨数据源去重(优先按DOI,标题相似度为备选方案)

python scripts/dedupe_papers.py --state research_state.json

**MCP enrichment (optional, run if available):** call `mcp__asta__search_papers_by_relevance` and `mcp__asta__snippet_search` and feed results via `scripts/research_state.py ingest`. If the MCP call errors or times out, do not retry — move on.

**Iterate.** Read the state file. Are there keyword gaps? Are there authors appearing 3+ times whose other work you haven't pulled? Run another round. Stop when saturation hits — **every source, not just the last one queried:**

```bash
python scripts/research_state.py saturation --state research_state.json
python scripts/dedupe_papers.py --state research_state.json

**MCP增强(可选,若可用则执行)**:调用`mcp__asta__search_papers_by_relevance`和`mcp__asta__snippet_search`,并通过`scripts/research_state.py ingest`导入结果。若MCP调用出错或超时,无需重试——继续后续流程。

**迭代检索**。读取状态文件,检查是否存在关键词空白?是否有出现3次以上的作者,其其他研究成果未被检索?执行新一轮检索。当所有数据源达到饱和状态时停止——**需覆盖所有检索过的数据源,而非仅最后一个**:

```bash
python scripts/research_state.py saturation --state research_state.json

Returns { "per_source": {...}, "overall_saturated": true/false, ... }

返回 { "per_source": {...}, "overall_saturated": true/false, ... }


`overall_saturated` is true only when every queried source has run at least `--min-rounds` (default 2) rounds AND each is individually below the new-paper percentage and new-citation thresholds. A source that has been queried only once cannot be declared saturated, which rules out the failure mode where a single quiet source falsely ends discovery. Use `--source openalex` to check one source in isolation.

**Budget caps and broad-topic escape hatches.** Phase 1 has two hard caps to prevent runaway agents: `SCHOLAR_PHASE1_MAX_ROUNDS` (default 10 rounds per source) and `SCHOLAR_PHASE1_MAX_REQUESTS_PER_SOURCE` (default 20 ingests per source). Hitting either returns `phase1_budget_exhausted` with a `next:` hint. For genuinely broad topics that cross subfields (e.g. CS-ML topics with multiple keyword clusters), the saturation thresholds can also fail to converge under the defaults — relax them with `SCHOLAR_SATURATION_NEW_PCT` (default 20.0), `SCHOLAR_SATURATION_MAX_CITATIONS` (default 100), and `SCHOLAR_SATURATION_NEW_AUTHORS_PCT` / `SCHOLAR_SATURATION_NEW_VENUES_PCT`. These env vars are honored both by `python scripts/research_state.py saturation` *and* by the G2 gate, so raising them lets the gate accept "good enough" coverage on topics where the default is unreachable.

只有当每个检索过的数据源至少执行了`--min-rounds`(默认2轮)检索,且每个数据源均满足新增论文占比和新增引用量阈值时,`overall_saturated`才会为true。仅检索过一次的数据源无法被判定为饱和,以此避免单一数据源检索量不足导致错误终止发现阶段。可使用`--source openalex`单独检查某一数据源的饱和状态。

**预算限制与宽主题逃生舱**。阶段1设置了两个硬限制以避免Agent失控:`SCHOLAR_PHASE1_MAX_ROUNDS`(默认每个数据源最多10轮检索)和`SCHOLAR_PHASE1_MAX_REQUESTS_PER_SOURCE`(默认每个数据源最多20次导入)。触发任一限制将返回`phase1_budget_exhausted`并给出`next:`提示。对于跨子领域的宽主题(例如计算机科学-机器学习领域包含多个关键词集群),默认饱和阈值可能无法收敛——可通过环境变量`SCHOLAR_SATURATION_NEW_PCT`(默认20.0)、`SCHOLAR_SATURATION_MAX_CITATIONS`(默认100)、`SCHOLAR_SATURATION_NEW_AUTHORS_PCT` / `SCHOLAR_SATURATION_NEW_VENUES_PCT`调整阈值。这些环境变量同时被`python scripts/research_state.py saturation`和G2审核门认可,提高阈值可让审核门接受默认阈值无法达到的主题的“足够好”覆盖范围。

Phase 2 — Triage

阶段2 — 文献筛选

Rank the deduplicated corpus and pick the top-N for deep reading.
bash
python scripts/rank_papers.py \
  --state research_state.json \
  --question "<phase 0 question>" \
  --alpha 0.4 --beta 0.3 --gamma 0.2 --delta 0.1 \
  --top 20
The formula is transparent — the script prints it and writes the components to state so the report can cite its own methodology:
score = α·relevance + β·log10(citations+1)/3 + γ·recency_decay(half-life=5yr) + δ·venue_prior
Defaults target a literature review. For a scoping review prefer higher
α
(relevance) and lower
β
(citations). For a systematic review of a narrow question, lower
α
and higher
β
.
Write the top-N selection to state:
bash
python scripts/research_state.py select --state research_state.json --top 20
Triage the selection into deep / skim / defer tiers before advancing. Phase 3 fan-out is the most expensive stage of the workflow; not every selected paper deserves a full agent dispatch:
bash
python scripts/skim_papers.py --state research_state.json \
  --deep-ratio 0.5 --skim-ratio 0.5
Defaults split the top-N evenly: top half →
deep
(agent dispatch in Phase 3), bottom half →
skim
(abstract-derived evidence stub auto-filled,
depth=shallow
). For tighter budgets, use
--deep-ratio 0.3 --skim-ratio 0.5
— the remaining 20% gets
tier=defer
and is removed from
selected_ids
(still queryable as candidates for citation chase).
The script emits
data.deep_tier_preview
listing the deep-tier papers by triage_score. Show this to the user before advancing so they can hand-override before agents fan out (re-run with different ratios, or manually re-rank in state). Triage is required before G3 passes — the gate's
triage_applied
check rejects the advance otherwise.
Optional but recommended — prefetch deep-tier PDFs before agent fan-out:
bash
python scripts/prefetch_pdfs.py --state research_state.json \
  --tier deep --concurrency 4
Fetches every deep-tier paper's PDF into
${SCHOLAR_CACHE_DIR:-.scholar_cache}/pdfs/<id-hash>/
via
paper-fetch
(with Unpaywall fallback), in parallel waves, and writes
pdf_path
/
pdf_status
/
pdf_source
/
pdf_bytes
per paper. Phase 3 agents then read the local file directly instead of each running its own download — Agent context stays focused on reading + reasoning, not on retrying paywalls.
Failures land as
pdf_status='failed'
with a
pdf_failure_code
(
paper_fetch_error
,
no_open_access_pdf
,
pdf_download_failed
, …); papers without a DOI get
pdf_status='no_doi'
. Phase 3 agents check
pdf_path
first and only fall back to
extract_pdf.py --doi
if the prefetched path is missing. Re-running prefetch is cheap: papers with an existing
pdf_path
on disk are skipped (
pdf_status='cached'
).
Human-in-loop for paywalled PDFs. When automatic fetch fails (paywall, OA chain exhausted, no DOI), surface a hand-fetch list to the user via
--emit-manifest
(read-only):
bash
python scripts/prefetch_pdfs.py --state research_state.json --emit-manifest
对去重后的论文库进行排序,筛选出需深度阅读的Top-N论文。
bash
python scripts/rank_papers.py \
  --state research_state.json \
  --question "<阶段0的问题>" \
  --alpha 0.4 --beta 0.3 --gamma 0.2 --delta 0.1 \
  --top 20
排序公式完全透明——脚本会打印公式并将各分量写入状态文件,以便报告可引用自身的方法论:
score = α·相关性 + β·log10(引用量+1)/3 + γ·时效性衰减(半衰期=5年) + δ·期刊优先级
默认参数针对文献综述场景。对于范围界定综述,建议提高
α
(相关性)权重,降低
β
(引用量)权重。对于窄问题的系统性综述,降低
α
权重,提高
β
权重。
将Top-N筛选结果写入状态文件:
bash
python scripts/research_state.py select --state research_state.json --top 20
在推进前将筛选结果分为深度阅读/快速浏览/延后处理三个层级。阶段3的并行Agent调度是工作流中最昂贵的环节;并非所有筛选出的论文都值得分配完整的Agent资源:
bash
python scripts/skim_papers.py --state research_state.json \
  --deep-ratio 0.5 --skim-ratio 0.5
默认将Top-N论文平均拆分:前半部分→
deep
(阶段3分配Agent),后半部分→
skim
(自动填充基于摘要的证据 stub,
depth=shallow
)。若预算紧张,可使用
--deep-ratio 0.3 --skim-ratio 0.5
——剩余20%的论文将被标记为
tier=defer
并从
selected_ids
中移除(仍可作为引用追踪的候选)。
脚本会输出
data.deep_tier_preview
,按筛选分数列出深度阅读层级的论文。在推进前展示给用户,以便用户在Agent调度前手动调整(重新运行并设置不同比例,或在状态文件中手动重新排序)。筛选完成是通过G3审核门的必要条件——审核门的
triage_applied
检查会拒绝未完成筛选的推进请求。
可选但推荐——在Agent调度前预取深度阅读层级的PDF
bash
python scripts/prefetch_pdfs.py --state research_state.json \
  --tier deep --concurrency 4
通过
paper-fetch
(Unpaywall为备选方案)将所有深度阅读层级论文的PDF并行下载到
${SCHOLAR_CACHE_DIR:-.scholar_cache}/pdfs/<id-hash>/
路径下,并为每篇论文写入
pdf_path
/
pdf_status
/
pdf_source
/
pdf_bytes
信息。阶段3的Agent将直接读取本地文件,而非各自执行下载——Agent的上下文将专注于阅读和推理,无需重试付费墙限制。
下载失败会标记为
pdf_status='failed'
并附带
pdf_failure_code
paper_fetch_error
no_open_access_pdf
pdf_download_failed
等);无DOI的论文会标记为
pdf_status='no_doi'
。阶段3的Agent会优先检查
pdf_path
,仅当预取路径缺失时才会调用
extract_pdf.py --doi
重试。重新执行预取操作成本很低:已存在
pdf_path
的论文会被跳过(
pdf_status='cached'
)。
付费墙PDF的人工介入。当自动下载失败(付费墙、OA链用尽、无DOI)时,通过
--emit-manifest
(只读模式)向用户展示需手动下载的列表:
bash
python scripts/prefetch_pdfs.py --state research_state.json --emit-manifest

Returns { needs_user_download: [{id, doi, title, drop_at, alt_urls}, ...] }

返回 { needs_user_download: [{id, doi, title, drop_at, alt_urls}, ...] }


The user downloads each PDF (institutional VPN, ResearchGate, etc.) and drops it at the listed `drop_at` path (any `*.pdf` filename in that subdir works). On the next normal `prefetch_pdfs.py` run, dropped files are auto-absorbed as `pdf_source='user_provided'` without re-fetching.

Skip prefetch entirely when paper-fetch is not installed AND you don't want Unpaywall traffic — Phase 3 agents will then download per-paper inside their own contexts (slower, noisier, but functionally identical).

用户下载每篇PDF(通过机构VPN、ResearchGate等渠道)并将其放入指定的`drop_at`路径(该子目录下的任何`*.pdf`文件均可)。下次正常执行`prefetch_pdfs.py`时,放入的文件将被自动识别为`pdf_source='user_provided'`,不会重新下载。

若未安装paper-fetch且不想使用Unpaywall流量,可完全跳过预取环节——阶段3的Agent会在各自的上下文内逐篇下载PDF(速度较慢、噪音较大,但功能一致)。

Phase 3 — Deep read (parallel agent fan-out)

阶段3 — 深度阅读(并行Agent调度)

Phase 3 splits by tier:
  • tier=skim
    apply_triage()
    already wrote an abstract-derived evidence stub with
    depth=shallow
    . No further action needed.
  • tier=deep
    — dispatch one agent per paper, in parallel waves of 8–10. Each agent reads the PDF, writes structured evidence back to state, and returns one JSON line. The host's main context never sees the full PDF text.
The agent prompt template lives at
references/agent_prompts/phase3_deep_read.md
. Load it once, instantiate per paper, and dispatch all N tool_use calls in a single message so they fan out concurrently. Per-agent contract:
  • Input:
    paper_id
    ,
    doi
    ,
    pdf_url
    ,
    abstract
    ,
    question
    ,
    state_path
  • Action:
    extract_pdf.py --doi <doi> --output <tmp>
    → read text → write
    evidence --depth full
  • Output: one line
    {"paper_id": "...", "status": "ok"|"evidence_unavailable", ...}
The state CLI is exclusive-locked, so N agents writing concurrent
evidence
calls are serialized automatically — no coordination needed.
bash
undefined
阶段3按层级拆分处理:
  • tier=skim
    apply_triage()
    已自动写入基于摘要的浅度证据 stub,
    depth=shallow
    。无需进一步操作。
  • tier=deep
    — 为每篇论文分配一个Agent,以8–10个为一组并行调度。每个Agent读取PDF,将结构化证据写入状态文件,并返回一行JSON。主上下文不会接触完整的PDF文本。
Agent提示模板位于
references/agent_prompts/phase3_deep_read.md
。加载一次模板,为每篇论文实例化,并在单个消息中发送所有N个tool_use调用,以便并发调度。每个Agent的约定:
  • 输入
    paper_id
    doi
    pdf_url
    abstract
    question
    state_path
  • 操作
    extract_pdf.py --doi <doi> --output <tmp>
    → 读取文本 → 写入
    evidence --depth full
  • 输出:一行JSON
    {"paper_id": "...", "status": "ok"|"evidence_unavailable", ...}
状态CLI采用排他锁机制,因此N个Agent并发写入
evidence
的操作会自动串行化——无需协调。
bash
undefined

After all wave(s) complete, verify deep-tier coverage:

所有批次完成后,验证深度阅读层级的覆盖情况:

python scripts/research_state.py advance --state research_state.json
--to 4 --check-only

If `deep_tier_full_evidence` is failing, dispatch a follow-up wave for the missing ids only. If a paper's full text is genuinely unreachable (paywall, exhausted OA chain), the agent should write a `depth=shallow` record with `method` starting `evidence_unavailable:` per the prompt's failure-mode section — that record satisfies `depth_marks_valid` without inflating the deep-tier coverage count.

**Manual fallback (no agents available).** Hosts that cannot dispatch parallel agents (some non-CC platforms) can run Phase 3 sequentially in the main session: for each `tier=deep` paper, `extract_pdf.py --doi <doi>` then `research_state.py evidence --id <pid> --depth full ...`. Slower and burns more context, but the gate logic is identical.
python scripts/research_state.py advance --state research_state.json
--to 4 --check-only

若`deep_tier_full_evidence`检查失败,仅为缺失ID的论文调度后续批次。若某篇论文的全文确实无法获取(付费墙、OA链用尽),Agent应写入`depth=shallow`记录,且`method`以`evidence_unavailable:`开头(详见提示模板中的失败模式部分)——该记录可满足`depth_marks_valid`要求,不会虚增深度阅读层级的覆盖数。

**手动备选方案(无可用Agent)**。无法调度并行Agent的主机(部分非CC平台)可在主会话中串行执行阶段3:对于每篇`tier=deep`的论文,先执行`extract_pdf.py --doi <doi>`,再执行`research_state.py evidence --id <pid> --depth full ...`。速度较慢且消耗更多上下文,但审核门逻辑一致。

Phase 4 — Citation chasing

阶段4 — 引用追踪

Take the top 5-10 highest-ranked papers and expand the graph.
bash
undefined
选取排名前5-10的论文,扩展引用图谱。
bash
undefined

Preview the request count first — this is the most expensive command

先预览请求数量——这是成本最高的命令

python scripts/build_citation_graph.py
--state research_state.json
--seed-top 8 --direction both --depth 1 --dry-run
python scripts/build_citation_graph.py
--state research_state.json
--seed-top 8 --direction both --depth 1 --dry-run

Run with an idempotency key so a retry after a network blip is free

使用幂等键执行,以便网络故障后重试无需重复操作

python scripts/build_citation_graph.py
--state research_state.json
--seed-top 8 --direction both --depth 1
--idempotency-key "chase-$(date -u +%Y%m%dT%H%M)"

The script pulls backward references (what did this paper cite?) and forward citations (who cited this paper?), deduplicates against existing state, and writes new candidate papers with `discovered_via: citation_chase`. Run rank + deep read again on any new high-scoring additions.

**Dual backend.** `--source openalex|s2|both` (default `both`). OpenAlex covers most fields well; Semantic Scholar (S2) has better CS / arXiv / cross-disciplinary coverage. The two graphs disagree more than you'd expect — running both then deduping by id surfaces real coverage gaps. S2 needs a DOI / arXiv id / PMID on each seed (it doesn't accept OpenAlex ids); seeds without one skip the S2 backend. `S2_API_KEY` env var raises the S2 quota; without it the public quota of ~1 req/s applies.

**Idempotency.** When `--idempotency-key <k>` is set, the first successful run writes `{response, signature}` to `.scholar_cache/<hash>.json`. A retried run with the same key replays the cached response without re-hitting OpenAlex or re-mutating state. Reusing the same key with different arguments returns `idempotency_key_mismatch` rather than silently serving stale data. Cache directory: `SCHOLAR_CACHE_DIR` env var, default `.scholar_cache/`.

**Special case — a highly cited paper has never been challenged.** If rank says a paper is top-3 by citations but no critiques appear in the corpus, search explicitly for `"<first author> <year>" critique OR limitations OR reanalysis OR failed replication`. This is the confirmation-bias backstop.
python scripts/build_citation_graph.py
--state research_state.json
--seed-top 8 --direction both --depth 1
--idempotency-key "chase-$(date -u +%Y%m%dT%H%M)"

脚本会提取反向引用(该论文引用了哪些文献?)和正向引用(哪些文献引用了该论文?),与现有状态去重,并将新的候选论文标记为`discovered_via: citation_chase`。对新增的高评分论文再次执行排序和深度阅读。

**双后端支持**。`--source openalex|s2|both`(默认`both`)。OpenAlex覆盖大多数领域;Semantic Scholar(S2)在计算机科学/arXiv/跨学科领域的覆盖范围更优。两者的引用图谱差异远超预期——同时运行两者并按ID去重可发现真实的覆盖空白。S2要求每个种子论文具备DOI/arXiv ID/PMID(不接受OpenAlex ID);无上述ID的种子论文会跳过S2后端。环境变量`S2_API_KEY`可提高S2的配额;无密钥时使用公共配额(约1请求/秒)。

**幂等性**。当设置`--idempotency-key <k>`时,首次成功运行会将`{response, signature}`写入`.scholar_cache/<hash>.json`。使用相同密钥重试时,会重放缓存的响应,无需重新调用OpenAlex或修改状态文件。若使用相同密钥但参数不同,会返回`idempotency_key_mismatch`,而非静默返回 stale 数据。缓存目录:环境变量`SCHOLAR_CACHE_DIR`,默认`.scholar_cache/`。

**特殊情况——高引用论文未被质疑**。若排名显示某篇论文的引用量位列前三,但论文库中无相关批评,需专门检索`"<第一作者> <年份>" critique OR limitations OR reanalysis OR failed replication`。这是避免确认偏误的保障措施。

Phase 5 — Synthesis

阶段5 — 研究合成

No scripts here — this is where the agent earns its keep. Cluster and structure:
  1. Thematic clustering. Group the top-N into 3-6 themes that map onto the report outline. Themes should be orthogonal: a paper can be primary to one, secondary to at most one other.
  2. Tension map. Where do papers disagree? For each disagreement, note: which papers, on what, and whether the disagreement is empirical (different data), methodological (different tools), or theoretical (different framings).
  3. Timeline. When relevant, a chronological arc: seminal paper → consolidation → refinement → current frontier.
  4. Venn / gap. What has been studied well, partially, and not at all? The gap is the pivot for Phase 7.
此阶段无脚本支持——是Agent发挥核心价值的环节。进行聚类和结构化处理:
  1. 主题聚类。将Top-N论文分为3-6个主题,与报告大纲对应。主题应相互独立:一篇论文可主要归属一个主题,最多次要归属另一个主题。
  2. 分歧梳理。论文间存在哪些分歧?针对每个分歧,记录:涉及哪些论文、分歧点是什么、分歧属于实证类(数据不同)、方法类(工具不同)还是理论类(框架不同)。
  3. 时间线。若相关,梳理时间脉络:开创性论文→整合→细化→当前前沿。
  4. 研究空白。哪些内容已被充分研究、部分研究、尚未研究?研究空白是阶段7的核心切入点。

Phase 6 — Self-critique

阶段6 — 自我审查

This is not optional. Load
assets/prompts/self_critique.md
and run the full checklist against your draft (still unpublished). The checklist covers:
  • Single-source claims (any claim backed by only one paper?)
  • Citation/recency skew (is the latest-2-years window covered?)
  • Venue bias (is the corpus dominated by one journal/venue?)
  • Author bias (does one lab dominate the citations?)
  • Untested high-citation papers (anyone cite a paper without reading a critique?)
  • Contradictions buried (any tension in Phase 5 that got glossed over?)
  • Archetype fit (does the structure match the chosen archetype?)
  • Unanchored claims (any statement without a
    [^id]
    anchor?)
Write findings to
research_state.json
under
self_critique
and fix blockers before Phase 7. Findings go into the report appendix verbatim — the reader deserves to see what the research process doubted itself about.
此环节为必填项。加载
assets/prompts/self_critique.md
,对照完整清单审阅未发布的草稿。清单涵盖:
  • 单一来源结论(是否存在仅基于一篇论文的结论?)
  • 引用/时效性偏差(是否覆盖了近2年的文献?)
  • 期刊偏差(论文库是否被单一期刊/会议主导?)
  • 作者偏差(引用是否被单一实验室主导?)
  • 未验证的高引用论文(是否存在仅引用未阅读批评内容的论文?)
  • 被掩盖的矛盾(阶段5梳理的分歧是否被忽略?)
  • 模板匹配度(结构是否与所选模板一致?)
  • 无依据结论(是否存在无
    [^id]
    标记的陈述?)
将审查结果写入
research_state.json
self_critique
字段,并在进入阶段7前解决所有阻塞问题。审查结果将原样纳入报告附录——读者有权了解研究过程中的自我质疑点。

Phase 7 — Report

阶段7 — 报告生成

Render an archetype scaffold from state, then fill the agent-prose slots and validate anchors:
bash
undefined
从状态文件渲染模板框架,填充Agent撰写的内容并验证引用标记:
bash
undefined

Generate the scaffold — fills header, themes, tensions, methodology

生成框架——从状态文件填充标题、主题、分歧、方法论附录、自我审查附录和引用索引。留下
<!-- AGENT: ... -->
占位符供撰写内容。

appendix, self-critique appendix, and bibliography anchor index from

state. Leaves
<!-- AGENT: ... -->
placeholders for prose.

python scripts/render_report.py --state research_state.json
python scripts/render_report.py --state research_state.json

→ reports/<slug>_<YYYYMMDD>.md by default; pass --output PATH to override.

默认输出到reports/<slug>_<YYYYMMDD>.md;可通过--output PATH指定路径。

After filling in the prose, lint every [^id] anchor against

填充内容后,检查每一个[^id]标记是否与state.papers匹配。在报告生成前捕获标记错误。

state.papers. Catches typo'd anchors before the report ships.

python scripts/render_report.py --state research_state.json
--lint reports/<slug>_<YYYYMMDD>.md
python scripts/render_report.py --state research_state.json
--lint reports/<slug>_<YYYYMMDD>.md

Export bibliography in the user's preferred format

按用户偏好格式导出参考文献

python scripts/export_bibtex.py --state research_state.json --format bibtex --output refs.bib python scripts/export_bibtex.py --state research_state.json --format csl-json --output refs.json

The scaffold's body uses `[^id]` anchors (the paper id from state). The
bibliography section at the bottom carries one definition per selected
paper. The lint mode flags `unknown_anchors_used` (typos) and
`undefined_in_text` (anchors with no footnote definition); both are
blockers. `unused_definitions` is a soft signal — selected papers that
ended up not cited inline.

**Save path convention:** `reports/<slug>_<YYYYMMDD>.md`. The skill does not write outside the working directory unless the user specifies a path.
python scripts/export_bibtex.py --state research_state.json --format bibtex --output refs.bib python scripts/export_bibtex.py --state research_state.json --format csl-json --output refs.json

框架主体使用`[^id]`标记(状态文件中的论文ID)。底部的参考文献部分包含每篇筛选论文的定义。lint模式会标记`unknown_anchors_used`(拼写错误)和`undefined_in_text`(无脚注定义的标记);两者均为阻塞问题。`unused_definitions`是软信号——筛选出但未在正文中引用的论文。

**保存路径约定**:`reports/<slug>_<YYYYMMDD>.md`。除非用户指定路径,否则工具不会写入工作目录外的位置。

Report archetype selection

报告模板选择

ArchetypeWhen to usePrimary output shape
literature_review
User wants to know what's established about a topicThematic sections + synthesis + gap
systematic_review
Narrow question, many studies, need rigorous comparisonPRISMA-lite flow + extraction table + pooled findings
scoping_review
Broad topic, "what has been studied?"Coverage map + methods inventory + research gap
comparative_analysis
"A vs B" — methods, models, approachesAxes of comparison + per-axis verdict + recommendation
grant_background
Narrative for a proposal introductionProblem significance + what's known + what's missing + why our approach
Templates live in
assets/templates/<archetype>.md
. Load only the one you need.
模板类型使用场景核心输出结构
literature_review
用户想了解某一主题的已有研究成果主题章节+研究合成+研究空白
systematic_review
窄问题、多项研究、需严格对比PRISMA简化流程+提取表格+汇总结果
scoping_review
宽主题、“已研究哪些内容?”覆盖范围图+方法清单+研究空白
comparative_analysis
“A vs B”——方法、模型、方案对比对比维度+各维度结论+建议
grant_background
基金申请引言的叙述性内容问题重要性+已有研究+研究空白+我方方案的必要性
模板位于
assets/templates/<archetype>.md
。仅加载所需的模板即可。

Scripts reference

脚本参考

ScriptPurpose
research_state.py
Init, read, write, query the state file. Central to every phase.
search_openalex.py
Primary search (no key, 240M works, citation counts).
search_arxiv.py
arXiv API — preprints and CS/ML/physics.
search_crossref.py
Crossref REST — authoritative DOI metadata.
search_pubmed.py
NCBI E-utilities — biomedical corpus with MeSH.
search_exa.py
Exa neural web search (optional, key-gated) — open-web coverage the scholarly APIs miss.
dedupe_papers.py
DOI normalization + title similarity merging across sources.
rank_papers.py
Transparent scoring formula. Prints the formula and per-paper components.
skim_papers.py
Phase-3 triage. Splits selected papers into
deep
/
skim
/
defer
tiers on cheap deterministic signals, refines
selected_ids
, auto-fills evidence stubs for skim tier. Runs at the close of Phase 2 before G3.
prefetch_pdfs.py
Optional. Pulls deep-tier PDFs into a stable cache via paper-fetch (with Unpaywall fallback) before Phase 3 agent fan-out. Concurrent (
--concurrency
), idempotent on re-run, fail-soft per paper. Writes
pdf_path
/
pdf_status
per paper so agents read a local file instead of re-downloading.
build_citation_graph.py
Forward/backward snowballing via OpenAlex.
extract_pdf.py
Full-text extraction (pypdf). Accepts
--input
,
--url
, or
--doi
. DOI mode resolves via paper-fetch skill if installed, falls back to Unpaywall. Safe on scanned PDFs (skips, emits warning).
export_bibtex.py
BibTeX / CSL-JSON / RIS export from state.
render_report.py
Phase 7 — render an archetype scaffold from
state.themes
/
state.tensions
/
state.queries
/
state.ranking
/
state.self_critique
, with
<!-- AGENT: ... -->
slots for prose.
--lint <report.md>
validates every
[^id]
anchor against
state.papers
.
All scripts accept
--help
,
--schema
, emit a structured JSON envelope on stdout, and use
research_state.json
as the single source of truth. Every script is idempotent on the state file (network-layer idempotency is P1 work).
脚本用途
research_state.py
初始化、读取、写入、查询状态文件。是所有阶段的核心。
search_openalex.py
核心检索工具(无需密钥,包含2.4亿+研究成果,提供引用计数)。
search_arxiv.py
arXiv API——预印本和计算机科学/机器学习/物理学领域文献。
search_crossref.py
Crossref REST——权威DOI元数据。
search_pubmed.py
NCBI E-utilities——生物医学论文库,支持MeSH术语。
search_exa.py
Exa神经网页搜索(可选,需密钥)——检索学术API未覆盖的开放网页内容。
dedupe_papers.py
DOI标准化+跨数据源标题相似度合并。
rank_papers.py
透明评分公式。会打印公式和单篇论文的各评分分量。
skim_papers.py
阶段3筛选工具。基于低成本确定性信号将筛选出的论文分为
deep
/
skim
/
defer
层级,优化
selected_ids
,自动填充快速浏览层级的证据stub。在阶段2结束后、G3审核前执行。
prefetch_pdfs.py
可选工具。在阶段3Agent调度前,通过paper-fetch(Unpaywall为备选)将深度阅读层级的PDF下载到稳定缓存。支持并发(
--concurrency
),重试时幂等,单篇论文下载失败不影响整体。为每篇论文写入
pdf_path
/
pdf_status
,以便Agent读取本地文件而非重新下载。
build_citation_graph.py
通过OpenAlex进行正向/反向滚雪球式检索。
extract_pdf.py
全文提取(基于pypdf)。接受
--input
--url
--doi
参数。DOI模式优先通过paper-fetch工具(若已安装)解析,备选方案为Unpaywall。对扫描版PDF安全(会跳过并发出警告)。
export_bibtex.py
从状态文件导出BibTeX/CSL-JSON/RIS格式的参考文献。
render_report.py
阶段7——从
state.themes
/
state.tensions
/
state.queries
/
state.ranking
/
state.self_critique
渲染模板框架,包含
<!-- AGENT: ... -->
占位符供撰写内容。
--lint <report.md>
验证所有
[^id]
标记是否与
state.papers
匹配。
所有脚本均接受
--help
--schema
参数,在标准输出中输出结构化JSON包,并以
research_state.json
为唯一数据源。所有脚本对状态文件均具备幂等性(网络层幂等性为优先级1的工作)。

CLI contract, env vars, and state schema

CLI约定、环境变量与状态文件 schema

Three details that agents discover by running scripts and reading the JSON envelopes — kept out of the body to save context. Load on demand:
  • references/cli_contract.md
    — the success/failure envelope shape, exit codes,
    --schema
    introspection, and idempotency cache semantics.
  • references/env_vars.md
    — the trust-boundary env vars (
    SCHOLAR_*
    ,
    NCBI_API_KEY
    ,
    EXA_API_KEY
    ,
    S2_API_KEY
    ,
    PAPER_FETCH_SCRIPT
    ). Agents should never set these — surface to the user when a script reports a missing one.
  • references/state_schema.md
    — the
    research_state.json
    shape. Prefer
    python scripts/research_state.py --schema
    for the live, machine-readable version.
以下细节可通过运行脚本和读取JSON包获取——未放入主体内容以节省上下文。按需加载:
  • references/cli_contract.md
    ——成功/失败包结构、退出码、
    --schema
    自省、幂等缓存语义。
  • references/env_vars.md
    ——信任边界环境变量(
    SCHOLAR_*
    NCBI_API_KEY
    EXA_API_KEY
    S2_API_KEY
    PAPER_FETCH_SCRIPT
    )。Agent不应设置这些变量——当脚本报告缺失时,告知用户。
  • references/state_schema.md
    ——
    research_state.json
    的结构。优先使用
    python scripts/research_state.py --schema
    获取实时的机器可读版本。

Completion gates

阶段推进审核门

Each phase transition has a gate (G1..G7). Advance ONLY via:
bash
python scripts/research_state.py --state <path> advance          # advance by 1
python scripts/research_state.py --state <path> advance --check-only   # preview only
The gate predicates are enforced in
scripts/_gates.py
. Direct
set --field phase
is rejected — the
phase
field is no longer settable. If the gate fails, the envelope lists the failing checks by name so you know exactly what's missing.
TargetGate (enforced)
G1 (→ 1)Question set, archetype valid, state initialized.
≥3 keyword clusters
is host-checked.
G2 (→ 2)
overall_saturated == true
across all queried sources AND ≥3 distinct sources in
state.queries
.
G3 (→ 3)
state.ranking
recorded;
selected_ids
non-empty; every selected paper has
score_components
;
state.triage_complete=true
(run
skim_papers.py
).
G4 (→ 4)All selected papers have
depth ∈ {full, shallow}
AND every
tier=deep
paper either (a) has
depth=full
, or (b) has
depth=shallow
with
evidence.method
starting one of two documented escape-hatch prefixes:
evidence_unavailable:
(PDF unreachable — paywall, exhausted OA chain, scanned) or
topic_mismatch:
(PDF read fully but off-topic — Phase 2 ranking false-positive). Skim-tier
depth=shallow
is by design and does not block.
G5 (→ 5)≥1 query whose
source
contains
citation_chase
(any backend layout —
openalex_citation_chase
,
s2_citation_chase
, or the default dual
openalex_s2_citation_chase
) AND
hits > 0
.
G6 (→ 6)
len(themes) ≥ 3
AND (
len(tensions) ≥ 1
OR a critique finding mentioning "no tensions").
G7 (→ 7)
state.self_critique.appendix
non-empty;
len(resolved) ≥ len(findings)
.
每个阶段过渡均设有审核门(G1..G7)。仅可通过以下命令推进:
bash
python scripts/research_state.py --state <path> advance          # 推进1个阶段
python scripts/research_state.py --state <path> advance --check-only   # 仅预览审核结果
审核门的判定条件在
scripts/_gates.py
中强制执行。直接通过
set --field phase
修改阶段会被拒绝——
phase
字段不可手动设置。若审核失败,返回的包会列出失败的检查项名称,明确告知缺失内容。
目标阶段审核门(强制执行)
G1(→ 1)已设置问题、模板有效、状态已初始化。
≥3个关键词集群
由主机检查。
G2(→ 2)所有检索过的数据源
overall_saturated == true
,且
state.queries
中包含≥3个不同数据源。
G3(→ 3)已记录
state.ranking
selected_ids
非空;每篇筛选出的论文均有
score_components
state.triage_complete=true
(已执行
skim_papers.py
)。
G4(→ 4)所有筛选出的论文
depth ∈ {full, shallow}
,且每篇
tier=deep
的论文要么(a)
depth=full
,要么(b)
depth=shallow
evidence.method
以以下两个文档化逃生前缀之一开头:
evidence_unavailable:
(PDF无法获取——付费墙、OA链用尽、扫描版)或
topic_mismatch:
(已完整读取PDF但偏离主题——阶段2排序误判)。快速浏览层级的
depth=shallow
为设计预期,不会阻塞。
G5(→ 5)至少有一个查询的
source
包含
citation_chase
(任何后端类型——
openalex_citation_chase
s2_citation_chase
或默认双后端
openalex_s2_citation_chase
),且
hits > 0
G6(→ 6)
len(themes) ≥ 3
,且(
len(tensions) ≥ 1
或审查结果提及“无分歧”)。
G7(→ 7)
state.self_critique.appendix
非空;
len(resolved) ≥ len(findings)

Enrichment with MCP tools

MCP工具增强

Semantic Scholar coverage is not one of these — it is reached through the script path (
build_citation_graph.py --source s2|both
) and is a first-class Phase 4 backend, not enrichment. The MCP tools below are the genuine skin layer: they may time out, get renamed, or be absent entirely, and no phase output depends on them.
If the session has asta or Brave Search MCP tools available, use them as enrichment:
  • mcp__asta__search_papers_by_relevance
    — good for dense relevance ranking on top of the script searches
  • mcp__asta__get_citations
    — lighter weight than
    build_citation_graph.py
    for spot-checking a single seed paper
  • mcp__asta__snippet_search
    — grep-like search across abstracts
  • Brave Search — non-academic sources (blog posts, press releases, pre-print discussion)
Treat MCP tools as unreliable by design — they may timeout or be unavailable. Never place a phase-critical step behind an MCP call. Scripts are the spine; MCP is the skin.
Semantic Scholar覆盖不属于此类——它通过脚本路径(
build_citation_graph.py --source s2|both
)访问,是阶段4的一等后端,而非增强工具。以下MCP工具属于真正的辅助层:可能超时、重命名或不可用,且任何阶段输出均不依赖它们。
若会话中有asta或Brave Search MCP工具可用,可将其作为增强工具使用:
  • mcp__asta__search_papers_by_relevance
    ——在脚本检索基础上进行精准相关性排序
  • mcp__asta__get_citations
    ——比
    build_citation_graph.py
    更轻量,适合单点检查单篇种子论文
  • mcp__asta__snippet_search
    ——类似grep的摘要检索
  • Brave Search——非学术来源(博客文章、新闻稿、预印本讨论)
默认将MCP工具视为不可靠——可能超时或不可用。切勿将阶段核心步骤依赖于MCP调用。脚本是核心骨架;MCP是辅助皮肤。

Pitfalls (short list; see
references/pitfalls.md
for detail)

常见陷阱(简短列表;详见
references/pitfalls.md

  1. Treating the first page of search results as "the literature" — run multiple keyword clusters and chase citations.
  2. Unanchored claims — every non-trivial statement in the report needs a
    [^id]
    pointing to a paper in state.
  3. Confirmation bias — actively search for critiques of top-cited papers; see Phase 4 special case.
  4. Preprint conflation — arXiv/bioRxiv are preprints; tag them as such in the report and weight evidence accordingly. Lint-safe convention: place the anchor and marker separately —
    [^id] *(preprint)*
    , not
    [^id, preprint]
    (commas inside footnote brackets break Markdown parsing and the
    render_report.py --lint
    check).
  5. Venue monoculture — if >60% of top-N come from one journal/venue, broaden sources.
  6. Author monoculture — same for a single lab or author.
  7. Recency collapse — the last 2 years matter for "state of the art" framings; check explicit coverage.
  8. Stale MCP tool names — MCP servers rename tools; always list available tools before assuming names. Script paths are stable; MCP names are not.
  9. Single-shot search — budget for ≥3 search rounds per cluster, not one.
  10. Skipping self-critique — the temptation to ship a clean draft is exactly when Phase 6 catches the most.
  1. 将检索结果第一页视为“全部文献”——需执行多轮关键词集群检索并追踪引用。
  2. 无依据结论——报告中所有非 trivial 陈述均需附带指向状态文件中论文的
    [^id]
    标记。
  3. 确认偏误——主动检索高引用论文的批评内容;详见阶段4的特殊情况。
  4. 预印本混淆——arXiv/bioRxiv为预印本;在报告中标记,并相应权衡证据权重。安全的lint约定:将标记和说明分开——
    [^id] *(preprint)*
    ,而非
    [^id, preprint]
    (脚注括号内的逗号会破坏Markdown解析和
    render_report.py --lint
    检查)。
  5. 期刊单一化——若Top-N论文中>60%来自同一期刊/会议,需扩大数据源范围。
  6. 作者单一化——单一实验室或作者主导引用的情况同理。
  7. 时效性缺失——“前沿现状”场景需覆盖近2年的文献;检查是否有明确覆盖。
  8. MCP工具名称过时——MCP服务器会重命名工具;使用前务必列出可用工具。脚本路径稳定;MCP名称不稳定。
  9. 单次检索——每个关键词集群需预留≥3轮检索的预算,而非仅1轮。
  10. 跳过自我审查——急于发布整洁草稿时,正是阶段6发现问题最多的时候。

Example interaction

示例交互

A complete walk-through (CRISPR base editing for DMD — Phase 0 question restate through Phase 7 report and bibliography) lives in
references/example_run.md
. Read it once when you want to see what a healthy run looks like end-to-end; it's not load-bearing for routine sessions.
完整的流程演示(针对DMD的CRISPR碱基编辑——从阶段0问题重述到阶段7报告和参考文献)位于
references/example_run.md
。当你想了解健康运行的端到端流程时,可阅读一次;日常会话无需依赖该内容。

References

参考文档

Modular documentation, loaded only when needed:
  • references/search_strategies.md
    — Boolean clusters, PICO, snowballing, saturation math
  • references/source_selection.md
    — which database for which question
  • references/quality_assessment.md
    — CRAAP, journal tier, retraction check, preprint handling
  • references/report_templates.md
    — the 5 archetypes with section-by-section guidance
  • references/pitfalls.md
    — long-form version of the pitfalls list with examples
  • references/cli_contract.md
    — JSON envelope shape, exit codes,
    --schema
    introspection, idempotency cache
  • references/env_vars.md
    — trust-boundary configuration (SCHOLAR_*, NCBI_API_KEY, EXA_API_KEY, S2_API_KEY, PAPER_FETCH_SCRIPT)
  • references/state_schema.md
    research_state.json
    shape and ID-normalization rules
  • references/example_run.md
    — full end-to-end example (CRISPR base editing for DMD)
  • references/agent_prompts/phase3_deep_read.md
    — per-paper prompt for parallel agent fan-out in Phase 3
模块化文档,按需加载:
  • references/search_strategies.md
    ——布尔关键词集群、PICO框架、滚雪球检索、饱和计算
  • references/source_selection.md
    ——不同问题对应的数据库选择
  • references/quality_assessment.md
    ——CRAAP原则、期刊层级、撤稿检查、预印本处理
  • references/report_templates.md
    ——5种模板的分章节指导
  • references/pitfalls.md
    ——陷阱列表的详细版本,附带示例
  • references/cli_contract.md
    ——JSON包结构、退出码、
    --schema
    自省、幂等缓存
  • references/env_vars.md
    ——信任边界配置(SCHOLAR_*、NCBI_API_KEY、EXA_API_KEY、S2_API_KEY、PAPER_FETCH_SCRIPT)
  • references/state_schema.md
    ——
    research_state.json
    结构和ID标准化规则
  • references/example_run.md
    ——完整端到端示例(针对DMD的CRISPR碱基编辑)
  • references/agent_prompts/phase3_deep_read.md
    ——阶段3并行Agent调度的单篇论文提示模板