semanticscholar-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Semantic Scholar Search Workflow

Semantic Scholar 搜索工作流

Search academic papers via the Semantic Scholar API using a structured 4-phase workflow.
Critical rule: NEVER make multiple sequential Bash calls for API requests. Always write ONE Python script that runs all searches, then execute it once. All rate limiting is handled inside
s2.py
automatically.
通过结构化的4阶段工作流,使用Semantic Scholar API搜索学术论文。
核心规则: 绝不要进行多次连续的Bash调用来发起API请求。始终编写一个Python脚本执行所有搜索,然后仅运行一次脚本。所有速率限制由
s2.py
自动处理。

Phase 1: Understand & Plan

阶段1:理解与规划

Parse the user's intent and choose a search strategy:
解析用户意图并选择搜索策略:

Decision Tree

决策树

User wants...StrategyFunction
Broad topic explorationRelevance search
search_relevance()
Precise technical terms, exact phrasesBulk search with boolean operators
search_bulk()
with
build_bool_query()
Specific passages or methodsSnippet search
search_snippets()
Known paper by titleTitle match
match_title()
Known paper by DOI/PMID/ArXivDirect lookup
get_paper()
Papers citing a known workCitation traversal
get_citations()
Related to one paperSingle-seed recommendations
find_similar()
Related to multiple papersMulti-seed recommendations
recommend()
Find a researcherAuthor search
search_authors()
Researcher's profileAuthor details
get_author()
Researcher's publicationsAuthor papers
get_author_papers()
用户需求...策略函数
宽泛主题探索相关性搜索
search_relevance()
精准技术术语、精确短语带布尔运算符的批量搜索
search_bulk()
+
build_bool_query()
特定段落或方法片段搜索
search_snippets()
通过标题查找已知论文标题匹配
match_title()
通过DOI/PMID/ArXiv查找已知论文直接查询
get_paper()
引用某篇已知论文的文献引用遍历
get_citations()
与某篇论文相关的文献单种子推荐
find_similar()
与多篇论文相关的文献多种子推荐
recommend()
查找研究者作者搜索
search_authors()
研究者个人资料作者详情
get_author()
研究者发表的论文作者论文列表
get_author_papers()

Query Construction Rules

查询构建规则

  • Ambiguous terms (e.g., "stem cells" could mean mesenchymal or stem-like T cells): Use
    build_bool_query()
    with exact phrases and exclusions
    • Example:
      build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])
  • Multi-context queries (e.g., "topic X in cancer AND autoimmunity"): Plan separate searches, deduplicate with
    deduplicate()
  • Broad topics: Use
    search_relevance()
    with filters (year, venue, fieldsOfStudy, minCitationCount)
  • 歧义术语(例如“stem cells”可能指间充质干细胞或类干细胞T细胞):使用
    build_bool_query()
    结合精确短语和排除项
    • 示例:
      build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])
  • 多语境查询(例如“肿瘤与自身免疫中的X主题”):规划单独搜索,使用
    deduplicate()
    去重
  • 宽泛主题:使用
    search_relevance()
    并结合筛选器(年份、期刊、研究领域、最低引用数)

Plan Filters

筛选器规划

FilterUse when
year="2020-"
Recent work only
publication_date="2024-01-01:2024-06-30"
Precise date range (YYYY-MM-DD)
fields_of_study="Medicine"
Restrict to domain
min_citations=10
Only established papers
pub_types="Review"
Find reviews/meta-analyses
pub_types="ClinicalTrial"
Clinical trials only
open_access=True
Only open access papers
Checkpoint: Before proceeding, verify: (1) search strategy matches user intent, (2) filters are appropriate, (3) query is specific enough to avoid irrelevant results.
筛选器使用场景
year="2020-"
仅查找近期成果
publication_date="2024-01-01:2024-06-30"
精确日期范围(YYYY-MM-DD)
fields_of_study="Medicine"
限定领域范围
min_citations=10
仅查找已确立影响力的论文
pub_types="Review"
查找综述/荟萃分析
pub_types="ClinicalTrial"
仅查找临床试验
open_access=True
仅查找开放获取论文
检查点: 继续执行前,请确认:(1) 搜索策略匹配用户意图,(2) 筛选器设置合理,(3) 查询足够具体以避免无关结果。

Phase 2: Execute Search

阶段2:执行搜索

Write ONE Python script that begins with the standard prelude below, then runs all searches:
python
undefined
编写一个Python脚本,开头包含以下标准前置代码,然后运行所有搜索:
python
undefined

--- Standard prelude (use in every script) ---

--- Standard prelude (use in every script) ---

import sys, os, glob _candidates = [ os.path.expanduser("/.claude/skills/semanticscholar-skill"), os.path.expanduser("/.openclaw/skills/semanticscholar-skill"), *glob.glob(os.path.expanduser("/.claude/plugins/**/semanticscholar-skill"), recursive=True), *glob.glob(os.path.expanduser("/.codex/skills/semanticscholar-skill")), ".", ] SKILL_DIR = next((p for p in _candidates if os.path.isfile(os.path.join(p, "s2.py"))), None) if SKILL_DIR is None: raise RuntimeError("Cannot locate semanticscholar-skill (s2.py not found)") sys.path.insert(0, SKILL_DIR) from s2 import *
import sys, os, glob _candidates = [ os.path.expanduser("/.claude/skills/semanticscholar-skill"), os.path.expanduser("/.openclaw/skills/semanticscholar-skill"), *glob.glob(os.path.expanduser("/.claude/plugins/**/semanticscholar-skill"), recursive=True), *glob.glob(os.path.expanduser("/.codex/skills/semanticscholar-skill")), ".", ] SKILL_DIR = next((p for p in _candidates if os.path.isfile(os.path.join(p, "s2.py"))), None) if SKILL_DIR is None: raise RuntimeError("Cannot locate semanticscholar-skill (s2.py not found)") sys.path.insert(0, SKILL_DIR) from s2 import *

--- end prelude ---

--- end prelude ---

Build precise query

Build precise query

q = build_bool_query( phrases=["stem-like T cells"], required=["CD4", "IBD"], excluded=["mesenchymal"] ) papers = search_bulk(q, max_results=30, year="2018-", fields_of_study="Medicine") papers = deduplicate(papers)
print(format_results(papers, "Stem-like CD4 T cells in IBD"))

Save to `/tmp/s2_search.py`, then run with `python3 /tmp/s2_search.py` in a single Bash call. Rate limiting, retries, and backoff are automatic inside `s2.py`.

**Checkpoint:** Verify the script ran successfully (no exceptions) and returned results. If 0 results, broaden the query or relax filters before presenting.
q = build_bool_query( phrases=["stem-like T cells"], required=["CD4", "IBD"], excluded=["mesenchymal"] ) papers = search_bulk(q, max_results=30, year="2018-", fields_of_study="Medicine") papers = deduplicate(papers)
print(format_results(papers, "Stem-like CD4 T cells in IBD"))

保存至`/tmp/s2_search.py`,然后通过单次Bash调用`python3 /tmp/s2_search.py`运行。速率限制、重试和退避机制由`s2.py`自动处理。

**检查点:** 确认脚本运行成功(无异常)并返回结果。如果返回0条结果,请在展示前放宽查询条件或筛选器。

Worked Examples

示例实践

Each example below assumes the standard prelude from Phase 2 is at the top of the script.
Example 1: Author workflow — "Find papers by Yann LeCun on self-supervised learning"
python
authors = search_authors("Yann LeCun", max_results=5)
print(format_authors(authors))
以下每个示例均假设脚本顶部包含阶段2中的标准前置代码
示例1:作者工作流 — "查找Yann LeCun关于自监督学习的论文"
python
authors = search_authors("Yann LeCun", max_results=5)
print(format_authors(authors))

Use the first match's ID to get their papers

使用第一个匹配结果的ID获取其论文

author_id = authors[0]["authorId"] papers = get_author_papers(author_id, max_results=50)
author_id = authors[0]["authorId"] papers = get_author_papers(author_id, max_results=50)

Filter locally for topic

本地筛选主题

ssl_papers = [p for p in papers if "self-supervised" in (p.get("title") or "").lower()] print(format_results(ssl_papers, "Yann LeCun - Self-Supervised Learning"))

**Example 2: Citation chain with intent** — "Who cited the Transformer paper and how did they use it?"

```python
paper = get_paper("DOI:10.48550/arXiv.1706.03762")
print(f"Title: {paper['title']}, Citations: {paper['citationCount']}")
ssl_papers = [p for p in papers if "self-supervised" in (p.get("title") or "").lower()] print(format_results(ssl_papers, "Yann LeCun - Self-Supervised Learning"))

**示例2:带意图的引用链** — "谁引用了Transformer论文,以及他们如何使用该模型?"

```python
paper = get_paper("DOI:10.48550/arXiv.1706.03762")
print(f"Title: {paper['title']}, Citations: {paper['citationCount']}")

Citation envelopes carry contextsWithIntent — keep them, don't flatten.

引用信封包含contextsWithIntent — 保留完整结构,不要扁平化。

citing = get_citations(paper["paperId"], max_results=50) citing.sort(key=lambda c: (c.get("citingPaper") or {}).get("citationCount", 0), reverse=True) print(format_citations(citing, max_items=10)) # renders intent labels + context snippet

**Example 3: Multi-seed recommendations with BibTeX export** — "Find papers like these two but not about NLP"

```python
recs = recommend(
    positive_ids=["DOI:10.1038/nature14539", "ARXIV:2010.11929"],
    negative_ids=["ARXIV:1706.03762"],
    limit=20
)
print(format_results(recs, "Vision papers like Deep Learning & ViT, excluding NLP"))
citing = get_citations(paper["paperId"], max_results=50) citing.sort(key=lambda c: (c.get("citingPaper") or {}).get("citationCount", 0), reverse=True) print(format_citations(citing, max_items=10)) # 渲染意图标签 + 上下文片段

**示例3:多种子推荐与BibTeX导出** — "查找类似这两篇但与NLP无关的论文"

```python
recs = recommend(
    positive_ids=["DOI:10.1038/nature14539", "ARXIV:2010.11929"],
    negative_ids=["ARXIV:1706.03762"],
    limit=20
)
print(format_results(recs, "Vision papers like Deep Learning & ViT, excluding NLP"))

Export BibTeX for top results

导出前10条结果的BibTeX

bib_data = batch_papers([r["paperId"] for r in recs[:10]], fields="title,citationStyles") print(export_bibtex(bib_data))
undefined
bib_data = batch_papers([r["paperId"] for r in recs[:10]], fields="title,citationStyles") print(export_bibtex(bib_data))
undefined

Phase 3: Summarize & Present

阶段3:总结与展示

  • Use
    format_results()
    for consistent output (summary table + top-10 details)
  • If user's language is Chinese, present summaries in Chinese
  • Always note total results count and search strategy used
  • Highlight most relevant papers based on the user's specific question
  • 使用
    format_results()
    生成一致的输出(汇总表格 + 前10条详情)
  • 如果用户使用中文,以中文展示摘要
  • 始终注明结果总数和所使用的搜索策略
  • 根据用户的具体问题高亮最相关的论文

Phase 4: User Interaction Loop

阶段4:用户交互循环

After presenting results, always offer these options:
  1. Translate — titles/summaries to Chinese (or other language)
  2. Details — full abstract for specific paper numbers
  3. Refine — narrow or expand search with different terms/filters
  4. Similar — find papers similar to a specific result (
    find_similar()
    )
  5. Citations — who cited a specific paper and how (
    get_citations()
    +
    format_citations()
    for intent labels)
  6. Export — save results via
    export_bibtex()
    ,
    export_markdown()
    , or
    export_json()
  7. Done — end search session
Loop until user says done. Each follow-up uses the same single-script pattern.

展示结果后,始终提供以下选项
  1. 翻译 — 将标题/摘要翻译成中文(或其他语言)
  2. 详情 — 获取特定论文编号的完整摘要
  3. 优化 — 使用不同术语/筛选器缩小或扩大搜索范围
  4. 相似文献 — 查找与特定结果相似的论文(
    find_similar()
  5. 引用情况 — 查看谁引用了特定论文及引用意图(
    get_citations()
    +
    format_citations()
    生成意图标签)
  6. 导出 — 通过
    export_bibtex()
    export_markdown()
    export_json()
    保存结果
  7. 完成 — 结束搜索会话
循环直到用户表示完成。每次后续操作均遵循单脚本模式。

API Quick Reference

API快速参考

Helper Module (
s2.py
)

辅助模块(
s2.py

Use the standard prelude from Phase 2 at the top of every script. Then call any of the functions below — the module's docstring (
help(s2)
or read
s2.py
) lists each by phase with one-line summaries.
在每个脚本顶部使用阶段2中的标准前置代码。然后调用以下任意函数 — 模块的文档字符串(
help(s2)
或阅读
s2.py
)按阶段列出了每个函数及其单行摘要。

Paper Search Functions

论文搜索函数

FunctionPurposeMax Results
search_relevance(query, **filters)
Simple broad search1,000
search_bulk(query, sort=..., **filters)
Boolean precise search10,000,000
search_snippets(query, paper_ids=, authors=, inserted_before=, **filters)
Full-text passage search1,000
match_title(title)
Exact title match1
paper_autocomplete(query)
Query-completion suggestions
get_paper(paper_id)
Single paper details
get_citations(paper_id, max_results, publication_date=)
Who cited this10,000
get_references(paper_id, max_results)
What this cites10,000
find_similar(paper_id, limit, pool)
Single-seed recommendations500
recommend(positive_ids, negative_ids, limit)
Multi-seed recommendations500
batch_papers(ids, fields)
Batch lookup (≤500)
函数用途最大结果数
search_relevance(query, **filters)
简单宽泛搜索1,000
search_bulk(query, sort=..., **filters)
布尔精准搜索10,000,000
search_snippets(query, paper_ids=, authors=, inserted_before=, **filters)
全文段落搜索1,000
match_title(title)
精确标题匹配1
paper_autocomplete(query)
查询补全建议
get_paper(paper_id)
单篇论文详情
get_citations(paper_id, max_results, publication_date=)
引用该论文的文献10,000
get_references(paper_id, max_results)
该论文引用的文献10,000
find_similar(paper_id, limit, pool)
单种子推荐500
recommend(positive_ids, negative_ids, limit)
多种子推荐500
batch_papers(ids, fields)
批量查询(≤500)

Author Functions

作者相关函数

FunctionPurposeMax Results
search_authors(query, max_results)
Find researchers by name1,000
get_author(author_id)
Author profile (affiliations, h-index)
get_author_papers(author_id, max_results, publication_date=)
Author's publications10,000
get_paper_authors(paper_id, max_results)
Paper's author list1,000
batch_authors(ids, fields)
Batch author lookup (≤1000)
函数用途最大结果数
search_authors(query, max_results)
通过姓名查找研究者1,000
get_author(author_id)
作者资料(机构、h指数)
get_author_papers(author_id, max_results, publication_date=)
作者发表的论文10,000
get_paper_authors(paper_id, max_results)
论文的作者列表1,000
batch_authors(ids, fields)
批量作者查询(≤1000)

Filter Parameters (kwargs)

筛选器参数(关键字参数)

snake_case kwargs are translated to S2 camelCase params automatically (
fields_of_study
fieldsOfStudy
,
min_citations
minCitationCount
,
publication_date
publicationDateOrYear
,
pub_types
publicationTypes
,
open_access
openAccessPdf
). Use snake_case here.
year
,
publication_date
,
venue
,
fields_of_study
,
min_citations
,
pub_types
,
open_access
  • year
    :
    "2020-"
    ,
    "-2019"
    ,
    "2016-2020"
  • publication_date
    :
    "2024-01-01:2024-06-30"
    (YYYY-MM-DD range, open-ended OK)
  • pub_types
    :
    Review
    ,
    JournalArticle
    ,
    Conference
    ,
    ClinicalTrial
    ,
    MetaAnalysis
    ,
    Dataset
    ,
    Book
    ,
    CaseReport
    ,
    Editorial
    ,
    LettersAndComments
    ,
    News
    ,
    Study
    ,
    BookSection
蛇形命名的关键字参数会自动转换为S2的驼峰命名参数(
fields_of_study
fieldsOfStudy
min_citations
minCitationCount
publication_date
publicationDateOrYear
pub_types
publicationTypes
open_access
openAccessPdf
)。此处使用蛇形命名。
year
,
publication_date
,
venue
,
fields_of_study
,
min_citations
,
pub_types
,
open_access
  • year
    :
    "2020-"
    ,
    "-2019"
    ,
    "2016-2020"
  • publication_date
    :
    "2024-01-01:2024-06-30"
    (YYYY-MM-DD范围,支持开放式区间)
  • pub_types
    :
    Review
    ,
    JournalArticle
    ,
    Conference
    ,
    ClinicalTrial
    ,
    MetaAnalysis
    ,
    Dataset
    ,
    Book
    ,
    CaseReport
    ,
    Editorial
    ,
    LettersAndComments
    ,
    News
    ,
    Study
    ,
    BookSection

Boolean Query Syntax (bulk search only)

布尔查询语法(仅批量搜索可用)

SyntaxExampleMeaning
"..."
"deep learning"
Exact phrase
+
+transformer
Must include
-
-survey
Exclude
|
CNN | RNN
OR
*
neuro*
Prefix wildcard
()
(CNN | RNN) +attention
Grouping
Use
build_bool_query(phrases, required, excluded, or_terms)
to construct safely.
语法示例含义
"..."
"deep learning"
精确短语
+
+transformer
必须包含
-
-survey
排除
`\``CNN \
*
neuro*
前缀通配符
()
`(CNN \RNN) +attention`
使用
build_bool_query(phrases, required, excluded, or_terms)
安全构建查询。

Output Functions

输出函数

FunctionPurpose
format_table(papers, max_rows=30)
Markdown summary table
format_details(papers, max_papers=10)
Detailed entries with TLDR/abstract
format_citations(citations, max_items=10)
Citation envelopes with intent labels + context snippet
format_results(papers, query_desc)
Combined: summary + table + details
format_authors(authors, max_rows=20)
Author table (name, affiliations, h-index)
export_bibtex(papers)
BibTeX entries (requires
citationStyles
field)
export_markdown(papers, query_desc)
Full markdown report saved to file
export_json(papers, path)
JSON export saved to file
deduplicate(papers)
Remove duplicates by paperId
函数用途
format_table(papers, max_rows=30)
Markdown汇总表格
format_details(papers, max_papers=10)
包含TLDR/摘要的详细条目
format_citations(citations, max_items=10)
带意图标签和上下文片段的引用信封
format_results(papers, query_desc)
组合输出:摘要 + 表格 + 详情
format_authors(authors, max_rows=20)
作者表格(姓名、机构、h指数)
export_bibtex(papers)
BibTeX条目(需要
citationStyles
字段)
export_markdown(papers, query_desc)
保存为完整Markdown报告文件
export_json(papers, path)
保存为JSON导出文件
deduplicate(papers)
通过paperId去重

Supported ID Formats

支持的ID格式

DOI:10.1038/...
,
ARXIV:2106.15928
,
PMID:19872477
,
PMCID:PMC2323569
,
CorpusId:215416146
,
ACL:2020.acl-main.447
,
DBLP:conf/acl/...
,
MAG:3015453090
,
URL:https://...
DOI:10.1038/...
,
ARXIV:2106.15928
,
PMID:19872477
,
PMCID:PMC2323569
,
CorpusId:215416146
,
ACL:2020.acl-main.447
,
DBLP:conf/acl/...
,
MAG:3015453090
,
URL:https://...

Paper Fields

论文字段

Default:
title,year,citationCount,authors,venue,externalIds,tldr
Additional:
abstract
,
references
,
citations
,
openAccessPdf
,
publicationDate
,
publicationVenue
,
fieldsOfStudy
,
s2FieldsOfStudy
,
journal
,
isOpenAccess
,
referenceCount
,
influentialCitationCount
,
citationStyles
,
embedding
,
textAvailability
Author fields:
name
,
affiliations
,
paperCount
,
citationCount
,
hIndex
,
homepage
,
externalIds
,
papers
默认字段:
title,year,citationCount,authors,venue,externalIds,tldr
额外字段:
abstract
,
references
,
citations
,
openAccessPdf
,
publicationDate
,
publicationVenue
,
fieldsOfStudy
,
s2FieldsOfStudy
,
journal
,
isOpenAccess
,
referenceCount
,
influentialCitationCount
,
citationStyles
,
embedding
,
textAvailability
作者字段:
name
,
affiliations
,
paperCount
,
citationCount
,
hIndex
,
homepage
,
externalIds
,
papers

Rate Limiting

速率限制

Handled automatically by
s2.py
: 1.1s gap between requests, exponential backoff (2s→4s→8s→16s→32s, max 60s) on 429/504 errors, up to 5 retries.
s2.py
自动处理:请求间隔1.1秒,遇到429/504错误时指数退避(2s→4s→8s→16s→32s,最大60s),最多重试5次。

Troubleshooting

故障排除

ErrorCauseFix
HTTPError 403
Missing or invalid API keyVerify
S2_API_KEY
is set:
echo $S2_API_KEY
HTTPError 429
after 5 retries
Sustained rate limit exceededWait 60s, reduce
max_results
, or split into smaller batches
ModuleNotFoundError: s2
Skill directory not on pathVerify skill is installed at
~/.claude/skills/
,
~/.openclaw/skills/
, or as a Claude Code plugin under
~/.claude/plugins/
ModuleNotFoundError: requests
requests
not installed
pip install requests
or
uv pip install requests
0 results returnedQuery too specific or filters too narrowBroaden query, remove filters, try
search_relevance()
instead of
search_bulk()
KeyError: 'data'
Endpoint returned error objectCheck
r.get("message")
for API error details
tldr
field is empty
Not all papers have TLDRFall back to
abstract
field; bulk search never returns
tldr
错误原因解决方法
HTTPError 403
API密钥缺失或无效确认
S2_API_KEY
已设置:
echo $S2_API_KEY
5次重试后仍出现
HTTPError 429
持续超出速率限制等待60秒,减少
max_results
,或拆分为更小的批次
ModuleNotFoundError: s2
技能目录不在路径中确认技能已安装在
~/.claude/skills/
~/.openclaw/skills/
或Claude Code插件目录
~/.claude/plugins/
ModuleNotFoundError: requests
requests
未安装
pip install requests
uv pip install requests
返回0条结果查询过于具体或筛选器过于严格放宽查询条件,移除筛选器,尝试使用
search_relevance()
替代
search_bulk()
KeyError: 'data'
端点返回错误对象查看
r.get("message")
获取API错误详情
tldr
字段为空
并非所有论文都有TLDR回退使用
abstract
字段;批量搜索从不返回
tldr