semanticscholar-skill
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSemantic Scholar Search Workflow
Semantic Scholar 搜索工作流
Search academic papers via the Semantic Scholar API using a structured 4-phase workflow.
Critical rule: NEVER make multiple sequential Bash calls for API requests. Always write ONE Python script that runs all searches, then execute it once. All rate limiting is handled inside automatically.
s2.py通过结构化的4阶段工作流,使用Semantic Scholar API搜索学术论文。
核心规则: 绝不要进行多次连续的Bash调用来发起API请求。始终编写一个Python脚本执行所有搜索,然后仅运行一次脚本。所有速率限制由自动处理。
s2.pyPhase 1: Understand & Plan
阶段1:理解与规划
Parse the user's intent and choose a search strategy:
解析用户意图并选择搜索策略:
Decision Tree
决策树
| User wants... | Strategy | Function |
|---|---|---|
| Broad topic exploration | Relevance search | |
| Precise technical terms, exact phrases | Bulk search with boolean operators | |
| Specific passages or methods | Snippet search | |
| Known paper by title | Title match | |
| Known paper by DOI/PMID/ArXiv | Direct lookup | |
| Papers citing a known work | Citation traversal | |
| Related to one paper | Single-seed recommendations | |
| Related to multiple papers | Multi-seed recommendations | |
| Find a researcher | Author search | |
| Researcher's profile | Author details | |
| Researcher's publications | Author papers | |
| 用户需求... | 策略 | 函数 |
|---|---|---|
| 宽泛主题探索 | 相关性搜索 | |
| 精准技术术语、精确短语 | 带布尔运算符的批量搜索 | |
| 特定段落或方法 | 片段搜索 | |
| 通过标题查找已知论文 | 标题匹配 | |
| 通过DOI/PMID/ArXiv查找已知论文 | 直接查询 | |
| 引用某篇已知论文的文献 | 引用遍历 | |
| 与某篇论文相关的文献 | 单种子推荐 | |
| 与多篇论文相关的文献 | 多种子推荐 | |
| 查找研究者 | 作者搜索 | |
| 研究者个人资料 | 作者详情 | |
| 研究者发表的论文 | 作者论文列表 | |
Query Construction Rules
查询构建规则
- Ambiguous terms (e.g., "stem cells" could mean mesenchymal or stem-like T cells): Use with exact phrases and exclusions
build_bool_query()- Example:
build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])
- Example:
- Multi-context queries (e.g., "topic X in cancer AND autoimmunity"): Plan separate searches, deduplicate with
deduplicate() - Broad topics: Use with filters (year, venue, fieldsOfStudy, minCitationCount)
search_relevance()
- 歧义术语(例如“stem cells”可能指间充质干细胞或类干细胞T细胞):使用结合精确短语和排除项
build_bool_query()- 示例:
build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])
- 示例:
- 多语境查询(例如“肿瘤与自身免疫中的X主题”):规划单独搜索,使用去重
deduplicate() - 宽泛主题:使用并结合筛选器(年份、期刊、研究领域、最低引用数)
search_relevance()
Plan Filters
筛选器规划
| Filter | Use when |
|---|---|
| Recent work only |
| Precise date range (YYYY-MM-DD) |
| Restrict to domain |
| Only established papers |
| Find reviews/meta-analyses |
| Clinical trials only |
| Only open access papers |
Checkpoint: Before proceeding, verify: (1) search strategy matches user intent, (2) filters are appropriate, (3) query is specific enough to avoid irrelevant results.
| 筛选器 | 使用场景 |
|---|---|
| 仅查找近期成果 |
| 精确日期范围(YYYY-MM-DD) |
| 限定领域范围 |
| 仅查找已确立影响力的论文 |
| 查找综述/荟萃分析 |
| 仅查找临床试验 |
| 仅查找开放获取论文 |
检查点: 继续执行前,请确认:(1) 搜索策略匹配用户意图,(2) 筛选器设置合理,(3) 查询足够具体以避免无关结果。
Phase 2: Execute Search
阶段2:执行搜索
Write ONE Python script that begins with the standard prelude below, then runs all searches:
python
undefined编写一个Python脚本,开头包含以下标准前置代码,然后运行所有搜索:
python
undefined--- Standard prelude (use in every script) ---
--- Standard prelude (use in every script) ---
import sys, os, glob
_candidates = [
os.path.expanduser("/.claude/skills/semanticscholar-skill"),
os.path.expanduser("/.openclaw/skills/semanticscholar-skill"),
*glob.glob(os.path.expanduser("/.claude/plugins/**/semanticscholar-skill"), recursive=True),
*glob.glob(os.path.expanduser("/.codex/skills/semanticscholar-skill")),
".",
]
SKILL_DIR = next((p for p in _candidates if os.path.isfile(os.path.join(p, "s2.py"))), None)
if SKILL_DIR is None:
raise RuntimeError("Cannot locate semanticscholar-skill (s2.py not found)")
sys.path.insert(0, SKILL_DIR)
from s2 import *
import sys, os, glob
_candidates = [
os.path.expanduser("/.claude/skills/semanticscholar-skill"),
os.path.expanduser("/.openclaw/skills/semanticscholar-skill"),
*glob.glob(os.path.expanduser("/.claude/plugins/**/semanticscholar-skill"), recursive=True),
*glob.glob(os.path.expanduser("/.codex/skills/semanticscholar-skill")),
".",
]
SKILL_DIR = next((p for p in _candidates if os.path.isfile(os.path.join(p, "s2.py"))), None)
if SKILL_DIR is None:
raise RuntimeError("Cannot locate semanticscholar-skill (s2.py not found)")
sys.path.insert(0, SKILL_DIR)
from s2 import *
--- end prelude ---
--- end prelude ---
Build precise query
Build precise query
q = build_bool_query(
phrases=["stem-like T cells"],
required=["CD4", "IBD"],
excluded=["mesenchymal"]
)
papers = search_bulk(q, max_results=30, year="2018-", fields_of_study="Medicine")
papers = deduplicate(papers)
print(format_results(papers, "Stem-like CD4 T cells in IBD"))
Save to `/tmp/s2_search.py`, then run with `python3 /tmp/s2_search.py` in a single Bash call. Rate limiting, retries, and backoff are automatic inside `s2.py`.
**Checkpoint:** Verify the script ran successfully (no exceptions) and returned results. If 0 results, broaden the query or relax filters before presenting.q = build_bool_query(
phrases=["stem-like T cells"],
required=["CD4", "IBD"],
excluded=["mesenchymal"]
)
papers = search_bulk(q, max_results=30, year="2018-", fields_of_study="Medicine")
papers = deduplicate(papers)
print(format_results(papers, "Stem-like CD4 T cells in IBD"))
保存至`/tmp/s2_search.py`,然后通过单次Bash调用`python3 /tmp/s2_search.py`运行。速率限制、重试和退避机制由`s2.py`自动处理。
**检查点:** 确认脚本运行成功(无异常)并返回结果。如果返回0条结果,请在展示前放宽查询条件或筛选器。Worked Examples
示例实践
Each example below assumes the standard prelude from Phase 2 is at the top of the script.
Example 1: Author workflow — "Find papers by Yann LeCun on self-supervised learning"
python
authors = search_authors("Yann LeCun", max_results=5)
print(format_authors(authors))以下每个示例均假设脚本顶部包含阶段2中的标准前置代码。
示例1:作者工作流 — "查找Yann LeCun关于自监督学习的论文"
python
authors = search_authors("Yann LeCun", max_results=5)
print(format_authors(authors))Use the first match's ID to get their papers
使用第一个匹配结果的ID获取其论文
author_id = authors[0]["authorId"]
papers = get_author_papers(author_id, max_results=50)
author_id = authors[0]["authorId"]
papers = get_author_papers(author_id, max_results=50)
Filter locally for topic
本地筛选主题
ssl_papers = [p for p in papers if "self-supervised" in (p.get("title") or "").lower()]
print(format_results(ssl_papers, "Yann LeCun - Self-Supervised Learning"))
**Example 2: Citation chain with intent** — "Who cited the Transformer paper and how did they use it?"
```python
paper = get_paper("DOI:10.48550/arXiv.1706.03762")
print(f"Title: {paper['title']}, Citations: {paper['citationCount']}")ssl_papers = [p for p in papers if "self-supervised" in (p.get("title") or "").lower()]
print(format_results(ssl_papers, "Yann LeCun - Self-Supervised Learning"))
**示例2:带意图的引用链** — "谁引用了Transformer论文,以及他们如何使用该模型?"
```python
paper = get_paper("DOI:10.48550/arXiv.1706.03762")
print(f"Title: {paper['title']}, Citations: {paper['citationCount']}")Citation envelopes carry contextsWithIntent — keep them, don't flatten.
引用信封包含contextsWithIntent — 保留完整结构,不要扁平化。
citing = get_citations(paper["paperId"], max_results=50)
citing.sort(key=lambda c: (c.get("citingPaper") or {}).get("citationCount", 0), reverse=True)
print(format_citations(citing, max_items=10)) # renders intent labels + context snippet
**Example 3: Multi-seed recommendations with BibTeX export** — "Find papers like these two but not about NLP"
```python
recs = recommend(
positive_ids=["DOI:10.1038/nature14539", "ARXIV:2010.11929"],
negative_ids=["ARXIV:1706.03762"],
limit=20
)
print(format_results(recs, "Vision papers like Deep Learning & ViT, excluding NLP"))citing = get_citations(paper["paperId"], max_results=50)
citing.sort(key=lambda c: (c.get("citingPaper") or {}).get("citationCount", 0), reverse=True)
print(format_citations(citing, max_items=10)) # 渲染意图标签 + 上下文片段
**示例3:多种子推荐与BibTeX导出** — "查找类似这两篇但与NLP无关的论文"
```python
recs = recommend(
positive_ids=["DOI:10.1038/nature14539", "ARXIV:2010.11929"],
negative_ids=["ARXIV:1706.03762"],
limit=20
)
print(format_results(recs, "Vision papers like Deep Learning & ViT, excluding NLP"))Export BibTeX for top results
导出前10条结果的BibTeX
bib_data = batch_papers([r["paperId"] for r in recs[:10]], fields="title,citationStyles")
print(export_bibtex(bib_data))
undefinedbib_data = batch_papers([r["paperId"] for r in recs[:10]], fields="title,citationStyles")
print(export_bibtex(bib_data))
undefinedPhase 3: Summarize & Present
阶段3:总结与展示
- Use for consistent output (summary table + top-10 details)
format_results() - If user's language is Chinese, present summaries in Chinese
- Always note total results count and search strategy used
- Highlight most relevant papers based on the user's specific question
- 使用生成一致的输出(汇总表格 + 前10条详情)
format_results() - 如果用户使用中文,以中文展示摘要
- 始终注明结果总数和所使用的搜索策略
- 根据用户的具体问题高亮最相关的论文
Phase 4: User Interaction Loop
阶段4:用户交互循环
After presenting results, always offer these options:
- Translate — titles/summaries to Chinese (or other language)
- Details — full abstract for specific paper numbers
- Refine — narrow or expand search with different terms/filters
- Similar — find papers similar to a specific result ()
find_similar() - Citations — who cited a specific paper and how (+
get_citations()for intent labels)format_citations() - Export — save results via ,
export_bibtex(), orexport_markdown()export_json() - Done — end search session
Loop until user says done. Each follow-up uses the same single-script pattern.
展示结果后,始终提供以下选项:
- 翻译 — 将标题/摘要翻译成中文(或其他语言)
- 详情 — 获取特定论文编号的完整摘要
- 优化 — 使用不同术语/筛选器缩小或扩大搜索范围
- 相似文献 — 查找与特定结果相似的论文()
find_similar() - 引用情况 — 查看谁引用了特定论文及引用意图(+
get_citations()生成意图标签)format_citations() - 导出 — 通过、
export_bibtex()或export_markdown()保存结果export_json() - 完成 — 结束搜索会话
循环直到用户表示完成。每次后续操作均遵循单脚本模式。
API Quick Reference
API快速参考
Helper Module (s2.py
)
s2.py辅助模块(s2.py
)
s2.pyUse the standard prelude from Phase 2 at the top of every script. Then call any of the functions below — the module's docstring ( or read ) lists each by phase with one-line summaries.
help(s2)s2.py在每个脚本顶部使用阶段2中的标准前置代码。然后调用以下任意函数 — 模块的文档字符串(或阅读)按阶段列出了每个函数及其单行摘要。
help(s2)s2.pyPaper Search Functions
论文搜索函数
| Function | Purpose | Max Results |
|---|---|---|
| Simple broad search | 1,000 |
| Boolean precise search | 10,000,000 |
| Full-text passage search | 1,000 |
| Exact title match | 1 |
| Query-completion suggestions | — |
| Single paper details | — |
| Who cited this | 10,000 |
| What this cites | 10,000 |
| Single-seed recommendations | 500 |
| Multi-seed recommendations | 500 |
| Batch lookup (≤500) | — |
| 函数 | 用途 | 最大结果数 |
|---|---|---|
| 简单宽泛搜索 | 1,000 |
| 布尔精准搜索 | 10,000,000 |
| 全文段落搜索 | 1,000 |
| 精确标题匹配 | 1 |
| 查询补全建议 | — |
| 单篇论文详情 | — |
| 引用该论文的文献 | 10,000 |
| 该论文引用的文献 | 10,000 |
| 单种子推荐 | 500 |
| 多种子推荐 | 500 |
| 批量查询(≤500) | — |
Author Functions
作者相关函数
| Function | Purpose | Max Results |
|---|---|---|
| Find researchers by name | 1,000 |
| Author profile (affiliations, h-index) | — |
| Author's publications | 10,000 |
| Paper's author list | 1,000 |
| Batch author lookup (≤1000) | — |
| 函数 | 用途 | 最大结果数 |
|---|---|---|
| 通过姓名查找研究者 | 1,000 |
| 作者资料(机构、h指数) | — |
| 作者发表的论文 | 10,000 |
| 论文的作者列表 | 1,000 |
| 批量作者查询(≤1000) | — |
Filter Parameters (kwargs)
筛选器参数(关键字参数)
snake_case kwargs are translated to S2 camelCase params automatically ( → , → , → , → , → ). Use snake_case here.
fields_of_studyfieldsOfStudymin_citationsminCitationCountpublication_datepublicationDateOrYearpub_typespublicationTypesopen_accessopenAccessPdfyearpublication_datevenuefields_of_studymin_citationspub_typesopen_access- :
year,"2020-","-2019""2016-2020" - :
publication_date(YYYY-MM-DD range, open-ended OK)"2024-01-01:2024-06-30" - :
pub_types,Review,JournalArticle,Conference,ClinicalTrial,MetaAnalysis,Dataset,Book,CaseReport,Editorial,LettersAndComments,News,StudyBookSection
蛇形命名的关键字参数会自动转换为S2的驼峰命名参数( → 、 → 、 → 、 → 、 → )。此处使用蛇形命名。
fields_of_studyfieldsOfStudymin_citationsminCitationCountpublication_datepublicationDateOrYearpub_typespublicationTypesopen_accessopenAccessPdfyearpublication_datevenuefields_of_studymin_citationspub_typesopen_access- :
year,"2020-","-2019""2016-2020" - :
publication_date(YYYY-MM-DD范围,支持开放式区间)"2024-01-01:2024-06-30" - :
pub_types,Review,JournalArticle,Conference,ClinicalTrial,MetaAnalysis,Dataset,Book,CaseReport,Editorial,LettersAndComments,News,StudyBookSection
Boolean Query Syntax (bulk search only)
布尔查询语法(仅批量搜索可用)
| Syntax | Example | Meaning |
|---|---|---|
| | Exact phrase |
| | Must include |
| | Exclude |
| | OR |
| | Prefix wildcard |
| | Grouping |
Use to construct safely.
build_bool_query(phrases, required, excluded, or_terms)| 语法 | 示例 | 含义 |
|---|---|---|
| | 精确短语 |
| | 必须包含 |
| | 排除 |
| `\ | ` | `CNN \ |
| | 前缀通配符 |
| `(CNN \ | RNN) +attention` |
使用安全构建查询。
build_bool_query(phrases, required, excluded, or_terms)Output Functions
输出函数
| Function | Purpose |
|---|---|
| Markdown summary table |
| Detailed entries with TLDR/abstract |
| Citation envelopes with intent labels + context snippet |
| Combined: summary + table + details |
| Author table (name, affiliations, h-index) |
| BibTeX entries (requires |
| Full markdown report saved to file |
| JSON export saved to file |
| Remove duplicates by paperId |
| 函数 | 用途 |
|---|---|
| Markdown汇总表格 |
| 包含TLDR/摘要的详细条目 |
| 带意图标签和上下文片段的引用信封 |
| 组合输出:摘要 + 表格 + 详情 |
| 作者表格(姓名、机构、h指数) |
| BibTeX条目(需要 |
| 保存为完整Markdown报告文件 |
| 保存为JSON导出文件 |
| 通过paperId去重 |
Supported ID Formats
支持的ID格式
DOI:10.1038/...ARXIV:2106.15928PMID:19872477PMCID:PMC2323569CorpusId:215416146ACL:2020.acl-main.447DBLP:conf/acl/...MAG:3015453090URL:https://...DOI:10.1038/...ARXIV:2106.15928PMID:19872477PMCID:PMC2323569CorpusId:215416146ACL:2020.acl-main.447DBLP:conf/acl/...MAG:3015453090URL:https://...Paper Fields
论文字段
Default:
title,year,citationCount,authors,venue,externalIds,tldrAdditional: , , , , , , , , , , , , , ,
abstractreferencescitationsopenAccessPdfpublicationDatepublicationVenuefieldsOfStudys2FieldsOfStudyjournalisOpenAccessreferenceCountinfluentialCitationCountcitationStylesembeddingtextAvailabilityAuthor fields: , , , , , , ,
nameaffiliationspaperCountcitationCounthIndexhomepageexternalIdspapers默认字段:
title,year,citationCount,authors,venue,externalIds,tldr额外字段:, , , , , , , , , , , , , ,
abstractreferencescitationsopenAccessPdfpublicationDatepublicationVenuefieldsOfStudys2FieldsOfStudyjournalisOpenAccessreferenceCountinfluentialCitationCountcitationStylesembeddingtextAvailability作者字段:, , , , , , ,
nameaffiliationspaperCountcitationCounthIndexhomepageexternalIdspapersRate Limiting
速率限制
Handled automatically by : 1.1s gap between requests, exponential backoff (2s→4s→8s→16s→32s, max 60s) on 429/504 errors, up to 5 retries.
s2.py由自动处理:请求间隔1.1秒,遇到429/504错误时指数退避(2s→4s→8s→16s→32s,最大60s),最多重试5次。
s2.pyTroubleshooting
故障排除
| Error | Cause | Fix |
|---|---|---|
| Missing or invalid API key | Verify |
| Sustained rate limit exceeded | Wait 60s, reduce |
| Skill directory not on path | Verify skill is installed at |
| | |
| 0 results returned | Query too specific or filters too narrow | Broaden query, remove filters, try |
| Endpoint returned error object | Check |
| Not all papers have TLDR | Fall back to |
| 错误 | 原因 | 解决方法 |
|---|---|---|
| API密钥缺失或无效 | 确认 |
5次重试后仍出现 | 持续超出速率限制 | 等待60秒,减少 |
| 技能目录不在路径中 | 确认技能已安装在 |
| | |
| 返回0条结果 | 查询过于具体或筛选器过于严格 | 放宽查询条件,移除筛选器,尝试使用 |
| 端点返回错误对象 | 查看 |
| 并非所有论文都有TLDR | 回退使用 |