Semantic Scholar Search Workflow
Search academic papers via the Semantic Scholar API using a structured 4-phase workflow.
Critical rule: NEVER make multiple sequential Bash calls for API requests. Always write ONE Python script that runs all searches, then execute it once. All rate limiting is handled inside
automatically.
Phase 1: Understand & Plan
Parse the user's intent and choose a search strategy:
Decision Tree
| User wants... | Strategy | Function |
|---|
| Broad topic exploration | Relevance search | |
| Precise technical terms, exact phrases | Bulk search with boolean operators | with |
| Specific passages or methods | Snippet search | |
| Known paper by title | Title match | |
| Known paper by DOI/PMID/ArXiv | Direct lookup | |
| Papers citing a known work | Citation traversal | |
| Related to one paper | Single-seed recommendations | |
| Related to multiple papers | Multi-seed recommendations | |
| Find a researcher | Author search | |
| Researcher's profile | Author details | |
| Researcher's publications | Author papers | |
Query Construction Rules
- Ambiguous terms (e.g., "stem cells" could mean mesenchymal or stem-like T cells): Use with exact phrases and exclusions
- Example:
build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])
- Multi-context queries (e.g., "topic X in cancer AND autoimmunity"): Plan separate searches, deduplicate with
- Broad topics: Use with filters (year, venue, fieldsOfStudy, minCitationCount)
Plan Filters
| Filter | Use when |
|---|
| Recent work only |
publication_date="2024-01-01:2024-06-30"
| Precise date range (YYYY-MM-DD) |
fields_of_study="Medicine"
| Restrict to domain |
| Only established papers |
| Find reviews/meta-analyses |
pub_types="ClinicalTrial"
| Clinical trials only |
| Only open access papers |
Checkpoint: Before proceeding, verify: (1) search strategy matches user intent, (2) filters are appropriate, (3) query is specific enough to avoid irrelevant results.
Phase 2: Execute Search
Write ONE Python script that begins with the standard prelude below, then runs all searches:
python
# --- Standard prelude (use in every script) ---
import sys, os, glob
_candidates = [
os.path.expanduser("~/.claude/skills/semanticscholar-skill"),
os.path.expanduser("~/.openclaw/skills/semanticscholar-skill"),
*glob.glob(os.path.expanduser("~/.claude/plugins/**/semanticscholar-skill"), recursive=True),
*glob.glob(os.path.expanduser("~/.codex/skills/semanticscholar-skill")),
".",
]
SKILL_DIR = next((p for p in _candidates if os.path.isfile(os.path.join(p, "s2.py"))), None)
if SKILL_DIR is None:
raise RuntimeError("Cannot locate semanticscholar-skill (s2.py not found)")
sys.path.insert(0, SKILL_DIR)
from s2 import *
# --- end prelude ---
# Build precise query
q = build_bool_query(
phrases=["stem-like T cells"],
required=["CD4", "IBD"],
excluded=["mesenchymal"]
)
papers = search_bulk(q, max_results=30, year="2018-", fields_of_study="Medicine")
papers = deduplicate(papers)
print(format_results(papers, "Stem-like CD4 T cells in IBD"))
Save to
, then run with
python3 /tmp/s2_search.py
in a single Bash call. Rate limiting, retries, and backoff are automatic inside
.
Checkpoint: Verify the script ran successfully (no exceptions) and returned results. If 0 results, broaden the query or relax filters before presenting.
Worked Examples
Each example below assumes the standard prelude from Phase 2 is at the top of the script.
Example 1: Author workflow — "Find papers by Yann LeCun on self-supervised learning"
python
authors = search_authors("Yann LeCun", max_results=5)
print(format_authors(authors))
# Use the first match's ID to get their papers
author_id = authors[0]["authorId"]
papers = get_author_papers(author_id, max_results=50)
# Filter locally for topic
ssl_papers = [p for p in papers if "self-supervised" in (p.get("title") or "").lower()]
print(format_results(ssl_papers, "Yann LeCun - Self-Supervised Learning"))
Example 2: Citation chain with intent — "Who cited the Transformer paper and how did they use it?"
python
paper = get_paper("DOI:10.48550/arXiv.1706.03762")
print(f"Title: {paper['title']}, Citations: {paper['citationCount']}")
# Citation envelopes carry contextsWithIntent — keep them, don't flatten.
citing = get_citations(paper["paperId"], max_results=50)
citing.sort(key=lambda c: (c.get("citingPaper") or {}).get("citationCount", 0), reverse=True)
print(format_citations(citing, max_items=10)) # renders intent labels + context snippet
Example 3: Multi-seed recommendations with BibTeX export — "Find papers like these two but not about NLP"
python
recs = recommend(
positive_ids=["DOI:10.1038/nature14539", "ARXIV:2010.11929"],
negative_ids=["ARXIV:1706.03762"],
limit=20
)
print(format_results(recs, "Vision papers like Deep Learning & ViT, excluding NLP"))
# Export BibTeX for top results
bib_data = batch_papers([r["paperId"] for r in recs[:10]], fields="title,citationStyles")
print(export_bibtex(bib_data))
Phase 3: Summarize & Present
- Use for consistent output (summary table + top-10 details)
- If user's language is Chinese, present summaries in Chinese
- Always note total results count and search strategy used
- Highlight most relevant papers based on the user's specific question
Phase 4: User Interaction Loop
After presenting results, always offer these options:
- Translate — titles/summaries to Chinese (or other language)
- Details — full abstract for specific paper numbers
- Refine — narrow or expand search with different terms/filters
- Similar — find papers similar to a specific result ()
- Citations — who cited a specific paper and how ( + for intent labels)
- Export — save results via , , or
- Done — end search session
Loop until user says done. Each follow-up uses the same single-script pattern.
API Quick Reference
Helper Module ()
Use the
standard prelude from Phase 2 at the top of every script. Then call any of the functions below — the module's docstring (
or read
) lists each by phase with one-line summaries.
Paper Search Functions
| Function | Purpose | Max Results |
|---|
search_relevance(query, **filters)
| Simple broad search | 1,000 |
search_bulk(query, sort=..., **filters)
| Boolean precise search | 10,000,000 |
search_snippets(query, paper_ids=, authors=, inserted_before=, **filters)
| Full-text passage search | 1,000 |
| Exact title match | 1 |
paper_autocomplete(query)
| Query-completion suggestions | — |
| Single paper details | — |
get_citations(paper_id, max_results, publication_date=)
| Who cited this | 10,000 |
get_references(paper_id, max_results)
| What this cites | 10,000 |
find_similar(paper_id, limit, pool)
| Single-seed recommendations | 500 |
recommend(positive_ids, negative_ids, limit)
| Multi-seed recommendations | 500 |
batch_papers(ids, fields)
| Batch lookup (≤500) | — |
Author Functions
| Function | Purpose | Max Results |
|---|
search_authors(query, max_results)
| Find researchers by name | 1,000 |
| Author profile (affiliations, h-index) | — |
get_author_papers(author_id, max_results, publication_date=)
| Author's publications | 10,000 |
get_paper_authors(paper_id, max_results)
| Paper's author list | 1,000 |
batch_authors(ids, fields)
| Batch author lookup (≤1000) | — |
Filter Parameters (kwargs)
snake_case kwargs are translated to S2 camelCase params automatically (
→
,
→
,
→
,
→
,
→
). Use snake_case here.
- : , ,
- : (YYYY-MM-DD range, open-ended OK)
- : , , , , , , , , , , , ,
Boolean Query Syntax (bulk search only)
| Syntax | Example | Meaning |
|---|
| | Exact phrase |
| | Must include |
| | Exclude |
| | OR |
| | Prefix wildcard |
| | Grouping |
Use
build_bool_query(phrases, required, excluded, or_terms)
to construct safely.
Output Functions
| Function | Purpose |
|---|
format_table(papers, max_rows=30)
| Markdown summary table |
format_details(papers, max_papers=10)
| Detailed entries with TLDR/abstract |
format_citations(citations, max_items=10)
| Citation envelopes with intent labels + context snippet |
format_results(papers, query_desc)
| Combined: summary + table + details |
format_authors(authors, max_rows=20)
| Author table (name, affiliations, h-index) |
| BibTeX entries (requires field) |
export_markdown(papers, query_desc)
| Full markdown report saved to file |
export_json(papers, path)
| JSON export saved to file |
| Remove duplicates by paperId |
Supported ID Formats
Paper Fields
Default:
title,year,citationCount,authors,venue,externalIds,tldr
Additional:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Author fields:
,
,
,
,
,
,
,
Rate Limiting
Handled automatically by
: 1.1s gap between requests, exponential backoff (2s→4s→8s→16s→32s, max 60s) on 429/504 errors, up to 5 retries.
Troubleshooting
| Error | Cause | Fix |
|---|
| Missing or invalid API key | Verify is set: |
| after 5 retries | Sustained rate limit exceeded | Wait 60s, reduce , or split into smaller batches |
| Skill directory not on path | Verify skill is installed at , , or as a Claude Code plugin under |
ModuleNotFoundError: requests
| not installed | or |
| 0 results returned | Query too specific or filters too narrow | Broaden query, remove filters, try instead of |
| Endpoint returned error object | Check for API error details |
| field is empty | Not all papers have TLDR | Fall back to field; bulk search never returns |