literature-search-openalex
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenAlex Skill
OpenAlex Skill
Prerequisites
前提条件
-
: Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv -
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://developers.openalex.org/ and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.
-
file: Make sure the
.envfile exists in your home directory. Create one if it does not exist..env -
(optional but recommended): Enables the OpenAlex Premium API with higher rate limits. The skill works without it (using the free "polite pool"). If the variable is missing from
OPENALEX_API_KEY, do NOT ask the user to paste it into the chat (this would leak the key into the agent's context). Instead, give the user this command — substituting.envwith the resolved literal path to theENV_FILEfile:.envbashprintf "Enter OpenAlex API key (typing hidden): " && read -s key && echo && echo "OPENALEX_API_KEY=$key" >> "ENV_FILE" && echo "Saved."The scripts load credentials automatically via. NEVER read, print, or inspect thedotenvfile or its variables (e.g. no.env,cat,grep,echo, orprintenvon keys). Credentials must stay out of the agent's context. See the Rate Limits section for more details.os.environ.get
-
: 阅读
uv技能文档并按照其设置说明操作,确保uv已安装且在PATH中。uv -
用户通知: 如果本技能目录中不存在LICENSE_NOTIFICATION.txt文件,则(1) 显眼地通知用户查看https://developers.openalex.org/上的条款,并始终检查该技能获取的论文的许可证是否存在限制,然后(2) 创建记录通知文本和时间戳的文件。
-
文件: 确保你的主目录中存在
.env文件,若不存在则创建一个。.env -
(可选但推荐):启用OpenAlex高级API,获得更高的速率限制。即使没有该密钥,技能也能正常工作(使用免费的“礼貌池”)。如果
OPENALEX_API_KEY文件中缺少该变量,请勿要求用户在聊天中粘贴密钥(这会导致密钥泄露到Agent的上下文环境中)。取而代之,向用户提供以下命令——将.env替换为ENV_FILE文件的实际路径:.envbashprintf "Enter OpenAlex API key (typing hidden): " && read -s key && echo && echo "OPENALEX_API_KEY=$key" >> "ENV_FILE" && echo "Saved."脚本会通过自动加载凭据。绝对不要读取、打印或检查dotenv文件或其变量(例如不要使用.env、cat、grep、echo或printenv获取密钥)。凭据必须远离Agent的上下文环境。更多细节请查看速率限制部分。os.environ.get
Core Rules
核心规则
- List Sources. If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.
- Resolve before filter. NEVER filter by name. Always a name to an ID first, then use that ID in
resolve.--filter - Use the CLI only. Never call the API via /
curl. The CLI handles retries and rate limiting.urllib - No fabrication. Never invent OpenAlex IDs or DOIs. Use /
resolveto look them up. Report empty results accurately.get - API key. If a command returns 401/429 or you need high-volume queries,
follow the prerequisite instructions above to help the user add
to the
OPENALEX_API_KEYfile. Keys are at OpenAlex.org → account settings..env - Keep output small. Always use and
--selectfor overview queries. Pipe--per-page 5–10output to a file (filter), then slim with> results.jsonbefore reading into context.jq
- 列出来源:如果使用了本技能,务必在输出中提及这一点,并列出所有用于生成输出的论文的URL。
- 先解析再过滤:绝不按名称过滤。始终先将名称解析为ID,然后在
resolve中使用该ID。--filter - 仅使用CLI:绝不要通过/
curl调用API。CLI会处理重试和速率限制。urllib - 禁止编造:绝不编造OpenAlex ID或DOI。使用/
resolve进行查询。准确报告空结果。get - API密钥:如果命令返回401/429错误,或者需要执行大量查询,请按照上述前提条件的说明帮助用户将添加到
OPENALEX_API_KEY文件中。可在OpenAlex.org → 账户设置中获取密钥。.env - 精简输出:对于概览查询,始终使用和
--select参数。将--per-page 5–10的输出管道传输到文件(filter),然后使用> results.json精简内容后再读取到上下文环境中。jq
Rate Limits
速率限制
- With key: ~10 req/s, $1/day free budget.
- Without key: Very limited, $0.01/day budget.
| Operation | Cost |
|---|---|
Singleton | Free |
| $0.0001 |
| $0.001 |
| $0.01 |
- 有密钥:约10次请求/秒,每日免费额度1美元。
- 无密钥:限制严格,每日额度0.01美元。
| 操作类型 | 成本 |
|---|---|
单次 | 免费 |
| 0.0001美元 |
| 0.001美元 |
| 0.01美元 |
CLI Reference
CLI参考
uv run scripts/openalex_cli.py [--api-key KEY] <command> [flags]Entity types (shared across commands): , , ,
, , , , , , ,
, , , , , ,
, ,
worksauthorssourcesinstitutionstopicsdomainsfieldssubfieldssdgscountriescontinentslanguageskeywordspublishersfunderswork-typessource-typesinstitution-typeslicensesuv run scripts/openalex_cli.py [--api-key KEY] <command> [flags]实体类型(所有命令通用):, , ,
, , , , , , ,
, , , , , ,
, ,
worksauthorssourcesinstitutionstopicsdomainsfieldssubfieldssdgscountriescontinentslanguageskeywordspublishersfunderswork-typessource-typesinstitution-typeslicensesCommands
命令
resolve — Name → ID candidates. Returns ,
, . Use for more candidates.
<entity> <query>iddisplay_namehint--per-page Nget — Full metadata for one entity. Accepts short ID
(), full URL, or DOI URL. Use to limit fields.
<entity> <id>W2741809807--selectfilter — Search/filter entities. Key flags are:
<entity>- : Full-text search (10× cost of
--search <query>)--filter - : Filter expressions. Use
--filter <expr>for AND and,for OR.| - : Sort results (e.g.,
--sort <field:dir>)cited_by_count:desc - : Limit the fields returned in the output.
--select <fields> - : Aggregate results by a specific field.
--group-by <field> - : Number of results per page (default 25, max 100).
--per-page <N> - : Specify the page number to retrieve.
--page <N> - : Get a random sample of up to 10,000 results.
--sample <N> - : Seed for reproducible sampling.
--seed <N>
download-pdf — Download PDF (requires API key).
Falls back to alternative locations if primary fails. Whenever you
download a PDF, verify it is not empty or corrupted.
<work-id> <output-path>pdf_urlrate-limit — Check current rate limit status (requires API key).
resolve — 将名称转换为ID候选结果。返回、
、。使用获取更多候选结果。
<entity> <query>iddisplay_namehint--per-page Nget — 获取单个实体的完整元数据。接受短ID()、完整URL或DOI URL。使用限制返回字段。
<entity> <id>W2741809807--selectfilter — 搜索/过滤实体。主要参数包括:
<entity>- : 全文搜索(成本为
--search <query>的10倍)--filter - : 过滤表达式。使用
--filter <expr>表示AND,,表示OR。| - : 对结果排序(例如
--sort <field:dir>)cited_by_count:desc - : 限制输出返回的字段。
--select <fields> - : 按指定字段聚合结果。
--group-by <field> - : 每页结果数量(默认25,最大100)。
--per-page <N> - : 指定要获取的页码。
--page <N> - : 获取最多10,000条结果的随机样本。
--sample <N> - : 用于可复现抽样的种子值。
--seed <N>
download-pdf — 下载PDF(需要API密钥)。如果主路径失败,会自动尝试备用地址。每次下载PDF后,需验证文件非空且未损坏。
<work-id> <output-path>pdf_urlrate-limit — 检查当前速率限制状态(需要API密钥)。
Search Tips
搜索技巧
- If returns no matches, try alternate spellings or abbreviations.
resolve - If returns 0 results, try broader terms (max 3 retries).
--search - If returns multiple candidates, present them to the user with
resolveanddisplay_namefor manual selection.hint
- 如果未返回匹配结果,尝试使用其他拼写或缩写。
resolve - 如果返回0条结果,尝试更宽泛的术语(最多重试3次)。
--search - 如果返回多个候选结果,将带有
resolve和display_name的候选结果呈现给用户,供其手动选择。hint
Entity References
实体参考
Consult for valid filter, sort, and group-by fields per entity:
references/- Works — Authors — Sources
- Institutions — Topics — Taxonomy
- Geo & Language — Publishers & Funders
- Type Values
请查看目录,获取每个实体对应的有效过滤、排序和分组字段:
references/- Works — Authors — Sources
- Institutions — Topics — Taxonomy
- Geo & Language — Publishers & Funders
- Type Values
Common Workflows
常见工作流
bash
undefinedbash
undefinedAuthor's works (resolve → filter)
作者发表成果(解析→过滤)
uv run scripts/openalex_cli.py resolve authors "Geoffrey Hinton"
uv run scripts/openalex_cli.py filter works
--filter "authorships.author.id:A5108093963"
--sort "cited_by_count:desc" --per-page 10 > papers.json cat papers.json | jq '[.results[] | {id, title: .display_name, year: .publication_year, citations: .cited_by_count}]'
--filter "authorships.author.id:A5108093963"
--sort "cited_by_count:desc" --per-page 10 > papers.json cat papers.json | jq '[.results[] | {id, title: .display_name, year: .publication_year, citations: .cited_by_count}]'
uv run scripts/openalex_cli.py resolve authors "Geoffrey Hinton"
uv run scripts/openalex_cli.py filter works
--filter "authorships.author.id:A5108093963"
--sort "cited_by_count:desc" --per-page 10 > papers.json cat papers.json | jq '[.results[] | {id, title: .display_name, year: .publication_year, citations: .cited_by_count}]'
--filter "authorships.author.id:A5108093963"
--sort "cited_by_count:desc" --per-page 10 > papers.json cat papers.json | jq '[.results[] | {id, title: .display_name, year: .publication_year, citations: .cited_by_count}]'
DOI lookup
DOI查询
uv run scripts/openalex_cli.py get works "https://doi.org/10.1038/s41586-021-03819-2"
uv run scripts/openalex_cli.py get works "https://doi.org/10.1038/s41586-021-03819-2"
Bulk DOI lookup (up to 100)
批量DOI查询(最多100个)
uv run scripts/openalex_cli.py filter works
--filter "doi:10.1234/a|10.1234/b|10.1234/c" --per-page 100 > results.json
--filter "doi:10.1234/a|10.1234/b|10.1234/c" --per-page 100 > results.json
uv run scripts/openalex_cli.py filter works
--filter "doi:10.1234/a|10.1234/b|10.1234/c" --per-page 100 > results.json
--filter "doi:10.1234/a|10.1234/b|10.1234/c" --per-page 100 > results.json
Institutional impact by year
机构年度影响力统计
uv run scripts/openalex_cli.py resolve institutions "MIT"
uv run scripts/openalex_cli.py filter works
--filter "authorships.institutions.id:I63966007"
--group-by "publication_year" > mit_by_year.json
--filter "authorships.institutions.id:I63966007"
--group-by "publication_year" > mit_by_year.json
uv run scripts/openalex_cli.py resolve institutions "MIT"
uv run scripts/openalex_cli.py filter works
--filter "authorships.institutions.id:I63966007"
--group-by "publication_year" > mit_by_year.json
--filter "authorships.institutions.id:I63966007"
--group-by "publication_year" > mit_by_year.json
Random sample
随机样本
uv run scripts/openalex_cli.py filter works
--filter "publication_year:2023,is_oa:true"
--sample 100 --seed 42 > results.json
--filter "publication_year:2023,is_oa:true"
--sample 100 --seed 42 > results.json
undefineduv run scripts/openalex_cli.py filter works
--filter "publication_year:2023,is_oa:true"
--sample 100 --seed 42 > results.json
--filter "publication_year:2023,is_oa:true"
--sample 100 --seed 42 > results.json
undefinedError Handling
错误处理
| Code | Meaning | Action |
|---|---|---|
| 401 | Unauthorized | Help user add API key to |
| 403 | Plan upgrade needed | Inform user; see https://openalex.org/pricing |
| 404 | Not found | Verify ID; try |
| 429 | Rate limited | Wait and retry; suggest adding API key to |
Known premium-only filters: , .
from_updated_dateto_updated_dateNever fabricate results on empty responses — report accurately and suggest
alternate search terms.
| 错误码 | 含义 | 处理操作 |
|---|---|---|
| 401 | 未授权 | 帮助用户将API密钥添加到 |
| 403 | 需要升级套餐 | 告知用户;查看https://openalex.org/pricing |
| 404 | 未找到 | 验证ID;先尝试使用 |
| 429 | 速率受限 | 等待后重试;建议将API密钥添加到 |
已知仅高级版可用的过滤条件:, 。
from_updated_dateto_updated_date在返回空结果时绝不编造内容——准确报告结果,并建议用户尝试其他搜索术语。