literature-search-openalex

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenAlex Skill

OpenAlex Skill

Prerequisites

前提条件

  1. uv
    : Read the
    uv
    skill and follow its Setup instructions to ensure
    uv
    is installed and on PATH.
  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://developers.openalex.org/ and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.
  3. .env
    file
    : Make sure the
    .env
    file exists in your home directory. Create one if it does not exist.
  4. OPENALEX_API_KEY
    (optional but recommended): Enables the OpenAlex Premium API with higher rate limits. The skill works without it (using the free "polite pool"). If the variable is missing from
    .env
    , do NOT ask the user to paste it into the chat (this would leak the key into the agent's context). Instead, give the user this command — substituting
    ENV_FILE
    with the resolved literal path to the
    .env
    file
    :
    bash
    printf "Enter OpenAlex API key (typing hidden): " && read -s key && echo && echo "OPENALEX_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
    The scripts load credentials automatically via
    dotenv
    . NEVER read, print, or inspect the
    .env
    file or its variables (e.g. no
    cat
    ,
    grep
    ,
    echo
    ,
    printenv
    , or
    os.environ.get
    on keys). Credentials must stay out of the agent's context. See the Rate Limits section for more details.
  1. uv
    : 阅读
    uv
    技能文档并按照其设置说明操作,确保
    uv
    已安装且在PATH中。
  2. 用户通知: 如果本技能目录中不存在LICENSE_NOTIFICATION.txt文件,则(1) 显眼地通知用户查看https://developers.openalex.org/上的条款,并始终检查该技能获取的论文的许可证是否存在限制,然后(2) 创建记录通知文本和时间戳的文件。
  3. .env
    文件
    : 确保你的主目录中存在
    .env
    文件,若不存在则创建一个。
  4. OPENALEX_API_KEY
    (可选但推荐):启用OpenAlex高级API,获得更高的速率限制。即使没有该密钥,技能也能正常工作(使用免费的“礼貌池”)。如果
    .env
    文件中缺少该变量,请勿要求用户在聊天中粘贴密钥(这会导致密钥泄露到Agent的上下文环境中)。取而代之,向用户提供以下命令——将
    ENV_FILE
    替换为
    .env
    文件的实际路径:
    bash
    printf "Enter OpenAlex API key (typing hidden): " && read -s key && echo && echo "OPENALEX_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
    脚本会通过
    dotenv
    自动加载凭据。绝对不要读取、打印或检查
    .env
    文件或其变量(例如不要使用
    cat
    grep
    echo
    printenv
    os.environ.get
    获取密钥)。凭据必须远离Agent的上下文环境。更多细节请查看速率限制部分

Core Rules

核心规则

  1. List Sources. If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.
  2. Resolve before filter. NEVER filter by name. Always
    resolve
    a name to an ID first, then use that ID in
    --filter
    .
  3. Use the CLI only. Never call the API via
    curl
    /
    urllib
    . The CLI handles retries and rate limiting.
  4. No fabrication. Never invent OpenAlex IDs or DOIs. Use
    resolve
    /
    get
    to look them up. Report empty results accurately.
  5. API key. If a command returns 401/429 or you need high-volume queries, follow the prerequisite instructions above to help the user add
    OPENALEX_API_KEY
    to the
    .env
    file. Keys are at OpenAlex.org → account settings.
  6. Keep output small. Always use
    --select
    and
    --per-page 5–10
    for overview queries. Pipe
    filter
    output to a file (
    > results.json
    ), then slim with
    jq
    before reading into context.
  1. 列出来源:如果使用了本技能,务必在输出中提及这一点,并列出所有用于生成输出的论文的URL。
  2. 先解析再过滤:绝不按名称过滤。始终先将名称
    resolve
    解析为ID,然后在
    --filter
    中使用该ID。
  3. 仅使用CLI:绝不要通过
    curl
    /
    urllib
    调用API。CLI会处理重试和速率限制。
  4. 禁止编造:绝不编造OpenAlex ID或DOI。使用
    resolve
    /
    get
    进行查询。准确报告空结果。
  5. API密钥:如果命令返回401/429错误,或者需要执行大量查询,请按照上述前提条件的说明帮助用户将
    OPENALEX_API_KEY
    添加到
    .env
    文件中。可在OpenAlex.org → 账户设置中获取密钥。
  6. 精简输出:对于概览查询,始终使用
    --select
    --per-page 5–10
    参数。将
    filter
    的输出管道传输到文件(
    > results.json
    ),然后使用
    jq
    精简内容后再读取到上下文环境中。

Rate Limits

速率限制

  • With key: ~10 req/s, $1/day free budget.
  • Without key: Very limited, $0.01/day budget.
OperationCost
Singleton
get
Free
filter
$0.0001
--search
/
resolve
$0.001
download-pdf
$0.01
  • 有密钥:约10次请求/秒,每日免费额度1美元。
  • 无密钥:限制严格,每日额度0.01美元。
操作类型成本
单次
get
请求
免费
filter
操作
0.0001美元
--search
/
resolve
操作
0.001美元
download-pdf
操作
0.01美元

CLI Reference

CLI参考

uv run scripts/openalex_cli.py [--api-key KEY] <command> [flags]
Entity types (shared across commands):
works
,
authors
,
sources
,
institutions
,
topics
,
domains
,
fields
,
subfields
,
sdgs
,
countries
,
continents
,
languages
,
keywords
,
publishers
,
funders
,
work-types
,
source-types
,
institution-types
,
licenses
uv run scripts/openalex_cli.py [--api-key KEY] <command> [flags]
实体类型(所有命令通用):
works
,
authors
,
sources
,
institutions
,
topics
,
domains
,
fields
,
subfields
,
sdgs
,
countries
,
continents
,
languages
,
keywords
,
publishers
,
funders
,
work-types
,
source-types
,
institution-types
,
licenses

Commands

命令

resolve
<entity> <query>
— Name → ID candidates. Returns
id
,
display_name
,
hint
. Use
--per-page N
for more candidates.
get
<entity> <id>
— Full metadata for one entity. Accepts short ID (
W2741809807
), full URL, or DOI URL. Use
--select
to limit fields.
filter
<entity>
— Search/filter entities. Key flags are:
  • --search <query>
    : Full-text search (10× cost of
    --filter
    )
  • --filter <expr>
    : Filter expressions. Use
    ,
    for AND and
    |
    for OR.
  • --sort <field:dir>
    : Sort results (e.g.,
    cited_by_count:desc
    )
  • --select <fields>
    : Limit the fields returned in the output.
  • --group-by <field>
    : Aggregate results by a specific field.
  • --per-page <N>
    : Number of results per page (default 25, max 100).
  • --page <N>
    : Specify the page number to retrieve.
  • --sample <N>
    : Get a random sample of up to 10,000 results.
  • --seed <N>
    : Seed for reproducible sampling.
download-pdf
<work-id> <output-path>
— Download PDF (requires API key). Falls back to alternative
pdf_url
locations if primary fails. Whenever you download a PDF, verify it is not empty or corrupted.
rate-limit — Check current rate limit status (requires API key).
resolve
<entity> <query>
— 将名称转换为ID候选结果。返回
id
display_name
hint
。使用
--per-page N
获取更多候选结果。
get
<entity> <id>
— 获取单个实体的完整元数据。接受短ID(
W2741809807
)、完整URL或DOI URL。使用
--select
限制返回字段。
filter
<entity>
— 搜索/过滤实体。主要参数包括:
  • --search <query>
    : 全文搜索(成本为
    --filter
    的10倍)
  • --filter <expr>
    : 过滤表达式。使用
    ,
    表示AND,
    |
    表示OR。
  • --sort <field:dir>
    : 对结果排序(例如
    cited_by_count:desc
  • --select <fields>
    : 限制输出返回的字段。
  • --group-by <field>
    : 按指定字段聚合结果。
  • --per-page <N>
    : 每页结果数量(默认25,最大100)。
  • --page <N>
    : 指定要获取的页码。
  • --sample <N>
    : 获取最多10,000条结果的随机样本。
  • --seed <N>
    : 用于可复现抽样的种子值。
download-pdf
<work-id> <output-path>
— 下载PDF(需要API密钥)。如果主路径失败,会自动尝试备用
pdf_url
地址。每次下载PDF后,需验证文件非空且未损坏。
rate-limit — 检查当前速率限制状态(需要API密钥)。

Search Tips

搜索技巧

  • If
    resolve
    returns no matches, try alternate spellings or abbreviations.
  • If
    --search
    returns 0 results, try broader terms (max 3 retries).
  • If
    resolve
    returns multiple candidates, present them to the user with
    display_name
    and
    hint
    for manual selection.
  • 如果
    resolve
    未返回匹配结果,尝试使用其他拼写或缩写。
  • 如果
    --search
    返回0条结果,尝试更宽泛的术语(最多重试3次)。
  • 如果
    resolve
    返回多个候选结果,将带有
    display_name
    hint
    的候选结果呈现给用户,供其手动选择。

Entity References

实体参考

Consult
references/
for valid filter, sort, and group-by fields per entity:
  • WorksAuthorsSources
  • InstitutionsTopicsTaxonomy
  • Geo & LanguagePublishers & Funders
  • Type Values
请查看
references/
目录,获取每个实体对应的有效过滤、排序和分组字段:
  • WorksAuthorsSources
  • InstitutionsTopicsTaxonomy
  • Geo & LanguagePublishers & Funders
  • Type Values

Common Workflows

常见工作流

bash
undefined
bash
undefined

Author's works (resolve → filter)

作者发表成果(解析→过滤)

uv run scripts/openalex_cli.py resolve authors "Geoffrey Hinton" uv run scripts/openalex_cli.py filter works
--filter "authorships.author.id:A5108093963"
--sort "cited_by_count:desc" --per-page 10 > papers.json cat papers.json | jq '[.results[] | {id, title: .display_name, year: .publication_year, citations: .cited_by_count}]'
uv run scripts/openalex_cli.py resolve authors "Geoffrey Hinton" uv run scripts/openalex_cli.py filter works
--filter "authorships.author.id:A5108093963"
--sort "cited_by_count:desc" --per-page 10 > papers.json cat papers.json | jq '[.results[] | {id, title: .display_name, year: .publication_year, citations: .cited_by_count}]'

DOI lookup

DOI查询

uv run scripts/openalex_cli.py get works "https://doi.org/10.1038/s41586-021-03819-2"
uv run scripts/openalex_cli.py get works "https://doi.org/10.1038/s41586-021-03819-2"

Bulk DOI lookup (up to 100)

批量DOI查询(最多100个)

uv run scripts/openalex_cli.py filter works
--filter "doi:10.1234/a|10.1234/b|10.1234/c" --per-page 100 > results.json
uv run scripts/openalex_cli.py filter works
--filter "doi:10.1234/a|10.1234/b|10.1234/c" --per-page 100 > results.json

Institutional impact by year

机构年度影响力统计

uv run scripts/openalex_cli.py resolve institutions "MIT" uv run scripts/openalex_cli.py filter works
--filter "authorships.institutions.id:I63966007"
--group-by "publication_year" > mit_by_year.json
uv run scripts/openalex_cli.py resolve institutions "MIT" uv run scripts/openalex_cli.py filter works
--filter "authorships.institutions.id:I63966007"
--group-by "publication_year" > mit_by_year.json

Random sample

随机样本

uv run scripts/openalex_cli.py filter works
--filter "publication_year:2023,is_oa:true"
--sample 100 --seed 42 > results.json
undefined
uv run scripts/openalex_cli.py filter works
--filter "publication_year:2023,is_oa:true"
--sample 100 --seed 42 > results.json
undefined

Error Handling

错误处理

CodeMeaningAction
401UnauthorizedHelp user add API key to
.env
(see prereqs)
403Plan upgrade neededInform user; see https://openalex.org/pricing
404Not foundVerify ID; try
resolve
first
429Rate limitedWait and retry; suggest adding API key to
.env
Known premium-only filters:
from_updated_date
,
to_updated_date
.
Never fabricate results on empty responses — report accurately and suggest alternate search terms.
错误码含义处理操作
401未授权帮助用户将API密钥添加到
.env
文件(参见前提条件)
403需要升级套餐告知用户;查看https://openalex.org/pricing
404未找到验证ID;先尝试使用
resolve
解析
429速率受限等待后重试;建议将API密钥添加到
.env
文件
已知仅高级版可用的过滤条件:
from_updated_date
,
to_updated_date
在返回空结果时绝不编造内容——准确报告结果,并建议用户尝试其他搜索术语。