arxiv

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

arXiv Research

arXiv 学术研究

Search and retrieve academic papers from arXiv via their free REST API. No API key, no dependencies — just curl.

通过arXiv的免费REST API搜索和获取学术论文。无需API密钥，无需依赖库——只需使用curl即可。

Quick Reference

快速参考

Action	Command
Search papers	`curl "https://export.arxiv.org/api/query?search_query=all:QUERY&max_results=5"`
Get specific paper	`curl "https://export.arxiv.org/api/query?id_list=2402.03300"`
Read abstract (web)	`web_extract(urls=["https://arxiv.org/abs/2402.03300"])`
Read full paper (PDF)	`web_extract(urls=["https://arxiv.org/pdf/2402.03300"])`

操作	命令
搜索论文	`curl "https://export.arxiv.org/api/query?search_query=all:QUERY&max_results=5"`
获取特定论文	`curl "https://export.arxiv.org/api/query?id_list=2402.03300"`
阅读摘要（网页）	`web_extract(urls=["https://arxiv.org/abs/2402.03300"])`
阅读完整论文（PDF）	`web_extract(urls=["https://arxiv.org/pdf/2402.03300"])`

Searching Papers

搜索论文

The API returns Atom XML. Parse with

grep

sed

or pipe through

python3

for clean output.

API返回Atom XML格式数据。可以使用

grep

sed

解析，或者通过

python3

管道处理以获得清晰输出。

Basic search

基础搜索

bash

curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"

bash

curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"

Clean output (parse XML to readable format)

清晰输出（将XML解析为可读格式）

bash

curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom'}
root = ET.parse(sys.stdin).getroot()
for i, entry in enumerate(root.findall('a:entry', ns)):
    title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
    arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
    published = entry.find('a:published', ns).text[:10]
    authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
    summary = entry.find('a:summary', ns).text.strip()[:200]
    cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns))
    print(f'{i+1}. [{arxiv_id}] {title}')
    print(f'   Authors: {authors}')
    print(f'   Published: {published} | Categories: {cats}')
    print(f'   Abstract: {summary}...')
    print(f'   PDF: https://arxiv.org/pdf/{arxiv_id}')
    print()
"

bash

curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom'}
root = ET.parse(sys.stdin).getroot()
for i, entry in enumerate(root.findall('a:entry', ns)):
    title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
    arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
    published = entry.find('a:published', ns).text[:10]
    authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
    summary = entry.find('a:summary', ns).text.strip()[:200]
    cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns))
    print(f'{i+1}. [{arxiv_id}] {title}')
    print(f'   Authors: {authors}')
    print(f'   Published: {published} | Categories: {cats}')
    print(f'   Abstract: {summary}...')
    print(f'   PDF: https://arxiv.org/pdf/{arxiv_id}')
    print()
"

Search Query Syntax

搜索查询语法

Prefix	Searches	Example
`all:`	All fields	`all:transformer+attention`
`ti:`	Title	`ti:large+language+models`
`au:`	Author	`au:vaswani`
`abs:`	Abstract	`abs:reinforcement+learning`
`cat:`	Category	`cat:cs.AI`
`co:`	Comment	`co:accepted+NeurIPS`

前缀	搜索范围	示例
`all:`	所有字段	`all:transformer+attention`
`ti:`	标题	`ti:large+language+models`
`au:`	作者	`au:vaswani`
`abs:`	摘要	`abs:reinforcement+learning`
`cat:`	分类	`cat:cs.AI`
`co:`	评论	`co:accepted+NeurIPS`

Boolean operators

布尔运算符

undefined

undefined

AND (default when using +)

AND（使用+时默认逻辑）

search_query=all:transformer+attention

OR

search_query=all:GPT+OR+all:BERT

AND NOT

search_query=all:language+model+ANDNOT+all:vision

Exact phrase

精确短语

search_query=ti:"chain+of+thought"

Combined

组合查询

search_query=au:hinton+AND+cat:cs.LG

undefined

search_query=au:hinton+AND+cat:cs.LG

undefined

Sort and Pagination

排序与分页

Parameter	Options
`sortBy`	`relevance` , `lastUpdatedDate` , `submittedDate`
`sortOrder`	`ascending` , `descending`
`start`	Result offset (0-based)
`max_results`	Number of results (default 10, max 30000)

bash

undefined

参数	选项
`sortBy`	`relevance` , `lastUpdatedDate` , `submittedDate`
`sortOrder`	`ascending` , `descending`
`start`	结果偏移量（从0开始）
`max_results`	结果数量（默认10，最大30000）

bash

undefined

Latest 10 papers in cs.AI

cs.AI分类下最新的10篇论文

curl -s "https://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=submittedDate&sortOrder=descending&max_results=10"

undefined

curl -s "https://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=submittedDate&sortOrder=descending&max_results=10"

undefined

Fetching Specific Papers

获取特定论文

bash

undefined

bash

undefined

By arXiv ID

通过arXiv ID

curl -s "https://export.arxiv.org/api/query?id_list=2402.03300"

Multiple papers

多篇论文

curl -s "https://export.arxiv.org/api/query?id_list=2402.03300,2401.12345,2403.00001"

undefined

curl -s "https://export.arxiv.org/api/query?id_list=2402.03300,2401.12345,2403.00001"

undefined

BibTeX Generation

BibTeX 生成

After fetching metadata for a paper, generate a BibTeX entry:

{% raw %}

bash

curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
root = ET.parse(sys.stdin).getroot()
entry = root.find('a:entry', ns)
if entry is None: sys.exit('Paper not found')
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
year = entry.find('a:published', ns).text[:4]
raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
cat = entry.find('arxiv:primary_category', ns)
primary = cat.get('term') if cat is not None else 'cs.LG'
last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1]
print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},')
print(f'  title     = {{{title}}},')
print(f'  author    = {{{authors}}},')
print(f'  year      = {{{year}}},')
print(f'  eprint    = {{{raw_id}}},')
print(f'  archivePrefix = {{arXiv}},')
print(f'  primaryClass  = {{{primary}}},')
print(f'  url       = {{https://arxiv.org/abs/{raw_id}}}')
print('}')
"

{% endraw %}

获取论文元数据后，生成BibTeX条目：

{% raw %}

bash

curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
root = ET.parse(sys.stdin).getroot()
entry = root.find('a:entry', ns)
if entry is None: sys.exit('Paper not found')
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
year = entry.find('a:published', ns).text[:4]
raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
cat = entry.find('arxiv:primary_category', ns)
primary = cat.get('term') if cat is not None else 'cs.LG'
last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1]
print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},')
print(f'  title     = {{{title}}},')
print(f'  author    = {{{authors}}},')
print(f'  year      = {{{year}}},')
print(f'  eprint    = {{{raw_id}}},')
print(f'  archivePrefix = {{arXiv}},')
print(f'  primaryClass  = {{{primary}}},')
print(f'  url       = {{https://arxiv.org/abs/{raw_id}}}')
print('}')
"

{% endraw %}

Reading Paper Content

阅读论文内容

After finding a paper, read it:

undefined

找到论文后，可通过以下方式阅读：

undefined

Abstract page (fast, metadata + abstract)

摘要页面（快速获取元数据+摘要）

web_extract(urls=["https://arxiv.org/abs/2402.03300"])

Full paper (PDF → markdown via Firecrawl)

完整论文（通过Firecrawl将PDF转换为markdown）

web_extract(urls=["https://arxiv.org/pdf/2402.03300"])


For local PDF processing, see the `ocr-and-documents` skill.

web_extract(urls=["https://arxiv.org/pdf/2402.03300"])


如需本地PDF处理，请参考`ocr-and-documents`技能。

Common Categories

常见分类

Category	Field
`cs.AI`	Artificial Intelligence
`cs.CL`	Computation and Language (NLP)
`cs.CV`	Computer Vision
`cs.LG`	Machine Learning
`cs.CR`	Cryptography and Security
`stat.ML`	Machine Learning (Statistics)
`math.OC`	Optimization and Control
`physics.comp-ph`	Computational Physics

Full list: https://arxiv.org/category_taxonomy

分类	领域
`cs.AI`	人工智能
`cs.CL`	计算与语言（NLP）
`cs.CV`	计算机视觉
`cs.LG`	机器学习
`cs.CR`	密码学与安全
`stat.ML`	机器学习（统计学）
`math.OC`	优化与控制
`physics.comp-ph`	计算物理

完整列表：https://arxiv.org/category_taxonomy

Helper Script

辅助脚本

The

scripts/search_arxiv.py

script handles XML parsing and provides clean output:

bash

python scripts/search_arxiv.py "GRPO reinforcement learning"
python scripts/search_arxiv.py "transformer attention" --max 10 --sort date
python scripts/search_arxiv.py --author "Yann LeCun" --max 5
python scripts/search_arxiv.py --category cs.AI --sort date
python scripts/search_arxiv.py --id 2402.03300
python scripts/search_arxiv.py --id 2402.03300,2401.12345

No dependencies — uses only Python stdlib.

scripts/search_arxiv.py

脚本可处理XML解析并提供清晰输出：

bash

python scripts/search_arxiv.py "GRPO reinforcement learning"
python scripts/search_arxiv.py "transformer attention" --max 10 --sort date
python scripts/search_arxiv.py --author "Yann LeCun" --max 5
python scripts/search_arxiv.py --category cs.AI --sort date
python scripts/search_arxiv.py --id 2402.03300
python scripts/search_arxiv.py --id 2402.03300,2401.12345

无需依赖库——仅使用Python标准库。

Semantic Scholar (Citations, Related Papers, Author Profiles)

Semantic Scholar（引用、相关论文、作者档案）

arXiv doesn't provide citation data or recommendations. Use the Semantic Scholar API for that — free, no key needed for basic use (1 req/sec), returns JSON.

arXiv不提供引用数据或推荐功能。可使用Semantic Scholar API获取这些信息——免费使用，基础功能无需密钥（每秒1次请求），返回JSON格式数据。

Get paper details + citations

获取论文详情+引用数据

bash

undefined

bash

undefined

By arXiv ID

通过arXiv ID

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300?fields=title,authors,citationCount,referenceCount,influentialCitationCount,year,abstract" | python3 -m json.tool

By Semantic Scholar paper ID or DOI

通过Semantic Scholar论文ID或DOI

curl -s "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example?fields=title,citationCount"

undefined

curl -s "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example?fields=title,citationCount"

undefined

Get citations OF a paper (who cited it)

获取某篇论文的引用文献（谁引用了它）

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool

Get references FROM a paper (what it cites)

获取某篇论文的参考文献（它引用了什么）

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool

Search papers (alternative to arXiv search, returns JSON)

搜索论文（arXiv搜索的替代方案，返回JSON）

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.tool

bash

curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.tool

Get paper recommendations

获取论文推荐

bash

curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \
  -H "Content-Type: application/json" \
  -d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.tool

bash

curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \
  -H "Content-Type: application/json" \
  -d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.tool

Author profile

作者档案

bash

curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.tool

bash

curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.tool

Useful Semantic Scholar fields

实用的Semantic Scholar字段

title

authors

year

abstract

citationCount

referenceCount

influentialCitationCount

isOpenAccess

openAccessPdf

fieldsOfStudy

publicationVenue

externalIds

(contains arXiv ID, DOI, etc.)

title

authors

year

abstract

citationCount

referenceCount

influentialCitationCount

isOpenAccess

openAccessPdf

fieldsOfStudy

publicationVenue

externalIds

（包含arXiv ID、DOI等）

Complete Research Workflow

完整研究工作流

Discover:

python scripts/search_arxiv.py "your topic" --sort date --max 10

Assess impact:

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount"

Read abstract:

web_extract(urls=["https://arxiv.org/abs/ID"])

Read full paper:

web_extract(urls=["https://arxiv.org/pdf/ID"])

Find related work:

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20"

Get recommendations: POST to Semantic Scholar recommendations endpoint

Track authors:

curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=NAME"

发现论文：

python scripts/search_arxiv.py "你的研究主题" --sort date --max 10

评估影响力：

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount"

阅读摘要：

web_extract(urls=["https://arxiv.org/abs/ID"])

阅读完整论文：

web_extract(urls=["https://arxiv.org/pdf/ID"])

查找相关研究：

curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20"

获取推荐论文：向Semantic Scholar推荐端点发送POST请求

追踪作者：

curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=作者姓名"

Rate Limits

请求频率限制

API	Rate	Auth
arXiv	~1 req / 3 seconds	None needed
Semantic Scholar	1 req / second	None (100/sec with API key)

API	频率限制	认证要求
arXiv	约每3秒1次请求	无需认证
Semantic Scholar	每秒1次请求	无需认证（使用API密钥可达到每秒100次）

Notes

注意事项

arXiv returns Atom XML — use the helper script or parsing snippet for clean output
Semantic Scholar returns JSON — pipe through
```
python3 -m json.tool
```
for readability
arXiv IDs: old format (
```
hep-th/0601001
```
) vs new (
```
2402.03300
```
)

PDF:

https://arxiv.org/pdf/{id}

— Abstract:

https://arxiv.org/abs/{id}

HTML (when available):
```
https://arxiv.org/html/{id}
```
For local PDF processing, see the
```
ocr-and-documents
```
skill

arXiv返回Atom XML格式数据——建议使用辅助脚本或解析代码片段以获得清晰输出
Semantic Scholar返回JSON格式数据——可通过
```
python3 -m json.tool
```
管道处理以提高可读性
arXiv ID格式：旧格式（
```
hep-th/0601001
```
） vs 新格式（
```
2402.03300
```
）

PDF地址：

https://arxiv.org/pdf/{id}

—— 摘要地址：

https://arxiv.org/abs/{id}

HTML页面（若可用）：
```
https://arxiv.org/html/{id}
```
如需本地PDF处理，请参考
```
ocr-and-documents
```
技能

ID Versioning

ID版本控制

```
arxiv.org/abs/1706.03762
```
always resolves to the latest version
```
arxiv.org/abs/1706.03762v1
```
points to a specific immutable version
When generating citations, preserve the version suffix you actually read to prevent citation drift (a later version may substantially change content)
The API
```
<id>
```
field returns the versioned URL (e.g.,
```
http://arxiv.org/abs/1706.03762v7
```
)

```
arxiv.org/abs/1706.03762
```
始终指向最新版本
```
arxiv.org/abs/1706.03762v1
```
指向特定的不可变版本
生成引用时，请保留你实际阅读的版本后缀，以避免引用偏差（后续版本可能大幅修改内容）
API的
```
<id>
```
字段返回带版本的URL（例如：
```
http://arxiv.org/abs/1706.03762v7
```
）

Withdrawn Papers

撤回的论文

Papers can be withdrawn after submission. When this happens:

The
```
<summary>
```
field contains a withdrawal notice (look for "withdrawn" or "retracted")
Metadata fields may be incomplete
Always check the summary before treating a result as a valid paper

论文提交后可能被撤回。发生撤回时：

```
<summary>
```
字段包含撤回通知（查找"withdrawn"或"retracted"关键词）
元数据字段可能不完整
在将结果视为有效论文前，请务必检查摘要内容