arxiv-paper-translator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesearXiv Paper Translator
arXiv论文翻译工具
Translate academic papers from arXiv by downloading LaTeX source, translating content while preserving structure, and generating translated PDFs with technical reports.
通过下载LaTeX源码、在保留文档结构的同时翻译内容,并生成带技术报告的翻译版PDF,实现arXiv学术论文的翻译。
Workflow Overview
工作流程概述
- Download & Extract - Get LaTeX source from arXiv
- Translate - Translate English narrative content to Chinese following LaTeX-specific rules
- REVIEW PHASE - MUST COMPLETE before compiling
- CJK Support & Localize Labels - Add xeCJK, localize labels
- Compile .tex Files - Generate translated PDF using XeLaTeX
- Report - Create technical summary document
- 下载与提取 - 从arXiv获取LaTeX源码
- 翻译 - 遵循LaTeX特定规则将英文叙述内容翻译成中文
- 审核阶段 - 必须完成后才能进行编译
- CJK支持与标签本地化 - 添加xeCJK包,本地化标签
- 编译.tex文件 - 使用XeLaTeX生成翻译版PDF
- 报告生成 - 创建技术总结文档
Prerequisites
前置条件
Check local xelatex installation:
bash
xelatex --versionIf not installed, make sure Docker is installed and available.
bash
docker --versionThis skill requires XeLaTeX to compile translated PDFs. If not installed locally, Docker will be used instead.
Recommend using xu-cheng/latex-docker Docker images.
e.g. Tex Live full distribution (only linux/amd64):
bash
undefined检查本地xelatex安装情况:
bash
xelatex --version若未安装,请确保Docker已安装并可用。
bash
docker --version本工具需要XeLaTeX来编译翻译后的PDF。若本地未安装,将使用Docker替代。
推荐使用xu-cheng/latex-docker Docker镜像。
例如,Tex Live完整发行版(仅支持linux/amd64):
bash
undefinedNOTICE: ghcr.1ms.run is a mirror of ghcr.io.
注意:ghcr.1ms.run是ghcr.io的镜像源。
docker pull ghcr.1ms.run/xu-cheng/texlive-debian:20260101 --platform linux/amd64
docker pull ghcr.1ms.run/xu-cheng/texlive-debian:20260101 --platform linux/amd64
=> docker pull ghcr.io/xu-cheng/latex-debian:20260101 --platform linux/amd64
=> docker pull ghcr.io/xu-cheng/latex-debian:20260101 --platform linux/amd64
**If both local XeLaTeX and Docker are not installed, then STOP trying to run this skill.**
And Ask user question: "XeLaTeX or Docker is required to compile translated PDFs. Which one do you want to use? I'll help you to setup."
**若本地XeLaTeX和Docker均未安装,请停止运行本工具**。
并询问用户:"编译翻译后的PDF需要XeLaTeX或Docker。您想使用哪一个?我会帮您完成配置。"Step 1: Download LaTeX Source
步骤1:下载LaTeX源码
Extract ARXIV_ID from user input.
Download and extract source code from arXiv:
bash
undefined从用户输入中提取ARXIV_ID。
从arXiv下载并提取源码:
bash
undefinedDownload LaTeX source (replace ARXIV_ID with user-specified paper ID)
下载LaTeX源码(将ARXIV_ID替换为用户指定的论文ID)
ARXIV_ID="2206.04655"
mkdir -p arXiv_${ARXIV_ID}
wget -q https://arxiv.org/e-print/${ARXIV_ID} -O arXiv_${ARXIV_ID}/paper_source.tar.gz
mkdir -p arXiv_${ARXIV_ID}/paper_source
tar -xzf arXiv_${ARXIV_ID}/paper_source.tar.gz -C arXiv_${ARXIV_ID}/paper_source
**Verify extraction:**
```bashARXIV_ID="2206.04655"
mkdir -p arXiv_${ARXIV_ID}
wget -q https://arxiv.org/e-print/${ARXIV_ID} -O arXiv_${ARXIV_ID}/paper_source.tar.gz
mkdir -p arXiv_${ARXIV_ID}/paper_source
tar -xzf arXiv_${ARXIV_ID}/paper_source.tar.gz -C arXiv_${ARXIV_ID}/paper_source
**验证提取结果:**
```bashList files to understand structure
列出文件以了解结构
tree arXiv_${ARXIV_ID}/paper_source
undefinedtree arXiv_${ARXIV_ID}/paper_source
undefinedStep 2: Translate LaTeX Files
步骤2:翻译LaTeX文件
IMPORTANT: Before translating, read references/translation_guidelines.md for detailed rules.
重要提示:翻译前,请阅读references/translation_guidelines.md中的详细规则。
Translation Workflow
翻译工作流程
Step 2.1. Copy all files from to :
paper_source/paper_cn/Option 1 - Using cp (standard):
bash
cd arXiv_${ARXIV_ID}
mkdir -p paper_cn
cp -r paper_source/* paper_cn/Option 2 - Using rsync (better for incremental sync):
bash
cd arXiv_${ARXIV_ID}
mkdir -p paper_cn
rsync -av paper_source/ paper_cn/All .tex files in will be translated in-place later.
paper_cn/Step 2.2. Gather Context (MANDATORY):
Before ANY translation, you MUST extract:
- Paper Title: From in main file
\title{...} - Abstract: From or
\begin{abstract}...\end{abstract}in main file\abstract{...} - Paper Structure: List all sections and which .tex file contains each
- Key Terminologies: Build terminology table from paper content
For some glossaries or terminologies you don't know how to translate, you can ASK user question for definition.
This information is REQUIRED for translation tasks.
Read references/translation_prompt.md for the prompt template.
Step 2.3. Dispatch Translation Tasks
Identify files to translate:
- Find main file (contains \documentclass{...}, usually main.tex, paper.tex, template.tex, etc.)
- Filter .tex files that need translation (skip macro-only files if any, or user specified files)
- Create list of files to translate
Translation Strategy:
-
Translate main file first (sequential)
- Builds shared terminology context
- Ensures consistency for other files
-
Translate other files:
- If 3+ files: Dispatch in parallel
- If 1-2 files: Sequential translation
Each translation Task:
- Task type: general-purpose subagent
- Input: File path in directory
paper_cn/ - Action: Read file → Translate → Edit file (Update file content with translated text)
- Must follow references/translation_prompt.md
- Must use gathered context (title, abstract, structure, terminologies)
Example command to find main .tex file:
bash
find paper_cn/ -name "*.tex" -exec grep -l '\\documentclass' {} \; | head -1步骤2.1. 复制所有文件从到:
paper_source/paper_cn/选项1 - 使用cp(标准方式):
bash
cd arXiv_${ARXIV_ID}
mkdir -p paper_cn
cp -r paper_source/* paper_cn/选项2 - 使用rsync(更适合增量同步):
bash
cd arXiv_${ARXIV_ID}
mkdir -p paper_cn
rsync -av paper_source/ paper_cn/后续将在目录下的所有.tex文件中直接完成翻译。
paper_cn/步骤2.2. 收集上下文(必填):
在开始任何翻译工作前,必须提取以下信息:
- 论文标题:从主文件中的获取
\title{...} - 摘要:从主文件中的或
\begin{abstract}...\end{abstract}获取\abstract{...} - 论文结构:列出所有章节及其所在的.tex文件
- 关键术语:从论文内容中构建术语表
对于一些不知道如何翻译的术语或词汇,可以询问用户获取定义。
这些信息是翻译任务的必需内容。
阅读references/translation_prompt.md获取提示模板。
步骤2.3. 分配翻译任务
确定需要翻译的文件:
- 找到主文件(包含,通常命名为main.tex、paper.tex、template.tex等)
\documentclass{...} - 筛选需要翻译的.tex文件(如有仅包含宏定义的文件可跳过,或根据用户指定筛选)
- 创建待翻译文件列表
翻译策略:
-
优先翻译主文件(顺序执行)
- 构建共享术语上下文
- 确保其他文件翻译的一致性
-
翻译其他文件:
- 若文件数≥3:并行分配任务
- 若文件数1-2个:顺序翻译
每个翻译任务要求:
- 任务类型:通用子代理
- 输入:目录下的文件路径
paper_cn/ - 操作:读取文件 → 翻译 → 编辑文件(用翻译后的文本更新文件内容)
- 必须遵循references/translation_prompt.md的要求
- 必须使用收集到的上下文信息(标题、摘要、结构、术语)
查找主.tex文件的示例命令:
bash
find paper_cn/ -name "*.tex" -exec grep -l '\\documentclass' {} \; | head -1Step 3: Review Translation
步骤3:审核翻译内容
After all translation Tasks are completed, you MUST review the translated content following references/review_checklist.md to verify:
- File Completeness Check
- LaTeX Command Spelling
- CJK Catcode Issues
- Translation Quality Check
- Content Spot-Check
Perform fixes as needed based on review findings.
CRITICAL: Before proceeding to Step 4, you must confirm:
- All review checks completed
- Any issues identified and fixed
- Translation quality verified
所有翻译任务完成后,必须遵循references/review_checklist.md审核翻译内容,验证以下项:
- 文件完整性检查
- LaTeX命令拼写检查
- CJK字符编码问题检查
- 翻译质量检查
- 内容抽样检查
根据审核发现的问题进行必要的修正。
关键提示:进入步骤4前,必须确认:
- 所有审核检查已完成
- 已识别并修复所有问题
- 翻译质量已验证
Step 4: Add Chinese Support
步骤4:添加中文支持
IMPORTANT: Follow references/chinese_support.md to configure CJK fonts and localize labels.
Modify main .tex file to include xeCJK package and set CJK fonts.
e.g. for Fandol font (which is included in TexLive Docker image):
latex
\usepackage{xeCJK}
\setCJKmainfont{FandolSong}[ItalicFont=FandolKai] % 宋体 - 正文,\emph 用楷体
\setCJKsansfont{FandolHei} % 黑体 - 标题、\textsf
\setCJKmonofont{FandolFang} % 仿宋 - 代码、\textttIf running locally, Ask user for font preference before configuring. Check available fonts with .
fc-list :lang=zh family重要提示:遵循references/chinese_support.md配置CJK字体并本地化标签。
修改主.tex文件以引入xeCJK包并设置CJK字体。
例如,使用Fandol字体(已包含在TexLive Docker镜像中):
latex
\usepackage{xeCJK}
\setCJKmainfont{FandolSong}[ItalicFont=FandolKai] % 宋体 - 正文,\emph 用楷体
\setCJKsansfont{FandolHei} % 黑体 - 标题、\textsf
\setCJKmonofont{FandolFang} % 仿宋 - 代码、\texttt若在本地运行,请先询问用户的字体偏好再进行配置。使用命令检查可用字体。
fc-list :lang=zh familyStep 5: Compile Translated PDF
步骤5:编译翻译后的PDF
Option 1: Local XeLaTeX
选项1:本地XeLaTeX
bash
undefinedbash
undefinedBasic compilation
基础编译
xelatex main.tex
xelatex main.tex
If paper has bibliography (recommended approach)
若论文包含参考文献(推荐方式)
xelatex main.tex
bibtex main
xelatex main.tex
xelatex main.tex
Or use `latexmk` for automated compilation:
```bash
latexmk -xelatex main.texxelatex main.tex
bibtex main
xelatex main.tex
xelatex main.tex
或使用`latexmk`进行自动化编译:
```bash
latexmk -xelatex main.texOption 2: Docker with TeX Live
选项2:使用带TeX Live的Docker
bash
undefinedbash
undefinedchange working directory to arXiv_${ARXIV_ID}
切换到arXiv_${ARXIV_ID}目录
cd /path/to/arXiv_${ARXIV_ID}
docker run --rm
-v "$(pwd)/paper_cn":/workspace
-w /workspace
ghcr.1ms.run/xu-cheng/texlive-debian:20260101
latexmk -xelatex main.tex
-v "$(pwd)/paper_cn":/workspace
-w /workspace
ghcr.1ms.run/xu-cheng/texlive-debian:20260101
latexmk -xelatex main.tex
undefinedcd /path/to/arXiv_${ARXIV_ID}
docker run --rm
-v "$(pwd)/paper_cn":/workspace
-w /workspace
ghcr.1ms.run/xu-cheng/texlive-debian:20260101
latexmk -xelatex main.tex
-v "$(pwd)/paper_cn":/workspace
-w /workspace
ghcr.1ms.run/xu-cheng/texlive-debian:20260101
latexmk -xelatex main.tex
undefinedStep 6: Generate Technical Report
步骤6:生成技术报告
If user requests a technical summary, spawn a subagent following references/summary_prompt.md to create a technical summary using assets/report_template.md.
Save report:
arXiv_${ARXIV_ID}/technical_report.md若用户需要技术总结,可根据references/summary_prompt.md调用子代理,使用assets/report_template.md创建技术总结文档。
保存报告:
arXiv_${ARXIV_ID}/technical_report.mdFinal Deliverables
最终交付物
- Translated PDF:
paper_cn/<main-file>.pdf - Technical report:
arXiv_${ARXIV_ID}/technical_report.md - TeX Source: directory with all translated LaTeX files
paper_cn/
- 翻译版PDF:
paper_cn/<主文件名>.pdf - 技术报告:
arXiv_${ARXIV_ID}/technical_report.md - TeX源码:包含所有翻译后LaTeX文件的目录
paper_cn/
Common Issues & Solutions
常见问题与解决方案
| Issue | Solution |
|---|---|
| Downloaded file is single .tex, not .tar.gz | |
| Main file not named main.tex | |
| Compilation fails with encoding error | |
Command misspelling (e.g. | See review checklist step 2 — diff command sets to find typos |
| Undefined control sequence - \xmax概率 | xeCJK catcode issue — insert |
| Undefined control sequence - \chinese{弋} | Original uses CJK package's \chinese macro; add \newcommand{\chinese}[1]{#1} after xeCJK config to prevent catcode issue |
| Custom .sty/.cls files | Copy to |
| Mixed CJK/Latin characters may cause xeCJK font switching errors (e.g. |
Undefined references (e.g., | Ensure ALL referenced files are present in |
| 问题 | 解决方案 |
|---|---|
| 下载的文件是单个.tex文件而非.tar.gz包 | 执行 |
| 主文件未命名为main.tex | 执行 |
| 编译时出现编码错误 | 执行 |
命令拼写错误(如 | 参考审核清单步骤2 — 使用diff命令对比查找拼写错误 |
| 未定义控制序列 - \xmax概率 | xeCJK字符编码问题 — 在自定义宏与CJK文本间插入 |
| 未定义控制序列 - \chinese{弋} | 原文使用了CJK包的\chinese宏;在xeCJK配置后添加 |
| 自定义.sty/.cls文件 | 复制到 |
翻译后的表格出现 | 混合CJK/拉丁字符可能导致xeCJK字体切换错误(如 |
未定义引用(如 | 确保 |
References
参考资料
- Translation rules: references/translation_guidelines.md
- Translation prompt: references/translation_prompt.md
- Review checklist: references/review_checklist.md
- Chinese support: references/chinese_support.md
- Report template: assets/report_template.md
- Makefile template (optional): assets/Makefile.template
- 翻译规则:references/translation_guidelines.md
- 翻译提示模板:references/translation_prompt.md
- 审核清单:references/review_checklist.md
- 中文支持配置:references/chinese_support.md
- 报告模板:assets/report_template.md
- Makefile模板(可选):assets/Makefile.template