docx-to-markdown

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

docx-to-markdown

DOCX转Markdown

Convert Microsoft Word (.docx) documents to Markdown format.
将Microsoft Word(.docx)文档转换为Markdown格式。

Installation Required

需先安装依赖

bash
cd .claude/skills/docx-to-markdown
npm install
Dependencies:
mammoth
,
turndown
,
@truto/turndown-plugin-gfm
bash
cd .claude/skills/docx-to-markdown
npm install
依赖项:
mammoth
,
turndown
,
@truto/turndown-plugin-gfm

Quick Start

快速开始

bash
undefined
bash
undefined

Basic conversion

基础转换

node .claude/skills/docx-to-markdown/scripts/convert.cjs
--file ./document.docx
node .claude/skills/docx-to-markdown/scripts/convert.cjs
--file ./document.docx

Custom output path

自定义输出路径

node .claude/skills/docx-to-markdown/scripts/convert.cjs
--file ./doc.docx
--output ./output/doc.md
node .claude/skills/docx-to-markdown/scripts/convert.cjs
--file ./doc.docx
--output ./output/doc.md

Extract images to directory

将图片提取到指定目录

node .claude/skills/docx-to-markdown/scripts/convert.cjs
--file ./doc.docx
--output ./output/doc.md
--images ./output/images/
undefined
node .claude/skills/docx-to-markdown/scripts/convert.cjs
--file ./doc.docx
--output ./output/doc.md
--images ./output/images/
undefined

CLI Options

CLI选项

OptionRequiredDescription
--file <path>
YesInput DOCX file
--output <path>
NoOutput Markdown path (default: input name + .md)
--images <dir>
NoDirectory for extracted images (default: inline base64)
选项是否必填描述
--file <path>
输入的DOCX文件路径
--output <path>
输出Markdown文件的路径(默认:输入文件名+.md)
--images <dir>
提取图片的存储目录(默认:转为base64嵌入文本)

Output Format (JSON)

输出格式(JSON)

json
{
  "success": true,
  "input": "/path/to/input.docx",
  "output": "/path/to/output.md",
  "wordCount": 1523,
  "images": 5,
  "warnings": ["Some formatting may be simplified"]
}
json
{
  "success": true,
  "input": "/path/to/input.docx",
  "output": "/path/to/output.md",
  "wordCount": 1523,
  "images": 5,
  "warnings": ["Some formatting may be simplified"]
}

Supported Elements

支持的元素

  • Headings (H1-H6)
  • Paragraphs and emphasis (bold, italic, strikethrough)
  • Ordered and unordered lists
  • Tables (GFM format)
  • Links
  • Images (extracted or base64)
  • Code blocks (requires Word "Code" style)
  • Blockquotes
  • 标题(H1-H6)
  • 段落和强调格式(粗体、斜体、删除线)
  • 有序和无序列表
  • 表格(GFM格式)
  • 链接
  • 图片(可提取或转为base64)
  • 代码块(需要使用Word "Code"样式)
  • 块引用

Known Limitations

已知限制

  • Nested lists: Numbering may reset in deeply nested lists
  • Nested tables: Inner tables are flattened
  • Code blocks: Require explicit Word style mapping ("Code" or "Code Block")
  • Complex formatting: Some advanced formatting may be simplified
  • Footnotes: Converted but may lose some formatting
  • 嵌套列表:深度嵌套的列表可能会出现编号重置问题
  • 嵌套表格:内部表格会被扁平化处理
  • 代码块:需要显式映射Word样式("Code"或"Code Block")
  • 复杂格式:部分高级格式可能会被简化
  • 脚注:会被转换,但可能丢失部分格式

Google Docs Support

Google Docs支持

Export your Google Doc as DOCX first, then convert:
  1. In Google Docs: File → Download → Microsoft Word (.docx)
  2. Run this converter on the downloaded file
先将Google文档导出为DOCX格式,再进行转换:
  1. 在Google Docs中:文件 → 下载 → Microsoft Word(.docx)
  2. 使用本转换器处理下载的文件

Troubleshooting

问题排查

Dependencies not found: Run
npm install
in skill directory Empty output: Ensure DOCX contains actual text (not just images) Code blocks not detected: Use Word's built-in "Code" style
未找到依赖项: 在技能目录下运行
npm install
输出为空: 确保DOCX文件包含实际文本(不只是图片) 未检测到代码块: 使用Word内置的"Code"样式

IMPORTANT Task Planning Notes

重要任务规划注意事项

  • Always plan and break many small todo tasks
  • Always add a final review todo task to review the works done at the end to find any fix or enhancement needed
  • 始终将任务拆分为多个小的待办事项
  • 始终添加一个最终审核的待办任务,在最后检查已完成的工作,找出需要修复或优化的地方