docx

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

(中文)DOCX creation, editing, and analysis

DOCX文档的创建、编辑与分析

Overview

概述

A .docx file is a ZIP archive containing XML files.
.docx文件是一个包含XML文件的ZIP压缩包。

Quick Reference

快速参考

TaskApproach
Read/analyze content
pandoc
or unpack for raw XML
Create new documentUse
docx-js
- see Creating New Documents below
Edit existing documentUnpack → edit XML → repack - see Editing Existing Documents below
任务实现方式
读取/分析内容使用
pandoc
或解压文件获取原始XML
创建新文档使用
docx-js
- 详见下方「创建新文档」部分
编辑现有文档解压 → 编辑XML → 重新打包 - 详见下方「编辑现有文档」部分

Converting .doc to .docx

将.doc格式转换为.docx

Legacy
.doc
files must be converted before editing:
bash
python scripts/office/soffice.py --headless --convert-to docx document.doc
旧版
.doc
文件在编辑前必须先转换:
bash
python scripts/office/soffice.py --headless --convert-to docx document.doc

Reading Content

读取内容

bash
undefined
bash
undefined

Text extraction with tracked changes

提取带修订记录的文本

pandoc --track-changes=all document.docx -o output.md
pandoc --track-changes=all document.docx -o output.md

Raw XML access

访问原始XML

python scripts/office/unpack.py document.docx unpacked/
undefined
python scripts/office/unpack.py document.docx unpacked/
undefined

Converting to Images

转换为图片

bash
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
bash
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page

Accepting Tracked Changes

接受修订记录

To produce a clean document with all tracked changes accepted (requires LibreOffice):
bash
python scripts/accept_changes.py input.docx output.docx

要生成一份已接受所有修订记录的干净文档(需要LibreOffice):
bash
python scripts/accept_changes.py input.docx output.docx

Creating New Documents

创建新文档

Generate .docx files with JavaScript, then validate. Install:
npm install -g docx
使用JavaScript生成.docx文件,然后进行验证。安装依赖:
npm install -g docx

Setup

初始化

javascript
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
        Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
        TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
        VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
javascript
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
        Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
        TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
        VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* 内容 */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));

Validation

验证

After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
bash
python scripts/office/validate.py doc.docx
创建文件后,进行验证。如果验证失败,解压文件、修复XML后重新打包。
bash
python scripts/office/validate.py doc.docx

Page Size

页面尺寸

javascript
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
  properties: {
    page: {
      size: {
        width: 12240,   // 8.5 inches in DXA
        height: 15840   // 11 inches in DXA
      },
      margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
    }
  },
  children: [/* content */]
}]
Common page sizes (DXA units, 1440 DXA = 1 inch):
PaperWidthHeightContent Width (1" margins)
US Letter12,24015,8409,360
A4 (default)11,90616,8389,026
Landscape orientation: docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:
javascript
size: {
  width: 12240,   // Pass SHORT edge as width
  height: 15840,  // Pass LONG edge as height
  orientation: PageOrientation.LANDSCAPE  // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)
javascript
// 重要提示:docx-js默认使用A4纸,而非美国信纸
// 为确保结果一致,请始终显式设置页面尺寸
sections: [{
  properties: {
    page: {
      size: {
        width: 12240,   // 8.5英寸,单位为DXA
        height: 15840   // 11英寸,单位为DXA
      },
      margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1英寸边距
    }
  },
  children: [/* 内容 */]
}]
常见页面尺寸(单位:DXA,1440 DXA = 1英寸):
纸张类型宽度高度内容宽度(1英寸边距)
美国信纸12,24015,8409,360
A4(默认)11,90616,8389,026
横向排版: docx-js会在内部自动交换宽高,因此只需传入纵向尺寸并设置横向参数即可:
javascript
size: {
  width: 12240,   // 将短边设为width
  height: 15840,  // 将长边设为height
  orientation: PageOrientation.LANDSCAPE  // docx-js会在XML中自动交换宽高
},
// 内容宽度 = 15840 - 左边距 - 右边距(使用长边计算)

Styles (Override Built-in Headings)

样式(覆盖内置标题样式)

Use Arial as the default font (universally supported). Keep titles black for readability.
javascript
const doc = new Document({
  styles: {
    default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
    paragraphStyles: [
      // IMPORTANT: Use exact IDs to override built-in styles
      { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 32, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
      { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 28, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
    ]
  },
  sections: [{
    children: [
      new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
    ]
  }]
});
使用Arial作为默认字体(兼容性最广)。标题保持黑色以保证可读性。
javascript
const doc = new Document({
  styles: {
    default: { document: { run: { font: "Arial", size: 24 } } }, // 默认12号字
    paragraphStyles: [
      // 重要提示:使用精确ID覆盖内置样式
      { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 32, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel为生成目录所必需
      { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 28, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
    ]
  },
  sections: [{
    children: [
      new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("标题")] }),
    ]
  }]
});

Lists (NEVER use unicode bullets)

列表(禁止使用Unicode项目符号)

javascript
// ❌ WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("• Item")] })  // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] })  // BAD

// ✅ CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
  numbering: {
    config: [
      { reference: "bullets",
        levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
      { reference: "numbers",
        levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
    ]
  },
  sections: [{
    children: [
      new Paragraph({ numbering: { reference: "bullets", level: 0 },
        children: [new TextRun("Bullet item")] }),
      new Paragraph({ numbering: { reference: "numbers", level: 0 },
        children: [new TextRun("Numbered item")] }),
    ]
  }]
});

// ⚠️ Each reference creates INDEPENDENT numbering
// Same reference = continues (1,2,3 then 4,5,6)
// Different reference = restarts (1,2,3 then 1,2,3)
javascript
// ❌ 错误 - 切勿手动插入项目符号字符
new Paragraph({ children: [new TextRun("• 项目")] })  // 不推荐
new Paragraph({ children: [new TextRun("\u2022 项目")] })  // 不推荐

// ✅ 正确 - 使用编号配置与LevelFormat.BULLET
const doc = new Document({
  numbering: {
    config: [
      { reference: "bullets",
        levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
      { reference: "numbers",
        levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
    ]
  },
  sections: [{
    children: [
      new Paragraph({ numbering: { reference: "bullets", level: 0 },
        children: [new TextRun("项目符号项")] }),
      new Paragraph({ numbering: { reference: "numbers", level: 0 },
        children: [new TextRun("编号项")] }),
    ]
  }]
});

// ⚠️ 每个reference对应独立的编号序列
// 相同reference:编号连续(1,2,3 之后 4,5,6)
// 不同reference:编号重置(1,2,3 之后 1,2,3)

Tables

表格

CRITICAL: Tables need dual widths - set both
columnWidths
on the table AND
width
on each cell. Without both, tables render incorrectly on some platforms.
javascript
// CRITICAL: Always set table width for consistent rendering
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };

new Table({
  width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
  columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
  rows: [
    new TableRow({
      children: [
        new TableCell({
          borders,
          width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
          shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
          margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
          children: [new Paragraph({ children: [new TextRun("Cell")] })]
        })
      ]
    })
  ]
})
Table width calculation:
Always use
WidthType.DXA
WidthType.PERCENTAGE
breaks in Google Docs.
javascript
// Table width = sum of columnWidths = content width
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360]  // Must sum to table width
Width rules:
  • Always use
    WidthType.DXA
    — never
    WidthType.PERCENTAGE
    (incompatible with Google Docs)
  • Table width must equal the sum of
    columnWidths
  • Cell
    width
    must match corresponding
    columnWidth
  • Cell
    margins
    are internal padding - they reduce content area, not add to cell width
  • For full-width tables: use content width (page width minus left and right margins)
重要提示:表格需要双重宽度设置 - 同时在表格上设置
columnWidths
和每个单元格的
width
。缺少任一设置都会导致表格在部分平台上显示异常。
javascript
// 重要提示:始终设置表格宽度以保证显示一致
// 重要提示:使用ShadingType.CLEAR(而非SOLID)避免黑色背景
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };

new Table({
  width: { size: 9360, type: WidthType.DXA }, // 始终使用DXA(百分比在Google Docs中会失效)
  columnWidths: [4680, 4680], // 必须与表格宽度总和一致(DXA:1440 = 1英寸)
  rows: [
    new TableRow({
      children: [
        new TableCell({
          borders,
          width: { size: 4680, type: WidthType.DXA }, // 同时为每个单元格设置宽度
          shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // 使用CLEAR而非SOLID
          margins: { top: 80, bottom: 80, left: 120, right: 120 }, // 单元格内边距(内部填充,不增加单元格宽度)
          children: [new Paragraph({ children: [new TextRun("单元格")] })]
        })
      ]
    })
  ]
})
表格宽度计算:
始终使用
WidthType.DXA
WidthType.PERCENTAGE
在Google Docs中会失效。
javascript
// 表格宽度 = columnWidths总和 = 内容宽度
// 带1英寸边距的美国信纸:12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360]  // 必须与表格宽度总和一致
宽度规则:
  • 始终使用
    WidthType.DXA
    — 切勿使用
    WidthType.PERCENTAGE
    (与Google Docs不兼容)
  • 表格宽度必须等于
    columnWidths
    的总和
  • 单元格
    width
    必须与对应的
    columnWidth
    一致
  • 单元格
    margins
    是内部填充 - 会缩小内容区域,不会增加单元格宽度
  • 全宽表格:使用内容宽度(页面宽度减去左右边距)

Images

图片

javascript
// CRITICAL: type parameter is REQUIRED
new Paragraph({
  children: [new ImageRun({
    type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
    data: fs.readFileSync("image.png"),
    transformation: { width: 200, height: 150 },
    altText: { title: "Title", description: "Desc", name: "Name" } // All three required
  })]
})
javascript
// 重要提示:type参数为必填项
new Paragraph({
  children: [new ImageRun({
    type: "png", // 必填:png, jpg, jpeg, gif, bmp, svg
    data: fs.readFileSync("image.png"),
    transformation: { width: 200, height: 150 },
    altText: { title: "标题", description: "描述", name: "名称" } // 三项均为必填
  })]
})

Page Breaks

分页符

javascript
// CRITICAL: PageBreak must be inside a Paragraph
new Paragraph({ children: [new PageBreak()] })

// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })
javascript
// 重要提示:PageBreak必须嵌套在Paragraph内
new Paragraph({ children: [new PageBreak()] })

// 或使用pageBreakBefore属性
new Paragraph({ pageBreakBefore: true, children: [new TextRun("新页面")] })

Table of Contents

目录

javascript
// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })
javascript
// 重要提示:标题必须仅使用HeadingLevel - 不能使用自定义样式
new TableOfContents("目录", { hyperlink: true, headingStyleRange: "1-3" })

Headers/Footers

页眉/页脚

javascript
sections: [{
  properties: {
    page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
  },
  headers: {
    default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
  },
  footers: {
    default: new Footer({ children: [new Paragraph({
      children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
    })] })
  },
  children: [/* content */]
}]
javascript
sections: [{
  properties: {
    page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1英寸
  },
  headers: {
    default: new Header({ children: [new Paragraph({ children: [new TextRun("页眉")] })] })
  },
  footers: {
    default: new Footer({ children: [new Paragraph({
      children: [new TextRun("第 "), new TextRun({ children: [PageNumber.CURRENT] }), new TextRun(" 页")]
    })] })
  },
  children: [/* 内容 */]
}]

Critical Rules for docx-js

docx-js使用关键规则

  • Set page size explicitly - docx-js defaults to A4; use US Letter (12240 x 15840 DXA) for US documents
  • Landscape: pass portrait dimensions - docx-js swaps width/height internally; pass short edge as
    width
    , long edge as
    height
    , and set
    orientation: PageOrientation.LANDSCAPE
  • Never use
    \n
    - use separate Paragraph elements
  • Never use unicode bullets - use
    LevelFormat.BULLET
    with numbering config
  • PageBreak must be in Paragraph - standalone creates invalid XML
  • ImageRun requires
    type
    - always specify png/jpg/etc
  • Always set table
    width
    with DXA
    - never use
    WidthType.PERCENTAGE
    (breaks in Google Docs)
  • Tables need dual widths -
    columnWidths
    array AND cell
    width
    , both must match
  • Table width = sum of columnWidths - for DXA, ensure they add up exactly
  • Always add cell margins - use
    margins: { top: 80, bottom: 80, left: 120, right: 120 }
    for readable padding
  • Use
    ShadingType.CLEAR
    - never SOLID for table shading
  • TOC requires HeadingLevel only - no custom styles on heading paragraphs
  • Override built-in styles - use exact IDs: "Heading1", "Heading2", etc.
  • Include
    outlineLevel
    - required for TOC (0 for H1, 1 for H2, etc.)

  • 显式设置页面尺寸 - docx-js默认使用A4纸;美国文档请使用美国信纸(12240 x 15840 DXA)
  • 横向排版:传入纵向尺寸 - docx-js会在内部自动交换宽高;将短边设为
    width
    ,长边设为
    height
    ,并设置
    orientation: PageOrientation.LANDSCAPE
  • 切勿使用
    \n
    - 使用独立的Paragraph元素
  • 切勿使用Unicode项目符号 - 使用
    LevelFormat.BULLET
    与编号配置
  • PageBreak必须嵌套在Paragraph内 - 单独使用会生成无效XML
  • ImageRun必须指定
    type
    - 始终明确png/jpg等格式
  • 始终使用DXA设置表格
    width
    - 切勿使用
    WidthType.PERCENTAGE
    (在Google Docs中会失效)
  • 表格需要双重宽度设置 -
    columnWidths
    数组与单元格
    width
    必须匹配
  • 表格宽度 = columnWidths总和 - 对于DXA单位,确保数值完全相等
  • 始终添加单元格内边距 - 使用
    margins: { top: 80, bottom: 80, left: 120, right: 120 }
    保证可读性
  • 使用
    ShadingType.CLEAR
    - 表格底纹切勿使用SOLID
  • 目录仅支持HeadingLevel - 标题段落不能使用自定义样式
  • 覆盖内置样式 - 使用精确ID:"Heading1", "Heading2"等
  • 包含
    outlineLevel
    - 生成目录所必需(H1对应0,H2对应1等)

Editing Existing Documents

编辑现有文档

Follow all 3 steps in order.
请按顺序执行以下3个步骤。

Step 1: Unpack

步骤1:解压文档

bash
python scripts/office/unpack.py document.docx unpacked/
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (
“
etc.) so they survive editing. Use
--merge-runs false
to skip run merging.
bash
python scripts/office/unpack.py document.docx unpacked/
提取XML文件、格式化代码、合并相邻run,并将智能引号转换为XML实体(如
“
等)以保证编辑后不会丢失。使用
--merge-runs false
跳过run合并。

Step 2: Edit XML

步骤2:编辑XML

Edit files in
unpacked/word/
. See XML Reference below for patterns.
Use "Claude" as the author for tracked changes and comments, unless the user explicitly requests use of a different name.
Use the Edit tool directly for string replacement. Do not write Python scripts. Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.
CRITICAL: Use smart quotes for new content. When adding text with apostrophes or quotes, use XML entities to produce smart quotes:
xml
<!-- Use these entities for professional typography -->
<w:t>Here&#x2019;s a quote: &#x201C;Hello&#x201D;</w:t>
EntityCharacter
&#x2018;
‘ (left single)
&#x2019;
’ (right single / apostrophe)
&#x201C;
“ (left double)
&#x201D;
” (right double)
Adding comments: Use
comment.py
to handle boilerplate across multiple XML files (text must be pre-escaped XML):
bash
python scripts/comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0  # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author"  # custom author name
Then add markers to document.xml (see Comments in XML Reference).
编辑
unpacked/word/
目录下的文件。XML参考格式见下方内容。
默认使用“Claude”作为作者 用于修订记录与批注,除非用户明确要求使用其他名称。
直接使用编辑工具进行字符串替换。请勿编写Python脚本。 脚本会增加不必要的复杂度,编辑工具可直观展示替换内容。
重要提示:新增内容请使用智能引号。 添加包含撇号或引号的文本时,使用XML实体生成智能引号:
xml
<!-- 使用以下实体保证专业排版 -->
<w:t>这是一段引用:&#x201C;你好&#x201D;</w:t>
实体字符
&#x2018;
‘ (左单引号)
&#x2019;
’ (右单引号/撇号)
&#x201C;
“ (左双引号)
&#x201D;
” (右双引号)
添加批注: 使用
comment.py
处理多个XML文件中的重复模板(文本必须是预转义的XML):
bash
python scripts/comment.py unpacked/ 0 "包含&amp;和&#x2019;的批注文本"
python scripts/comment.py unpacked/ 1 "回复文本" --parent 0  # 回复批注0
python scripts/comment.py unpacked/ 0 "文本" --author "自定义作者"  # 自定义作者名称
然后在document.xml中添加标记(详见XML参考中的批注部分)。

Step 3: Pack

步骤3:重新打包文档

bash
python scripts/office/pack.py unpacked/ output.docx --original document.docx
Validates with auto-repair, condenses XML, and creates DOCX. Use
--validate false
to skip.
Auto-repair will fix:
  • durableId
    >= 0x7FFFFFFF (regenerates valid ID)
  • Missing
    xml:space="preserve"
    on
    <w:t>
    with whitespace
Auto-repair won't fix:
  • Malformed XML, invalid element nesting, missing relationships, schema violations
bash
python scripts/office/pack.py unpacked/ output.docx --original document.docx
自动验证并修复问题、压缩XML,最终生成DOCX文件。使用
--validate false
跳过验证。
自动修复可解决以下问题:
  • durableId
    >= 0x7FFFFFFF(重新生成有效ID)
  • 带前后空格的
    <w:t>
    元素缺少
    xml:space="preserve"
    属性
自动修复无法解决以下问题:
  • XML格式错误、元素嵌套无效、缺失关联关系、违反Schema规则

Common Pitfalls

常见误区

  • Replace entire
    <w:r>
    elements
    : When adding tracked changes, replace the whole
    <w:r>...</w:r>
    block with
    <w:del>...<w:ins>...
    as siblings. Don't inject tracked change tags inside a run.
  • Preserve
    <w:rPr>
    formatting
    : Copy the original run's
    <w:rPr>
    block into your tracked change runs to maintain bold, font size, etc.

  • 替换完整的
    <w:r>
    元素
    :添加修订记录时,将整个
    <w:r>...</w:r>
    块替换为
    <w:del>...<w:ins>...
    作为同级元素。切勿在run内部插入修订记录标签。
  • 保留
    <w:rPr>
    格式
    :将原始run的
    <w:rPr>
    块复制到修订记录对应的run中,以保持加粗、字号等格式。

XML Reference

XML参考

Schema Compliance

Schema合规性

  • Element order in
    <w:pPr>
    :
    <w:pStyle>
    ,
    <w:numPr>
    ,
    <w:spacing>
    ,
    <w:ind>
    ,
    <w:jc>
    ,
    <w:rPr>
    last
  • Whitespace: Add
    xml:space="preserve"
    to
    <w:t>
    with leading/trailing spaces
  • RSIDs: Must be 8-digit hex (e.g.,
    00AB1234
    )
  • <w:pPr>
    中的元素顺序:
    <w:pStyle>
    ,
    <w:numPr>
    ,
    <w:spacing>
    ,
    <w:ind>
    ,
    <w:jc>
    ,
    <w:rPr>
    (最后)
  • 空格处理:带前后空格的
    <w:t>
    元素需添加
    xml:space="preserve"
    属性
  • RSIDs:必须为8位十六进制数(例如
    00AB1234

Tracked Changes

修订记录

Insertion:
xml
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:t>inserted text</w:t></w:r>
</w:ins>
Deletion:
xml
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
Inside
<w:del>
: Use
<w:delText>
instead of
<w:t>
, and
<w:delInstrText>
instead of
<w:instrText>
.
Minimal edits - only mark what changes:
xml
<!-- Change "30 days" to "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
  <w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
  <w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>
Deleting entire paragraphs/list items - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add
<w:del/>
inside
<w:pPr><w:rPr>
:
xml
<w:p>
  <w:pPr>
    <w:numPr>...</w:numPr>  <!-- list numbering if present -->
    <w:rPr>
      <w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
    </w:rPr>
  </w:pPr>
  <w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
    <w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
  </w:del>
</w:p>
Without the
<w:del/>
in
<w:pPr><w:rPr>
, accepting changes leaves an empty paragraph/list item.
Rejecting another author's insertion - nest deletion inside their insertion:
xml
<w:ins w:author="Jane" w:id="5">
  <w:del w:author="Claude" w:id="10">
    <w:r><w:delText>their inserted text</w:delText></w:r>
  </w:del>
</w:ins>
Restoring another author's deletion - add insertion after (don't modify their deletion):
xml
<w:del w:author="Jane" w:id="5">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
  <w:r><w:t>deleted text</w:t></w:r>
</w:ins>
插入内容:
xml
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:t>插入的文本</w:t></w:r>
</w:ins>
删除内容:
xml
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>删除的文本</w:delText></w:r>
</w:del>
<w:del>
内部:
使用
<w:delText>
代替
<w:t>
,使用
<w:delInstrText>
代替
<w:instrText>
最小化编辑 - 仅标记变更内容:
xml
<!-- 将"30天"改为"60天" -->
<w:r><w:t>期限为 </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
  <w:n><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
  <w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> 天。</w:t></w:r>
删除整个段落/列表项 - 删除段落所有内容时,还需标记段落标记为已删除,使其与下一段落合并。在
<w:pPr><w:rPr>
内添加
<w:del/>
xml
<w:p>
  <w:pPr>
    <w:numPr>...</w:numPr>  <!-- 若为列表项则保留编号配置 -->
    <w:rPr>
      <w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
    </w:rPr>
  </w:pPr>
  <w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
    <w:r><w:delText>待删除的整个段落内容...</w:delText></w:r>
  </w:del>
</w:p>
若不在
<w:pPr><w:rPr>
中添加
<w:del/>
,接受修订后会留下空段落/列表项。
拒绝其他作者的插入内容 - 在其插入内容内部嵌套删除标记:
xml
<w:ins w:author="Jane" w:id="5">
  <w:del w:author="Claude" w:id="10">
    <w:r><w:delText>他们插入的文本</w:delText></w:r>
  </w:del>
</w:ins>
恢复其他作者删除的内容 - 在删除标记后添加插入内容(请勿修改原删除标记):
xml
<w:del w:author="Jane" w:id="5">
  <w:r><w:delText>被删除的文本</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
  <w:r><w:t>被删除的文本</w:t></w:r>
</w:ins>

Comments

批注

After running
comment.py
(see Step 2), add markers to document.xml. For replies, use
--parent
flag and nest markers inside the parent's.
CRITICAL:
<w:commentRangeStart>
and
<w:commentRangeEnd>
are siblings of
<w:r>
, never inside
<w:r>
.
xml
<!-- Comment markers are direct children of w:p, never inside w:r -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>

<!-- Comment 0 with reply 1 nested inside -->
<w:commentRangeStart w:id="0"/>
  <w:commentRangeStart w:id="1"/>
  <w:r><w:t>text</w:t></w:r>
  <w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>
运行
comment.py
后(见步骤2),在document.xml中添加标记。回复批注时,使用
--parent
参数并将标记嵌套在父批注标记内。
重要提示:
<w:commentRangeStart>
<w:commentRangeEnd>
<w:r>
的同级元素,切勿嵌套在
<w:r>
内部。
xml
<!-- 批注标记是w:p的直接子元素,切勿嵌套在w:r内部 -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>已删除</w:delText></w:r>
</w:del>
<w:r><w:t> 更多文本</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>

<!-- 批注0包含嵌套的回复批注1 -->
<w:commentRangeStart w:id="0"/>
  <w:commentRangeStart w:id="1"/>
  <w:r><w:t>文本</w:t></w:r>
  <w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>

Images

图片

  1. Add image file to
    word/media/
  2. Add relationship to
    word/_rels/document.xml.rels
    :
xml
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>
  1. Add content type to
    [Content_Types].xml
    :
xml
<Default Extension="png" ContentType="image/png"/>
  1. Reference in document.xml:
xml
<w:drawing>
  <wp:inline>
    <wp:extent cx="914400" cy="914400"/>  <!-- EMUs: 914400 = 1 inch -->
    <a:graphic>
      <a:graphicData uri=".../picture">
        <pic:pic>
          <pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
        </pic:pic>
      </a:graphicData>
    </a:graphic>
  </wp:inline>
</w:drawing>

  1. 将图片文件添加到
    word/media/
    目录
  2. word/_rels/document.xml.rels
    中添加关联关系:
xml
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>
  1. [Content_Types].xml
    中添加内容类型:
xml
<Default Extension="png" ContentType="image/png"/>
  1. 在document.xml中引用图片:
xml
<w:drawing>
  <wp:inline>
    <wp:extent cx="914400" cy="914400"/>  <!-- EMUs:914400 = 1英寸 -->
    <a:graphic>
      <a:graphicData uri=".../picture">
        <pic:pic>
          <pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
        </pic:pic>
      </a:graphicData>
    </a:graphic>
  </wp:inline>
</w:drawing>

Dependencies

依赖项

  • pandoc: Text extraction
  • docx:
    npm install -g docx
    (new documents)
  • LibreOffice: PDF conversion (auto-configured for sandboxed environments via
    scripts/office/soffice.py
    )
  • Poppler:
    pdftoppm
    for images
  • pandoc:文本提取
  • docx
    npm install -g docx
    (创建新文档)
  • LibreOffice:PDF转换(通过
    scripts/office/soffice.py
    在沙箱环境中自动配置)
  • Poppler
    pdftoppm
    图片转换工具