docx-format-replicator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DOCX Format Replicator

DOCX 格式复制工具

Overview

概述

Extract formatting information from existing Word documents (.docx) and use it to generate new documents with identical formatting but different content. This skill enables creating document templates, maintaining consistent formatting across multiple documents, and replicating complex Word document structures.
从现有Word文档(.docx)中提取格式信息,并使用该信息生成格式完全相同但内容不同的新文档。此功能可用于创建文档模板、在多份文档中保持一致格式,以及复制复杂的Word文档结构。

When to Use This Skill

适用场景

Use this skill when the user:
  • Wants to extract formatting from an existing Word document
  • Needs to create multiple documents with the same format
  • Has a template document and wants to generate similar documents with new content
  • Asks to "replicate", "copy format", "use the same style", or "create a document like"
  • Mentions document templates, corporate standards, or format consistency
当用户有以下需求时,可使用此功能:
  • 想要从现有Word文档中提取格式
  • 需要创建多份格式相同的文档
  • 已有模板文档,想要生成内容不同的同类文档
  • 要求“复制格式”“使用相同样式”“创建类似文档”
  • 提及文档模板、企业标准或格式一致性

Workflow

工作流程

Step 1: Extract Format from Template

步骤1:从模板中提取格式

Extract formatting information from an existing Word document to create a reusable format configuration.
bash
python scripts/extract_format.py <template.docx> <output.json>
Example:
bash
python scripts/extract_format.py "HY研制任务书.docx" format_template.json
What Gets Extracted:
  • Style definitions (fonts, sizes, colors, alignment)
  • Paragraph and character styles
  • Numbering schemes (1, 1.1, 1.1.1, etc.)
  • Table structures and styles
  • Header and footer configurations
Output: JSON file containing all format information (see
references/format_config_schema.md
for details)
从现有Word文档中提取格式信息,创建可复用的格式配置文件。
bash
python scripts/extract_format.py <template.docx> <output.json>
示例:
bash
python scripts/extract_format.py "HY研制任务书.docx" format_template.json
提取内容:
  • 样式定义(字体、字号、颜色、对齐方式)
  • 段落和字符样式
  • 编号方案(1、1.1、1.1.1等)
  • 表格结构和样式
  • 页眉和页脚配置
输出:包含所有格式信息的JSON文件(详情请参阅
references/format_config_schema.md

Step 2: Prepare Content Data

步骤2:准备内容数据

Create a JSON file with the actual content for the new document. The content must follow the structure defined in
references/content_data_schema.md
.
Content Structure:
json
{
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "version": "1.0",
    "date": "2025-01-15"
  },
  "sections": [
    {
      "type": "heading",
      "content": "Section Title",
      "level": 1,
      "number": "1"
    },
    {
      "type": "paragraph",
      "content": "Paragraph text content."
    },
    {
      "type": "table",
      "rows": 3,
      "cells": [
        ["Header 1", "Header 2"],
        ["Data 1", "Data 2"]
      ]
    }
  ]
}
Supported Section Types:
  • heading
    - Headings with optional numbering
  • paragraph
    - Text paragraphs
  • table
    - Tables with configurable rows and columns
  • page_break
    - Page breaks
See
assets/example_content.json
for a complete example.
创建包含新文档实际内容的JSON文件,内容结构需符合
references/content_data_schema.md
中的定义。
内容结构:
json
{
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "version": "1.0",
    "date": "2025-01-15"
  },
  "sections": [
    {
      "type": "heading",
      "content": "Section Title",
      "level": 1,
      "number": "1"
    },
    {
      "type": "paragraph",
      "content": "Paragraph text content."
    },
    {
      "type": "table",
      "rows": 3,
      "cells": [
        ["Header 1", "Header 2"],
        ["Data 1", "Data 2"]
      ]
    }
  ]
}
支持的章节类型:
  • heading
    - 可带编号的标题
  • paragraph
    - 文本段落
  • table
    - 可配置行列的表格
  • page_break
    - 分页符
完整示例请参阅
assets/example_content.json

Step 3: Generate New Document

步骤3:生成新文档

Generate a new Word document using the extracted format and prepared content.
bash
python scripts/generate_document.py <format.json> <content.json> <output.docx>
Example:
bash
python scripts/generate_document.py format_template.json new_content.json output_document.docx
Result: A new .docx file with the format from the template applied to the new content.
使用提取的格式和准备好的内容生成新的Word文档。
bash
python scripts/generate_document.py <format.json> <content.json> <output.docx>
示例:
bash
python scripts/generate_document.py format_template.json new_content.json output_document.docx
结果:一个新的.docx文件,将模板中的格式应用到新内容上。

Complete Example Workflow

完整示例流程

User asks: "I have a research task document. I need to create 5 more documents with the same format but different content."
  1. Extract the format:
bash
python scripts/extract_format.py research_task_template.docx template_format.json
  1. Create content files for each new document (content1.json, content2.json, etc.)
  2. Generate documents:
bash
python scripts/generate_document.py template_format.json content1.json document1.docx
python scripts/generate_document.py template_format.json content2.json document2.docx
用户提问:“我有一份研究任务文档,需要再创建5份格式相同但内容不同的文档。”
  1. 提取格式:
bash
python scripts/extract_format.py research_task_template.docx template_format.json
  1. 为每份新文档创建内容文件(content1.json、content2.json等)
  2. 生成文档:
bash
python scripts/generate_document.py template_format.json content1.json document1.docx
python scripts/generate_document.py template_format.json content2.json document2.docx

... repeat for all documents

... 重复此步骤生成所有文档

undefined
undefined

Common Use Cases

常见使用场景

Corporate Document Templates

企业文档模板

Extract format from a company template and generate reports, proposals, or specifications with consistent branding.
bash
undefined
从公司模板中提取格式,生成格式一致、带有品牌标识的报告、提案或规范文档。
bash
undefined

One-time: Extract company template

一次性操作:提取公司模板格式

python scripts/extract_format.py "Company Template.docx" company_format.json
python scripts/extract_format.py "Company Template.docx" company_format.json

For each new document:

生成每份新文档时:

python scripts/generate_document.py company_format.json new_report.json "Monthly Report.docx"
undefined
python scripts/generate_document.py company_format.json new_report.json "Monthly Report.docx"
undefined

Technical Documentation Series

技术文档系列

Create multiple technical documents (specifications, test plans, manuals) with identical formatting.
bash
undefined
创建多份格式完全相同的技术文档(规范、测试计划、手册等)。
bash
undefined

Extract from specification template

从规范模板提取格式

python scripts/extract_format.py spec_template.docx spec_format.json
python scripts/extract_format.py spec_template.docx spec_format.json

Generate multiple specs

生成多份规范文档

python scripts/generate_document.py spec_format.json product_a_spec.json "Product A Spec.docx" python scripts/generate_document.py spec_format.json product_b_spec.json "Product B Spec.docx"
undefined
python scripts/generate_document.py spec_format.json product_a_spec.json "Product A Spec.docx" python scripts/generate_document.py spec_format.json product_b_spec.json "Product B Spec.docx"
undefined

Research Task Documents

研究任务文档

The included example template (
assets/hy_template_format.json
) demonstrates a complete research task document format with:
  • Approval/review table in header
  • Multi-level numbering (1, 1.1, 1.1.1)
  • Technical specification tables
  • Structured sections
Use this as a starting point for similar technical documents.
附带的示例模板(
assets/hy_template_format.json
)展示了一份完整技术研究任务文档的格式,包括:
  • 页眉中的审批/评审表格
  • 多级编号(1、1.1、1.1.1)
  • 技术规范表格
  • 结构化章节
可将其作为同类技术文档的起点。

Advanced Usage

高级用法

Customizing Extraction

自定义提取

Modify
scripts/extract_format.py
to extract additional properties not covered by default:
  • Custom XML elements
  • Advanced table features (merged cells, borders)
  • Embedded objects
  • Custom properties
修改
scripts/extract_format.py
以提取默认未覆盖的额外属性:
  • 自定义XML元素
  • 高级表格功能(合并单元格、边框)
  • 嵌入对象
  • 自定义属性

Extending Content Types

扩展内容类型

Add new section types in
scripts/generate_document.py
:
  • Images with captions
  • Bulleted or numbered lists
  • Footnotes and endnotes
  • Custom content blocks
See
references/content_data_schema.md
for extension guidelines.
scripts/generate_document.py
中添加新的章节类型:
  • 带标题的图片
  • 项目符号或编号列表
  • 脚注和尾注
  • 自定义内容块
扩展指南请参阅
references/content_data_schema.md

Batch Processing

批量处理

Create a wrapper script to generate multiple documents:
python
import json
import subprocess

format_file = "template_format.json"
content_files = ["content1.json", "content2.json", "content3.json"]

for i, content_file in enumerate(content_files, 1):
    output = f"document_{i}.docx"
    subprocess.run([
        "python", "scripts/generate_document.py",
        format_file, content_file, output
    ])
创建包装脚本以批量生成多份文档:
python
import json
import subprocess

format_file = "template_format.json"
content_files = ["content1.json", "content2.json", "content3.json"]

for i, content_file in enumerate(content_files, 1):
    output = f"document_{i}.docx"
    subprocess.run([
        "python", "scripts/generate_document.py",
        format_file, content_file, output
    ])

Dependencies

依赖项

The scripts require:
  • Python 3.7+
  • python-docx
    library:
    pip install python-docx
No additional dependencies are needed for the core functionality.
这些脚本需要:
  • Python 3.7+
  • python-docx
    库:
    pip install python-docx
核心功能无需其他额外依赖。

Resources

资源

scripts/

scripts/

  • extract_format.py - Extract formatting from Word documents
  • generate_document.py - Generate new documents from format + content
Both scripts include built-in help:
bash
python scripts/extract_format.py --help
python scripts/generate_document.py --help
  • extract_format.py - 从Word文档中提取格式
  • generate_document.py - 基于格式+内容生成新文档
两个脚本均内置帮助信息:
bash
python scripts/extract_format.py --help
python scripts/generate_document.py --help

references/

references/

  • format_config_schema.md - Complete schema for format configuration files
  • content_data_schema.md - Complete schema for content data files
Read these for detailed information on file structures and available options.
  • format_config_schema.md - 格式配置文件的完整 schema
  • content_data_schema.md - 内容数据文件的完整 schema
如需了解文件结构和可用选项的详细信息,请阅读这些文档。

assets/

assets/

  • hy_template_format.json - Example extracted format from a technical research task document
  • example_content.json - Example content data showing all section types
Use these as references when creating your own format and content files.
  • hy_template_format.json - 从技术研究任务文档提取的示例格式
  • example_content.json - 展示所有章节类型的示例内容数据
创建自己的格式和内容文件时,可将这些作为参考。

Troubleshooting

故障排除

Missing styles in output: Ensure style IDs in content data match those in format config. Check
format.json
for available style IDs.
Table formatting issues: Verify table dimensions (rows/columns) match between content data and format config. See
format_config_schema.md
for table structure.
Font not displaying correctly: Some fonts may not be available on all systems. Check that referenced fonts are installed.
Dependencies missing: Install required Python packages:
bash
pip install python-docx
输出中缺少样式:确保内容数据中的样式ID与格式配置中的一致。请查看
format.json
获取可用的样式ID。
表格格式问题:验证内容数据和格式配置中的表格尺寸(行/列)是否匹配。有关表格结构的详情,请参阅
format_config_schema.md
字体显示不正确:部分字体可能并非在所有系统上都可用,请检查引用的字体是否已安装。
缺少依赖项:安装所需的Python包:
bash
pip install python-docx

Tips

提示

  1. Test with examples first: Use the included
    hy_template_format.json
    and
    example_content.json
    to understand the workflow before extracting your own formats.
  2. Start simple: Begin with basic headings and paragraphs, then add tables and complex formatting.
  3. Validate JSON: Use a JSON validator to check content data files before generating documents.
  4. Keep format configs: Store extracted format configurations for reuse across multiple projects.
  5. Version control: Track both format configs and content data in version control for reproducible document generation.
  1. 先使用示例测试:在提取自己的格式之前,先使用附带的
    hy_template_format.json
    example_content.json
    了解工作流程。
  2. 从简单开始:先从基本标题和段落入手,再添加表格和复杂格式。
  3. 验证JSON:在生成文档之前,使用JSON验证工具检查内容数据文件。
  4. 保存格式配置:保存提取的格式配置,以便在多个项目中重复使用。
  5. 版本控制:将格式配置和内容数据纳入版本控制,以实现可复现的文档生成。