docx-format-replicator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDOCX Format Replicator
DOCX 格式复制工具
Overview
概述
Extract formatting information from existing Word documents (.docx) and use it to generate new documents with identical formatting but different content. This skill enables creating document templates, maintaining consistent formatting across multiple documents, and replicating complex Word document structures.
从现有Word文档(.docx)中提取格式信息,并使用该信息生成格式完全相同但内容不同的新文档。此功能可用于创建文档模板、在多份文档中保持一致格式,以及复制复杂的Word文档结构。
When to Use This Skill
适用场景
Use this skill when the user:
- Wants to extract formatting from an existing Word document
- Needs to create multiple documents with the same format
- Has a template document and wants to generate similar documents with new content
- Asks to "replicate", "copy format", "use the same style", or "create a document like"
- Mentions document templates, corporate standards, or format consistency
当用户有以下需求时,可使用此功能:
- 想要从现有Word文档中提取格式
- 需要创建多份格式相同的文档
- 已有模板文档,想要生成内容不同的同类文档
- 要求“复制格式”“使用相同样式”“创建类似文档”
- 提及文档模板、企业标准或格式一致性
Workflow
工作流程
Step 1: Extract Format from Template
步骤1:从模板中提取格式
Extract formatting information from an existing Word document to create a reusable format configuration.
bash
python scripts/extract_format.py <template.docx> <output.json>Example:
bash
python scripts/extract_format.py "HY研制任务书.docx" format_template.jsonWhat Gets Extracted:
- Style definitions (fonts, sizes, colors, alignment)
- Paragraph and character styles
- Numbering schemes (1, 1.1, 1.1.1, etc.)
- Table structures and styles
- Header and footer configurations
Output: JSON file containing all format information (see for details)
references/format_config_schema.md从现有Word文档中提取格式信息,创建可复用的格式配置文件。
bash
python scripts/extract_format.py <template.docx> <output.json>示例:
bash
python scripts/extract_format.py "HY研制任务书.docx" format_template.json提取内容:
- 样式定义(字体、字号、颜色、对齐方式)
- 段落和字符样式
- 编号方案(1、1.1、1.1.1等)
- 表格结构和样式
- 页眉和页脚配置
输出:包含所有格式信息的JSON文件(详情请参阅)
references/format_config_schema.mdStep 2: Prepare Content Data
步骤2:准备内容数据
Create a JSON file with the actual content for the new document. The content must follow the structure defined in .
references/content_data_schema.mdContent Structure:
json
{
"metadata": {
"title": "Document Title",
"author": "Author Name",
"version": "1.0",
"date": "2025-01-15"
},
"sections": [
{
"type": "heading",
"content": "Section Title",
"level": 1,
"number": "1"
},
{
"type": "paragraph",
"content": "Paragraph text content."
},
{
"type": "table",
"rows": 3,
"cells": [
["Header 1", "Header 2"],
["Data 1", "Data 2"]
]
}
]
}Supported Section Types:
- - Headings with optional numbering
heading - - Text paragraphs
paragraph - - Tables with configurable rows and columns
table - - Page breaks
page_break
See for a complete example.
assets/example_content.json创建包含新文档实际内容的JSON文件,内容结构需符合中的定义。
references/content_data_schema.md内容结构:
json
{
"metadata": {
"title": "Document Title",
"author": "Author Name",
"version": "1.0",
"date": "2025-01-15"
},
"sections": [
{
"type": "heading",
"content": "Section Title",
"level": 1,
"number": "1"
},
{
"type": "paragraph",
"content": "Paragraph text content."
},
{
"type": "table",
"rows": 3,
"cells": [
["Header 1", "Header 2"],
["Data 1", "Data 2"]
]
}
]
}支持的章节类型:
- - 可带编号的标题
heading - - 文本段落
paragraph - - 可配置行列的表格
table - - 分页符
page_break
完整示例请参阅。
assets/example_content.jsonStep 3: Generate New Document
步骤3:生成新文档
Generate a new Word document using the extracted format and prepared content.
bash
python scripts/generate_document.py <format.json> <content.json> <output.docx>Example:
bash
python scripts/generate_document.py format_template.json new_content.json output_document.docxResult: A new .docx file with the format from the template applied to the new content.
使用提取的格式和准备好的内容生成新的Word文档。
bash
python scripts/generate_document.py <format.json> <content.json> <output.docx>示例:
bash
python scripts/generate_document.py format_template.json new_content.json output_document.docx结果:一个新的.docx文件,将模板中的格式应用到新内容上。
Complete Example Workflow
完整示例流程
User asks: "I have a research task document. I need to create 5 more documents with the same format but different content."
- Extract the format:
bash
python scripts/extract_format.py research_task_template.docx template_format.json-
Create content files for each new document (content1.json, content2.json, etc.)
-
Generate documents:
bash
python scripts/generate_document.py template_format.json content1.json document1.docx
python scripts/generate_document.py template_format.json content2.json document2.docx用户提问:“我有一份研究任务文档,需要再创建5份格式相同但内容不同的文档。”
- 提取格式:
bash
python scripts/extract_format.py research_task_template.docx template_format.json-
为每份新文档创建内容文件(content1.json、content2.json等)
-
生成文档:
bash
python scripts/generate_document.py template_format.json content1.json document1.docx
python scripts/generate_document.py template_format.json content2.json document2.docx... repeat for all documents
... 重复此步骤生成所有文档
undefinedundefinedCommon Use Cases
常见使用场景
Corporate Document Templates
企业文档模板
Extract format from a company template and generate reports, proposals, or specifications with consistent branding.
bash
undefined从公司模板中提取格式,生成格式一致、带有品牌标识的报告、提案或规范文档。
bash
undefinedOne-time: Extract company template
一次性操作:提取公司模板格式
python scripts/extract_format.py "Company Template.docx" company_format.json
python scripts/extract_format.py "Company Template.docx" company_format.json
For each new document:
生成每份新文档时:
python scripts/generate_document.py company_format.json new_report.json "Monthly Report.docx"
undefinedpython scripts/generate_document.py company_format.json new_report.json "Monthly Report.docx"
undefinedTechnical Documentation Series
技术文档系列
Create multiple technical documents (specifications, test plans, manuals) with identical formatting.
bash
undefined创建多份格式完全相同的技术文档(规范、测试计划、手册等)。
bash
undefinedExtract from specification template
从规范模板提取格式
python scripts/extract_format.py spec_template.docx spec_format.json
python scripts/extract_format.py spec_template.docx spec_format.json
Generate multiple specs
生成多份规范文档
python scripts/generate_document.py spec_format.json product_a_spec.json "Product A Spec.docx"
python scripts/generate_document.py spec_format.json product_b_spec.json "Product B Spec.docx"
undefinedpython scripts/generate_document.py spec_format.json product_a_spec.json "Product A Spec.docx"
python scripts/generate_document.py spec_format.json product_b_spec.json "Product B Spec.docx"
undefinedResearch Task Documents
研究任务文档
The included example template () demonstrates a complete research task document format with:
assets/hy_template_format.json- Approval/review table in header
- Multi-level numbering (1, 1.1, 1.1.1)
- Technical specification tables
- Structured sections
Use this as a starting point for similar technical documents.
附带的示例模板()展示了一份完整技术研究任务文档的格式,包括:
assets/hy_template_format.json- 页眉中的审批/评审表格
- 多级编号(1、1.1、1.1.1)
- 技术规范表格
- 结构化章节
可将其作为同类技术文档的起点。
Advanced Usage
高级用法
Customizing Extraction
自定义提取
Modify to extract additional properties not covered by default:
scripts/extract_format.py- Custom XML elements
- Advanced table features (merged cells, borders)
- Embedded objects
- Custom properties
修改以提取默认未覆盖的额外属性:
scripts/extract_format.py- 自定义XML元素
- 高级表格功能(合并单元格、边框)
- 嵌入对象
- 自定义属性
Extending Content Types
扩展内容类型
Add new section types in :
scripts/generate_document.py- Images with captions
- Bulleted or numbered lists
- Footnotes and endnotes
- Custom content blocks
See for extension guidelines.
references/content_data_schema.md在中添加新的章节类型:
scripts/generate_document.py- 带标题的图片
- 项目符号或编号列表
- 脚注和尾注
- 自定义内容块
扩展指南请参阅。
references/content_data_schema.mdBatch Processing
批量处理
Create a wrapper script to generate multiple documents:
python
import json
import subprocess
format_file = "template_format.json"
content_files = ["content1.json", "content2.json", "content3.json"]
for i, content_file in enumerate(content_files, 1):
output = f"document_{i}.docx"
subprocess.run([
"python", "scripts/generate_document.py",
format_file, content_file, output
])创建包装脚本以批量生成多份文档:
python
import json
import subprocess
format_file = "template_format.json"
content_files = ["content1.json", "content2.json", "content3.json"]
for i, content_file in enumerate(content_files, 1):
output = f"document_{i}.docx"
subprocess.run([
"python", "scripts/generate_document.py",
format_file, content_file, output
])Dependencies
依赖项
The scripts require:
- Python 3.7+
- library:
python-docxpip install python-docx
No additional dependencies are needed for the core functionality.
这些脚本需要:
- Python 3.7+
- 库:
python-docxpip install python-docx
核心功能无需其他额外依赖。
Resources
资源
scripts/
scripts/
- extract_format.py - Extract formatting from Word documents
- generate_document.py - Generate new documents from format + content
Both scripts include built-in help:
bash
python scripts/extract_format.py --help
python scripts/generate_document.py --help- extract_format.py - 从Word文档中提取格式
- generate_document.py - 基于格式+内容生成新文档
两个脚本均内置帮助信息:
bash
python scripts/extract_format.py --help
python scripts/generate_document.py --helpreferences/
references/
- format_config_schema.md - Complete schema for format configuration files
- content_data_schema.md - Complete schema for content data files
Read these for detailed information on file structures and available options.
- format_config_schema.md - 格式配置文件的完整 schema
- content_data_schema.md - 内容数据文件的完整 schema
如需了解文件结构和可用选项的详细信息,请阅读这些文档。
assets/
assets/
- hy_template_format.json - Example extracted format from a technical research task document
- example_content.json - Example content data showing all section types
Use these as references when creating your own format and content files.
- hy_template_format.json - 从技术研究任务文档提取的示例格式
- example_content.json - 展示所有章节类型的示例内容数据
创建自己的格式和内容文件时,可将这些作为参考。
Troubleshooting
故障排除
Missing styles in output: Ensure style IDs in content data match those in format config. Check for available style IDs.
format.jsonTable formatting issues: Verify table dimensions (rows/columns) match between content data and format config. See for table structure.
format_config_schema.mdFont not displaying correctly: Some fonts may not be available on all systems. Check that referenced fonts are installed.
Dependencies missing: Install required Python packages:
bash
pip install python-docx输出中缺少样式:确保内容数据中的样式ID与格式配置中的一致。请查看获取可用的样式ID。
format.json表格格式问题:验证内容数据和格式配置中的表格尺寸(行/列)是否匹配。有关表格结构的详情,请参阅。
format_config_schema.md字体显示不正确:部分字体可能并非在所有系统上都可用,请检查引用的字体是否已安装。
缺少依赖项:安装所需的Python包:
bash
pip install python-docxTips
提示
-
Test with examples first: Use the includedand
hy_template_format.jsonto understand the workflow before extracting your own formats.example_content.json -
Start simple: Begin with basic headings and paragraphs, then add tables and complex formatting.
-
Validate JSON: Use a JSON validator to check content data files before generating documents.
-
Keep format configs: Store extracted format configurations for reuse across multiple projects.
-
Version control: Track both format configs and content data in version control for reproducible document generation.
- 先使用示例测试:在提取自己的格式之前,先使用附带的和
hy_template_format.json了解工作流程。example_content.json - 从简单开始:先从基本标题和段落入手,再添加表格和复杂格式。
- 验证JSON:在生成文档之前,使用JSON验证工具检查内容数据文件。
- 保存格式配置:保存提取的格式配置,以便在多个项目中重复使用。
- 版本控制:将格式配置和内容数据纳入版本控制,以实现可复现的文档生成。