image-to-editable-ppt
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseImage to Editable PPT
Image to Editable PPT
Overview
Overview
这个 skill 用于把视觉型幻灯片输入重建成对象级可编辑的 PowerPoint 。
.pptx输入可以是一张图片、多张图片、PDF、图片版 PPT/PPTX。输出始终是 。目标不是把整页截图包进 PPT,而是尽量让可读文字成为原生 PowerPoint 文本框,让基础结构成为原生形状,让复杂视觉元素成为独立图片资产,并用 manifest、预览和验证报告保证结果可检查、可返工。
.pptx默认取舍:对象级可编辑率优先。宁可视觉略粗糙,也不要用整页 raster 冒充可编辑 PPT。
This skill is used to reconstruct visual slide inputs into object-level editable PowerPoint files.
.pptxInputs can be a single image, multiple images, PDF files, or image-based PPT/PPTX files. The output is always . The goal is not to embed full-page screenshots into PPT, but to make readable text into native PowerPoint text boxes as much as possible, basic structures into native shapes, complex visual elements into independent image assets, and use manifests, previews, and validation reports to ensure results are inspectable and reworkable.
.pptxDefault trade-off: Prioritize object-level editability. It is better to have slightly rough visuals than to use full-page raster images to pretend to be editable PPT.
Hard Constraints
Hard Constraints
- 每一个来源页面都必须由 page subagent 重建,包括单张图片输入。
- 主 agent 不做页面重建,只做 orchestration。
- 不设计父 agent 单独执行、顺序降级执行或低保真降级模式。没有可用 page subagent 就停止,不进入页面重建。
- 所有生图、改图、背景修复、透明 bitmap 资产和 asset sheet 都必须使用 skill。
$imagegen - 的默认路径是 built-in
$imagegen。不要在本 skill 里直接调用 Image API。image_gen - 如果页面需要 ,但
$imagegen或 built-in$imagegen不可用,停止该页并报告 blocker,不伪造资产。image_gen - 原始整页 加可编辑文本覆盖是失败模式,不是 fallback。
source.png - 只有基础 primitive 和简单结构对象可以用原生 PPT shape。非文字视觉对象不确定时,默认用 重绘成独立资产。
$imagegen - page worker 必须先做文字字号、视觉对象、背景策略和形状角形的清单校准,再写 manifest;不能靠审美猜测默认字号或默认圆角。
- 关键状态只能由脚本推进。agent 不能手写 JSON 把 page、imagegen job 或 run 标成完成。
- Every source page must be reconstructed by a page subagent, including single-image inputs.
- The main agent does not perform page reconstruction, only orchestration.
- Do not design a parent agent to execute alone, perform sequential degradation, or use low-fidelity degradation mode. Stop if no available page subagent is present, do not proceed to page reconstruction.
- All image generation, image modification, background repair, transparent bitmap assets, and asset sheets must use the skill.
$imagegen - The default path for is the built-in
$imagegen. Do not directly call Image APIs in this skill.image_gen - If a page requires but
$imagegenor the built-in$imagegenis unavailable, stop processing this page and report the blocker, do not forge assets.image_gen - Overlaying editable text on the original full-page is a failure mode, not a fallback.
source.png - Only basic primitives and simple structural objects can use native PPT shapes. For non-text visual objects with uncertainty, default to using to redraw them into independent assets.
$imagegen - Page workers must first calibrate the checklist of text font sizes, visual objects, background strategies, and shape corners before writing the manifest; cannot rely on aesthetic guesses for default font sizes or default rounded corners.
- Key states can only be advanced by scripts. Agents cannot manually write JSON to mark pages, imagegen jobs, or runs as completed.
Visible Progress Plan
Visible Progress Plan
正常运行时,主 agent 必须保持一个用户可见 checklist,同一时间只有一个 active step:
- 准备输入和任务目录。
- 分派页面重建。
- 重建页面对象。
- 检查并修复页面。
- 组装和验证 PPTX。
完成条件:
- :
准备输入和任务目录、deck_manifest.json、page_jobs.json、pages/page_NNN/source.png已存在。notes_manifest.json - :主 agent 按
分派页面重建分批 spawn page subagent;每个已 spawn page 都由max_concurrent_pages记录为 dispatched。如果不能继续 spawn subagent,停在这里并报告 blocker。record_page_dispatch.py - :每个 page 都由 page worker 产出
重建页面对象、manifest.json、page.pptx、preview.png、split_assets_contact.png、validation.json。page_result.json - :所有 page 通过
检查并修复页面记录,repair queue 清空;无法修复时报告 blocker。record_page_result.py - :
组装和验证 PPTX和final/<origin>_edited.pptx已存在。final/validation.json
不要只因为聊天里说完成就标记步骤完成;必须有真实文件或脚本推进的状态。
During normal operation, the main agent must maintain a user-visible checklist with only one active step at a time:
- Prepare input and task directories.
- Dispatch page reconstruction tasks.
- Reconstruct page objects.
- Inspect and repair pages.
- Assemble and validate PPTX.
Completion criteria:
- :
Prepare input and task directories,deck_manifest.json,page_jobs.json,pages/page_NNN/source.pngexist.notes_manifest.json - : The main agent spawns page subagents in batches according to
Dispatch page reconstruction tasks; each spawned page is recorded as dispatched bymax_concurrent_pages. If subagents cannot be spawned continuously, stop here and report the blocker.record_page_dispatch.py - : Each page produces
Reconstruct page objects,manifest.json,page.pptx,preview.png,split_assets_contact.png,validation.jsonby the page worker.page_result.json - : All pages are recorded by
Inspect and repair pages, and the repair queue is cleared; report blockers if repairs are not possible.record_page_result.py - :
Assemble and validate PPTXandfinal/<origin>_edited.pptxexist.final/validation.json
Do not mark steps as completed just because it is stated in the chat; real files or script-advanced states are required.
Default Workflow
Default Workflow
- 运行 创建 run 目录、归一化输入、生成 deck/page manifest 和 page request。
prepare_deck_run.py - 运行 查看待分派页面、active dispatches 和可用 dispatch slot。
page_job_status.py - 主 agent 按 分批 spawn 普通 Codex worker subagent;不要一次性 spawn 超过运行时并发上限。
max_concurrent_pages - spawn 后立即运行 记录 dispatch。
record_page_dispatch.py - 每个 page worker 只在自己的 page 目录内工作,完成 page-level build、preview、contact sheet、validation。
- page worker 返回后,主 agent 运行 检查文件、路径和 hash,并推进 page 状态。
record_page_result.py - 再次运行 ;如果还有 pending/repair_needed page,就继续下一批分派。
page_job_status.py - 如有页面问题,运行 生成 repair item,再分批分派 repair worker。
queue_page_repairs.py - 所有 page accepted 后,运行 组装最终 PPTX、复制 notes、运行 deck validation 和 QA summary。
finalize_deck_run.py
正常主入口是 。不再保留旧输入归一化脚本作为公开入口或兼容 wrapper。
prepare_deck_run.py- Run to create the run directory, normalize inputs, and generate deck/page manifests and page requests.
prepare_deck_run.py - Run to view pending dispatch pages, active dispatches, and available dispatch slots.
page_job_status.py - The main agent spawns regular Codex worker subagents in batches according to ; do not spawn more than the runtime concurrency limit at once.
max_concurrent_pages - Immediately run to record the dispatch after spawning.
record_page_dispatch.py - Each page worker only works within its own page directory, completing page-level build, preview, contact sheet, and validation.
- After the page worker returns, the main agent runs to check files, paths, and hashes, and advance the page state.
record_page_result.py - Run again; if there are pending/repair_needed pages, continue dispatching the next batch.
page_job_status.py - If there are page issues, run to generate repair items, then dispatch repair workers in batches.
queue_page_repairs.py - After all pages are accepted, run to assemble the final PPTX, copy notes, and run deck validation and QA summary.
finalize_deck_run.py
The normal main entry is . Do not retain old input normalization scripts as public entries or compatibility wrappers.
prepare_deck_run.pyGeneration Delegation
Generation Delegation
使用 前必须读取并遵守:
$imagegentext
${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/SKILL.md本 skill 只组合 ,不重新定义图片生成 API 规则。
$imagegen页面内需要 的常见场景:
$imagegen- complex background 上有文字、图标或前景对象,需要 source-preserving foreground removal + localized background restoration。
- 图标、pictogram、徽章、贴纸、手绘标记、风格化箭头、装饰符号等需要作为独立资产。
- 需要生成 chroma-key asset sheet,再本地去底、切分、透明化。
- 需要 targeted repair 某个 clean base 或某个前景资产。
项目实际使用的生成图片必须复制到 page 目录,并通过 page-local 记录。不要让 manifest 引用只存在于 的图片。
imagegen-jobs.json$CODEX_HOME/generated_images/...复杂背景默认保留 source identity。可以用 生成 clean background,但必须把 source 当作 edit target 和强约束参考来修复/重建,不能生成一个“同类但不同”的新背景。遮挡少时优先局部修复;遮挡多或需要整张 clean base 时,也必须保留原始构图、透视、主要物体位置、色彩、光照和背景细节,并在 manifest 的 里记录保真策略。
$imagegenbackground_strategyBefore using , you must read and comply with:
$imagegentext
${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/SKILL.mdThis skill only combines and does not redefine image generation API rules.
$imagegenCommon scenarios requiring within a page:
$imagegen- Text, icons, or foreground objects on complex backgrounds, requiring source-preserving foreground removal + localized background restoration.
- Icons, pictograms, badges, stickers, hand-drawn marks, stylized arrows, decorative symbols, etc., that need to be treated as independent assets.
- Need to generate chroma-key asset sheets, then perform local background removal, segmentation, and transparency processing.
- Need targeted repair of a clean base or a foreground asset.
Generated images actually used in the project must be copied to the page directory and recorded via the page-local . Do not let the manifest reference images that only exist in .
imagegen-jobs.json$CODEX_HOME/generated_images/...Complex backgrounds default to retaining source identity. You can use to generate a clean background, but must use the source as the edit target and strong constraint reference for repair/reconstruction; cannot generate a new background that is "similar but different". Prioritize local repair when occlusion is minor; when occlusion is severe or a full clean base is needed, must retain the original composition, perspective, main object positions, colors, lighting, and background details, and record the fidelity strategy in of the manifest.
$imagegenbackground_strategySubagent Dispatch
Subagent Dispatch
page subagent 是唯一的页面重建执行者。主 agent 不重建页面。
每个 page worker 必须收到自包含 prompt,至少包含:
- run dir、page id、page dir、source image 绝对路径。
- 允许写入范围:只能写当前 page dir。
- 禁止写入范围:deck manifest、notes manifest、final deck、其他 page。
- 必读 reference:、
page-decision-tree.md、imagegen-integration.md、manifest-schema.md。qa-rubric.md - 必读 。
$imagegen/SKILL.md - required outputs 和返回格式。
page worker prompt 模板在 。
prompts/page-worker.md如果无法 spawn page subagent,停止并报告 blocker。不要顺序执行页面重建。
Page subagents are the only executors of page reconstruction. The main agent does not reconstruct pages.
Each page worker must receive a self-contained prompt that includes at least:
- Absolute paths of run dir, page id, page dir, and source image.
- Allowed write scope: Only the current page dir can be written to.
- Prohibited write scope: Deck manifest, notes manifest, final deck, other pages.
- Required references: ,
page-decision-tree.md,imagegen-integration.md,manifest-schema.md.qa-rubric.md - Required reading of .
$imagegen/SKILL.md - Required outputs and return format.
The page worker prompt template is in .
prompts/page-worker.mdIf page subagents cannot be spawned, stop and report the blocker. Do not perform sequential page reconstruction.
Rules
Rules
- 文字:所有可读文字都应成为可见原生 PPT text box。隐藏、透明、1 pt、off-canvas 或 metadata-only 文本不算可编辑文字。
- 字号:先根据 source 字形高度、容器高度和同行密度估算,再用 preview 对照缩放;不确定时偏小而不是偏大。manifest 必须记录 。
quality_checks.font_size_calibrated=true - 结构:只有基础 primitive 可以用原生 PPT shape,例如直线、矩形、圆形、表格线、坐标轴、简单柱状块、基础容器。矩形容器默认用 ;只有 source 明确是圆角时才用
rect,并记录roundRect或source_corner_radius_px。corner_reason - 前景:图标、pictogram、logo-like mark、手绘标记、贴纸、徽章、复杂箭头、装饰元素默认用 重绘成独立资产。若源图里的小型视觉对象需要高度一致、无可读文字、且重绘会改变身份,可作为独立 source-derived raster asset 裁出并记录来源区域;这不是整页截图 fallback。
$imagegen - 背景:纯色、简单渐变、规则网格、普通卡片可用原生或脚本重建;复杂照片、纹理、插画被前景遮挡时,用 做 inpainting/restoration。
$imagegen - asset sheet:默认用稀疏 chroma-key asset sheet 减少生图次数。元素间距优先,不能拥挤、粘连、互相投影。
- provenance:每个最终 raster asset 都必须有来源记录。不能把原始 source crop 当成默认视觉资产。
- QA:确定性 validation 必要但不充分。必须检查 和
preview.png。split_assets_contact.png - repair:修最小失败范围。不要为了一个文本框或一个图标重建整页。
- 状态:、
page_jobs.json的关键状态必须由脚本推进。imagegen-jobs.json
- Text: All readable text should become visible native PPT text boxes. Hidden, transparent, 1 pt, off-canvas, or metadata-only text does not count as editable text.
- Font size: First estimate based on source glyph height, container height, and peer density, then compare with preview for scaling; when uncertain, use a smaller size rather than a larger one. The manifest must record .
quality_checks.font_size_calibrated=true - Structure: Only basic primitives can use native PPT shapes, such as lines, rectangles, circles, table lines, axes, simple bar blocks, and basic containers. Rectangular containers default to using ; only use
rectif the source clearly has rounded corners, and recordroundRectorsource_corner_radius_px.corner_reason - Foreground: Icons, pictograms, logo-like marks, hand-drawn marks, stickers, badges, complex arrows, and decorative elements default to being redrawn into independent assets using . If small visual objects in the source image require high consistency, have no readable text, and redrawing would change their identity, they can be cropped as independent source-derived raster assets and their source areas recorded; this is not a full-page screenshot fallback.
$imagegen - Background: Solid colors, simple gradients, regular grids, and ordinary cards can be reconstructed natively or via scripts; when complex photos, textures, or illustrations are occluded by foreground objects, use for inpainting/restoration.
$imagegen - Asset sheet: Default to using sparse chroma-key asset sheets to reduce the number of image generation times. Prioritize element spacing; cannot be crowded, stuck together, or have overlapping projections.
- Provenance: Each final raster asset must have a source record. Do not treat original source crops as default visual assets.
- QA: Deterministic validation is necessary but not sufficient. Must check and
preview.png.split_assets_contact.png - Repair: Repair the minimum failed scope. Do not reconstruct the entire page for a single text box or icon.
- State: Key states in and
page_jobs.jsonmust be advanced by scripts.imagegen-jobs.json
Acceptance Criteria
Acceptance Criteria
- 输出是有效 。
.pptx - 单图输出 1 页;多图每图 1 页;PDF 第 N 页对应输出第 N 页;PPT/PPTX 第 N 页对应输出第 N 页。
- PPT/PPTX speaker notes 按页原样复制,不翻译、不摘要、不交给 page worker 改写。
- 每页有 、
manifest.json、page.pptx、preview.png、split_assets_contact.png、validation.json。page_result.json - 每页 source image size、text inventory、object decisions、asset provenance、known limits 都有记录。
- 每个 page 都由 记录 dispatch,并由
record_page_dispatch.py记录结果。record_page_result.py - 最终 deck 有 和
final/<origin>_edited.pptx。final/validation.json - 若出现 blocker,最终回复必须说明 blocker 阶段、证据路径和未完成原因;不能称为正常完成。
- The output is a valid file.
.pptx - Single image input outputs 1 page; multiple images output 1 page per image; page N of PDF corresponds to page N of output; page N of PPT/PPTX corresponds to page N of output.
- PPT/PPTX speaker notes are copied as-is per page, without translation, summarization, or rewriting by page workers.
- Each page has ,
manifest.json,page.pptx,preview.png,split_assets_contact.png,validation.json.page_result.json - Each page has records of source image size, text inventory, object decisions, asset provenance, and known limits.
- Each page is recorded as dispatched by and the result is recorded by
record_page_dispatch.py.record_page_result.py - The final deck has and
final/<origin>_edited.pptx.final/validation.json - If a blocker occurs, the final response must explain the blocker stage, evidence path, and reason for incompletion; cannot be called a normal completion.
Reference Map
Reference Map
- :职责边界、run/page 目录结构、owner 原则。
references/architecture.md - :run/page/imagegen 状态机和脚本推进规则。
references/state-machine.md - :page worker、repair worker 的提示词契约和返回格式。
references/subagent-contract.md - :如何组合
references/imagegen-integration.md,包括 clean base、asset sheet、透明化和记录。$imagegen - :页面分析、背景策略、前景/结构对象边界。
references/page-decision-tree.md - :deck/page/imagegen JSON schema 第一版。
references/manifest-schema.md - :结构、文字、资产、背景、视觉 QA 标准。
references/qa-rubric.md - :repair queue、最小返工范围和 blocker 判定。
references/repair-policy.md - :脚本职责、输入输出和允许调用者。
references/script-contracts.md - :普通页面重建 worker prompt。
prompts/page-worker.md - :页面修复 worker prompt。
prompts/page-repair-worker.md - :clean base 生成/编辑 prompt。
prompts/imagegen-clean-base.md - :稀疏 asset sheet prompt。
prompts/imagegen-asset-sheet.md - :targeted imagegen repair prompt。
prompts/imagegen-repair.md
- : Responsibility boundaries, run/page directory structure, owner principles.
references/architecture.md - : Run/page/imagegen state machine and script advancement rules.
references/state-machine.md - : Prompt contracts and return formats for page workers and repair workers.
references/subagent-contract.md - : How to combine
references/imagegen-integration.md, including clean base, asset sheet, transparency processing, and recording.$imagegen - : Page analysis, background strategies, foreground/structural object boundaries.
references/page-decision-tree.md - : First version of deck/page/imagegen JSON schema.
references/manifest-schema.md - : Structural, text, asset, background, and visual QA standards.
references/qa-rubric.md - : Repair queue, minimum rework scope, and blocker determination.
references/repair-policy.md - : Script responsibilities, inputs/outputs, and allowed callers.
references/script-contracts.md - : Prompt for regular page reconstruction workers.
prompts/page-worker.md - : Prompt for page repair workers.
prompts/page-repair-worker.md - : Prompt for clean base generation/editing.
prompts/imagegen-clean-base.md - : Prompt for sparse asset sheet generation.
prompts/imagegen-asset-sheet.md - : Prompt for targeted imagegen repair.
prompts/imagegen-repair.md