ppt-image-first-workflow
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseppt-image-first-workflow
ppt-image-first-workflow
Skill by ara.so — Daily 2026 Skills collection.
A conversation-first, image-first PPT workflow skill that takes a vague presentation request through structured stages: content baseline → style preview → plan lock → generation → review. Pages are rendered as full-image visuals via GPT Image 2 and packaged into PPTX containers — not drawn as native editable PowerPoint objects.
由ara.so开发的技能——Daily 2026 Skills合集。
这是一项对话优先、图像优先的PPT工作流技能,可将模糊的演示需求通过结构化阶段逐步落地:内容基线 → 风格预览 → 方案锁定 → 生成 → 审核。所有页面通过GPT Image 2渲染为整页图像,再打包为PPTX文件——并非以PowerPoint原生可编辑对象的形式绘制。
What This Project Does
项目功能
ppt-image-first- Collects minimal intake info (purpose, audience, page count, materials, identity anchors)
- Builds a if user materials are thin
content_report.md - Aligns style boundaries with 3 short questions
- Generates real image previews (cover, TOC, body pages) across multiple style directions
- Iterates on style until user confirms
- Runs a "style reverse inference" check to lock stable visual traits
- Produces planning artifacts: ,
design_spec.md,slide_blueprint.mdspec_lock.md - Generates final per-page images via GPT Image 2
- Packages images into a file
.pptx - Runs a structured review-and-retouch loop
Output type: Image-first PPTX — each slide is a full-page rendered image. Text/shapes inside slides are NOT individually editable PowerPoint objects.
ppt-image-first- 收集最少的必要信息(演示目的、受众、页数、已有素材、品牌标识)
- 若用户素材不足,生成文档
content_report.md - 通过3个简短问题确定风格边界
- 针对多种风格方向生成真实图像预览(封面、目录、正文页)
- 迭代风格直至用户确认
- 执行「风格反向推理」检查,锁定稳定视觉特征
- 生成规划文档:、
design_spec.md、slide_blueprint.mdspec_lock.md - 通过GPT Image 2生成最终单页图像
- 将图像打包为文件
.pptx - 运行结构化审核与修改循环
输出类型: 图像优先的PPTX文件——每张幻灯片都是整页渲染图像。幻灯片内的文本/图形并非可单独编辑的PowerPoint对象。
Installation
安装步骤
bash
undefinedbash
undefinedClone the repository
Clone the repository
git clone https://github.com/NyxTides/ppt-image-first.git
cd ppt-image-first
git clone https://github.com/NyxTides/ppt-image-first.git
cd ppt-image-first
Install Python dependencies
Install Python dependencies
pip install -r requirements.txt
pip install -r requirements.txt
Copy the skill file into your agent's skill directory
Copy the skill file into your agent's skill directory
For Claude Code:
For Claude Code:
cp SKILL.md ~/.claude/skills/ppt-image-first.md
cp SKILL.md ~/.claude/skills/ppt-image-first.md
For Codex CLI:
For Codex CLI:
cp SKILL.md ~/.codex/skills/ppt-image-first.md
cp SKILL.md ~/.codex/skills/ppt-image-first.md
For Opencode:
For Opencode:
cp SKILL.md ~/.opencode/skills/ppt-image-first.md
undefinedcp SKILL.md ~/.opencode/skills/ppt-image-first.md
undefinedEnvironment Variables
环境变量
bash
undefinedbash
undefinedRequired: OpenAI API key for GPT Image 2 generation
Required: OpenAI API key for GPT Image 2 generation
export OPENAI_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_here
Optional: output directory for generated files (default: ./output)
Optional: output directory for generated files (default: ./output)
export PPT_OUTPUT_DIR=./my_decks
export PPT_OUTPUT_DIR=./my_decks
Optional: default aspect ratio (default: 16:9)
Optional: default aspect ratio (default: 16:9)
export PPT_ASPECT_RATIO=16:9
---export PPT_ASPECT_RATIO=16:9
---Project Structure
项目结构
text
ppt-image-first/
├─ SKILL.md # Agent skill definition
├─ references/
│ ├─ workflow.md # Full stage-by-stage workflow spec
│ ├─ conversation_framework.md # Intake + confirmation dialogue rules
│ └─ preview-flow.md # Image preview generation logic
├─ templates/
│ ├─ content_report_reference.md # Template: content baseline doc
│ ├─ design_spec_reference.md # Template: visual design spec
│ ├─ slide_blueprint_reference.md # Template: per-page blueprint
│ └─ spec_lock_reference.md # Template: execution constraints
└─ assets/
├─ preview_shell/index.html # Style comparison UI shell
├─ candidate_picker_shell/index.html # Multi-candidate selection UI
└─ review_shell/index.html # Review & retouch UI shelltext
ppt-image-first/
├─ SKILL.md # Agent skill definition
├─ references/
│ ├─ workflow.md # Full stage-by-stage workflow spec
│ ├─ conversation_framework.md # Intake + confirmation dialogue rules
│ └─ preview-flow.md # Image preview generation logic
├─ templates/
│ ├─ content_report_reference.md # Template: content baseline doc
│ ├─ design_spec_reference.md # Template: visual design spec
│ ├─ slide_blueprint_reference.md # Template: per-page blueprint
│ └─ spec_lock_reference.md # Template: execution constraints
└─ assets/
├─ preview_shell/index.html # Style comparison UI shell
├─ candidate_picker_shell/index.html # Multi-candidate selection UI
└─ review_shell/index.html # Review & retouch UI shellWorkflow Stages
工作流阶段
Stage 1 — Intake & Baseline Judgment
阶段1 — 需求收集与基线判断
Collect only essential info. Do NOT present a long form.
python
INTAKE_FIELDS = [
"purpose", # defense / product pitch / research report / training
"audience", # professor panel / investors / internal team
"page_count_hint", # rough number or duration ("20 slides" / "10 min talk")
"materials", # what the user already has
"identity_anchor", # school / company / lab / brand name
]After intake, output a baseline judgment (2–4 sentences) and pause at 需求确认 (requirements confirmation).
仅收集必要信息,不要展示冗长表单。
python
INTAKE_FIELDS = [
"purpose", # defense / product pitch / research report / training
"audience", # professor panel / investors / internal team
"page_count_hint", # rough number or duration ("20 slides" / "10 min talk")
"materials", # what the user already has
"identity_anchor", # school / company / lab / brand name
]收集信息后,输出基线判断(2-4句话),并在需求确认 (requirements confirmation) 环节暂停。
Stage 1.25 — Content Baseline (content_report.md
)
content_report.md阶段1.25 — 内容基线(content_report.md
)
content_report.mdIf user materials are thin (topic only, or scattered notes), generate a structured content report before any style work.
python
undefined若用户素材不足(仅提供主题或零散笔记),在开展任何风格工作前先生成结构化内容报告。
python
undefinedcontent_report.md structure
content_report.md structure
CONTENT_REPORT_SECTIONS = [
"core_thesis", # The one central claim or narrative spine
"key_sections", # 4–7 logical sections with bullet points
"data_and_evidence", # Stats, facts, examples to reference
"narrative_arc", # How sections connect: problem → solution → proof
"slide_count_estimate", # Recommended page breakdown per section
]
undefinedCONTENT_REPORT_SECTIONS = [
"core_thesis", # The one central claim or narrative spine
"key_sections", # 4–7 logical sections with bullet points
"data_and_evidence", # Stats, facts, examples to reference
"narrative_arc", # How sections connect: problem → solution → proof
"slide_count_estimate", # Recommended page breakdown per section
]
undefinedStage 1.5 — Style Boundary Alignment
阶段1.5 — 风格边界对齐
Ask exactly 3 short questions:
1. Overall tone: light / dark / neutral middle?
2. Direction: conventional professional / visually distinctive?
3. How many style directions to preview first? (recommend 2–3)提出恰好3个简短问题:
1. 整体色调:明亮 / 深色 / 中性?
2. 风格方向:传统专业风 / 视觉独特风?
3. 首次预览多少种风格方向?(推荐2-3种)Stage 2 — Style Proposals & Previews
阶段2 — 风格提案与预览
Generate N style directions. For each direction, produce real image previews:
python
PREVIEW_PAGES_PER_DIRECTION = [
"cover_page", # Title + identity anchor
"toc_page", # Table of contents / agenda
"body_page", # Representative content page
]Use the to display comparisons.
assets/preview_shell/index.html生成N种风格方向。针对每种方向,生成真实图像预览:
python
PREVIEW_PAGES_PER_DIRECTION = [
"cover_page", # Title + identity anchor
"toc_page", # Table of contents / agenda
"body_page", # Representative content page
]使用展示对比效果。
assets/preview_shell/index.htmlStage 2.5 — Style Refinement (optional)
阶段2.5 — 风格优化(可选)
If user wants to iterate on one direction rather than lock in, continue from that direction only. Do NOT force a final decision.
若用户希望在某一方向上迭代而非直接锁定,仅基于该方向继续优化,不要强制用户做出最终决定。
Stage 2.75 — Style Reverse Inference
阶段2.75 — 风格反向推理
After user selects a direction, analyze the confirmed preview images and extract:
python
STYLE_INFERENCE_CATEGORIES = {
"must_continue": [], # Traits clearly present, clearly liked
"confirm_extend": [], # Traits that worked here, check if wanted deck-wide
"do_not_lock": [], # Accidental/contextual traits, not repeatable rules
}用户选定方向后,分析已确认的预览图像并提取特征:
python
STYLE_INFERENCE_CATEGORIES = {
"must_continue": [], # Traits clearly present, clearly liked
"confirm_extend": [], # Traits that worked here, check if wanted deck-wide
"do_not_lock": [], # Accidental/contextual traits, not repeatable rules
}Stage 3 — Planning Artifacts
阶段3 — 规划文档
Generate in order:
python
PLANNING_ARTIFACTS = [
"design_spec.md", # Global visual rationale + continuity constraints
"slide_blueprint.md", # Per-page: intent, content payload, visual strategy
"spec_lock.md", # What CAN and CANNOT change during generation
]Pause at 生成前确认 (pre-generation confirmation) before proceeding.
按顺序生成:
python
PLANNING_ARTIFACTS = [
"design_spec.md", # Global visual rationale + continuity constraints
"slide_blueprint.md", # Per-page: intent, content payload, visual strategy
"spec_lock.md", # What CAN and CANNOT change during generation
]继续执行前,在生成前确认 (pre-generation confirmation) 环节暂停。
Stage 4 — Generation
阶段4 — 生成环节
Ask the user:
Generate mode:
A) One final image per slide (faster)
B) Multiple candidates per slide, then pick (slower, more control)If B, use before finalizing.
assets/candidate_picker_shell/index.html询问用户:
生成模式:
A) 每张幻灯片生成1张最终图像(速度更快)
B) 每张幻灯片生成多个候选图像后选择(速度较慢,可控性更强)若选择B,在最终确定前使用。
assets/candidate_picker_shell/index.htmlStage 5 — Review & Retouch
阶段5 — 审核与修改
Use . Structured feedback format:
assets/review_shell/index.htmlpython
REVIEW_FEEDBACK_SCHEMA = {
"slide_index": int,
"issue_type": "visual | content | layout | consistency",
"description": str,
"suggested_fix": str, # optional
}使用。结构化反馈格式:
assets/review_shell/index.htmlpython
REVIEW_FEEDBACK_SCHEMA = {
"slide_index": int,
"issue_type": "visual | content | layout | consistency",
"description": str,
"suggested_fix": str, # optional
}Core Python Usage
Python核心用法
Generating a Slide Image via GPT Image 2
通过GPT Image 2生成幻灯片图像
python
import openai
import base64
from pathlib import Path
client = openai.OpenAI() # reads OPENAI_API_KEY from env
def generate_slide_image(
prompt: str,
slide_index: int,
output_dir: str = "./output/slides",
size: str = "1792x1024", # 16:9 approximation
) -> Path:
"""Generate a single slide image using GPT Image 2."""
response = client.images.generate(
model="gpt-image-1",
prompt=prompt,
n=1,
size=size,
)
image_b64 = response.data[0].b64_json
image_bytes = base64.b64decode(image_b64)
out_path = Path(output_dir)
out_path.mkdir(parents=True, exist_ok=True)
slide_path = out_path / f"slide_{slide_index:02d}.png"
slide_path.write_bytes(image_bytes)
print(f"[slide {slide_index}] saved → {slide_path}")
return slide_pathpython
import openai
import base64
from pathlib import Path
client = openai.OpenAI() # reads OPENAI_API_KEY from env
def generate_slide_image(
prompt: str,
slide_index: int,
output_dir: str = "./output/slides",
size: str = "1792x1024", # 16:9 approximation
) -> Path:
"""Generate a single slide image using GPT Image 2."""
response = client.images.generate(
model="gpt-image-1",
prompt=prompt,
n=1,
size=size,
)
image_b64 = response.data[0].b64_json
image_bytes = base64.b64decode(image_b64)
out_path = Path(output_dir)
out_path.mkdir(parents=True, exist_ok=True)
slide_path = out_path / f"slide_{slide_index:02d}.png"
slide_path.write_bytes(image_bytes)
print(f"[slide {slide_index}] saved → {slide_path}")
return slide_pathBuilding PPTX from Slide Images
从幻灯片图像构建PPTX文件
python
from pptx import Presentation
from pptx.util import Inches, Pt
from pathlib import Path
def build_pptx_from_images(
image_paths: list[Path],
output_path: str = "./output/deck.pptx",
width_inches: float = 13.33, # 16:9 widescreen
height_inches: float = 7.5,
) -> Path:
"""Package a list of full-page slide images into a PPTX file."""
prs = Presentation()
prs.slide_width = Inches(width_inches)
prs.slide_height = Inches(height_inches)
blank_layout = prs.slide_layouts[6] # blank layout — no placeholders
for idx, img_path in enumerate(image_paths):
slide = prs.slides.add_slide(blank_layout)
slide.shapes.add_picture(
str(img_path),
left=Inches(0),
top=Inches(0),
width=Inches(width_inches),
height=Inches(height_inches),
)
print(f"[pptx] added slide {idx + 1}: {img_path.name}")
out = Path(output_path)
out.parent.mkdir(parents=True, exist_ok=True)
prs.save(str(out))
print(f"[pptx] saved → {out}")
return outpython
from pptx import Presentation
from pptx.util import Inches, Pt
from pathlib import Path
def build_pptx_from_images(
image_paths: list[Path],
output_path: str = "./output/deck.pptx",
width_inches: float = 13.33, # 16:9 widescreen
height_inches: float = 7.5,
) -> Path:
"""Package a list of full-page slide images into a PPTX file."""
prs = Presentation()
prs.slide_width = Inches(width_inches)
prs.slide_height = Inches(height_inches)
blank_layout = prs.slide_layouts[6] # blank layout — no placeholders
for idx, img_path in enumerate(image_paths):
slide = prs.slides.add_slide(blank_layout)
slide.shapes.add_picture(
str(img_path),
left=Inches(0),
top=Inches(0),
width=Inches(width_inches),
height=Inches(height_inches),
)
print(f"[pptx] added slide {idx + 1}: {img_path.name}")
out = Path(output_path)
out.parent.mkdir(parents=True, exist_ok=True)
prs.save(str(out))
print(f"[pptx] saved → {out}")
return outFull Pipeline: Images → PPTX
完整流程:图像 → PPTX
python
import os
from pathlib import Path
def run_generation_pipeline(
slide_prompts: list[str],
deck_title: str = "deck",
output_dir: str = "./output",
) -> Path:
"""
Given a list of per-slide prompts, generate images and package into PPTX.
slide_prompts should come from slide_blueprint.md — one prompt per page.
"""
slides_dir = Path(output_dir) / "slides"
image_paths = []
for i, prompt in enumerate(slide_prompts):
path = generate_slide_image(
prompt=prompt,
slide_index=i + 1,
output_dir=str(slides_dir),
)
image_paths.append(path)
pptx_path = build_pptx_from_images(
image_paths=image_paths,
output_path=f"{output_dir}/{deck_title}.pptx",
)
return pptx_pathpython
import os
from pathlib import Path
def run_generation_pipeline(
slide_prompts: list[str],
deck_title: str = "deck",
output_dir: str = "./output",
) -> Path:
"""
Given a list of per-slide prompts, generate images and package into PPTX.
slide_prompts should come from slide_blueprint.md — one prompt per page.
"""
slides_dir = Path(output_dir) / "slides"
image_paths = []
for i, prompt in enumerate(slide_prompts):
path = generate_slide_image(
prompt=prompt,
slide_index=i + 1,
output_dir=str(slides_dir),
)
image_paths.append(path)
pptx_path = build_pptx_from_images(
image_paths=image_paths,
output_path=f"{output_dir}/{deck_title}.pptx",
)
return pptx_pathExample usage
Example usage
if name == "main":
prompts = [
# From slide_blueprint.md — generated by the workflow
"Cover slide for a meteorology thesis defense. Title: 'Urban Heat Island Effects in Coastal Cities'. "
"University name: Coastal Institute of Atmospheric Science. Dark navy background, white typography, "
"subtle cloud texture, professional academic style. 16:9 widescreen.",
"Table of contents slide. Sections: 1. Background 2. Data & Methods 3. Results 4. Discussion 5. Conclusion. "
"Same dark navy color scheme, numbered list with clear hierarchy, minimal decorative elements.",
"Body slide: 'Key Findings'. Three main data points as large stat callouts: +2.3°C average temp increase, "
"67% of monitored stations affected, 15-year trend data. Dark navy background, accent color teal, "
"clean data-forward layout.",
]
output = run_generation_pipeline(
slide_prompts=prompts,
deck_title="meteorology-defense",
output_dir="./output",
)
print(f"Done: {output}")undefinedif name == "main":
prompts = [
# From slide_blueprint.md — generated by the workflow
"Cover slide for a meteorology thesis defense. Title: 'Urban Heat Island Effects in Coastal Cities'. "
"University name: Coastal Institute of Atmospheric Science. Dark navy background, white typography, "
"subtle cloud texture, professional academic style. 16:9 widescreen.",
"Table of contents slide. Sections: 1. Background 2. Data & Methods 3. Results 4. Discussion 5. Conclusion. "
"Same dark navy color scheme, numbered list with clear hierarchy, minimal decorative elements.",
"Body slide: 'Key Findings'. Three main data points as large stat callouts: +2.3°C average temp increase, "
"67% of monitored stations affected, 15-year trend data. Dark navy background, accent color teal, "
"clean data-forward layout.",
]
output = run_generation_pipeline(
slide_prompts=prompts,
deck_title="meteorology-defense",
output_dir="./output",
)
print(f"Done: {output}")undefinedGenerating Multiple Candidates per Slide
为单张幻灯片生成多个候选图像
python
def generate_slide_candidates(
prompt: str,
slide_index: int,
n_candidates: int = 3,
output_dir: str = "./output/candidates",
) -> list[Path]:
"""Generate N candidate images for one slide for user selection."""
paths = []
for c in range(n_candidates):
path = generate_slide_image(
prompt=prompt,
slide_index=slide_index,
output_dir=f"{output_dir}/slide_{slide_index:02d}",
)
# rename to include candidate index
new_path = path.parent / f"candidate_{c + 1}.png"
path.rename(new_path)
paths.append(new_path)
print(f"[candidate {c + 1}/{n_candidates}] slide {slide_index}")
return pathspython
def generate_slide_candidates(
prompt: str,
slide_index: int,
n_candidates: int = 3,
output_dir: str = "./output/candidates",
) -> list[Path]:
"""Generate N candidate images for one slide for user selection."""
paths = []
for c in range(n_candidates):
path = generate_slide_image(
prompt=prompt,
slide_index=slide_index,
output_dir=f"{output_dir}/slide_{slide_index:02d}",
)
# rename to include candidate index
new_path = path.parent / f"candidate_{c + 1}.png"
path.rename(new_path)
paths.append(new_path)
print(f"[candidate {c + 1}/{n_candidates}] slide {slide_index}")
return pathsPlanning Artifact Templates
规划文档模板
design_spec.md
(minimal structure)
design_spec.mddesign_spec.md
(最简结构)
design_spec.mdmarkdown
undefinedmarkdown
undefinedDesign Spec
Design Spec
Global Direction
Global Direction
[1–2 sentences on visual identity and rationale]
[1–2 sentences on visual identity and rationale]
Color Palette
Color Palette
- Primary: #______
- Secondary: #______
- Accent: #______
- Background: #______
- Primary: #______
- Secondary: #______
- Accent: #______
- Background: #______
Typography
Typography
- Heading: [font / weight / size range]
- Body: [font / weight / size range]
- Heading: [font / weight / size range]
- Body: [font / weight / size range]
Layout Principles
Layout Principles
- [Grid / alignment rules]
- [Spacing conventions]
- [What should appear on every slide vs. never]
- [Grid / alignment rules]
- [Spacing conventions]
- [What should appear on every slide vs. never]
Continuity Constraints
Continuity Constraints
- [What MUST remain consistent across all slides]
- [What is allowed to vary]
undefined- [What MUST remain consistent across all slides]
- [What is allowed to vary]
undefinedslide_blueprint.md
(per-page entry)
slide_blueprint.mdslide_blueprint.md
(单页条目)
slide_blueprint.mdmarkdown
undefinedmarkdown
undefinedSlide 03 — Key Findings
Slide 03 — Key Findings
Intent: Deliver the three most important statistical results as scannable callouts.
Content payload:
- Stat 1: +2.3°C average increase
- Stat 2: 67% of stations affected
- Stat 3: 15-year trend confirmed Visual strategy: Large number callouts, minimal prose, accent color on numbers. Carry-through elements: Logo bottom-left, slide number bottom-right, dark navy bg. Generation prompt:
Body slide titled 'Key Findings'. Three large stat callouts: '+2.3°C', '67%', '15 Years'. Dark navy background, teal accent on numbers, white body text, clean grid layout, 16:9.
undefinedIntent: Deliver the three most important statistical results as scannable callouts.
Content payload:
- Stat 1: +2.3°C average increase
- Stat 2: 67% of stations affected
- Stat 3: 15-year trend confirmed Visual strategy: Large number callouts, minimal prose, accent color on numbers. Carry-through elements: Logo bottom-left, slide number bottom-right, dark navy bg. Generation prompt:
Body slide titled 'Key Findings'. Three large stat callouts: '+2.3°C', '67%', '15 Years'. Dark navy background, teal accent on numbers, white body text, clean grid layout, 16:9.
undefinedspec_lock.md
(minimal structure)
spec_lock.mdspec_lock.md
(最简结构)
spec_lock.mdmarkdown
undefinedmarkdown
undefinedSpec Lock
Spec Lock
Locked (do not change)
Locked (do not change)
- Background color: dark navy #0A1628
- Logo placement: bottom-left corner
- Slide number placement: bottom-right
- Heading font: [confirmed font]
- Background color: dark navy #0A1628
- Logo placement: bottom-left corner
- Slide number placement: bottom-right
- Heading font: [confirmed font]
Flexible (may vary per page)
Flexible (may vary per page)
- Accent color intensity
- Layout grid (2-col vs. 3-col for body pages)
- Illustration vs. data visualization choice
- Accent color intensity
- Layout grid (2-col vs. 3-col for body pages)
- Illustration vs. data visualization choice
Do Not Fabricate
Do Not Fabricate
- Speaker's name, institutional affiliation
- Statistics not present in content_report.md
- Dates, locations, citation details
- Speaker's name, institutional affiliation
- Statistics not present in content_report.md
- Dates, locations, citation details
Generation Strategy
Generation Strategy
- Mode: single final per slide (or: multi-candidate then pick)
- Retouch allowed: yes, via review_shell feedback loop
---- Mode: single final per slide (or: multi-candidate then pick)
- Retouch allowed: yes, via review_shell feedback loop
---Common Patterns
常见模式
Pattern: Thin Materials → Content First
模式:素材不足 → 先做内容
python
undefinedpython
undefinedWhen user provides only a topic, not full content:
When user provides only a topic, not full content:
1. Generate content_report.md BEFORE any style work
1. Generate content_report.md BEFORE any style work
2. Use content_report.md as the source for all slide prompts
2. Use content_report.md as the source for all slide prompts
3. Never generate style previews from an empty premise
3. Never generate style previews from an empty premise
def should_generate_content_report(user_materials: str) -> bool:
"""Heuristic: if materials are under ~200 words, build content baseline first."""
return len(user_materials.split()) < 200
undefineddef should_generate_content_report(user_materials: str) -> bool:
"""Heuristic: if materials are under ~200 words, build content baseline first."""
return len(user_materials.split()) < 200
undefinedPattern: Style Preview Shell Integration
模式:风格预览壳集成
python
import subprocess
import json
from pathlib import Path
def launch_preview_shell(preview_images: dict[str, list[Path]]) -> None:
"""
Write preview manifest and open the preview shell in browser.
preview_images: {"direction_A": [cover, toc, body], "direction_B": [...]}
"""
manifest = {
direction: [str(p) for p in pages]
for direction, pages in preview_images.items()
}
shell_dir = Path("assets/preview_shell")
manifest_path = shell_dir / "preview_manifest.json"
manifest_path.write_text(json.dumps(manifest, indent=2))
# Open in default browser
subprocess.run(["open", str(shell_dir / "index.html")]) # macOS
# subprocess.run(["xdg-open", str(shell_dir / "index.html")]) # Linuxpython
import subprocess
import json
from pathlib import Path
def launch_preview_shell(preview_images: dict[str, list[Path]]) -> None:
"""
Write preview manifest and open the preview shell in browser.
preview_images: {"direction_A": [cover, toc, body], "direction_B": [...]}
"""
manifest = {
direction: [str(p) for p in pages]
for direction, pages in preview_images.items()
}
shell_dir = Path("assets/preview_shell")
manifest_path = shell_dir / "preview_manifest.json"
manifest_path.write_text(json.dumps(manifest, indent=2))
# Open in default browser
subprocess.run(["open", str(shell_dir / "index.html")]) # macOS
# subprocess.run(["xdg-open", str(shell_dir / "index.html")]) # LinuxPattern: Review Feedback → Retouch Prompt
模式:审核反馈 → 修改提示词
python
def feedback_to_retouch_prompt(
original_prompt: str,
feedback: dict,
) -> str:
"""Convert structured review feedback into an updated generation prompt."""
issue = feedback["description"]
fix = feedback.get("suggested_fix", "")
retouch_instruction = f"REVISION: {issue}"
if fix:
retouch_instruction += f" Fix: {fix}"
return f"{original_prompt}\n\n{retouch_instruction}"python
def feedback_to_retouch_prompt(
original_prompt: str,
feedback: dict,
) -> str:
"""Convert structured review feedback into an updated generation prompt."""
issue = feedback["description"]
fix = feedback.get("suggested_fix", "")
retouch_instruction = f"REVISION: {issue}"
if fix:
retouch_instruction += f" Fix: {fix}"
return f"{original_prompt}\n\n{retouch_instruction}"Troubleshooting
故障排查
Image generation returns an error
图像生成返回错误
python
undefinedpython
undefinedCheck: OPENAI_API_KEY is set
Check: OPENAI_API_KEY is set
import os
assert os.getenv("OPENAI_API_KEY"), "OPENAI_API_KEY not set"
import os
assert os.getenv("OPENAI_API_KEY"), "OPENAI_API_KEY not set"
Check: size parameter is valid for gpt-image-1
Check: size parameter is valid for gpt-image-1
Valid sizes: "1024x1024", "1536x1024", "1024x1536", "auto"
Valid sizes: "1024x1024", "1536x1024", "1024x1536", "auto"
For 16:9 slides, use "1536x1024" (landscape)
For 16:9 slides, use "1536x1024" (landscape)
undefinedundefinedPPTX images appear blurry
PPTX图像显示模糊
python
undefinedpython
undefinedUse the highest resolution size available
Use the highest resolution size available
Then let python-pptx scale to fill the slide — do NOT upscale manually
Then let python-pptx scale to fill the slide — do NOT upscale manually
size = "1536x1024" # use this instead of "1024x1024" for landscape slides
undefinedsize = "1536x1024" # use this instead of "1024x1024" for landscape slides
undefinedStyle consistency breaks across slides
幻灯片间风格不一致
Root cause: prompt drift — each slide prompt diverges from the locked spec.
Fix:
1. Prepend spec_lock.md's "Locked" section to EVERY slide prompt
2. Use a prompt prefix template:
STYLE_PREFIX = """
[STYLE LOCK] Dark navy #0A1628 background. Heading font: Inter Bold.
Logo bottom-left. Slide number bottom-right. Teal accent #2DD4BF on highlights.
DO NOT add gradients, textures, or decorative borders not in this spec.
"""
full_prompt = STYLE_PREFIX + slide_specific_promptRoot cause: prompt drift — each slide prompt diverges from the locked spec.
Fix:
1. Prepend spec_lock.md's "Locked" section to EVERY slide prompt
2. Use a prompt prefix template:
STYLE_PREFIX = """
[STYLE LOCK] Dark navy #0A1628 background. Heading font: Inter Bold.
Logo bottom-left. Slide number bottom-right. Teal accent #2DD4BF on highlights.
DO NOT add gradients, textures, or decorative borders not in this spec.
"""
full_prompt = STYLE_PREFIX + slide_specific_promptcontent_report.md
content ends up in slides verbatim
content_report.mdcontent_report.md
内容被直接照搬进幻灯片
content_report.mdThis is a workflow sequencing error.
content_report.md is a SOURCE document, not a script.
The slide_blueprint.md should ADAPT content into slide-appropriate payloads.
Each slide blueprint entry must go through:
content_report → narrative selection → slide payload → generation prompt
Never pipe content_report text directly into image generation prompts.这是工作流顺序错误导致的问题。
content_report.md是源文档,而非脚本。
slide_blueprint.md应将内容调整为适合幻灯片的呈现形式。
每个幻灯片蓝图条目必须经过以下流程:
content_report → 叙事筛选 → 幻灯片内容 → 生成提示词
切勿将content_report的文本直接传入图像生成提示词。Review shell not loading images
审核壳无法加载图像
bash
undefinedbash
undefinedImages must be accessible from the shell's local path
Images must be accessible from the shell's local path
Serve the output directory locally if needed:
Serve the output directory locally if needed:
cd output && python -m http.server 8080
cd output && python -m http.server 8080
Then open assets/review_shell/index.html with base path set to localhost:8080
Then open assets/review_shell/index.html with base path set to localhost:8080
---
---Key Constraints to Respect
需遵守的核心约束
| Constraint | Rule |
|---|---|
| Confirmation gates | There are 3 mandatory pause points: requirements confirm, pre-generation confirm, review. Do not skip them. |
| Preview = real images | Never substitute text mockups, ASCII art, or placeholder boxes for image previews. |
| Content before style | If materials are thin, |
| spec_lock.md is binding | Fields in the "Locked" section must be enforced in every generation prompt. |
| No fabrication | Never invent names, statistics, affiliations, dates, or citations not present in user materials or |
| Review is mandatory | After first full generation, always enter the review loop — do not present output as final by default. |
| Aspect ratio | Default is 16:9. Do not change unless user explicitly requests it. |
| 约束项 | 规则 |
|---|---|
| 确认节点 | 存在3个强制暂停节点:需求确认、生成前确认、审核。不得跳过。 |
| 预览必须是真实图像 | 不得用文字原型、ASCII艺术图或占位框替代图像预览。 |
| 先做内容再定风格 | 若素材不足,必须先生成 |
| 「Locked」部分的内容必须在所有生成提示词中严格执行。 |
| 不得编造内容 | 不得编造用户素材或 |
| 审核是强制环节 | 首次完整生成后,必须进入审核循环——默认不得将输出作为最终成果展示。 |
| 宽高比 | 默认宽高比为16:9,除非用户明确要求更改。 |
References
参考资料
- — Complete stage-by-stage workflow specification
references/workflow.md - — Intake rules and confirmation dialogue patterns
references/conversation_framework.md - — Image preview generation and shell integration details
references/preview-flow.md - — Reference templates for all four planning artifacts
templates/
- — 完整的分阶段工作流规范
references/workflow.md - — 需求收集规则与确认对话模式
references/conversation_framework.md - — 图像预览生成与壳集成细节
references/preview-flow.md - — 所有四类规划文档的参考模板
templates/