ppt-image-first-workflow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ppt-image-first-workflow

ppt-image-first-workflow

Skill by ara.so — Daily 2026 Skills collection.
A conversation-first, image-first PPT workflow skill that takes a vague presentation request through structured stages: content baseline → style preview → plan lock → generation → review. Pages are rendered as full-image visuals via GPT Image 2 and packaged into PPTX containers — not drawn as native editable PowerPoint objects.

ara.so开发的技能——Daily 2026 Skills合集。
这是一项对话优先、图像优先的PPT工作流技能,可将模糊的演示需求通过结构化阶段逐步落地:内容基线 → 风格预览 → 方案锁定 → 生成 → 审核。所有页面通过GPT Image 2渲染为整页图像,再打包为PPTX文件——并非以PowerPoint原生可编辑对象的形式绘制。

What This Project Does

项目功能

ppt-image-first
is a multi-stage workflow orchestrator, not a template stamper. It:
  1. Collects minimal intake info (purpose, audience, page count, materials, identity anchors)
  2. Builds a
    content_report.md
    if user materials are thin
  3. Aligns style boundaries with 3 short questions
  4. Generates real image previews (cover, TOC, body pages) across multiple style directions
  5. Iterates on style until user confirms
  6. Runs a "style reverse inference" check to lock stable visual traits
  7. Produces planning artifacts:
    design_spec.md
    ,
    slide_blueprint.md
    ,
    spec_lock.md
  8. Generates final per-page images via GPT Image 2
  9. Packages images into a
    .pptx
    file
  10. Runs a structured review-and-retouch loop
Output type: Image-first PPTX — each slide is a full-page rendered image. Text/shapes inside slides are NOT individually editable PowerPoint objects.

ppt-image-first
是一个多阶段工作流编排工具,而非模板生成器。它具备以下功能:
  1. 收集最少的必要信息(演示目的、受众、页数、已有素材、品牌标识)
  2. 若用户素材不足,生成
    content_report.md
    文档
  3. 通过3个简短问题确定风格边界
  4. 针对多种风格方向生成真实图像预览(封面、目录、正文页)
  5. 迭代风格直至用户确认
  6. 执行「风格反向推理」检查,锁定稳定视觉特征
  7. 生成规划文档:
    design_spec.md
    slide_blueprint.md
    spec_lock.md
  8. 通过GPT Image 2生成最终单页图像
  9. 将图像打包为
    .pptx
    文件
  10. 运行结构化审核与修改循环
输出类型: 图像优先的PPTX文件——每张幻灯片都是整页渲染图像。幻灯片内的文本/图形并非可单独编辑的PowerPoint对象。

Installation

安装步骤

bash
undefined
bash
undefined

Clone the repository

Clone the repository

Install Python dependencies

Install Python dependencies

pip install -r requirements.txt
pip install -r requirements.txt

Copy the skill file into your agent's skill directory

Copy the skill file into your agent's skill directory

For Claude Code:

For Claude Code:

cp SKILL.md ~/.claude/skills/ppt-image-first.md
cp SKILL.md ~/.claude/skills/ppt-image-first.md

For Codex CLI:

For Codex CLI:

cp SKILL.md ~/.codex/skills/ppt-image-first.md
cp SKILL.md ~/.codex/skills/ppt-image-first.md

For Opencode:

For Opencode:

cp SKILL.md ~/.opencode/skills/ppt-image-first.md
undefined
cp SKILL.md ~/.opencode/skills/ppt-image-first.md
undefined

Environment Variables

环境变量

bash
undefined
bash
undefined

Required: OpenAI API key for GPT Image 2 generation

Required: OpenAI API key for GPT Image 2 generation

export OPENAI_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_here

Optional: output directory for generated files (default: ./output)

Optional: output directory for generated files (default: ./output)

export PPT_OUTPUT_DIR=./my_decks
export PPT_OUTPUT_DIR=./my_decks

Optional: default aspect ratio (default: 16:9)

Optional: default aspect ratio (default: 16:9)

export PPT_ASPECT_RATIO=16:9

---
export PPT_ASPECT_RATIO=16:9

---

Project Structure

项目结构

text
ppt-image-first/
├─ SKILL.md                          # Agent skill definition
├─ references/
│  ├─ workflow.md                    # Full stage-by-stage workflow spec
│  ├─ conversation_framework.md      # Intake + confirmation dialogue rules
│  └─ preview-flow.md                # Image preview generation logic
├─ templates/
│  ├─ content_report_reference.md    # Template: content baseline doc
│  ├─ design_spec_reference.md       # Template: visual design spec
│  ├─ slide_blueprint_reference.md   # Template: per-page blueprint
│  └─ spec_lock_reference.md         # Template: execution constraints
└─ assets/
   ├─ preview_shell/index.html       # Style comparison UI shell
   ├─ candidate_picker_shell/index.html  # Multi-candidate selection UI
   └─ review_shell/index.html        # Review & retouch UI shell

text
ppt-image-first/
├─ SKILL.md                          # Agent skill definition
├─ references/
│  ├─ workflow.md                    # Full stage-by-stage workflow spec
│  ├─ conversation_framework.md      # Intake + confirmation dialogue rules
│  └─ preview-flow.md                # Image preview generation logic
├─ templates/
│  ├─ content_report_reference.md    # Template: content baseline doc
│  ├─ design_spec_reference.md       # Template: visual design spec
│  ├─ slide_blueprint_reference.md   # Template: per-page blueprint
│  └─ spec_lock_reference.md         # Template: execution constraints
└─ assets/
   ├─ preview_shell/index.html       # Style comparison UI shell
   ├─ candidate_picker_shell/index.html  # Multi-candidate selection UI
   └─ review_shell/index.html        # Review & retouch UI shell

Workflow Stages

工作流阶段

Stage 1 — Intake & Baseline Judgment

阶段1 — 需求收集与基线判断

Collect only essential info. Do NOT present a long form.
python
INTAKE_FIELDS = [
    "purpose",          # defense / product pitch / research report / training
    "audience",         # professor panel / investors / internal team
    "page_count_hint",  # rough number or duration ("20 slides" / "10 min talk")
    "materials",        # what the user already has
    "identity_anchor",  # school / company / lab / brand name
]
After intake, output a baseline judgment (2–4 sentences) and pause at 需求确认 (requirements confirmation).
仅收集必要信息,不要展示冗长表单。
python
INTAKE_FIELDS = [
    "purpose",          # defense / product pitch / research report / training
    "audience",         # professor panel / investors / internal team
    "page_count_hint",  # rough number or duration ("20 slides" / "10 min talk")
    "materials",        # what the user already has
    "identity_anchor",  # school / company / lab / brand name
]
收集信息后,输出基线判断(2-4句话),并在需求确认 (requirements confirmation) 环节暂停。

Stage 1.25 — Content Baseline (
content_report.md
)

阶段1.25 — 内容基线(
content_report.md

If user materials are thin (topic only, or scattered notes), generate a structured content report before any style work.
python
undefined
若用户素材不足(仅提供主题或零散笔记),在开展任何风格工作前先生成结构化内容报告。
python
undefined

content_report.md structure

content_report.md structure

CONTENT_REPORT_SECTIONS = [ "core_thesis", # The one central claim or narrative spine "key_sections", # 4–7 logical sections with bullet points "data_and_evidence", # Stats, facts, examples to reference "narrative_arc", # How sections connect: problem → solution → proof "slide_count_estimate", # Recommended page breakdown per section ]
undefined
CONTENT_REPORT_SECTIONS = [ "core_thesis", # The one central claim or narrative spine "key_sections", # 4–7 logical sections with bullet points "data_and_evidence", # Stats, facts, examples to reference "narrative_arc", # How sections connect: problem → solution → proof "slide_count_estimate", # Recommended page breakdown per section ]
undefined

Stage 1.5 — Style Boundary Alignment

阶段1.5 — 风格边界对齐

Ask exactly 3 short questions:
1. Overall tone: light / dark / neutral middle?
2. Direction: conventional professional / visually distinctive?
3. How many style directions to preview first? (recommend 2–3)
提出恰好3个简短问题:
1. 整体色调:明亮 / 深色 / 中性?
2. 风格方向:传统专业风 / 视觉独特风?
3. 首次预览多少种风格方向?(推荐2-3种)

Stage 2 — Style Proposals & Previews

阶段2 — 风格提案与预览

Generate N style directions. For each direction, produce real image previews:
python
PREVIEW_PAGES_PER_DIRECTION = [
    "cover_page",     # Title + identity anchor
    "toc_page",       # Table of contents / agenda
    "body_page",      # Representative content page
]
Use the
assets/preview_shell/index.html
to display comparisons.
生成N种风格方向。针对每种方向,生成真实图像预览:
python
PREVIEW_PAGES_PER_DIRECTION = [
    "cover_page",     # Title + identity anchor
    "toc_page",       # Table of contents / agenda
    "body_page",      # Representative content page
]
使用
assets/preview_shell/index.html
展示对比效果。

Stage 2.5 — Style Refinement (optional)

阶段2.5 — 风格优化(可选)

If user wants to iterate on one direction rather than lock in, continue from that direction only. Do NOT force a final decision.
若用户希望在某一方向上迭代而非直接锁定,仅基于该方向继续优化,不要强制用户做出最终决定。

Stage 2.75 — Style Reverse Inference

阶段2.75 — 风格反向推理

After user selects a direction, analyze the confirmed preview images and extract:
python
STYLE_INFERENCE_CATEGORIES = {
    "must_continue":    [],  # Traits clearly present, clearly liked
    "confirm_extend":   [],  # Traits that worked here, check if wanted deck-wide
    "do_not_lock":      [],  # Accidental/contextual traits, not repeatable rules
}
用户选定方向后,分析已确认的预览图像并提取特征:
python
STYLE_INFERENCE_CATEGORIES = {
    "must_continue":    [],  # Traits clearly present, clearly liked
    "confirm_extend":   [],  # Traits that worked here, check if wanted deck-wide
    "do_not_lock":      [],  # Accidental/contextual traits, not repeatable rules
}

Stage 3 — Planning Artifacts

阶段3 — 规划文档

Generate in order:
python
PLANNING_ARTIFACTS = [
    "design_spec.md",      # Global visual rationale + continuity constraints
    "slide_blueprint.md",  # Per-page: intent, content payload, visual strategy
    "spec_lock.md",        # What CAN and CANNOT change during generation
]
Pause at 生成前确认 (pre-generation confirmation) before proceeding.
按顺序生成:
python
PLANNING_ARTIFACTS = [
    "design_spec.md",      # Global visual rationale + continuity constraints
    "slide_blueprint.md",  # Per-page: intent, content payload, visual strategy
    "spec_lock.md",        # What CAN and CANNOT change during generation
]
继续执行前,在生成前确认 (pre-generation confirmation) 环节暂停。

Stage 4 — Generation

阶段4 — 生成环节

Ask the user:
Generate mode:
A) One final image per slide (faster)
B) Multiple candidates per slide, then pick (slower, more control)
If B, use
assets/candidate_picker_shell/index.html
before finalizing.
询问用户:
生成模式:
A) 每张幻灯片生成1张最终图像(速度更快)
B) 每张幻灯片生成多个候选图像后选择(速度较慢,可控性更强)
若选择B,在最终确定前使用
assets/candidate_picker_shell/index.html

Stage 5 — Review & Retouch

阶段5 — 审核与修改

Use
assets/review_shell/index.html
. Structured feedback format:
python
REVIEW_FEEDBACK_SCHEMA = {
    "slide_index": int,
    "issue_type": "visual | content | layout | consistency",
    "description": str,
    "suggested_fix": str,  # optional
}

使用
assets/review_shell/index.html
。结构化反馈格式:
python
REVIEW_FEEDBACK_SCHEMA = {
    "slide_index": int,
    "issue_type": "visual | content | layout | consistency",
    "description": str,
    "suggested_fix": str,  # optional
}

Core Python Usage

Python核心用法

Generating a Slide Image via GPT Image 2

通过GPT Image 2生成幻灯片图像

python
import openai
import base64
from pathlib import Path

client = openai.OpenAI()  # reads OPENAI_API_KEY from env

def generate_slide_image(
    prompt: str,
    slide_index: int,
    output_dir: str = "./output/slides",
    size: str = "1792x1024",  # 16:9 approximation
) -> Path:
    """Generate a single slide image using GPT Image 2."""
    response = client.images.generate(
        model="gpt-image-1",
        prompt=prompt,
        n=1,
        size=size,
    )
    
    image_b64 = response.data[0].b64_json
    image_bytes = base64.b64decode(image_b64)
    
    out_path = Path(output_dir)
    out_path.mkdir(parents=True, exist_ok=True)
    
    slide_path = out_path / f"slide_{slide_index:02d}.png"
    slide_path.write_bytes(image_bytes)
    
    print(f"[slide {slide_index}] saved → {slide_path}")
    return slide_path
python
import openai
import base64
from pathlib import Path

client = openai.OpenAI()  # reads OPENAI_API_KEY from env

def generate_slide_image(
    prompt: str,
    slide_index: int,
    output_dir: str = "./output/slides",
    size: str = "1792x1024",  # 16:9 approximation
) -> Path:
    """Generate a single slide image using GPT Image 2."""
    response = client.images.generate(
        model="gpt-image-1",
        prompt=prompt,
        n=1,
        size=size,
    )
    
    image_b64 = response.data[0].b64_json
    image_bytes = base64.b64decode(image_b64)
    
    out_path = Path(output_dir)
    out_path.mkdir(parents=True, exist_ok=True)
    
    slide_path = out_path / f"slide_{slide_index:02d}.png"
    slide_path.write_bytes(image_bytes)
    
    print(f"[slide {slide_index}] saved → {slide_path}")
    return slide_path

Building PPTX from Slide Images

从幻灯片图像构建PPTX文件

python
from pptx import Presentation
from pptx.util import Inches, Pt
from pathlib import Path

def build_pptx_from_images(
    image_paths: list[Path],
    output_path: str = "./output/deck.pptx",
    width_inches: float = 13.33,   # 16:9 widescreen
    height_inches: float = 7.5,
) -> Path:
    """Package a list of full-page slide images into a PPTX file."""
    prs = Presentation()
    prs.slide_width = Inches(width_inches)
    prs.slide_height = Inches(height_inches)
    
    blank_layout = prs.slide_layouts[6]  # blank layout — no placeholders
    
    for idx, img_path in enumerate(image_paths):
        slide = prs.slides.add_slide(blank_layout)
        slide.shapes.add_picture(
            str(img_path),
            left=Inches(0),
            top=Inches(0),
            width=Inches(width_inches),
            height=Inches(height_inches),
        )
        print(f"[pptx] added slide {idx + 1}: {img_path.name}")
    
    out = Path(output_path)
    out.parent.mkdir(parents=True, exist_ok=True)
    prs.save(str(out))
    print(f"[pptx] saved → {out}")
    return out
python
from pptx import Presentation
from pptx.util import Inches, Pt
from pathlib import Path

def build_pptx_from_images(
    image_paths: list[Path],
    output_path: str = "./output/deck.pptx",
    width_inches: float = 13.33,   # 16:9 widescreen
    height_inches: float = 7.5,
) -> Path:
    """Package a list of full-page slide images into a PPTX file."""
    prs = Presentation()
    prs.slide_width = Inches(width_inches)
    prs.slide_height = Inches(height_inches)
    
    blank_layout = prs.slide_layouts[6]  # blank layout — no placeholders
    
    for idx, img_path in enumerate(image_paths):
        slide = prs.slides.add_slide(blank_layout)
        slide.shapes.add_picture(
            str(img_path),
            left=Inches(0),
            top=Inches(0),
            width=Inches(width_inches),
            height=Inches(height_inches),
        )
        print(f"[pptx] added slide {idx + 1}: {img_path.name}")
    
    out = Path(output_path)
    out.parent.mkdir(parents=True, exist_ok=True)
    prs.save(str(out))
    print(f"[pptx] saved → {out}")
    return out

Full Pipeline: Images → PPTX

完整流程:图像 → PPTX

python
import os
from pathlib import Path

def run_generation_pipeline(
    slide_prompts: list[str],
    deck_title: str = "deck",
    output_dir: str = "./output",
) -> Path:
    """
    Given a list of per-slide prompts, generate images and package into PPTX.
    slide_prompts should come from slide_blueprint.md — one prompt per page.
    """
    slides_dir = Path(output_dir) / "slides"
    image_paths = []
    
    for i, prompt in enumerate(slide_prompts):
        path = generate_slide_image(
            prompt=prompt,
            slide_index=i + 1,
            output_dir=str(slides_dir),
        )
        image_paths.append(path)
    
    pptx_path = build_pptx_from_images(
        image_paths=image_paths,
        output_path=f"{output_dir}/{deck_title}.pptx",
    )
    return pptx_path
python
import os
from pathlib import Path

def run_generation_pipeline(
    slide_prompts: list[str],
    deck_title: str = "deck",
    output_dir: str = "./output",
) -> Path:
    """
    Given a list of per-slide prompts, generate images and package into PPTX.
    slide_prompts should come from slide_blueprint.md — one prompt per page.
    """
    slides_dir = Path(output_dir) / "slides"
    image_paths = []
    
    for i, prompt in enumerate(slide_prompts):
        path = generate_slide_image(
            prompt=prompt,
            slide_index=i + 1,
            output_dir=str(slides_dir),
        )
        image_paths.append(path)
    
    pptx_path = build_pptx_from_images(
        image_paths=image_paths,
        output_path=f"{output_dir}/{deck_title}.pptx",
    )
    return pptx_path

Example usage

Example usage

if name == "main": prompts = [ # From slide_blueprint.md — generated by the workflow "Cover slide for a meteorology thesis defense. Title: 'Urban Heat Island Effects in Coastal Cities'. " "University name: Coastal Institute of Atmospheric Science. Dark navy background, white typography, " "subtle cloud texture, professional academic style. 16:9 widescreen.",
    "Table of contents slide. Sections: 1. Background 2. Data & Methods 3. Results 4. Discussion 5. Conclusion. "
    "Same dark navy color scheme, numbered list with clear hierarchy, minimal decorative elements.",
    
    "Body slide: 'Key Findings'. Three main data points as large stat callouts: +2.3°C average temp increase, "
    "67% of monitored stations affected, 15-year trend data. Dark navy background, accent color teal, "
    "clean data-forward layout.",
]

output = run_generation_pipeline(
    slide_prompts=prompts,
    deck_title="meteorology-defense",
    output_dir="./output",
)
print(f"Done: {output}")
undefined
if name == "main": prompts = [ # From slide_blueprint.md — generated by the workflow "Cover slide for a meteorology thesis defense. Title: 'Urban Heat Island Effects in Coastal Cities'. " "University name: Coastal Institute of Atmospheric Science. Dark navy background, white typography, " "subtle cloud texture, professional academic style. 16:9 widescreen.",
    "Table of contents slide. Sections: 1. Background 2. Data & Methods 3. Results 4. Discussion 5. Conclusion. "
    "Same dark navy color scheme, numbered list with clear hierarchy, minimal decorative elements.",
    
    "Body slide: 'Key Findings'. Three main data points as large stat callouts: +2.3°C average temp increase, "
    "67% of monitored stations affected, 15-year trend data. Dark navy background, accent color teal, "
    "clean data-forward layout.",
]

output = run_generation_pipeline(
    slide_prompts=prompts,
    deck_title="meteorology-defense",
    output_dir="./output",
)
print(f"Done: {output}")
undefined

Generating Multiple Candidates per Slide

为单张幻灯片生成多个候选图像

python
def generate_slide_candidates(
    prompt: str,
    slide_index: int,
    n_candidates: int = 3,
    output_dir: str = "./output/candidates",
) -> list[Path]:
    """Generate N candidate images for one slide for user selection."""
    paths = []
    for c in range(n_candidates):
        path = generate_slide_image(
            prompt=prompt,
            slide_index=slide_index,
            output_dir=f"{output_dir}/slide_{slide_index:02d}",
        )
        # rename to include candidate index
        new_path = path.parent / f"candidate_{c + 1}.png"
        path.rename(new_path)
        paths.append(new_path)
        print(f"[candidate {c + 1}/{n_candidates}] slide {slide_index}")
    return paths

python
def generate_slide_candidates(
    prompt: str,
    slide_index: int,
    n_candidates: int = 3,
    output_dir: str = "./output/candidates",
) -> list[Path]:
    """Generate N candidate images for one slide for user selection."""
    paths = []
    for c in range(n_candidates):
        path = generate_slide_image(
            prompt=prompt,
            slide_index=slide_index,
            output_dir=f"{output_dir}/slide_{slide_index:02d}",
        )
        # rename to include candidate index
        new_path = path.parent / f"candidate_{c + 1}.png"
        path.rename(new_path)
        paths.append(new_path)
        print(f"[candidate {c + 1}/{n_candidates}] slide {slide_index}")
    return paths

Planning Artifact Templates

规划文档模板

design_spec.md
(minimal structure)

design_spec.md
(最简结构)

markdown
undefined
markdown
undefined

Design Spec

Design Spec

Global Direction

Global Direction

[1–2 sentences on visual identity and rationale]
[1–2 sentences on visual identity and rationale]

Color Palette

Color Palette

  • Primary: #______
  • Secondary: #______
  • Accent: #______
  • Background: #______
  • Primary: #______
  • Secondary: #______
  • Accent: #______
  • Background: #______

Typography

Typography

  • Heading: [font / weight / size range]
  • Body: [font / weight / size range]
  • Heading: [font / weight / size range]
  • Body: [font / weight / size range]

Layout Principles

Layout Principles

  • [Grid / alignment rules]
  • [Spacing conventions]
  • [What should appear on every slide vs. never]
  • [Grid / alignment rules]
  • [Spacing conventions]
  • [What should appear on every slide vs. never]

Continuity Constraints

Continuity Constraints

  • [What MUST remain consistent across all slides]
  • [What is allowed to vary]
undefined
  • [What MUST remain consistent across all slides]
  • [What is allowed to vary]
undefined

slide_blueprint.md
(per-page entry)

slide_blueprint.md
(单页条目)

markdown
undefined
markdown
undefined

Slide 03 — Key Findings

Slide 03 — Key Findings

Intent: Deliver the three most important statistical results as scannable callouts. Content payload:
  • Stat 1: +2.3°C average increase
  • Stat 2: 67% of stations affected
  • Stat 3: 15-year trend confirmed Visual strategy: Large number callouts, minimal prose, accent color on numbers. Carry-through elements: Logo bottom-left, slide number bottom-right, dark navy bg. Generation prompt:
Body slide titled 'Key Findings'. Three large stat callouts: '+2.3°C', '67%', '15 Years'. Dark navy background, teal accent on numbers, white body text, clean grid layout, 16:9.
undefined
Intent: Deliver the three most important statistical results as scannable callouts. Content payload:
  • Stat 1: +2.3°C average increase
  • Stat 2: 67% of stations affected
  • Stat 3: 15-year trend confirmed Visual strategy: Large number callouts, minimal prose, accent color on numbers. Carry-through elements: Logo bottom-left, slide number bottom-right, dark navy bg. Generation prompt:
Body slide titled 'Key Findings'. Three large stat callouts: '+2.3°C', '67%', '15 Years'. Dark navy background, teal accent on numbers, white body text, clean grid layout, 16:9.
undefined

spec_lock.md
(minimal structure)

spec_lock.md
(最简结构)

markdown
undefined
markdown
undefined

Spec Lock

Spec Lock

Locked (do not change)

Locked (do not change)

  • Background color: dark navy #0A1628
  • Logo placement: bottom-left corner
  • Slide number placement: bottom-right
  • Heading font: [confirmed font]
  • Background color: dark navy #0A1628
  • Logo placement: bottom-left corner
  • Slide number placement: bottom-right
  • Heading font: [confirmed font]

Flexible (may vary per page)

Flexible (may vary per page)

  • Accent color intensity
  • Layout grid (2-col vs. 3-col for body pages)
  • Illustration vs. data visualization choice
  • Accent color intensity
  • Layout grid (2-col vs. 3-col for body pages)
  • Illustration vs. data visualization choice

Do Not Fabricate

Do Not Fabricate

  • Speaker's name, institutional affiliation
  • Statistics not present in content_report.md
  • Dates, locations, citation details
  • Speaker's name, institutional affiliation
  • Statistics not present in content_report.md
  • Dates, locations, citation details

Generation Strategy

Generation Strategy

  • Mode: single final per slide (or: multi-candidate then pick)
  • Retouch allowed: yes, via review_shell feedback loop

---
  • Mode: single final per slide (or: multi-candidate then pick)
  • Retouch allowed: yes, via review_shell feedback loop

---

Common Patterns

常见模式

Pattern: Thin Materials → Content First

模式:素材不足 → 先做内容

python
undefined
python
undefined

When user provides only a topic, not full content:

When user provides only a topic, not full content:

1. Generate content_report.md BEFORE any style work

1. Generate content_report.md BEFORE any style work

2. Use content_report.md as the source for all slide prompts

2. Use content_report.md as the source for all slide prompts

3. Never generate style previews from an empty premise

3. Never generate style previews from an empty premise

def should_generate_content_report(user_materials: str) -> bool: """Heuristic: if materials are under ~200 words, build content baseline first.""" return len(user_materials.split()) < 200
undefined
def should_generate_content_report(user_materials: str) -> bool: """Heuristic: if materials are under ~200 words, build content baseline first.""" return len(user_materials.split()) < 200
undefined

Pattern: Style Preview Shell Integration

模式:风格预览壳集成

python
import subprocess
import json
from pathlib import Path

def launch_preview_shell(preview_images: dict[str, list[Path]]) -> None:
    """
    Write preview manifest and open the preview shell in browser.
    preview_images: {"direction_A": [cover, toc, body], "direction_B": [...]}
    """
    manifest = {
        direction: [str(p) for p in pages]
        for direction, pages in preview_images.items()
    }
    
    shell_dir = Path("assets/preview_shell")
    manifest_path = shell_dir / "preview_manifest.json"
    manifest_path.write_text(json.dumps(manifest, indent=2))
    
    # Open in default browser
    subprocess.run(["open", str(shell_dir / "index.html")])  # macOS
    # subprocess.run(["xdg-open", str(shell_dir / "index.html")])  # Linux
python
import subprocess
import json
from pathlib import Path

def launch_preview_shell(preview_images: dict[str, list[Path]]) -> None:
    """
    Write preview manifest and open the preview shell in browser.
    preview_images: {"direction_A": [cover, toc, body], "direction_B": [...]}
    """
    manifest = {
        direction: [str(p) for p in pages]
        for direction, pages in preview_images.items()
    }
    
    shell_dir = Path("assets/preview_shell")
    manifest_path = shell_dir / "preview_manifest.json"
    manifest_path.write_text(json.dumps(manifest, indent=2))
    
    # Open in default browser
    subprocess.run(["open", str(shell_dir / "index.html")])  # macOS
    # subprocess.run(["xdg-open", str(shell_dir / "index.html")])  # Linux

Pattern: Review Feedback → Retouch Prompt

模式:审核反馈 → 修改提示词

python
def feedback_to_retouch_prompt(
    original_prompt: str,
    feedback: dict,
) -> str:
    """Convert structured review feedback into an updated generation prompt."""
    issue = feedback["description"]
    fix = feedback.get("suggested_fix", "")
    
    retouch_instruction = f"REVISION: {issue}"
    if fix:
        retouch_instruction += f" Fix: {fix}"
    
    return f"{original_prompt}\n\n{retouch_instruction}"

python
def feedback_to_retouch_prompt(
    original_prompt: str,
    feedback: dict,
) -> str:
    """Convert structured review feedback into an updated generation prompt."""
    issue = feedback["description"]
    fix = feedback.get("suggested_fix", "")
    
    retouch_instruction = f"REVISION: {issue}"
    if fix:
        retouch_instruction += f" Fix: {fix}"
    
    return f"{original_prompt}\n\n{retouch_instruction}"

Troubleshooting

故障排查

Image generation returns an error

图像生成返回错误

python
undefined
python
undefined

Check: OPENAI_API_KEY is set

Check: OPENAI_API_KEY is set

import os assert os.getenv("OPENAI_API_KEY"), "OPENAI_API_KEY not set"
import os assert os.getenv("OPENAI_API_KEY"), "OPENAI_API_KEY not set"

Check: size parameter is valid for gpt-image-1

Check: size parameter is valid for gpt-image-1

Valid sizes: "1024x1024", "1536x1024", "1024x1536", "auto"

Valid sizes: "1024x1024", "1536x1024", "1024x1536", "auto"

For 16:9 slides, use "1536x1024" (landscape)

For 16:9 slides, use "1536x1024" (landscape)

undefined
undefined

PPTX images appear blurry

PPTX图像显示模糊

python
undefined
python
undefined

Use the highest resolution size available

Use the highest resolution size available

Then let python-pptx scale to fill the slide — do NOT upscale manually

Then let python-pptx scale to fill the slide — do NOT upscale manually

size = "1536x1024" # use this instead of "1024x1024" for landscape slides
undefined
size = "1536x1024" # use this instead of "1024x1024" for landscape slides
undefined

Style consistency breaks across slides

幻灯片间风格不一致

Root cause: prompt drift — each slide prompt diverges from the locked spec.
Fix:
  1. Prepend spec_lock.md's "Locked" section to EVERY slide prompt
  2. Use a prompt prefix template:

STYLE_PREFIX = """
[STYLE LOCK] Dark navy #0A1628 background. Heading font: Inter Bold.
Logo bottom-left. Slide number bottom-right. Teal accent #2DD4BF on highlights.
DO NOT add gradients, textures, or decorative borders not in this spec.
"""

full_prompt = STYLE_PREFIX + slide_specific_prompt
Root cause: prompt drift — each slide prompt diverges from the locked spec.
Fix:
  1. Prepend spec_lock.md's "Locked" section to EVERY slide prompt
  2. Use a prompt prefix template:

STYLE_PREFIX = """
[STYLE LOCK] Dark navy #0A1628 background. Heading font: Inter Bold.
Logo bottom-left. Slide number bottom-right. Teal accent #2DD4BF on highlights.
DO NOT add gradients, textures, or decorative borders not in this spec.
"""

full_prompt = STYLE_PREFIX + slide_specific_prompt

content_report.md
content ends up in slides verbatim

content_report.md
内容被直接照搬进幻灯片

This is a workflow sequencing error.
content_report.md is a SOURCE document, not a script.
The slide_blueprint.md should ADAPT content into slide-appropriate payloads.
Each slide blueprint entry must go through:
  content_report → narrative selection → slide payload → generation prompt
Never pipe content_report text directly into image generation prompts.
这是工作流顺序错误导致的问题。
content_report.md是源文档,而非脚本。
slide_blueprint.md应将内容调整为适合幻灯片的呈现形式。
每个幻灯片蓝图条目必须经过以下流程:
  content_report → 叙事筛选 → 幻灯片内容 → 生成提示词
切勿将content_report的文本直接传入图像生成提示词。

Review shell not loading images

审核壳无法加载图像

bash
undefined
bash
undefined

Images must be accessible from the shell's local path

Images must be accessible from the shell's local path

Serve the output directory locally if needed:

Serve the output directory locally if needed:

cd output && python -m http.server 8080
cd output && python -m http.server 8080

Then open assets/review_shell/index.html with base path set to localhost:8080

Then open assets/review_shell/index.html with base path set to localhost:8080


---

---

Key Constraints to Respect

需遵守的核心约束

ConstraintRule
Confirmation gatesThere are 3 mandatory pause points: requirements confirm, pre-generation confirm, review. Do not skip them.
Preview = real imagesNever substitute text mockups, ASCII art, or placeholder boxes for image previews.
Content before styleIf materials are thin,
content_report.md
must be generated before any style work begins.
spec_lock.md is bindingFields in the "Locked" section must be enforced in every generation prompt.
No fabricationNever invent names, statistics, affiliations, dates, or citations not present in user materials or
content_report.md
.
Review is mandatoryAfter first full generation, always enter the review loop — do not present output as final by default.
Aspect ratioDefault is 16:9. Do not change unless user explicitly requests it.

约束项规则
确认节点存在3个强制暂停节点:需求确认、生成前确认、审核。不得跳过。
预览必须是真实图像不得用文字原型、ASCII艺术图或占位框替代图像预览。
先做内容再定风格若素材不足,必须先生成
content_report.md
再开展风格相关工作。
spec_lock.md
具有约束力
「Locked」部分的内容必须在所有生成提示词中严格执行。
不得编造内容不得编造用户素材或
content_report.md
中未提及的姓名、统计数据、机构 affiliation、日期或引用信息。
审核是强制环节首次完整生成后,必须进入审核循环——默认不得将输出作为最终成果展示。
宽高比默认宽高比为16:9,除非用户明确要求更改。

References

参考资料

  • references/workflow.md
    — Complete stage-by-stage workflow specification
  • references/conversation_framework.md
    — Intake rules and confirmation dialogue patterns
  • references/preview-flow.md
    — Image preview generation and shell integration details
  • templates/
    — Reference templates for all four planning artifacts
  • references/workflow.md
    — 完整的分阶段工作流规范
  • references/conversation_framework.md
    — 需求收集规则与确认对话模式
  • references/preview-flow.md
    — 图像预览生成与壳集成细节
  • templates/
    — 所有四类规划文档的参考模板