director

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Director — Story-to-Video Production Pipeline

Director — 文本故事转视频生产流程

Overview

概述

The Director skill orchestrates a complete short film production from a text story. It breaks the story into scenes, generates start/end frames for each, creates video clips from frame pairs, and concatenates everything into a final video.
Pipeline: Story Planning → Z-Image Hero + Character Refs → Qwen Edit Chain (all frames) → WAN 2.2 FLF Video Clips → ffmpeg Concatenation
Key architectural decisions:
  • 1 hero frame + edit chain for character consistency (NEVER independent Z-Image per scene)
  • Inter-scene frame continuity: Scene N's end frame IS Scene N+1's start frame (same image file, no edit gap)
  • Character reference images fed into Qwen Edit's extra image slots
  • State file persists to disk for context compaction survival
  • Each scene is independently retryable without affecting others
  • clear_vram
    between every model family switch
Director技能可根据文本故事编排完整的短片制作流程。它将故事拆分为多个分镜,为每个分镜生成起始/结束帧,基于帧对创建视频片段,并将所有内容拼接成最终视频。
流程: 故事规划 → Z-Image生成主帧+角色参考图 → Qwen Edit链(生成所有帧) → WAN 2.2 FLF视频片段 → ffmpeg拼接
核心架构决策:
  • 1张主帧+编辑链保证角色一致性(绝对不要为每个分镜独立生成Z-Image)
  • 分镜间帧连续性: 第N个分镜的结束帧就是第N+1个分镜的起始帧(同一图像文件,无编辑断层)
  • 角色参考图传入Qwen Edit的额外图像插槽
  • 状态文件持久化到磁盘,避免上下文压缩丢失信息
  • 每个分镜可独立重试,不影响其他分镜
  • 切换模型家族时执行
    clear_vram
    释放显存

CRITICAL: Character Consistency

关键:角色一致性

Independent Z-Image generations per scene produce different-looking characters. This was the #1 problem discovered during testing. The solution:
  1. Generate ONE hero frame with Z-Image — establishes the main character, setting, and lighting
  2. Generate character reference images — close-up portraits of each character, key props, and the background
  3. ALL other scene frames are created via Qwen Edit chain from the hero, with character refs in extra image slots
  4. This ensures the same face, clothing, and environment across every frame
**为每个分镜独立生成Z-Image会导致角色外观不一致。**这是测试中发现的头号问题。解决方案如下:
  1. 使用Z-Image生成1张主帧——确定主角、场景和光线
  2. 生成角色参考图——每个角色的特写肖像、关键道具和背景
  3. 所有其他分镜帧都通过Qwen Edit链基于主帧生成,同时将角色参考图放入额外图像插槽
  4. 这样可确保所有帧中的面部、服装和环境保持一致

8-Phase Pipeline

8阶段流程

Phase 1: Story Planning       → Break story into scenes (Claude reasoning, no ComfyUI)
Phase 2: Hero + Refs          → Z-Image: 1 hero frame + character ref portraits + background ref
Phase 3: Hero Review          → Visual verify hero and refs, user approves
Phase 4: Edit Chain           → Qwen Edit: chain ALL scene frames from hero (with char refs in slots 2-3)
Phase 5: Frame Review         → Visual verify all frames, approve/reject/retry
Phase 6: Video Clips          → WAN 2.2 FLF dual Hi-Lo (one clip per scene)
Phase 7: Video Review         → Preview each clip
Phase 8: Final Assembly       → ffmpeg concat all clips into one MP4
阶段1: 故事规划       → 将故事拆分为分镜(基于Claude推理,无需ComfyUI)
阶段2: 主帧+参考图    → Z-Image生成:1张主帧+角色参考肖像+背景参考图
阶段3: 主帧审核       → 视觉验证主帧和参考图,用户确认通过
阶段4: 编辑链         → Qwen Edit:基于主帧生成所有分镜帧(角色参考图放入插槽2-3)
阶段5: 帧审核         → 视觉验证所有帧,确认通过/驳回/重试
阶段6: 视频片段生成   → WAN 2.2 FLF双高低模(每个分镜生成一个片段)
阶段7: 视频审核       → 预览每个视频片段
阶段8: 最终组装       → ffmpeg拼接所有片段为单个MP4文件

State File Format

状态文件格式

Saved at
~/code/comfyui-mcp/workflows/director_state_{project_id}.json
. Updated after every edit or phase completion.
json
{
  "project_id": "story_20260216_143022",
  "created": "2026-02-16T14:30:22Z",
  "story": "Original user story text",
  "current_phase": 4,
  "orientation": "portrait",
  "hero_frame": { "file": "director_hero_00001_.png", "seed": 428571, "approved": true },
  "character_refs": {
    "man": "director_ref_man.png",
    "cat": "director_ref_cat.png",
    "woman": "director_ref_woman.png",
    "background": "director_ref_bedroom.png"
  },
  "scenes": [
    {
      "id": 1,
      "description": "Brief scene description",
      "edit_prompt_start": "Qwen Edit instruction to create start frame from source",
      "edit_prompt_end": "Qwen Edit instruction to create end frame from source",
      "edit_source_start": "hero",
      "edit_source_end": "hero",
      "video_prompt": "WAN motion description",
      "start_frame": { "file": "director_s1_start_00001_.png", "seed": 12345, "approved": true },
      "end_frame": { "file": "director_hero_00001_.png", "seed": null, "approved": true },
      "video_clip": { "file": "director_s1_00001.mp4", "seed": 11111, "approved": false },
      "status": "video_pending"
    }
  ],
  "final_video": null,
  "settings": {
    "start_frame_resolution": [832, 1472],
    "video_resolution": [480, 720],
    "video_frames": 81,
    "video_fps": 16
  }
}
保存路径:
~/code/comfyui-mcp/workflows/director_state_{project_id}.json
。每次编辑或阶段完成后更新。
json
{
  "project_id": "story_20260216_143022",
  "created": "2026-02-16T14:30:22Z",
  "story": "Original user story text",
  "current_phase": 4,
  "orientation": "portrait",
  "hero_frame": { "file": "director_hero_00001_.png", "seed": 428571, "approved": true },
  "character_refs": {
    "man": "director_ref_man.png",
    "cat": "director_ref_cat.png",
    "woman": "director_ref_woman.png",
    "background": "director_ref_bedroom.png"
  },
  "scenes": [
    {
      "id": 1,
      "description": "Brief scene description",
      "edit_prompt_start": "Qwen Edit instruction to create start frame from source",
      "edit_prompt_end": "Qwen Edit instruction to create end frame from source",
      "edit_source_start": "hero",
      "edit_source_end": "hero",
      "video_prompt": "WAN motion description",
      "start_frame": { "file": "director_s1_start_00001_.png", "seed": 12345, "approved": true },
      "end_frame": { "file": "director_hero_00001_.png", "seed": null, "approved": true },
      "video_clip": { "file": "director_s1_00001.mp4", "seed": 11111, "approved": false },
      "status": "video_pending"
    }
  ],
  "final_video": null,
  "settings": {
    "start_frame_resolution": [832, 1472],
    "video_resolution": [480, 720],
    "video_frames": 81,
    "video_fps": 16
  }
}

Models Used Per Phase

各阶段使用的模型

PhaseModel FamilyKey ModelsVRAM
2: Hero + RefsZ-Image
redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors
~17GB
4: Edit ChainQwen Edit
qwen_image_edit_2511_bf16.safetensors
+ Lightning LoRA
~17-18GB
6: Video ClipsWAN 2.2 I2VRemix NSFW Hi+Lo (built-in lightning)~22-24GB
CRITICAL:
clear_vram
between every model family switch.
阶段模型家族核心模型显存占用
2: 主帧+参考图Z-Image
redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors
~17GB
4: 编辑链Qwen Edit
qwen_image_edit_2511_bf16.safetensors
+ Lightning LoRA
~17-18GB
6: 视频片段WAN 2.2 I2VRemix NSFW Hi+Lo(内置lightning)~22-24GB
关键: 切换模型家族时必须执行
clear_vram

Phase 1: Story Planning

阶段1: 故事规划

Break the story into 2-6 scenes. For each scene, identify:
  • description: What happens (1-2 sentences)
  • start frame: What the opening frame looks like
  • end frame: What the closing frame looks like
  • video_prompt: Motion description for FLF transition
Identify a hero frame — the single most representative scene image that establishes the main character and setting. This hero will anchor all other frames via Qwen Edit.
Also identify which character reference images are needed (portraits of each character, key props, background).
将故事拆分为2-6个分镜。为每个分镜确定:
  • 描述: 发生的内容(1-2句话)
  • 起始帧: 开场帧的样子
  • 结束帧: 收尾帧的样子
  • video_prompt: FLF转场的动作描述
确定主帧——最具代表性的场景图像,用于确立主角和场景。所有其他帧都将通过Qwen Edit基于此主帧生成。
同时确定需要哪些角色参考图(每个角色的肖像、关键道具、背景)。

CRITICAL: Inter-Scene Frame Continuity

关键:分镜间帧连续性

The end frame of Scene N must be the EXACT same image file as the start frame of Scene N+1. Do NOT create separate Qwen-edited start frames for subsequent scenes — this causes visible jumps at scene boundaries when the videos are concatenated.
The frame chain for video generation:
Scene 1: S1_start (unique)        → hero (end)
Scene 2: hero (= S1 end)          → S2_end
Scene 3: S2_end (= S2 end)        → S3_end
Scene 4: S3_end (= S3 end)        → S4_end
Scene 5: S4_end (= S4 end)        → S5_end
Only Scene 1 needs a unique start frame. All other scenes inherit their start from the previous scene's end.
**第N个分镜的结束帧必须与第N+1个分镜的起始帧完全为同一图像文件。**不要为后续分镜单独创建Qwen编辑的起始帧——这会导致视频拼接时分镜边界出现明显跳变。
视频生成的帧链:
分镜1: S1起始帧(唯一)        → 主帧(结束帧)
分镜2: 主帧(=S1结束帧)       → S2结束帧
分镜3: S2结束帧(=S2结束帧)    → S3结束帧
分镜4: S3结束帧(=S3结束帧)    → S4结束帧
分镜5: S4结束帧(=S4结束帧)    → S5结束帧
只有分镜1需要唯一的起始帧。所有其他分镜的起始帧均继承自前一个分镜的结束帧。

Edit Chain Planning

编辑链规划

The edit chain produces only end frames (plus Scene 1's unique start frame). Map which end frame derives from which source:
  • Some end frames edit directly from the hero
  • Later end frames may chain from earlier end frames
  • Keep chains shallow (max 4-5 deep) to minimize drift
Example chain:
Hero (man+cat on bed)
  ├─ S1 Start: edit hero → remove cat, man alone
  ├─ S2 End: edit hero → replace cat with woman
  │    └─ S3 End: edit S2End → both sit up, man startled
  │         └─ S4 End: edit S3End → sitting close, warm smiles
  │              └─ S5 End: edit S4End → warm embrace
编辑链仅生成结束帧(加上分镜1的唯一起始帧)。规划每个结束帧的来源:
  • 部分结束帧直接基于主帧编辑生成
  • 后续结束帧可基于更早的结束帧链式生成
  • 保持链的深度较浅(最多4-5层),以最小化图像漂移
示例链:
主帧(男人+猫在床上)
  ├─ S1起始帧:编辑主帧 → 移除猫,只剩男人
  ├─ S2结束帧:编辑主帧 → 把猫换成女人
  │    └─ S3结束帧:编辑S2结束帧 → 两人坐起,男人受惊
  │         └─ S4结束帧:编辑S3结束帧 → 紧坐在一起,温暖微笑
  │              └─ S5结束帧:编辑S4结束帧 → 温暖拥抱

Phase 2: Hero + Character References — Z-Image

阶段2: 主帧+角色参考图 — Z-Image

Generate with Z-Image RedCraft DX1 (10 steps, CFG 1, euler/simple):
  1. Hero frame: The establishing shot with main character + key elements
  2. Character ref portraits: Close-up of each character (man, woman, animal, etc.)
  3. Background ref: The setting without characters
Add to negative prompts for character refs to exclude wrong subjects (e.g., "woman, female" when generating man portrait).
使用Z-Image RedCraft DX1生成(10步,CFG 1,euler/simple采样器):
  1. 主帧: 包含主角+关键元素的定场镜头
  2. 角色参考肖像: 每个角色的特写(男人、女人、动物等)
  3. 背景参考图: 无角色的场景
生成角色参考图时,在负面提示词中添加排除错误主体的内容(例如生成男性肖像时添加"woman, female")。

Hero Frame Workflow Template

主帧工作流模板

json
{
  "1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors" }},
  "2": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["1", 1], "text": "<hero_prompt>" }, "_meta": { "title": "Positive" }},
  "3": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["1", 1], "text": "3D, ai generated, semi realistic, illustrated, drawing, comic, digital painting, 3D model, blender, video game screenshot, render, smooth textures, CGI, text, writing, subtitle, watermark, logo, blurry, low quality, jpeg artifacts, grainy" }, "_meta": { "title": "Negative" }},
  "4": { "class_type": "EmptyLatentImage", "inputs": { "width": 832, "height": 1472, "batch_size": 1 }},
  "5": { "class_type": "KSampler", "inputs": {
    "model": ["1", 0], "positive": ["2", 0], "negative": ["3", 0], "latent_image": ["4", 0],
    "seed": 42, "steps": 10, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
  }},
  "6": { "class_type": "VAEDecode", "inputs": { "samples": ["5", 0], "vae": ["1", 2] }},
  "7": { "class_type": "SaveImage", "inputs": { "images": ["6", 0], "filename_prefix": "director_hero" }}
}
Queue hero + all refs while Z-Image checkpoint is loaded (same checkpoint, different prompts).
json
{
  "1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors" }},
  "2": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["1", 1], "text": "<hero_prompt>" }, "_meta": { "title": "Positive" }},
  "3": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["1", 1], "text": "3D, ai generated, semi realistic, illustrated, drawing, comic, digital painting, 3D model, blender, video game screenshot, render, smooth textures, CGI, text, writing, subtitle, watermark, logo, blurry, low quality, jpeg artifacts, grainy" }, "_meta": { "title": "Negative" }},
  "4": { "class_type": "EmptyLatentImage", "inputs": { "width": 832, "height": 1472, "batch_size": 1 }},
  "5": { "class_type": "KSampler", "inputs": {
    "model": ["1", 0], "positive": ["2", 0], "negative": ["3", 0], "latent_image": ["4", 0],
    "seed": 42, "steps": 10, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
  }},
  "6": { "class_type": "VAEDecode", "inputs": { "samples": ["5", 0], "vae": ["1", 2] }},
  "7": { "class_type": "SaveImage", "inputs": { "images": ["6", 0], "filename_prefix": "director_hero" }}
}
加载Z-Image模型时,同时队列生成主帧+所有参考图(同一模型,不同提示词)。

Phase 3: Hero Review

阶段3: 主帧审核

Show hero frame and all character refs. User approves or requests regeneration with new seed.
展示主帧和所有角色参考图。用户确认通过或要求使用新种子重新生成。

Phase 4: Edit Chain — Qwen Image Edit

阶段4: 编辑链 — Qwen Image Edit

CRITICAL: Consistency Rules for Edit Prompts

关键:编辑提示词的一致性规则

  1. Always explicitly anchor clothing: "The man wears his grey t-shirt" in EVERY prompt
  2. Always state what doesn't change: "Same bedroom, same warm lighting, same clothing"
  3. Use strong emotion words: "extremely shocked and startled" >> "surprised"
  4. Include proportionality: "Her head and body should be proportional and natural looking"
  5. Prevent head enlargement: In embrace/close-up poses, Qwen Edit tends to enlarge heads. Add explicit: "do not enlarge her head, keep the same small natural size as in the original image"
  6. Describe the transformation, not just the end state
  1. 始终明确锚定服装: 每个提示词中都要包含"男人穿着灰色T恤"这类描述
  2. 始终说明不变的元素: "同一卧室,同一暖光,同一服装"
  3. 使用强烈的情感词汇: "极度震惊,下巴掉落,双眼圆睁完全不敢相信" >> "惊讶"
  4. 包含比例描述: "她的头部和身体比例协调,看起来自然"
  5. 防止头部放大: 在拥抱/特写姿势中,Qwen Edit容易放大头部。需明确添加:"不要放大她的头部,保持与原图一致的自然小尺寸"
  6. 描述转变过程,而非仅最终状态

Workflow Template (with Character Reference Slots)

工作流模板(含角色参考插槽)

json
{
  "1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwen_image_edit_2511_bf16.safetensors", "weight_dtype": "default" }},
  "2": { "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["1", 0], "lora_name": "Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors", "strength_model": 1 }},
  "3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
  "4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
  "5": { "class_type": "LoadImage", "inputs": { "image": "<source_scene.png>" }, "_meta": { "title": "Source Scene" }},
  "5b": { "class_type": "LoadImage", "inputs": { "image": "<character_ref.png>" }, "_meta": { "title": "Character Ref" }},
  "5c": { "class_type": "LoadImage", "inputs": { "image": "<background_ref.png>" }, "_meta": { "title": "Background Ref" }},
  "6": { "class_type": "TextEncodeQwenImageEditPlusAdvance_lrzjason", "inputs": {
    "clip": ["3", 0], "prompt": "<edit_prompt>", "vae": ["4", 0],
    "vl_resize_image1": ["5", 0],
    "vl_resize_image2": ["5b", 0],
    "vl_resize_image3": ["5c", 0],
    "target_size": 1024, "target_vl_size": 384,
    "upscale_method": "lanczos", "crop_method": "pad"
  }},
  "7": { "class_type": "ConditioningZeroOut", "inputs": { "conditioning": ["6", 0] }},
  "8": { "class_type": "KSampler", "inputs": {
    "model": ["2", 0], "positive": ["6", 0], "negative": ["7", 0], "latent_image": ["6", 1],
    "seed": 42, "steps": 4, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
  }},
  "9": { "class_type": "VAEDecode", "inputs": { "samples": ["8", 0], "vae": ["4", 0] }},
  "10": { "class_type": "SaveImage", "inputs": { "images": ["9", 0], "filename_prefix": "director_s1_start" }}
}
Key: slots 5b and 5c — feed character reference and background reference into
vl_resize_image2
and
vl_resize_image3
. This helps the vision encoder maintain character appearance across edits.
json
{
  "1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwen_image_edit_2511_bf16.safetensors", "weight_dtype": "default" }},
  "2": { "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["1", 0], "lora_name": "Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors", "strength_model": 1 }},
  "3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
  "4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
  "5": { "class_type": "LoadImage", "inputs": { "image": "<source_scene.png>" }, "_meta": { "title": "Source Scene" }},
  "5b": { "class_type": "LoadImage", "inputs": { "image": "<character_ref.png>" }, "_meta": { "title": "Character Ref" }},
  "5c": { "class_type": "LoadImage", "inputs": { "image": "<background_ref.png>" }, "_meta": { "title": "Background Ref" }},
  "6": { "class_type": "TextEncodeQwenImageEditPlusAdvance_lrzjason", "inputs": {
    "clip": ["3", 0], "prompt": "<edit_prompt>", "vae": ["4", 0],
    "vl_resize_image1": ["5", 0],
    "vl_resize_image2": ["5b", 0],
    "vl_resize_image3": ["5c", 0],
    "target_size": 1024, "target_vl_size": 384,
    "upscale_method": "lanczos", "crop_method": "pad"
  }},
  "7": { "class_type": "ConditioningZeroOut", "inputs": { "conditioning": ["6", 0] }},
  "8": { "class_type": "KSampler", "inputs": {
    "model": ["2", 0], "positive": ["6", 0], "negative": ["7", 0], "latent_image": ["6", 1],
    "seed": 42, "steps": 4, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
  }},
  "9": { "class_type": "VAEDecode", "inputs": { "samples": ["8", 0], "vae": ["4", 0] }},
  "10": { "class_type": "SaveImage", "inputs": { "images": ["9", 0], "filename_prefix": "director_s1_start" }}
}
关键:插槽5b和5c——将角色参考图和背景参考图传入
vl_resize_image2
vl_resize_image3
。这有助于视觉编码器在编辑过程中保持角色外观一致。

Chain Execution

链执行

Edits are sequential — each depends on the previous output:
  1. Run edit, wait for completion
  2. upload_image
    the output
  3. Use uploaded output as source for next edit
  4. Update state file after each edit
Independent edits (both from hero) can run in parallel.
编辑是顺序执行的——每个编辑依赖前一个输出:
  1. 运行编辑,等待完成
  2. upload_image
    上传输出
  3. 使用上传的输出作为下一次编辑的源图像
  4. 每次编辑后更新状态文件
独立编辑(均基于主帧)可并行运行。

Timing

耗时

  • First edit: ~87s (model loading)
  • Subsequent edits: ~30-40s each (models cached)
  • 首次编辑:~87秒(模型加载)
  • 后续编辑:~30-40秒/次(模型已缓存)

Phase 5: Frame Review

阶段5: 帧审核

For each frame, show via
Read
for visual inspection. User approves or provides feedback. Re-run individual edits without redoing the whole chain.
每个帧通过
Read
展示进行视觉检查。用户确认通过或提供反馈。可单独重新运行编辑,无需重做整个链。

Phase 6: Video Clip Generation — WAN 2.2 FLF Dual Hi-Lo

阶段6: 视频片段生成 — WAN 2.2 FLF双高低模

Workflow Template

工作流模板

(Same as wan-flf-video skill — Remix NSFW Hi+Lo, 4-stack LoRA, ImageResizeKJv2, dual KSamplerAdvanced)
Key settings:
  • Portrait: width=480, height=720
  • 81 frames, 16fps = ~5 seconds per clip
  • uni_pc/beta sampler, CFG 1, 4 total steps (Hi: 0→2, Lo: 2→4)
  • ModelSamplingSD3 shift=5 on both UNETs
(与wan-flf-video技能相同——Remix NSFW Hi+Lo,4层LoRA,ImageResizeKJv2,双KSamplerAdvanced)
核心设置:
  • 竖屏:width=480, height=720
  • 81帧,16fps = 每个片段约5秒
  • uni_pc/beta采样器,CFG 1,共4步(Hi: 0→2, Lo: 2→4)
  • 两个UNET均开启ModelSamplingSD3 shift=5

Morph LoRA

Morph LoRA

For transformation scenes (e.g., cat→woman), add morph LoRA to Hi/Lo Common stacks:
  • wan2.2_i2v_magical_morph_highnoise.safetensors
    → Hi Common slot 1 (strength 1.0)
  • wan2.2_i2v_magical_morph_lownoise.safetensors
    → Lo Common slot 1 (strength 1.0)
Use 1.0 strength — tested without sparkle issues. Lower values (0.7-0.85) produce weaker morph effects that may look like a dissolve rather than a true morph.
对于转换场景(如猫→女人),在Hi/Lo公共堆栈中添加morph LoRA:
  • wan2.2_i2v_magical_morph_highnoise.safetensors
    → Hi公共插槽1(强度1.0
  • wan2.2_i2v_magical_morph_lownoise.safetensors
    → Lo公共插槽1(强度1.0
使用1.0强度——测试无闪烁问题。较低值(0.7-0.85)会产生较弱的变形效果,看起来更像溶解而非真正的变形。

Per-Scene Changes

分镜专属设置

Swap per scene: start/end image filenames, positive prompt text, noise_seed, filename_prefix.
All 5 clips can be queued at once — they run sequentially in ComfyUI, sharing loaded models.
每个分镜需替换:起始/结束图像文件名、正面提示词文本、noise_seed、filename_prefix。
可同时队列所有5个片段——它们在ComfyUI中顺序运行,共享已加载的模型。

Phase 7: Video Review

阶段7: 视频审核

Report each clip's filename. User previews externally.
报告每个片段的文件名。用户在外部预览。

Phase 8: Final Assembly — ffmpeg Concat

阶段8: 最终组装 — ffmpeg拼接

bash
cd "<ComfyUI_output_dir>"
printf "file 'director_s1_00001.mp4'\nfile 'director_s2_00001.mp4'\n..." > concat_list.txt
ffmpeg -f concat -safe 0 -i concat_list.txt -c copy director_final_{project_id}.mp4
All clips share resolution/codec/framerate — copy-concat works without re-encoding.
bash
cd "<ComfyUI_output_dir>"
printf "file 'director_s1_00001.mp4'\nfile 'director_s2_00001.mp4'\n..." > concat_list.txt
ffmpeg -f concat -safe 0 -i concat_list.txt -c copy director_final_{project_id}.mp4
所有片段分辨率/编码/帧率一致——直接复制拼接即可,无需重新编码。

Resumption Protocol

恢复协议

After context compaction:
  1. Read state file AND
    director_session_notes.md
    if it exists
  2. Check
    current_phase
    and per-scene status
  3. Skip approved assets, continue from incomplete point
  4. clear_vram
    before loading the model family for the current phase
上下文压缩后:
  1. 读取状态文件及
    director_session_notes.md
    (若存在)
  2. 检查
    current_phase
    和各分镜状态
  3. 跳过已确认的资源,从未完成的阶段继续
  4. 加载当前阶段的模型家族前执行
    clear_vram

Timing Estimates (RTX 4090)

耗时估算(RTX 4090)

PhasePer Scene5 Scenes
Hero + Refs (Z-Image)~10s each~50s (one-time)
Edit Chain (Qwen 4-step)~35s each~280s (8 edits)
Video Clip (WAN FLF 81 frames)~140s~700s
VRAM swaps (3x clear_vram)~30s each~90s
Total generation~19 min
阶段每个分镜耗时5个分镜总耗时
主帧+参考图(Z-Image)~10秒/个~50秒(一次性)
编辑链(Qwen 4步)~35秒/个~280秒(8次编辑)
视频片段(WAN FLF 81帧)~140秒/个~700秒
显存切换(3次clear_vram)~30秒/次~90秒
总生成耗时~19分钟

Storytelling Props for Continuity

保证叙事连续性的视觉道具

Use distinctive visual elements that transfer between characters/forms to create narrative connections:
  • A colored collar on an animal → becomes a choker/necklace on the human form
  • Eye color matching between animal and human
  • Distinctive clothing or accessories that persist across scenes
  • These "continuity props" reinforce the story visually
使用独特的视觉元素在角色/形态间传递,建立叙事关联:
  • 动物的彩色项圈 → 变成人类的颈圈/项链
  • 动物与人类的眼睛颜色匹配
  • 跨分镜保留的独特服装或配饰
  • 这些“连续性道具”可从视觉上强化故事

Prompt Engineering for Edit Chains

编辑链的提示词工程

DO

建议做法

  • "The man wears his grey t-shirt" (anchor clothing every time)
  • "Same bedroom, same warm amber lamplight, same white sheets"
  • "Extremely shocked, jaw dropped, eyes wide in total disbelief"
  • "Her head and body proportional and natural looking"
  • “男人穿着灰色T恤”(每次都锚定服装)
  • “同一卧室,同一暖黄色灯光,同一白色床单”
  • “极度震惊,下巴掉落,双眼圆睁完全不敢相信”
  • “她的头部和身体比例协调,看起来自然”

DON'T

避免做法

  • Assume clothing/setting will be preserved automatically
  • Use mild emotion words ("surprised" → use "extremely shocked" instead)
  • Chain more than 5-6 edits deep without branching back to hero
  • Assume head proportions stay correct in embrace/hug poses — always add explicit size anchoring
  • 假设服装/场景会自动保留
  • 使用温和的情感词汇(将“惊讶”改为“极度震惊”)
  • 链式编辑超过5-6层而不回溯到主帧
  • 假设拥抱姿势中头部比例会保持正确——始终明确添加尺寸锚定描述

WAN Video Prompts

WAN视频提示词

  • Use motion verbs: "walks", "turns", "reaches", "sits up", "leans in"
  • AVOID: "magical", "enchanted", "mystical" (causes sparkle effects)
  • USE: "smoothly transforms", "seamlessly reshapes", "gradually"
  • Include scale cues: "grows into", "expands upward"
  • 使用动作动词:“walks”、“turns”、“reaches”、“sits up”、“leans in”
  • 避免:“magical”、“enchanted”、“mystical”(会导致闪烁效果)
  • 推荐:“smoothly transforms”、“seamlessly reshapes”、“gradually”
  • 包含尺度提示:“grows into”、“expands upward”