comfyui-character-gen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ComfyUI Character Generation Expert

ComfyUI角色生成专家

Build production-ready ComfyUI workflows for consistent character generation across image, video, and voice modalities.
构建可在图像、视频、语音模态下生成一致性角色的生产级ComfyUI工作流。

Quick Decision: Which Approach?

快速决策:该选哪种方案?

Starting from reference images (like 3D renders)?InfiniteYou (state-of-the-art 2025) or InstantID + IP-Adapter (proven, lower VRAM)
Need highest identity fidelity?FLUX.2 (NEW 2026: up to 10 ref images) or PuLID Flux II (no model pollution)
Want iterative editing without retraining?FLUX Kontext (context-aware, maintains consistency across edits)
Creating video content?LTX-2 (NEW 2026: 4K production-ready), Wan 2.2 MoE (film-level), or FramePack (60-sec on 6GB!)
Need voice for character?TTS Audio Suite (unified platform, 23 languages) or F5-TTS Cross-Lingual (NEW 2026)
从参考图像(如3D渲染图)开始生成?InfiniteYou(2025年最新SOTA方案)或InstantID + IP-Adapter(经过验证,显存需求更低)
需要最高的身份保真度?FLUX.2(2026年新特性:最多支持10张参考图)或 PuLID Flux II(无模型污染问题)
想要无需重训的迭代式编辑能力?FLUX Kontext(上下文感知,编辑过程中保持一致性)
要创作视频内容?LTX-2(2026年新特性:4K生产级可用)、Wan 2.2 MoE(电影级画质)或 FramePack(6GB显存即可生成60秒视频!)
需要为角色生成语音?TTS Audio Suite(统一平台,支持23种语言)或 F5-TTS Cross-Lingual(2026年新特性)

Core Workflow Patterns

核心工作流模式

Pattern 1: Zero-Shot Character Generation (No Training)

模式1:零样本角色生成(无需训练)

Best for: Quick iteration, 3D-to-photorealism conversion, limited reference images
Load Reference Face → InstantID + IP-Adapter FaceID → ControlNet Pose → KSampler → FaceDetailer → Upscale
Critical settings:
  • CFG: 4-5 (prevents burning with InstantID)
  • Resolution: 1016×1016 (avoids watermark artifacts)
  • IP-Adapter weight: 0.6-0.8
  • InstantID noise injection: 35% to negative
See
references/workflows.md
for complete node configurations.
最佳适用场景:快速迭代、3D转写实转换、参考图像有限
Load Reference Face → InstantID + IP-Adapter FaceID → ControlNet Pose → KSampler → FaceDetailer → Upscale
关键设置:
  • CFG:4-5(避免InstantID出现过曝问题)
  • 分辨率:1016×1016(避免水印伪影)
  • IP-Adapter权重:0.6-0.8
  • InstantID噪声注入:35%注入负向
完整节点配置请查看
references/workflows.md

Pattern 2: LoRA + Identity Methods (Maximum Consistency)

模式2:LoRA + 身份保留方案(最高一致性)

Best for: Production work, character series, video generation base
Train LoRA → Load LoRA + Checkpoint → Add InstantID/PuLID → Generate → FaceDetailer → ReActor (optional) → Upscale
Training requirements:
  • 15-30 images, varied poses/expressions/lighting
  • Unique trigger word (e.g., "sage_character")
  • See
    references/lora-training.md
    for full parameters
最佳适用场景:生产级工作、角色系列创作、视频生成基底
Train LoRA → Load LoRA + Checkpoint → Add InstantID/PuLID → Generate → FaceDetailer → ReActor (可选) → Upscale
训练要求:
  • 15-30张图像,包含不同姿势/表情/光照条件
  • 唯一触发词(例如"sage_character")
  • 完整参数请查看
    references/lora-training.md

Pattern 3: Video Generation Pipeline

模式3:视频生成流水线

Best for: Talking heads, character animation, promotional content
Generate/Load Hero Image → Wan 2.1 I2V OR AnimateDiff → FaceDetailer per frame → Frame Interpolation → Video Combine
Model selection:
  • Wan 2.1 14B: Best quality, 24GB+ VRAM, slower
  • Wan 2.1 1.3B: 8GB VRAM, good quality, faster
  • AnimateDiff Lightning: Fastest, best for iteration
最佳适用场景:数字人说话头、角色动画、宣传内容
Generate/Load Hero Image → Wan 2.1 I2V OR AnimateDiff → FaceDetailer per frame → Frame Interpolation → Video Combine
模型选择:
  • Wan 2.1 14B:画质最优,需24GB+显存,速度较慢
  • Wan 2.1 1.3B:需8GB显存,画质良好,速度更快
  • AnimateDiff Lightning:速度最快,适合迭代调试

Pattern 4: Talking Head with Voice

模式4:带语音的说话头生成

Best for: Character dialogue, presentations, social content
Two approaches available:
Approach 1 (Image → Talking Head):
Character Portrait → Generate Audio → SadTalker/LivePortrait → CodeFormer Enhancement → Final Video

Approach 2 (Video → Add Voice):
Existing Video → Generate Audio → Wav2Lip Lip-Sync → CodeFormer Enhancement → Final Video
See
references/talking-head-workflows.md
for complete workflows and
references/voice-synthesis.md
for voice creation options.
最佳适用场景:角色对话、演示内容、社交内容
提供两种实现方案:
方案1(图像→说话头):
Character Portrait → Generate Audio → SadTalker/LivePortrait → CodeFormer Enhancement → Final Video

方案2(视频→添加语音):
Existing Video → Generate Audio → Wav2Lip Lip-Sync → CodeFormer Enhancement → Final Video
完整工作流请查看
references/talking-head-workflows.md
,语音创建选项请查看
references/voice-synthesis.md

Model Recommendations (2026 Updated)

模型推荐(2026年更新)

Image Generation

图像生成

Use CaseModelNotes
Best photorealismFLUX.1-devSlow but superior quality
Multi-reference consistencyFLUX.2NEW 2026: Up to 10 ref images, strong identity preservation
Fast iterationRealVisXL V5.0Good balance speed/quality
Character editingFLUX KontextContext-aware, maintains consistency across edits
Iterative refinementFLUX Kontext Pro/Max8x faster than GPT-Image (API)
适用场景模型备注
最佳写实效果FLUX.1-dev速度较慢但画质更优
多参考一致性FLUX.22026年新特性:最多支持10张参考图,身份保留能力强
快速迭代RealVisXL V5.0速度和画质的均衡之选
角色编辑FLUX Kontext上下文感知,编辑过程中保持一致性
迭代优化FLUX Kontext Pro/Max比GPT-Image(API)快8倍

Identity Preservation (2026 State-of-Art)

身份保留(2026年最新SOTA)

MethodBest ForVRAMNotes
FLUX.2Multi-reference consistency24GB+NEW 2026: Up to 10 ref images, branded content
InfiniteYouHighest identity match24GBICCV 2025 Highlight, SIM/AES variants
FLUX KontextIterative editing12-32GBBuilt-in consistency, no retraining
PuLID Flux IIDual characters, no pollution24-40GBContrastive alignment solves model pollution
AuraFaceCommercial identity encoding12GBNEW 2026: Open-source ArcFace alternative
InstantIDStyle transfer, 3D→realistic12GBMaintenance mode but still excellent
IP-Adapter FaceIDSpeed, lower VRAM6GB+Good baseline approach
方案最佳适用场景显存需求备注
FLUX.2多参考一致性24GB+2026年新特性:最多支持10张参考图,适用于品牌内容创作
InfiniteYou最高身份匹配度24GBICCV 2025亮点成果,提供SIM/AES变体
FLUX Kontext迭代式编辑12-32GB内置一致性能力,无需重训
PuLID Flux II双角色场景,无模型污染24-40GB对比对齐技术解决模型污染问题
AuraFace商用身份编码12GB2026年新特性:开源的ArcFace替代方案
InstantID风格迁移、3D转写实12GB已进入维护模式但效果依然出色
IP-Adapter FaceID速度优先、低显存场景6GB+不错的基线方案

Video Generation

视频生成

ModelQualitySpeedVRAMNotes
LTX-2★★★★★Medium16GB+NEW 2026: First open-source 4K audio+video, production-ready
Wan 2.2 MoE★★★★★Slow24GB+Film-level aesthetics, first+last frame control
FramePack★★★★★Medium6GB60-sec videos, VRAM-invariant breakthrough
Wan 2.1 1.3B★★★★Medium8GB+Consumer-friendly
AnimateDiff V3★★★Fast8GBMotion/camera LoRAs, infinite length
模型画质速度显存需求备注
LTX-2★★★★★中等16GB+2026年新特性:首个开源4K音视频方案,生产级可用
Wan 2.2 MoE★★★★★24GB+电影级美学效果,支持首尾帧控制
FramePack★★★★★中等6GB可生成60秒视频,显存无关技术突破
Wan 2.1 1.3B★★★★中等8GB+消费级硬件友好
AnimateDiff V3★★★8GB支持运动/相机LoRA,可生成无限长度

Voice/TTS

语音/TTS

ToolLicenseQualityFeatures
TTS Audio SuiteMulti★★★★★Unified platform, 23 languages, emotion control
F5-TTSMIT★★★★Zero-shot from <15 sec samples, Cross-Lingual 2026
ChatterboxMIT★★★★★Paralinguistic tags (
[laugh]
,
[sigh]
), 4 voices
IndexTTS-2MIT★★★★8-emotion vector control
ElevenLabsCommercial★★★★★Production quality (API)
工具协议画质特性
TTS Audio Suite多协议★★★★★统一平台,支持23种语言,情绪控制
F5-TTSMIT★★★★少于15秒样本即可零样本生成,2026年支持跨语言
ChatterboxMIT★★★★★支持副语言标签(
[laugh]
[sigh]
),4种音色
IndexTTS-2MIT★★★★8种情绪向量控制
ElevenLabs商用★★★★★生产级音质(API)

Essential Custom Nodes

必备自定义节点

Install via ComfyUI-Manager:
ComfyUI-Manager              # Must install first
ComfyUI_IPAdapter_plus       # IP-Adapter and FaceID
ComfyUI_InstantID            # InstantID workflow
ComfyUI-Impact-Pack          # FaceDetailer
ComfyUI-ReActor              # Face swapping
ComfyUI-AnimateDiff-Evolved  # Video generation
ComfyUI-VideoHelperSuite     # Video I/O
comfyui_controlnet_aux       # Pose/depth preprocessors
ComfyUI_UltimateSDUpscale    # Tiled upscaling
ComfyUI-Frame-Interpolation  # Smooth video
通过ComfyUI-Manager安装:
ComfyUI-Manager              # 必须最先安装
ComfyUI_IPAdapter_plus       # IP-Adapter和FaceID支持
ComfyUI_InstantID            # InstantID工作流支持
ComfyUI-Impact-Pack          # FaceDetailer支持
ComfyUI-ReActor              # 换脸支持
ComfyUI-AnimateDiff-Evolved  # 视频生成支持
ComfyUI-VideoHelperSuite     # 视频输入输出支持
comfyui_controlnet_aux       # 姿势/深度预处理器
ComfyUI_UltimateSDUpscale    # 分块超分支持
ComfyUI-Frame-Interpolation  # 视频插帧支持

RTX 50 Series Optimization (NEW 2026)

RTX 50系列优化(2026年新特性)

With 32GB VRAM on RTX 5090, run most workflows without optimization. ComfyUI v0.8.1 adds major RTX 50 Series enhancements:
Launch flags: --highvram --fp8_e4m3fn-unet
NEW v0.8.1 Features:
  • NVFP4/NVFP8 precision formats: 3x faster performance, 60% VRAM reduction on RTX 50 Series
  • Weight streaming: Uses system RAM when VRAM exhausted, enables larger models on mid-range GPUs
  • Enable tiled VAE for 8K+ upscaling
  • Batch 4× 1024×1024 generations in parallel
  • Run Wan 2.2 14B + LTX-2 natively
  • Use FP8 quantization for FLUX (50% VRAM reduction)
搭配RTX 5090的32GB显存,无需优化即可运行绝大多数工作流。ComfyUI v0.8.1新增了大量RTX 50系列增强特性:
启动参数:--highvram --fp8_e4m3fn-unet
v0.8.1新特性:
  • NVFP4/NVFP8精度格式:RTX 50系列下性能提升3倍,显存占用降低60%
  • 权重流加载:显存不足时使用系统内存,让中端GPU也能运行更大的模型
  • 支持分块VAE实现8K+超分
  • 可并行批量生成4张1024×1024图像
  • 可原生运行Wan 2.2 14B + LTX-2
  • 支持FLUX的FP8量化,显存占用降低50%

Workflow Generation Process

工作流生成流程

When building a workflow for a user:
  1. Clarify the goal: Image only? Video? With voice? What's the source material?
  2. Select the pipeline pattern from above based on requirements
  3. Generate the workflow following node configurations in
    references/workflows.md
  4. Include model downloads with exact filenames and paths from
    references/models.md
  5. Provide parameter recommendations specific to their hardware/use case
为用户构建工作流时:
  1. 明确需求:仅生成图像?还是视频?是否需要语音?源素材是什么?
  2. 根据需求从上述方案中选择流水线模式
  3. 按照
    references/workflows.md
    中的节点配置生成工作流
  4. 包含模型下载链接,提供
    references/models.md
    中的精确文件名和路径
  5. 提供适配用户硬件/使用场景的参数建议

Reference Files

参考文件

  • references/research-log.md
    - Latest techniques: InfiniteYou, FLUX Kontext, PuLID Flux II, Wan 2.2 MoE, FramePack, FLUX.2, LTX-2.3, Wan 2.6, Qwen3-TTS, and more
  • references/models.md
    - Complete model list with HuggingFace/Civitai links, file paths, and compatibility notes
  • references/workflows.md
    - Detailed node-by-node workflow templates for each pattern
  • references/lora-training.md
    - LoRA training guide with Kohya/AI-Toolkit parameters
  • references/voice-synthesis.md
    - Voice cloning, TTS, and lip-sync pipeline details
  • references/talking-head-workflows.md
    - Complete talking head workflows: Image→Talking Head (SadTalker, LivePortrait) and Video→Add Voice (Wav2Lip) with production scripts
  • references/evolution.md
    - Update sources, changelog, and user-specific learnings
  • references/research-log.md
    - 最新技术:InfiniteYou、FLUX Kontext、PuLID Flux II、Wan 2.2 MoE、FramePack、FLUX.2、LTX-2.3、Wan 2.6、Qwen3-TTS等
  • references/models.md
    - 完整模型列表,包含HuggingFace/Civitai链接、文件路径和兼容性说明
  • references/workflows.md
    - 每个模式的逐节点详细工作流模板
  • references/lora-training.md
    - LoRA训练指南,包含Kohya/AI-Toolkit参数
  • references/voice-synthesis.md
    - 语音克隆、TTS和口型同步流水线细节
  • references/talking-head-workflows.md
    - 完整说话头工作流:图像→说话头(SadTalker、LivePortrait)和视频→添加语音(Wav2Lip),附带生产级脚本
  • references/evolution.md
    - 更新来源、变更日志和用户特定经验总结

Skill Evolution

技能演进

This skill is designed to evolve. When helping the user:
Before starting a workflow:
  • Check if new models have dropped that might be better (search HuggingFace/Civitai if uncertain)
  • Consider if user's past successes/failures inform the approach
After completing a workflow:
  • Note what worked well or poorly for future reference
  • If user discovers better settings, update the relevant reference file
Proactive updates:
  • When the user mentions a new model or technique, research and integrate it
  • Periodically suggest checking for updates to key dependencies
See
references/evolution.md
for monitoring sources and update protocols.
本技能设计为可迭代演进。为用户提供帮助时:
启动工作流前:
  • 检查是否有新发布的更合适的模型(不确定时可搜索HuggingFace/Civitai)
  • 参考用户过往的成功/失败案例调整方案
完成工作流后:
  • 记录效果好坏,供未来参考
  • 如果用户发现了更优的设置,更新对应的参考文件
主动更新:
  • 当用户提到新的模型或技术时,进行调研并集成到方案中
  • 定期建议用户更新核心依赖
监控来源和更新协议请查看
references/evolution.md

Example: 3D Render to Photorealistic Character

示例:3D渲染图转写实角色

For converting stylized 3D renders (like game/VN characters) to photorealistic images:
Recommended approach: InstantID + IP-Adapter FaceID on FLUX
1. Load 3D render reference (best quality, front-facing)
2. Apply InstantID (extracts identity + facial keypoints)
3. Apply IP-Adapter FaceID Plus V2 (weight 0.7)
4. Use FLUX.1-dev checkpoint
5. Prompt: "photorealistic portrait, detailed skin texture, natural lighting, [character description]"
6. CFG: 4-5, Steps: 25-30
7. FaceDetailer pass (denoise 0.35)
8. Upscale with 4x-UltraSharp
This converts the stylized look to photorealism while preserving the core identity features.
将风格化3D渲染图(如游戏/视觉小说角色)转换为写实图像:
推荐方案:基于FLUX的InstantID + IP-Adapter FaceID
1. 加载3D渲染参考图(最高画质、正面朝向最佳)
2. 应用InstantID(提取身份特征+人脸关键点)
3. 应用IP-Adapter FaceID Plus V2(权重0.7)
4. 使用FLUX.1-dev checkpoint
5. 提示词:"photorealistic portrait, detailed skin texture, natural lighting, [角色描述]"
6. CFG:4-5,步骤:25-30
7. FaceDetailer处理(去噪强度0.35)
8. 使用4x-UltraSharp超分
该方案可将风格化外观转换为写实效果,同时保留核心身份特征。