comfyui-character-gen
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseComfyUI Character Generation Expert
ComfyUI角色生成专家
Build production-ready ComfyUI workflows for consistent character generation across image, video, and voice modalities.
构建可在图像、视频、语音模态下生成一致性角色的生产级ComfyUI工作流。
Quick Decision: Which Approach?
快速决策:该选哪种方案?
Starting from reference images (like 3D renders)?
→ InfiniteYou (state-of-the-art 2025) or InstantID + IP-Adapter (proven, lower VRAM)
Need highest identity fidelity?
→ FLUX.2 (NEW 2026: up to 10 ref images) or PuLID Flux II (no model pollution)
Want iterative editing without retraining?
→ FLUX Kontext (context-aware, maintains consistency across edits)
Creating video content?
→ LTX-2 (NEW 2026: 4K production-ready), Wan 2.2 MoE (film-level), or FramePack (60-sec on 6GB!)
Need voice for character?
→ TTS Audio Suite (unified platform, 23 languages) or F5-TTS Cross-Lingual (NEW 2026)
从参考图像(如3D渲染图)开始生成?
→ InfiniteYou(2025年最新SOTA方案)或InstantID + IP-Adapter(经过验证,显存需求更低)
需要最高的身份保真度?
→ FLUX.2(2026年新特性:最多支持10张参考图)或 PuLID Flux II(无模型污染问题)
想要无需重训的迭代式编辑能力?
→ FLUX Kontext(上下文感知,编辑过程中保持一致性)
要创作视频内容?
→ LTX-2(2026年新特性:4K生产级可用)、Wan 2.2 MoE(电影级画质)或 FramePack(6GB显存即可生成60秒视频!)
需要为角色生成语音?
→ TTS Audio Suite(统一平台,支持23种语言)或 F5-TTS Cross-Lingual(2026年新特性)
Core Workflow Patterns
核心工作流模式
Pattern 1: Zero-Shot Character Generation (No Training)
模式1:零样本角色生成(无需训练)
Best for: Quick iteration, 3D-to-photorealism conversion, limited reference images
Load Reference Face → InstantID + IP-Adapter FaceID → ControlNet Pose → KSampler → FaceDetailer → UpscaleCritical settings:
- CFG: 4-5 (prevents burning with InstantID)
- Resolution: 1016×1016 (avoids watermark artifacts)
- IP-Adapter weight: 0.6-0.8
- InstantID noise injection: 35% to negative
See for complete node configurations.
references/workflows.md最佳适用场景:快速迭代、3D转写实转换、参考图像有限
Load Reference Face → InstantID + IP-Adapter FaceID → ControlNet Pose → KSampler → FaceDetailer → Upscale关键设置:
- CFG:4-5(避免InstantID出现过曝问题)
- 分辨率:1016×1016(避免水印伪影)
- IP-Adapter权重:0.6-0.8
- InstantID噪声注入:35%注入负向
完整节点配置请查看。
references/workflows.mdPattern 2: LoRA + Identity Methods (Maximum Consistency)
模式2:LoRA + 身份保留方案(最高一致性)
Best for: Production work, character series, video generation base
Train LoRA → Load LoRA + Checkpoint → Add InstantID/PuLID → Generate → FaceDetailer → ReActor (optional) → UpscaleTraining requirements:
- 15-30 images, varied poses/expressions/lighting
- Unique trigger word (e.g., "sage_character")
- See for full parameters
references/lora-training.md
最佳适用场景:生产级工作、角色系列创作、视频生成基底
Train LoRA → Load LoRA + Checkpoint → Add InstantID/PuLID → Generate → FaceDetailer → ReActor (可选) → Upscale训练要求:
- 15-30张图像,包含不同姿势/表情/光照条件
- 唯一触发词(例如"sage_character")
- 完整参数请查看
references/lora-training.md
Pattern 3: Video Generation Pipeline
模式3:视频生成流水线
Best for: Talking heads, character animation, promotional content
Generate/Load Hero Image → Wan 2.1 I2V OR AnimateDiff → FaceDetailer per frame → Frame Interpolation → Video CombineModel selection:
- Wan 2.1 14B: Best quality, 24GB+ VRAM, slower
- Wan 2.1 1.3B: 8GB VRAM, good quality, faster
- AnimateDiff Lightning: Fastest, best for iteration
最佳适用场景:数字人说话头、角色动画、宣传内容
Generate/Load Hero Image → Wan 2.1 I2V OR AnimateDiff → FaceDetailer per frame → Frame Interpolation → Video Combine模型选择:
- Wan 2.1 14B:画质最优,需24GB+显存,速度较慢
- Wan 2.1 1.3B:需8GB显存,画质良好,速度更快
- AnimateDiff Lightning:速度最快,适合迭代调试
Pattern 4: Talking Head with Voice
模式4:带语音的说话头生成
Best for: Character dialogue, presentations, social content
Two approaches available:
Approach 1 (Image → Talking Head):
Character Portrait → Generate Audio → SadTalker/LivePortrait → CodeFormer Enhancement → Final Video
Approach 2 (Video → Add Voice):
Existing Video → Generate Audio → Wav2Lip Lip-Sync → CodeFormer Enhancement → Final VideoSee for complete workflows and for voice creation options.
references/talking-head-workflows.mdreferences/voice-synthesis.md最佳适用场景:角色对话、演示内容、社交内容
提供两种实现方案:
方案1(图像→说话头):
Character Portrait → Generate Audio → SadTalker/LivePortrait → CodeFormer Enhancement → Final Video
方案2(视频→添加语音):
Existing Video → Generate Audio → Wav2Lip Lip-Sync → CodeFormer Enhancement → Final Video完整工作流请查看,语音创建选项请查看。
references/talking-head-workflows.mdreferences/voice-synthesis.mdModel Recommendations (2026 Updated)
模型推荐(2026年更新)
Image Generation
图像生成
| Use Case | Model | Notes |
|---|---|---|
| Best photorealism | FLUX.1-dev | Slow but superior quality |
| Multi-reference consistency | FLUX.2 | NEW 2026: Up to 10 ref images, strong identity preservation |
| Fast iteration | RealVisXL V5.0 | Good balance speed/quality |
| Character editing | FLUX Kontext | Context-aware, maintains consistency across edits |
| Iterative refinement | FLUX Kontext Pro/Max | 8x faster than GPT-Image (API) |
| 适用场景 | 模型 | 备注 |
|---|---|---|
| 最佳写实效果 | FLUX.1-dev | 速度较慢但画质更优 |
| 多参考一致性 | FLUX.2 | 2026年新特性:最多支持10张参考图,身份保留能力强 |
| 快速迭代 | RealVisXL V5.0 | 速度和画质的均衡之选 |
| 角色编辑 | FLUX Kontext | 上下文感知,编辑过程中保持一致性 |
| 迭代优化 | FLUX Kontext Pro/Max | 比GPT-Image(API)快8倍 |
Identity Preservation (2026 State-of-Art)
身份保留(2026年最新SOTA)
| Method | Best For | VRAM | Notes |
|---|---|---|---|
| FLUX.2 | Multi-reference consistency | 24GB+ | NEW 2026: Up to 10 ref images, branded content |
| InfiniteYou | Highest identity match | 24GB | ICCV 2025 Highlight, SIM/AES variants |
| FLUX Kontext | Iterative editing | 12-32GB | Built-in consistency, no retraining |
| PuLID Flux II | Dual characters, no pollution | 24-40GB | Contrastive alignment solves model pollution |
| AuraFace | Commercial identity encoding | 12GB | NEW 2026: Open-source ArcFace alternative |
| InstantID | Style transfer, 3D→realistic | 12GB | Maintenance mode but still excellent |
| IP-Adapter FaceID | Speed, lower VRAM | 6GB+ | Good baseline approach |
| 方案 | 最佳适用场景 | 显存需求 | 备注 |
|---|---|---|---|
| FLUX.2 | 多参考一致性 | 24GB+ | 2026年新特性:最多支持10张参考图,适用于品牌内容创作 |
| InfiniteYou | 最高身份匹配度 | 24GB | ICCV 2025亮点成果,提供SIM/AES变体 |
| FLUX Kontext | 迭代式编辑 | 12-32GB | 内置一致性能力,无需重训 |
| PuLID Flux II | 双角色场景,无模型污染 | 24-40GB | 对比对齐技术解决模型污染问题 |
| AuraFace | 商用身份编码 | 12GB | 2026年新特性:开源的ArcFace替代方案 |
| InstantID | 风格迁移、3D转写实 | 12GB | 已进入维护模式但效果依然出色 |
| IP-Adapter FaceID | 速度优先、低显存场景 | 6GB+ | 不错的基线方案 |
Video Generation
视频生成
| Model | Quality | Speed | VRAM | Notes |
|---|---|---|---|---|
| LTX-2 | ★★★★★ | Medium | 16GB+ | NEW 2026: First open-source 4K audio+video, production-ready |
| Wan 2.2 MoE | ★★★★★ | Slow | 24GB+ | Film-level aesthetics, first+last frame control |
| FramePack | ★★★★★ | Medium | 6GB | 60-sec videos, VRAM-invariant breakthrough |
| Wan 2.1 1.3B | ★★★★ | Medium | 8GB+ | Consumer-friendly |
| AnimateDiff V3 | ★★★ | Fast | 8GB | Motion/camera LoRAs, infinite length |
| 模型 | 画质 | 速度 | 显存需求 | 备注 |
|---|---|---|---|---|
| LTX-2 | ★★★★★ | 中等 | 16GB+ | 2026年新特性:首个开源4K音视频方案,生产级可用 |
| Wan 2.2 MoE | ★★★★★ | 慢 | 24GB+ | 电影级美学效果,支持首尾帧控制 |
| FramePack | ★★★★★ | 中等 | 6GB | 可生成60秒视频,显存无关技术突破 |
| Wan 2.1 1.3B | ★★★★ | 中等 | 8GB+ | 消费级硬件友好 |
| AnimateDiff V3 | ★★★ | 快 | 8GB | 支持运动/相机LoRA,可生成无限长度 |
Voice/TTS
语音/TTS
| Tool | License | Quality | Features |
|---|---|---|---|
| TTS Audio Suite | Multi | ★★★★★ | Unified platform, 23 languages, emotion control |
| F5-TTS | MIT | ★★★★ | Zero-shot from <15 sec samples, Cross-Lingual 2026 |
| Chatterbox | MIT | ★★★★★ | Paralinguistic tags ( |
| IndexTTS-2 | MIT | ★★★★ | 8-emotion vector control |
| ElevenLabs | Commercial | ★★★★★ | Production quality (API) |
| 工具 | 协议 | 画质 | 特性 |
|---|---|---|---|
| TTS Audio Suite | 多协议 | ★★★★★ | 统一平台,支持23种语言,情绪控制 |
| F5-TTS | MIT | ★★★★ | 少于15秒样本即可零样本生成,2026年支持跨语言 |
| Chatterbox | MIT | ★★★★★ | 支持副语言标签( |
| IndexTTS-2 | MIT | ★★★★ | 8种情绪向量控制 |
| ElevenLabs | 商用 | ★★★★★ | 生产级音质(API) |
Essential Custom Nodes
必备自定义节点
Install via ComfyUI-Manager:
ComfyUI-Manager # Must install first
ComfyUI_IPAdapter_plus # IP-Adapter and FaceID
ComfyUI_InstantID # InstantID workflow
ComfyUI-Impact-Pack # FaceDetailer
ComfyUI-ReActor # Face swapping
ComfyUI-AnimateDiff-Evolved # Video generation
ComfyUI-VideoHelperSuite # Video I/O
comfyui_controlnet_aux # Pose/depth preprocessors
ComfyUI_UltimateSDUpscale # Tiled upscaling
ComfyUI-Frame-Interpolation # Smooth video通过ComfyUI-Manager安装:
ComfyUI-Manager # 必须最先安装
ComfyUI_IPAdapter_plus # IP-Adapter和FaceID支持
ComfyUI_InstantID # InstantID工作流支持
ComfyUI-Impact-Pack # FaceDetailer支持
ComfyUI-ReActor # 换脸支持
ComfyUI-AnimateDiff-Evolved # 视频生成支持
ComfyUI-VideoHelperSuite # 视频输入输出支持
comfyui_controlnet_aux # 姿势/深度预处理器
ComfyUI_UltimateSDUpscale # 分块超分支持
ComfyUI-Frame-Interpolation # 视频插帧支持RTX 50 Series Optimization (NEW 2026)
RTX 50系列优化(2026年新特性)
With 32GB VRAM on RTX 5090, run most workflows without optimization. ComfyUI v0.8.1 adds major RTX 50 Series enhancements:
Launch flags: --highvram --fp8_e4m3fn-unetNEW v0.8.1 Features:
- NVFP4/NVFP8 precision formats: 3x faster performance, 60% VRAM reduction on RTX 50 Series
- Weight streaming: Uses system RAM when VRAM exhausted, enables larger models on mid-range GPUs
- Enable tiled VAE for 8K+ upscaling
- Batch 4× 1024×1024 generations in parallel
- Run Wan 2.2 14B + LTX-2 natively
- Use FP8 quantization for FLUX (50% VRAM reduction)
搭配RTX 5090的32GB显存,无需优化即可运行绝大多数工作流。ComfyUI v0.8.1新增了大量RTX 50系列增强特性:
启动参数:--highvram --fp8_e4m3fn-unetv0.8.1新特性:
- NVFP4/NVFP8精度格式:RTX 50系列下性能提升3倍,显存占用降低60%
- 权重流加载:显存不足时使用系统内存,让中端GPU也能运行更大的模型
- 支持分块VAE实现8K+超分
- 可并行批量生成4张1024×1024图像
- 可原生运行Wan 2.2 14B + LTX-2
- 支持FLUX的FP8量化,显存占用降低50%
Workflow Generation Process
工作流生成流程
When building a workflow for a user:
-
Clarify the goal: Image only? Video? With voice? What's the source material?
-
Select the pipeline pattern from above based on requirements
-
Generate the workflow following node configurations in
references/workflows.md -
Include model downloads with exact filenames and paths from
references/models.md -
Provide parameter recommendations specific to their hardware/use case
为用户构建工作流时:
-
明确需求:仅生成图像?还是视频?是否需要语音?源素材是什么?
-
根据需求从上述方案中选择流水线模式
-
按照中的节点配置生成工作流
references/workflows.md -
包含模型下载链接,提供中的精确文件名和路径
references/models.md -
提供适配用户硬件/使用场景的参数建议
Reference Files
参考文件
- - Latest techniques: InfiniteYou, FLUX Kontext, PuLID Flux II, Wan 2.2 MoE, FramePack, FLUX.2, LTX-2.3, Wan 2.6, Qwen3-TTS, and more
references/research-log.md - - Complete model list with HuggingFace/Civitai links, file paths, and compatibility notes
references/models.md - - Detailed node-by-node workflow templates for each pattern
references/workflows.md - - LoRA training guide with Kohya/AI-Toolkit parameters
references/lora-training.md - - Voice cloning, TTS, and lip-sync pipeline details
references/voice-synthesis.md - - Complete talking head workflows: Image→Talking Head (SadTalker, LivePortrait) and Video→Add Voice (Wav2Lip) with production scripts
references/talking-head-workflows.md - - Update sources, changelog, and user-specific learnings
references/evolution.md
- - 最新技术:InfiniteYou、FLUX Kontext、PuLID Flux II、Wan 2.2 MoE、FramePack、FLUX.2、LTX-2.3、Wan 2.6、Qwen3-TTS等
references/research-log.md - - 完整模型列表,包含HuggingFace/Civitai链接、文件路径和兼容性说明
references/models.md - - 每个模式的逐节点详细工作流模板
references/workflows.md - - LoRA训练指南,包含Kohya/AI-Toolkit参数
references/lora-training.md - - 语音克隆、TTS和口型同步流水线细节
references/voice-synthesis.md - - 完整说话头工作流:图像→说话头(SadTalker、LivePortrait)和视频→添加语音(Wav2Lip),附带生产级脚本
references/talking-head-workflows.md - - 更新来源、变更日志和用户特定经验总结
references/evolution.md
Skill Evolution
技能演进
This skill is designed to evolve. When helping the user:
Before starting a workflow:
- Check if new models have dropped that might be better (search HuggingFace/Civitai if uncertain)
- Consider if user's past successes/failures inform the approach
After completing a workflow:
- Note what worked well or poorly for future reference
- If user discovers better settings, update the relevant reference file
Proactive updates:
- When the user mentions a new model or technique, research and integrate it
- Periodically suggest checking for updates to key dependencies
See for monitoring sources and update protocols.
references/evolution.md本技能设计为可迭代演进。为用户提供帮助时:
启动工作流前:
- 检查是否有新发布的更合适的模型(不确定时可搜索HuggingFace/Civitai)
- 参考用户过往的成功/失败案例调整方案
完成工作流后:
- 记录效果好坏,供未来参考
- 如果用户发现了更优的设置,更新对应的参考文件
主动更新:
- 当用户提到新的模型或技术时,进行调研并集成到方案中
- 定期建议用户更新核心依赖
监控来源和更新协议请查看。
references/evolution.mdExample: 3D Render to Photorealistic Character
示例:3D渲染图转写实角色
For converting stylized 3D renders (like game/VN characters) to photorealistic images:
Recommended approach: InstantID + IP-Adapter FaceID on FLUX
1. Load 3D render reference (best quality, front-facing)
2. Apply InstantID (extracts identity + facial keypoints)
3. Apply IP-Adapter FaceID Plus V2 (weight 0.7)
4. Use FLUX.1-dev checkpoint
5. Prompt: "photorealistic portrait, detailed skin texture, natural lighting, [character description]"
6. CFG: 4-5, Steps: 25-30
7. FaceDetailer pass (denoise 0.35)
8. Upscale with 4x-UltraSharpThis converts the stylized look to photorealism while preserving the core identity features.
将风格化3D渲染图(如游戏/视觉小说角色)转换为写实图像:
推荐方案:基于FLUX的InstantID + IP-Adapter FaceID
1. 加载3D渲染参考图(最高画质、正面朝向最佳)
2. 应用InstantID(提取身份特征+人脸关键点)
3. 应用IP-Adapter FaceID Plus V2(权重0.7)
4. 使用FLUX.1-dev checkpoint
5. 提示词:"photorealistic portrait, detailed skin texture, natural lighting, [角色描述]"
6. CFG:4-5,步骤:25-30
7. FaceDetailer处理(去噪强度0.35)
8. 使用4x-UltraSharp超分该方案可将风格化外观转换为写实效果,同时保留核心身份特征。