comfyui-character-gen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ComfyUI Character Generation Expert

ComfyUI角色生成专家

Build production-ready ComfyUI workflows for consistent character generation across image, video, and voice modalities.

构建可在图像、视频、语音模态下生成一致性角色的生产级ComfyUI工作流。

Quick Decision: Which Approach?

快速决策：该选哪种方案？

Starting from reference images (like 3D renders)? → InfiniteYou (state-of-the-art 2025) or InstantID + IP-Adapter (proven, lower VRAM)

Need highest identity fidelity? → FLUX.2 (NEW 2026: up to 10 ref images) or PuLID Flux II (no model pollution)

Want iterative editing without retraining? → FLUX Kontext (context-aware, maintains consistency across edits)

Creating video content? → LTX-2 (NEW 2026: 4K production-ready), Wan 2.2 MoE (film-level), or FramePack (60-sec on 6GB!)

Need voice for character? → TTS Audio Suite (unified platform, 23 languages) or F5-TTS Cross-Lingual (NEW 2026)

从参考图像（如3D渲染图）开始生成？ → InfiniteYou（2025年最新SOTA方案）或InstantID + IP-Adapter（经过验证，显存需求更低）

需要最高的身份保真度？ → FLUX.2（2026年新特性：最多支持10张参考图）或 PuLID Flux II（无模型污染问题）

想要无需重训的迭代式编辑能力？ → FLUX Kontext（上下文感知，编辑过程中保持一致性）

要创作视频内容？ → LTX-2（2026年新特性：4K生产级可用）、Wan 2.2 MoE（电影级画质）或 FramePack（6GB显存即可生成60秒视频！）

需要为角色生成语音？ → TTS Audio Suite（统一平台，支持23种语言）或 F5-TTS Cross-Lingual（2026年新特性）

Core Workflow Patterns

核心工作流模式

Pattern 1: Zero-Shot Character Generation (No Training)

模式1：零样本角色生成（无需训练）

Best for: Quick iteration, 3D-to-photorealism conversion, limited reference images

Load Reference Face → InstantID + IP-Adapter FaceID → ControlNet Pose → KSampler → FaceDetailer → Upscale

Critical settings:

CFG: 4-5 (prevents burning with InstantID)
Resolution: 1016×1016 (avoids watermark artifacts)
IP-Adapter weight: 0.6-0.8
InstantID noise injection: 35% to negative

See

references/workflows.md

for complete node configurations.

最佳适用场景：快速迭代、3D转写实转换、参考图像有限

Load Reference Face → InstantID + IP-Adapter FaceID → ControlNet Pose → KSampler → FaceDetailer → Upscale

关键设置：

CFG：4-5（避免InstantID出现过曝问题）
分辨率：1016×1016（避免水印伪影）
IP-Adapter权重：0.6-0.8
InstantID噪声注入：35%注入负向

完整节点配置请查看

references/workflows.md

。

Pattern 2: LoRA + Identity Methods (Maximum Consistency)

模式2：LoRA + 身份保留方案（最高一致性）

Best for: Production work, character series, video generation base

Train LoRA → Load LoRA + Checkpoint → Add InstantID/PuLID → Generate → FaceDetailer → ReActor (optional) → Upscale

Training requirements:

15-30 images, varied poses/expressions/lighting
Unique trigger word (e.g., "sage_character")
See
```
references/lora-training.md
```
for full parameters

最佳适用场景：生产级工作、角色系列创作、视频生成基底

Train LoRA → Load LoRA + Checkpoint → Add InstantID/PuLID → Generate → FaceDetailer → ReActor (可选) → Upscale

训练要求：

15-30张图像，包含不同姿势/表情/光照条件
唯一触发词（例如"sage_character"）
完整参数请查看
```
references/lora-training.md
```

Pattern 3: Video Generation Pipeline

模式3：视频生成流水线

Best for: Talking heads, character animation, promotional content

Generate/Load Hero Image → Wan 2.1 I2V OR AnimateDiff → FaceDetailer per frame → Frame Interpolation → Video Combine

Model selection:

Wan 2.1 14B: Best quality, 24GB+ VRAM, slower
Wan 2.1 1.3B: 8GB VRAM, good quality, faster
AnimateDiff Lightning: Fastest, best for iteration

最佳适用场景：数字人说话头、角色动画、宣传内容

Generate/Load Hero Image → Wan 2.1 I2V OR AnimateDiff → FaceDetailer per frame → Frame Interpolation → Video Combine

模型选择：

Wan 2.1 14B：画质最优，需24GB+显存，速度较慢
Wan 2.1 1.3B：需8GB显存，画质良好，速度更快
AnimateDiff Lightning：速度最快，适合迭代调试

Pattern 4: Talking Head with Voice

模式4：带语音的说话头生成

Best for: Character dialogue, presentations, social content

Two approaches available:

Approach 1 (Image → Talking Head):
Character Portrait → Generate Audio → SadTalker/LivePortrait → CodeFormer Enhancement → Final Video

Approach 2 (Video → Add Voice):
Existing Video → Generate Audio → Wav2Lip Lip-Sync → CodeFormer Enhancement → Final Video

See

references/talking-head-workflows.md

for complete workflows and

references/voice-synthesis.md

for voice creation options.

最佳适用场景：角色对话、演示内容、社交内容

提供两种实现方案：

方案1（图像→说话头）：
Character Portrait → Generate Audio → SadTalker/LivePortrait → CodeFormer Enhancement → Final Video

方案2（视频→添加语音）：
Existing Video → Generate Audio → Wav2Lip Lip-Sync → CodeFormer Enhancement → Final Video

完整工作流请查看

references/talking-head-workflows.md

，语音创建选项请查看

references/voice-synthesis.md

。

Model Recommendations (2026 Updated)

模型推荐（2026年更新）

Image Generation

图像生成

Use Case	Model	Notes
Best photorealism	FLUX.1-dev	Slow but superior quality
Multi-reference consistency	FLUX.2	NEW 2026: Up to 10 ref images, strong identity preservation
Fast iteration	RealVisXL V5.0	Good balance speed/quality
Character editing	FLUX Kontext	Context-aware, maintains consistency across edits
Iterative refinement	FLUX Kontext Pro/Max	8x faster than GPT-Image (API)

适用场景	模型	备注
最佳写实效果	FLUX.1-dev	速度较慢但画质更优
多参考一致性	FLUX.2	2026年新特性：最多支持10张参考图，身份保留能力强
快速迭代	RealVisXL V5.0	速度和画质的均衡之选
角色编辑	FLUX Kontext	上下文感知，编辑过程中保持一致性
迭代优化	FLUX Kontext Pro/Max	比GPT-Image（API）快8倍

Identity Preservation (2026 State-of-Art)

身份保留（2026年最新SOTA）

Method	Best For	VRAM	Notes
FLUX.2	Multi-reference consistency	24GB+	NEW 2026: Up to 10 ref images, branded content
InfiniteYou	Highest identity match	24GB	ICCV 2025 Highlight, SIM/AES variants
FLUX Kontext	Iterative editing	12-32GB	Built-in consistency, no retraining
PuLID Flux II	Dual characters, no pollution	24-40GB	Contrastive alignment solves model pollution
AuraFace	Commercial identity encoding	12GB	NEW 2026: Open-source ArcFace alternative
InstantID	Style transfer, 3D→realistic	12GB	Maintenance mode but still excellent
IP-Adapter FaceID	Speed, lower VRAM	6GB+	Good baseline approach

方案	最佳适用场景	显存需求	备注
FLUX.2	多参考一致性	24GB+	2026年新特性：最多支持10张参考图，适用于品牌内容创作
InfiniteYou	最高身份匹配度	24GB	ICCV 2025亮点成果，提供SIM/AES变体
FLUX Kontext	迭代式编辑	12-32GB	内置一致性能力，无需重训
PuLID Flux II	双角色场景，无模型污染	24-40GB	对比对齐技术解决模型污染问题
AuraFace	商用身份编码	12GB	2026年新特性：开源的ArcFace替代方案
InstantID	风格迁移、3D转写实	12GB	已进入维护模式但效果依然出色
IP-Adapter FaceID	速度优先、低显存场景	6GB+	不错的基线方案

Video Generation

视频生成

Model	Quality	Speed	VRAM	Notes
LTX-2	★★★★★	Medium	16GB+	NEW 2026: First open-source 4K audio+video, production-ready
Wan 2.2 MoE	★★★★★	Slow	24GB+	Film-level aesthetics, first+last frame control
FramePack	★★★★★	Medium	6GB	60-sec videos, VRAM-invariant breakthrough
Wan 2.1 1.3B	★★★★	Medium	8GB+	Consumer-friendly
AnimateDiff V3	★★★	Fast	8GB	Motion/camera LoRAs, infinite length

模型	画质	速度	显存需求	备注
LTX-2	★★★★★	中等	16GB+	2026年新特性：首个开源4K音视频方案，生产级可用
Wan 2.2 MoE	★★★★★	慢	24GB+	电影级美学效果，支持首尾帧控制
FramePack	★★★★★	中等	6GB	可生成60秒视频，显存无关技术突破
Wan 2.1 1.3B	★★★★	中等	8GB+	消费级硬件友好
AnimateDiff V3	★★★	快	8GB	支持运动/相机LoRA，可生成无限长度

Voice/TTS

语音/TTS

Tool	License	Quality	Features
TTS Audio Suite	Multi	★★★★★	Unified platform, 23 languages, emotion control
F5-TTS	MIT	★★★★	Zero-shot from <15 sec samples, Cross-Lingual 2026
Chatterbox	MIT	★★★★★	Paralinguistic tags ( `[laugh]` , `[sigh]` ), 4 voices
IndexTTS-2	MIT	★★★★	8-emotion vector control
ElevenLabs	Commercial	★★★★★	Production quality (API)

工具	协议	画质	特性
TTS Audio Suite	多协议	★★★★★	统一平台，支持23种语言，情绪控制
F5-TTS	MIT	★★★★	少于15秒样本即可零样本生成，2026年支持跨语言
Chatterbox	MIT	★★★★★	支持副语言标签（ `[laugh]` 、 `[sigh]` ），4种音色
IndexTTS-2	MIT	★★★★	8种情绪向量控制
ElevenLabs	商用	★★★★★	生产级音质（API）

Essential Custom Nodes

必备自定义节点

Install via ComfyUI-Manager:

ComfyUI-Manager              # Must install first
ComfyUI_IPAdapter_plus       # IP-Adapter and FaceID
ComfyUI_InstantID            # InstantID workflow
ComfyUI-Impact-Pack          # FaceDetailer
ComfyUI-ReActor              # Face swapping
ComfyUI-AnimateDiff-Evolved  # Video generation
ComfyUI-VideoHelperSuite     # Video I/O
comfyui_controlnet_aux       # Pose/depth preprocessors
ComfyUI_UltimateSDUpscale    # Tiled upscaling
ComfyUI-Frame-Interpolation  # Smooth video

通过ComfyUI-Manager安装：

ComfyUI-Manager              # 必须最先安装
ComfyUI_IPAdapter_plus       # IP-Adapter和FaceID支持
ComfyUI_InstantID            # InstantID工作流支持
ComfyUI-Impact-Pack          # FaceDetailer支持
ComfyUI-ReActor              # 换脸支持
ComfyUI-AnimateDiff-Evolved  # 视频生成支持
ComfyUI-VideoHelperSuite     # 视频输入输出支持
comfyui_controlnet_aux       # 姿势/深度预处理器
ComfyUI_UltimateSDUpscale    # 分块超分支持
ComfyUI-Frame-Interpolation  # 视频插帧支持

RTX 50 Series Optimization (NEW 2026)

RTX 50系列优化（2026年新特性）

With 32GB VRAM on RTX 5090, run most workflows without optimization. ComfyUI v0.8.1 adds major RTX 50 Series enhancements:

Launch flags: --highvram --fp8_e4m3fn-unet

NEW v0.8.1 Features:

NVFP4/NVFP8 precision formats: 3x faster performance, 60% VRAM reduction on RTX 50 Series
Weight streaming: Uses system RAM when VRAM exhausted, enables larger models on mid-range GPUs
Enable tiled VAE for 8K+ upscaling
Batch 4× 1024×1024 generations in parallel
Run Wan 2.2 14B + LTX-2 natively
Use FP8 quantization for FLUX (50% VRAM reduction)

搭配RTX 5090的32GB显存，无需优化即可运行绝大多数工作流。ComfyUI v0.8.1新增了大量RTX 50系列增强特性：

启动参数：--highvram --fp8_e4m3fn-unet

v0.8.1新特性：

NVFP4/NVFP8精度格式：RTX 50系列下性能提升3倍，显存占用降低60%
权重流加载：显存不足时使用系统内存，让中端GPU也能运行更大的模型
支持分块VAE实现8K+超分
可并行批量生成4张1024×1024图像
可原生运行Wan 2.2 14B + LTX-2
支持FLUX的FP8量化，显存占用降低50%

Workflow Generation Process

工作流生成流程

When building a workflow for a user:

Clarify the goal: Image only? Video? With voice? What's the source material?
Select the pipeline pattern from above based on requirements
Generate the workflow following node configurations in
```
references/workflows.md
```
Include model downloads with exact filenames and paths from
```
references/models.md
```
Provide parameter recommendations specific to their hardware/use case

为用户构建工作流时：

明确需求：仅生成图像？还是视频？是否需要语音？源素材是什么？
根据需求从上述方案中选择流水线模式
按照
references/workflows.md
中的节点配置生成工作流
包含模型下载链接，提供
```
references/models.md
```
中的精确文件名和路径
提供适配用户硬件/使用场景的参数建议

Reference Files

参考文件

```
references/research-log.md
```
- Latest techniques: InfiniteYou, FLUX Kontext, PuLID Flux II, Wan 2.2 MoE, FramePack, FLUX.2, LTX-2.3, Wan 2.6, Qwen3-TTS, and more
```
references/models.md
```
- Complete model list with HuggingFace/Civitai links, file paths, and compatibility notes
```
references/workflows.md
```
- Detailed node-by-node workflow templates for each pattern
```
references/lora-training.md
```
- LoRA training guide with Kohya/AI-Toolkit parameters
```
references/voice-synthesis.md
```
- Voice cloning, TTS, and lip-sync pipeline details
```
references/talking-head-workflows.md
```
- Complete talking head workflows: Image→Talking Head (SadTalker, LivePortrait) and Video→Add Voice (Wav2Lip) with production scripts
```
references/evolution.md
```
- Update sources, changelog, and user-specific learnings

```
references/research-log.md
```
- 最新技术：InfiniteYou、FLUX Kontext、PuLID Flux II、Wan 2.2 MoE、FramePack、FLUX.2、LTX-2.3、Wan 2.6、Qwen3-TTS等
```
references/models.md
```
- 完整模型列表，包含HuggingFace/Civitai链接、文件路径和兼容性说明
```
references/workflows.md
```
- 每个模式的逐节点详细工作流模板
```
references/lora-training.md
```
- LoRA训练指南，包含Kohya/AI-Toolkit参数
```
references/voice-synthesis.md
```
- 语音克隆、TTS和口型同步流水线细节
```
references/talking-head-workflows.md
```
- 完整说话头工作流：图像→说话头（SadTalker、LivePortrait）和视频→添加语音（Wav2Lip），附带生产级脚本
```
references/evolution.md
```
- 更新来源、变更日志和用户特定经验总结

Skill Evolution

技能演进

This skill is designed to evolve. When helping the user:

Before starting a workflow:

Check if new models have dropped that might be better (search HuggingFace/Civitai if uncertain)
Consider if user's past successes/failures inform the approach

After completing a workflow:

Note what worked well or poorly for future reference
If user discovers better settings, update the relevant reference file

Proactive updates:

When the user mentions a new model or technique, research and integrate it
Periodically suggest checking for updates to key dependencies

See

references/evolution.md

for monitoring sources and update protocols.

本技能设计为可迭代演进。为用户提供帮助时：

启动工作流前：

检查是否有新发布的更合适的模型（不确定时可搜索HuggingFace/Civitai）
参考用户过往的成功/失败案例调整方案

完成工作流后：

记录效果好坏，供未来参考
如果用户发现了更优的设置，更新对应的参考文件

主动更新：

当用户提到新的模型或技术时，进行调研并集成到方案中
定期建议用户更新核心依赖

监控来源和更新协议请查看

references/evolution.md

。

Example: 3D Render to Photorealistic Character

示例：3D渲染图转写实角色

For converting stylized 3D renders (like game/VN characters) to photorealistic images:

Recommended approach: InstantID + IP-Adapter FaceID on FLUX

1. Load 3D render reference (best quality, front-facing)
2. Apply InstantID (extracts identity + facial keypoints)
3. Apply IP-Adapter FaceID Plus V2 (weight 0.7)
4. Use FLUX.1-dev checkpoint
5. Prompt: "photorealistic portrait, detailed skin texture, natural lighting, [character description]"
6. CFG: 4-5, Steps: 25-30
7. FaceDetailer pass (denoise 0.35)
8. Upscale with 4x-UltraSharp

This converts the stylized look to photorealism while preserving the core identity features.

将风格化3D渲染图（如游戏/视觉小说角色）转换为写实图像：

推荐方案：基于FLUX的InstantID + IP-Adapter FaceID

1. 加载3D渲染参考图（最高画质、正面朝向最佳）
2. 应用InstantID（提取身份特征+人脸关键点）
3. 应用IP-Adapter FaceID Plus V2（权重0.7）
4. 使用FLUX.1-dev checkpoint
5. 提示词："photorealistic portrait, detailed skin texture, natural lighting, [角色描述]"
6. CFG：4-5，步骤：25-30
7. FaceDetailer处理（去噪强度0.35）
8. 使用4x-UltraSharp超分

该方案可将风格化外观转换为写实效果，同时保留核心身份特征。