comfyui-lora-training
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseComfyUI LoRA Training
ComfyUI LoRA训练
Guide the user through dataset preparation, training configuration, and evaluation for character LoRAs.
指导用户完成角色LoRA的数据集准备、训练配置和效果评估。
When to Train vs Zero-Shot
何时选择训练vs零样本生成
| Scenario | Recommendation |
|---|---|
| Need absolute consistency across many images | Train LoRA |
| Building a character series or ongoing project | Train LoRA |
| Quick one-off generation | Use zero-shot (InstantID/PuLID) |
| Limited references (1-5 images) | Use zero-shot |
| Testing concepts | Use zero-shot first, train if committing |
| 场景 | 推荐方案 |
|---|---|
| 需要在大量图片中保持绝对一致性 | 训练LoRA |
| 搭建角色系列或长期项目 | 训练LoRA |
| 快速一次性生成 | 使用零样本方案(InstantID/PuLID) |
| 参考素材有限(1-5张图) | 使用零样本方案 |
| 测试概念效果 | 优先用零样本测试,确认要落地再训练 |
Training Pipeline
训练流程
1. DATASET PREP
|-- Collect/generate 15-30 reference images
|-- Preprocess (crop, resize, diversify styles)
|-- Caption with trigger word + descriptions
|
2. CONFIGURE TRAINING
|-- Select training tool (Kohya/AI-Toolkit/FluxGym)
|-- Set hyperparameters based on model type
|-- Configure checkpointing
|
3. TRAIN
|-- Monitor loss curve
|-- Save checkpoints every 250-500 steps
|
4. EVALUATE
|-- Test each checkpoint with identical prompts
|-- Check identity accuracy, flexibility, overfitting
|-- Select best checkpoint
|
5. INTEGRATE
|-- Copy to ComfyUI models/loras/
|-- Update character profile with trigger word + strength
|-- Test in full workflow (LoRA + identity method)1. DATASET PREP
|-- 收集/生成15-30张参考图
|-- 预处理(裁剪、调整尺寸、风格多样化)
|-- 用触发词+描述为图片打标注
|
2. CONFIGURE TRAINING
|-- 选择训练工具(Kohya/AI-Toolkit/FluxGym)
|-- 基于模型类型设置超参数
|-- 配置检查点保存规则
|
3. TRAIN
|-- 监控损失曲线
|-- 每250-500步保存一次检查点
|
4. EVALUATE
|-- 用完全相同的提示词测试每个检查点
|-- 检查身份准确率、灵活性、过拟合情况
|-- 选择最优检查点
|
5. INTEGRATE
|-- 复制到ComfyUI的models/loras/目录下
|-- 更新角色档案,记录触发词和最优权重
|-- 在完整工作流中测试(LoRA + 身份识别方法)Dataset Preparation
数据集准备
Image Requirements
图片要求
| Aspect | Minimum | Optimal | Maximum |
|---|---|---|---|
| Count | 10-15 | 20-30 | 50+ |
| Resolution | 512x512 | 1024x1024 | - |
| Format | PNG/high JPEG | PNG | - |
| 维度 | 最低要求 | 最优要求 | 上限 |
|---|---|---|---|
| 数量 | 10-15张 | 20-30张 | 50张以上 |
| 分辨率 | 512x512 | 1024x1024 | - |
| 格式 | PNG/高清JPEG | PNG | - |
Content Diversity Checklist
内容多样性检查清单
- Multiple angles (front, 3/4, profile, back)
- Various expressions (neutral, smile, serious, laugh, etc.)
- Different lighting conditions (studio, natural, dramatic)
- Varied backgrounds (or transparent/solid)
- Multiple outfits/contexts
- Some close-ups, some medium shots
- If from 3D renders: include style variations (see below)
- 多个角度(正面、3/4脸、侧面、背面)
- 多种表情(中性、微笑、严肃、大笑等)
- 不同光照条件(影棚、自然光、戏剧光)
- 多样背景(或透明/纯色背景)
- 多种穿搭/场景
- 包含特写、中景等不同景别
- 如果是3D渲染图:需要包含风格变体(见下文)
Preprocessing 3D Renders
3D渲染图预处理
Problem: Training directly on 3D renders bakes in the "3D" aesthetic.
Solution: Generate style variations first:
- Run each render through img2img with varied style prompts
- Mix: 60% style variations, 40% original renders
- This teaches identity, not style
Style prompts for variation:
"photorealistic portrait, dslr photo"
"oil painting portrait"
"digital illustration"
"pencil sketch"
"watercolor portrait"问题:直接用3D渲染图训练会固化"3D"风格。
解决方案:先生成风格变体:
- 对每张渲染图运行图生图,搭配不同风格提示词
- 混合比例:60%风格变体图 + 40%原始渲染图
- 这样模型会学习身份特征,而非风格特征
用于生成变体的风格提示词:
"photorealistic portrait, dslr photo"
"oil painting portrait"
"digital illustration"
"pencil sketch"
"watercolor portrait"Captioning Rules
标注规则
Trigger word: ALWAYS use a unique token as first word.
- Good: ,
sage_character,ohwx_sagesks_person - Bad: ,
woman,redhead(too generic)character
Caption structure:
{trigger}, {subject type}, {clothing}, {pose}, {setting}, {lighting}, {style}DO NOT describe face features (let the model learn them):
- Bad: "woman with green eyes, freckles, auburn hair, defined cheekbones"
- Good: "sage_character, woman, indoor portrait, wearing blue sweater"
DO describe everything else: clothing, pose, background, lighting, expression.
触发词:永远将唯一的标识词放在标注的第一位。
- 正面示例:、
sage_character、ohwx_sagesks_person - 反面示例:、
woman、redhead(太过通用)character
标注结构:
{trigger}, {主体类型}, {穿搭}, {姿势}, {场景}, {光照}, {风格}不要描述面部特征(让模型自主学习):
- 反面示例:"woman with green eyes, freckles, auburn hair, defined cheekbones"
- 正面示例:"sage_character, woman, indoor portrait, wearing blue sweater"
其他内容都要描述:穿搭、姿势、背景、光照、表情。
Folder Structure
文件夹结构
dataset/{character_name}/{repeats}_{trigger_word}/
001.png + 001.txt
002.png + 002.txt
...Folder naming: = each image repeated 10x per epoch.
10_sage_characterdataset/{character_name}/{repeats}_{trigger_word}/
001.png + 001.txt
002.png + 002.txt
...文件夹命名规则: = 每张图每个epoch重复10次。
10_sage_characterTraining Configurations
训练配置
FLUX LoRA (AI-Toolkit) - Recommended
FLUX LoRA(AI-Toolkit)- 推荐方案
yaml
network:
type: lora
linear: 16 # Rank (16-32 for characters)
linear_alpha: 16 # Alpha = rank for FLUX
train:
batch_size: 1
gradient_accumulation_steps: 4
steps: 1500 # FLUX converges faster
lr: 4e-4 # Higher than SDXL
optimizer: adamw8bit
dtype: bf16
datasets:
- resolution: [1024]
caption_ext: "txt"
sample:
sample_every: 250
prompts:
- "{trigger}, photorealistic portrait"FLUX training notes:
- Converges 2-3x faster than SDXL
- 1000-2000 steps usually sufficient
- Watch for overfitting (quality plateaus early)
- 24GB VRAM for standard, 9GB with NF4 quantization (SimpleTuner)
yaml
network:
type: lora
linear: 16 # Rank (角色训练推荐16-32)
linear_alpha: 16 # FLUX场景下Alpha等于rank即可
train:
batch_size: 1
gradient_accumulation_steps: 4
steps: 1500 # FLUX收敛速度更快
lr: 4e-4 # 比SDXL的学习率更高
optimizer: adamw8bit
dtype: bf16
datasets:
- resolution: [1024]
caption_ext: "txt"
sample:
sample_every: 250
prompts:
- "{trigger}, photorealistic portrait"FLUX训练注意事项:
- 收敛速度比SDXL快2-3倍
- 通常1000-2000步就足够
- 注意过拟合(生成质量会较早进入平台期)
- 标准训练需要24GB显存,使用SimpleTuner的NF4量化仅需9GB显存
SDXL LoRA (Kohya_ss) - Proven
SDXL LoRA(Kohya_ss)- 成熟方案
yaml
pretrained_model: "RealVisXL_V5.0.safetensors"
network_dim: 32 # Rank (16-64)
network_alpha: 16 # Usually dim/2
resolution: "1024,1024"
train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 0.0001 # 1e-4
lr_scheduler: "cosine_with_restarts"
lr_scheduler_num_cycles: 3
max_train_epochs: 10
optimizer_type: "AdamW8bit"
mixed_precision: "bf16"
enable_bucket: true
min_snr_gamma: 5Step calculation:
total_steps = (images x repeats x epochs) / batch_size
Target: 1500-3000 steps for SDXL
Example: 20 images x 10 repeats x 5 epochs / 1 = 1000 stepsyaml
pretrained_model: "RealVisXL_V5.0.safetensors"
network_dim: 32 # Rank (16-64)
network_alpha: 16 # 通常为dim的一半
resolution: "1024,1024"
train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 0.0001 # 1e-4
lr_scheduler: "cosine_with_restarts"
lr_scheduler_num_cycles: 3
max_train_epochs: 10
optimizer_type: "AdamW8bit"
mixed_precision: "bf16"
enable_bucket: true
min_snr_gamma: 5步数计算方法:
总步数 = (图片数 × 重复次数 × epoch数) / 批次大小
SDXL目标步数:1500-3000步
示例:20张图 × 10次重复 × 5个epoch / 1批次 = 1000步Low VRAM Training (FluxGym / SimpleTuner)
低显存训练方案(FluxGym / SimpleTuner)
For 12-16GB VRAM:
yaml
use_8bit_adam: true
gradient_checkpointing: true
cache_latents_to_disk: true
max_data_loader_n_workers: 0
train_batch_size: 1
gradient_accumulation_steps: 8
quantize_base_model: nf4 # SimpleTuner only适用于12-16GB显存:
yaml
use_8bit_adam: true
gradient_checkpointing: true
cache_latents_to_disk: true
max_data_loader_n_workers: 0
train_batch_size: 1
gradient_accumulation_steps: 8
quantize_base_model: nf4 # 仅SimpleTuner支持Evaluation Protocol
评估规范
Test Each Checkpoint
测试每个检查点
Use identical prompts across all checkpoints:
Prompt 1: "{trigger}, photorealistic portrait, neutral expression"
Prompt 2: "{trigger}, photorealistic portrait, smiling, outdoor"
Prompt 3: "{trigger}, wearing formal suit, standing, office"
Prompt 4: "a person standing in a park" (WITHOUT trigger - should NOT produce character)所有检查点都使用完全相同的提示词测试:
提示词1: "{trigger}, photorealistic portrait, neutral expression"
提示词2: "{trigger}, photorealistic portrait, smiling, outdoor"
提示词3: "{trigger}, wearing formal suit, standing, office"
提示词4: "a person standing in a park" (不含触发词 - 不应该生成目标角色)Quality Indicators
质量判定标准
Good training:
- Character recognizable from trigger word alone
- Responds to different prompts/contexts
- Doesn't always produce same pose/expression
- Prompt 4 does NOT produce the character
Overfitting signs:
- Same exact pose/expression regardless of prompt
- Training backgrounds appearing in outputs
- Ignores clothing/setting prompts
- Prompt 4 produces the character (too strong)
训练效果好的特征:
- 仅靠触发词就能识别出目标角色
- 能响应不同的提示词/场景要求
- 不会每次都生成相同的姿势/表情
- 提示词4不会生成目标角色
过拟合征兆:
- 无论提示词是什么,都生成完全相同的姿势/表情
- 训练数据里的背景出现在生成结果中
- 无视穿搭/场景相关的提示词
- 提示词4也会生成目标角色(权重过强)
Best Epoch Selection
最优epoch选择
If using sample_every: 250 with 1500 steps:
- Checkpoint 250: Usually underfit
- Checkpoint 500-750: Often sweet spot for FLUX
- Checkpoint 1000-1500: May be overfitting
Compare visually and select the checkpoint with best identity + prompt flexibility balance.
如果设置每250步采样一次,总步数1500:
- 250步检查点:通常欠拟合
- 500-750步检查点:通常是FLUX的最优区间
- 1000-1500步检查点:可能出现过拟合
肉眼对比,选择身份准确率和提示词灵活性平衡得最好的检查点。
Post-Training Integration
训练后集成
- Copy best checkpoint to
{ComfyUI}/models/loras/ - Update character profile:
yaml
lora: trained: true model_file: "sage_character_flux.safetensors" trigger_word: "sage_character" best_strength: 0.8 - Test in full workflow: LoRA (0.7-0.9) + PuLID/IP-Adapter (0.5-0.7)
- Record successful settings in character's
generation_history
- 将最优检查点复制到目录
{ComfyUI}/models/loras/ - 更新角色档案:
yaml
lora: trained: true model_file: "sage_character_flux.safetensors" trigger_word: "sage_character" best_strength: 0.8 - 在完整工作流中测试:LoRA(权重0.7-0.9) + PuLID/IP-Adapter(权重0.5-0.7)
- 将成功的配置记录到角色的中
generation_history
Combining LoRA with Zero-Shot Methods
LoRA与零样本方法结合
Best practice: LoRA as base identity, zero-shot for enhancement.
[Load Checkpoint] → [Load LoRA (0.7-0.9)] → [Apply PuLID/IP-Adapter (0.5-0.7)] → [Generate]Lower weights on both prevents conflict while reinforcing identity.
最佳实践:以LoRA作为基础身份标识,零样本方法用于增强效果。
[加载检查点] → [加载LoRA(0.7-0.9)] → [应用PuLID/IP-Adapter(0.5-0.7)] → [生成]两者都使用较低权重可以避免冲突,同时强化身份特征。
Troubleshooting
问题排查
| Issue | Solution |
|---|---|
| LoRA not activating | Check trigger word spelling, ensure loaded before KSampler |
| Identity drift at angles | Add more angle variety to dataset, reduce network_dim |
| Overfitting | Reduce epochs, increase dataset, lower network_dim |
| Style contamination | Better caption diversity, don't describe style in captions |
| Poor quality/artifacts | Check training images for compression, reduce LR |
| 问题 | 解决方案 |
|---|---|
| LoRA未激活 | 检查触发词拼写,确保在KSampler之前加载LoRA |
| 侧面角度身份偏移 | 为数据集添加更多角度的素材,降低network_dim |
| 过拟合 | 减少epoch数,扩充数据集,降低network_dim |
| 风格污染 | 提升标注多样性,不要在标注里描述风格 |
| 生成质量差/有伪影 | 检查训练图片是否有压缩问题,降低学习率 |
Reference
参考资料
- - Full parameter reference
references/lora-training.md - - Training tool download links
references/models.md - Character profiles in for trigger words and reference images
projects/
- - 完整参数参考
references/lora-training.md - - 训练工具下载链接
references/models.md - 目录下的角色档案可查看触发词和参考图片
projects/