comfyui-lora-training

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ComfyUI LoRA Training

ComfyUI LoRA训练

Guide the user through dataset preparation, training configuration, and evaluation for character LoRAs.
指导用户完成角色LoRA的数据集准备、训练配置和效果评估。

When to Train vs Zero-Shot

何时选择训练vs零样本生成

ScenarioRecommendation
Need absolute consistency across many imagesTrain LoRA
Building a character series or ongoing projectTrain LoRA
Quick one-off generationUse zero-shot (InstantID/PuLID)
Limited references (1-5 images)Use zero-shot
Testing conceptsUse zero-shot first, train if committing
场景推荐方案
需要在大量图片中保持绝对一致性训练LoRA
搭建角色系列或长期项目训练LoRA
快速一次性生成使用零样本方案(InstantID/PuLID)
参考素材有限(1-5张图)使用零样本方案
测试概念效果优先用零样本测试,确认要落地再训练

Training Pipeline

训练流程

1. DATASET PREP
   |-- Collect/generate 15-30 reference images
   |-- Preprocess (crop, resize, diversify styles)
   |-- Caption with trigger word + descriptions
   |
2. CONFIGURE TRAINING
   |-- Select training tool (Kohya/AI-Toolkit/FluxGym)
   |-- Set hyperparameters based on model type
   |-- Configure checkpointing
   |
3. TRAIN
   |-- Monitor loss curve
   |-- Save checkpoints every 250-500 steps
   |
4. EVALUATE
   |-- Test each checkpoint with identical prompts
   |-- Check identity accuracy, flexibility, overfitting
   |-- Select best checkpoint
   |
5. INTEGRATE
   |-- Copy to ComfyUI models/loras/
   |-- Update character profile with trigger word + strength
   |-- Test in full workflow (LoRA + identity method)
1. DATASET PREP
   |-- 收集/生成15-30张参考图
   |-- 预处理(裁剪、调整尺寸、风格多样化)
   |-- 用触发词+描述为图片打标注
   |
2. CONFIGURE TRAINING
   |-- 选择训练工具(Kohya/AI-Toolkit/FluxGym)
   |-- 基于模型类型设置超参数
   |-- 配置检查点保存规则
   |
3. TRAIN
   |-- 监控损失曲线
   |-- 每250-500步保存一次检查点
   |
4. EVALUATE
   |-- 用完全相同的提示词测试每个检查点
   |-- 检查身份准确率、灵活性、过拟合情况
   |-- 选择最优检查点
   |
5. INTEGRATE
   |-- 复制到ComfyUI的models/loras/目录下
   |-- 更新角色档案,记录触发词和最优权重
   |-- 在完整工作流中测试(LoRA + 身份识别方法)

Dataset Preparation

数据集准备

Image Requirements

图片要求

AspectMinimumOptimalMaximum
Count10-1520-3050+
Resolution512x5121024x1024-
FormatPNG/high JPEGPNG-
维度最低要求最优要求上限
数量10-15张20-30张50张以上
分辨率512x5121024x1024-
格式PNG/高清JPEGPNG-

Content Diversity Checklist

内容多样性检查清单

  • Multiple angles (front, 3/4, profile, back)
  • Various expressions (neutral, smile, serious, laugh, etc.)
  • Different lighting conditions (studio, natural, dramatic)
  • Varied backgrounds (or transparent/solid)
  • Multiple outfits/contexts
  • Some close-ups, some medium shots
  • If from 3D renders: include style variations (see below)
  • 多个角度(正面、3/4脸、侧面、背面)
  • 多种表情(中性、微笑、严肃、大笑等)
  • 不同光照条件(影棚、自然光、戏剧光)
  • 多样背景(或透明/纯色背景)
  • 多种穿搭/场景
  • 包含特写、中景等不同景别
  • 如果是3D渲染图:需要包含风格变体(见下文)

Preprocessing 3D Renders

3D渲染图预处理

Problem: Training directly on 3D renders bakes in the "3D" aesthetic.
Solution: Generate style variations first:
  1. Run each render through img2img with varied style prompts
  2. Mix: 60% style variations, 40% original renders
  3. This teaches identity, not style
Style prompts for variation:
"photorealistic portrait, dslr photo"
"oil painting portrait"
"digital illustration"
"pencil sketch"
"watercolor portrait"
问题:直接用3D渲染图训练会固化"3D"风格。
解决方案:先生成风格变体:
  1. 对每张渲染图运行图生图,搭配不同风格提示词
  2. 混合比例:60%风格变体图 + 40%原始渲染图
  3. 这样模型会学习身份特征,而非风格特征
用于生成变体的风格提示词:
"photorealistic portrait, dslr photo"
"oil painting portrait"
"digital illustration"
"pencil sketch"
"watercolor portrait"

Captioning Rules

标注规则

Trigger word: ALWAYS use a unique token as first word.
  • Good:
    sage_character
    ,
    ohwx_sage
    ,
    sks_person
  • Bad:
    woman
    ,
    redhead
    ,
    character
    (too generic)
Caption structure:
{trigger}, {subject type}, {clothing}, {pose}, {setting}, {lighting}, {style}
DO NOT describe face features (let the model learn them):
  • Bad: "woman with green eyes, freckles, auburn hair, defined cheekbones"
  • Good: "sage_character, woman, indoor portrait, wearing blue sweater"
DO describe everything else: clothing, pose, background, lighting, expression.
触发词:永远将唯一的标识词放在标注的第一位。
  • 正面示例:
    sage_character
    ohwx_sage
    sks_person
  • 反面示例:
    woman
    redhead
    character
    (太过通用)
标注结构:
{trigger}, {主体类型}, {穿搭}, {姿势}, {场景}, {光照}, {风格}
不要描述面部特征(让模型自主学习):
  • 反面示例:"woman with green eyes, freckles, auburn hair, defined cheekbones"
  • 正面示例:"sage_character, woman, indoor portrait, wearing blue sweater"
其他内容都要描述:穿搭、姿势、背景、光照、表情。

Folder Structure

文件夹结构

dataset/{character_name}/{repeats}_{trigger_word}/
  001.png + 001.txt
  002.png + 002.txt
  ...
Folder naming:
10_sage_character
= each image repeated 10x per epoch.
dataset/{character_name}/{repeats}_{trigger_word}/
  001.png + 001.txt
  002.png + 002.txt
  ...
文件夹命名规则:
10_sage_character
= 每张图每个epoch重复10次。

Training Configurations

训练配置

FLUX LoRA (AI-Toolkit) - Recommended

FLUX LoRA(AI-Toolkit)- 推荐方案

yaml
network:
  type: lora
  linear: 16              # Rank (16-32 for characters)
  linear_alpha: 16         # Alpha = rank for FLUX

train:
  batch_size: 1
  gradient_accumulation_steps: 4
  steps: 1500              # FLUX converges faster
  lr: 4e-4                 # Higher than SDXL
  optimizer: adamw8bit
  dtype: bf16

datasets:
  - resolution: [1024]
    caption_ext: "txt"

sample:
  sample_every: 250
  prompts:
    - "{trigger}, photorealistic portrait"
FLUX training notes:
  • Converges 2-3x faster than SDXL
  • 1000-2000 steps usually sufficient
  • Watch for overfitting (quality plateaus early)
  • 24GB VRAM for standard, 9GB with NF4 quantization (SimpleTuner)
yaml
network:
  type: lora
  linear: 16              # Rank (角色训练推荐16-32)
  linear_alpha: 16         # FLUX场景下Alpha等于rank即可

train:
  batch_size: 1
  gradient_accumulation_steps: 4
  steps: 1500              # FLUX收敛速度更快
  lr: 4e-4                 # 比SDXL的学习率更高
  optimizer: adamw8bit
  dtype: bf16

datasets:
  - resolution: [1024]
    caption_ext: "txt"

sample:
  sample_every: 250
  prompts:
    - "{trigger}, photorealistic portrait"
FLUX训练注意事项:
  • 收敛速度比SDXL快2-3倍
  • 通常1000-2000步就足够
  • 注意过拟合(生成质量会较早进入平台期)
  • 标准训练需要24GB显存,使用SimpleTuner的NF4量化仅需9GB显存

SDXL LoRA (Kohya_ss) - Proven

SDXL LoRA(Kohya_ss)- 成熟方案

yaml
pretrained_model: "RealVisXL_V5.0.safetensors"
network_dim: 32            # Rank (16-64)
network_alpha: 16          # Usually dim/2
resolution: "1024,1024"
train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 0.0001      # 1e-4
lr_scheduler: "cosine_with_restarts"
lr_scheduler_num_cycles: 3
max_train_epochs: 10
optimizer_type: "AdamW8bit"
mixed_precision: "bf16"
enable_bucket: true
min_snr_gamma: 5
Step calculation:
total_steps = (images x repeats x epochs) / batch_size
Target: 1500-3000 steps for SDXL
Example: 20 images x 10 repeats x 5 epochs / 1 = 1000 steps
yaml
pretrained_model: "RealVisXL_V5.0.safetensors"
network_dim: 32            # Rank (16-64)
network_alpha: 16          # 通常为dim的一半
resolution: "1024,1024"
train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 0.0001      # 1e-4
lr_scheduler: "cosine_with_restarts"
lr_scheduler_num_cycles: 3
max_train_epochs: 10
optimizer_type: "AdamW8bit"
mixed_precision: "bf16"
enable_bucket: true
min_snr_gamma: 5
步数计算方法:
总步数 = (图片数 × 重复次数 × epoch数) / 批次大小
SDXL目标步数:1500-3000步
示例:20张图 × 10次重复 × 5个epoch / 1批次 = 1000步

Low VRAM Training (FluxGym / SimpleTuner)

低显存训练方案(FluxGym / SimpleTuner)

For 12-16GB VRAM:
yaml
use_8bit_adam: true
gradient_checkpointing: true
cache_latents_to_disk: true
max_data_loader_n_workers: 0
train_batch_size: 1
gradient_accumulation_steps: 8
quantize_base_model: nf4    # SimpleTuner only
适用于12-16GB显存:
yaml
use_8bit_adam: true
gradient_checkpointing: true
cache_latents_to_disk: true
max_data_loader_n_workers: 0
train_batch_size: 1
gradient_accumulation_steps: 8
quantize_base_model: nf4    # 仅SimpleTuner支持

Evaluation Protocol

评估规范

Test Each Checkpoint

测试每个检查点

Use identical prompts across all checkpoints:
Prompt 1: "{trigger}, photorealistic portrait, neutral expression"
Prompt 2: "{trigger}, photorealistic portrait, smiling, outdoor"
Prompt 3: "{trigger}, wearing formal suit, standing, office"
Prompt 4: "a person standing in a park"  (WITHOUT trigger - should NOT produce character)
所有检查点都使用完全相同的提示词测试:
提示词1: "{trigger}, photorealistic portrait, neutral expression"
提示词2: "{trigger}, photorealistic portrait, smiling, outdoor"
提示词3: "{trigger}, wearing formal suit, standing, office"
提示词4: "a person standing in a park" (不含触发词 - 不应该生成目标角色)

Quality Indicators

质量判定标准

Good training:
  • Character recognizable from trigger word alone
  • Responds to different prompts/contexts
  • Doesn't always produce same pose/expression
  • Prompt 4 does NOT produce the character
Overfitting signs:
  • Same exact pose/expression regardless of prompt
  • Training backgrounds appearing in outputs
  • Ignores clothing/setting prompts
  • Prompt 4 produces the character (too strong)
训练效果好的特征:
  • 仅靠触发词就能识别出目标角色
  • 能响应不同的提示词/场景要求
  • 不会每次都生成相同的姿势/表情
  • 提示词4不会生成目标角色
过拟合征兆:
  • 无论提示词是什么,都生成完全相同的姿势/表情
  • 训练数据里的背景出现在生成结果中
  • 无视穿搭/场景相关的提示词
  • 提示词4也会生成目标角色(权重过强)

Best Epoch Selection

最优epoch选择

If using sample_every: 250 with 1500 steps:
  • Checkpoint 250: Usually underfit
  • Checkpoint 500-750: Often sweet spot for FLUX
  • Checkpoint 1000-1500: May be overfitting
Compare visually and select the checkpoint with best identity + prompt flexibility balance.
如果设置每250步采样一次,总步数1500:
  • 250步检查点:通常欠拟合
  • 500-750步检查点:通常是FLUX的最优区间
  • 1000-1500步检查点:可能出现过拟合
肉眼对比,选择身份准确率和提示词灵活性平衡得最好的检查点。

Post-Training Integration

训练后集成

  1. Copy best checkpoint to
    {ComfyUI}/models/loras/
  2. Update character profile:
    yaml
    lora:
      trained: true
      model_file: "sage_character_flux.safetensors"
      trigger_word: "sage_character"
      best_strength: 0.8
  3. Test in full workflow: LoRA (0.7-0.9) + PuLID/IP-Adapter (0.5-0.7)
  4. Record successful settings in character's
    generation_history
  1. 将最优检查点复制到
    {ComfyUI}/models/loras/
    目录
  2. 更新角色档案:
    yaml
    lora:
      trained: true
      model_file: "sage_character_flux.safetensors"
      trigger_word: "sage_character"
      best_strength: 0.8
  3. 在完整工作流中测试:LoRA(权重0.7-0.9) + PuLID/IP-Adapter(权重0.5-0.7)
  4. 将成功的配置记录到角色的
    generation_history

Combining LoRA with Zero-Shot Methods

LoRA与零样本方法结合

Best practice: LoRA as base identity, zero-shot for enhancement.
[Load Checkpoint] → [Load LoRA (0.7-0.9)] → [Apply PuLID/IP-Adapter (0.5-0.7)] → [Generate]
Lower weights on both prevents conflict while reinforcing identity.
最佳实践:以LoRA作为基础身份标识,零样本方法用于增强效果。
[加载检查点] → [加载LoRA(0.7-0.9)] → [应用PuLID/IP-Adapter(0.5-0.7)] → [生成]
两者都使用较低权重可以避免冲突,同时强化身份特征。

Troubleshooting

问题排查

IssueSolution
LoRA not activatingCheck trigger word spelling, ensure loaded before KSampler
Identity drift at anglesAdd more angle variety to dataset, reduce network_dim
OverfittingReduce epochs, increase dataset, lower network_dim
Style contaminationBetter caption diversity, don't describe style in captions
Poor quality/artifactsCheck training images for compression, reduce LR
问题解决方案
LoRA未激活检查触发词拼写,确保在KSampler之前加载LoRA
侧面角度身份偏移为数据集添加更多角度的素材,降低network_dim
过拟合减少epoch数,扩充数据集,降低network_dim
风格污染提升标注多样性,不要在标注里描述风格
生成质量差/有伪影检查训练图片是否有压缩问题,降低学习率

Reference

参考资料

  • references/lora-training.md
    - Full parameter reference
  • references/models.md
    - Training tool download links
  • Character profiles in
    projects/
    for trigger words and reference images
  • references/lora-training.md
    - 完整参数参考
  • references/models.md
    - 训练工具下载链接
  • projects/
    目录下的角色档案可查看触发词和参考图片