lora

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Using LoRA for Fine-tuning

使用LoRA进行微调

LoRA (Low-Rank Adaptation) enables efficient fine-tuning by freezing pretrained weights and injecting small trainable matrices into transformer layers. This reduces trainable parameters to ~0.1% of the original model while maintaining performance.

LoRA（Low-Rank Adaptation）通过冻结预训练权重并在Transformer层中注入小型可训练矩阵，实现高效微调。这将可训练参数减少到原模型的约0.1%，同时保持模型性能。

Core Concepts

核心概念

How LoRA Works

LoRA工作原理

Instead of updating all weights during fine-tuning, LoRA decomposes weight updates into low-rank matrices:

W' = W + BA

Where:

```
W
```
is the frozen pretrained weight matrix (d × k)
```
B
```
is a trainable matrix (d × r)
```
A
```
is a trainable matrix (r × k)
```
r
```
is the rank, much smaller than d and k

The key insight: weight updates during fine-tuning have low intrinsic rank, so we can represent them efficiently with smaller matrices.

在微调期间不更新所有权重，LoRA将权重更新分解为低秩矩阵：

W' = W + BA

其中：

```
W
```
是冻结的预训练权重矩阵（d × k）
```
B
```
是可训练矩阵（d × r）
```
A
```
是可训练矩阵（r × k）
```
r
```
是秩，远小于d和k

核心思路：微调过程中的权重更新具有低内在秩，因此可以用更小的矩阵高效表示。

Why Use LoRA

为什么选择LoRA

Aspect	Full Fine-tuning	LoRA
Trainable params	100%	~0.1-1%
Memory usage	High	Low
Adapter size	Full model	~3-100 MB
Training speed	Slower	Faster
Multiple tasks	Separate models	Swap adapters

对比维度	全量微调	LoRA
可训练参数占比	100%	~0.1-1%
内存占用	高	低
适配器大小	完整模型	~3-100 MB
训练速度	较慢	更快
多任务支持	独立模型	切换适配器

Basic Setup

基础设置

Installation

安装

bash

pip install peft transformers accelerate

bash

pip install peft transformers accelerate

Minimal Example

最简示例

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
import torch

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
import torch

Load base model

加载基础模型

model_name = "meta-llama/Llama-3.2-1B" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(model_name) tokenizer.pad_token = tokenizer.eos_token

Configure LoRA

配置LoRA

lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], lora_dropout=0.05, bias="none", task_type=TaskType.CAUSAL_LM, )

Apply LoRA

应用LoRA

model = get_peft_model(model, lora_config) model.print_trainable_parameters()

trainable params: 3,407,872 || all params: 1,238,300,672 || trainable%: 0.28%

undefined

undefined

Configuration Parameters

配置参数

LoraConfig Options

LoraConfig选项

python

from peft import LoraConfig, TaskType

config = LoraConfig(
    # Core parameters
    r=16,                          # Rank of update matrices
    lora_alpha=32,                 # Scaling factor (alpha/r applied to updates)
    target_modules=["q_proj", "v_proj"],  # Layers to adapt

    # Regularization
    lora_dropout=0.05,             # Dropout on LoRA layers
    bias="none",                   # "none", "all", or "lora_only"

    # Task configuration
    task_type=TaskType.CAUSAL_LM,  # CAUSAL_LM, SEQ_CLS, SEQ_2_SEQ_LM, TOKEN_CLS

    # Advanced
    modules_to_save=None,          # Additional modules to train (e.g., ["lm_head"])
    layers_to_transform=None,      # Specific layer indices to adapt
    use_rslora=False,              # Rank-stabilized LoRA scaling
    use_dora=False,                # Weight-Decomposed LoRA
)

python

from peft import LoraConfig, TaskType

config = LoraConfig(
    # 核心参数
    r=16,                          # 更新矩阵的秩
    lora_alpha=32,                 # 缩放因子（更新时应用alpha/r）
    target_modules=["q_proj", "v_proj"],  # 要适配的层

    # 正则化
    lora_dropout=0.05,             # LoRA层的 dropout
    bias="none",                   # "none"、"all" 或 "lora_only"

    # 任务配置
    task_type=TaskType.CAUSAL_LM,  # CAUSAL_LM、SEQ_CLS、SEQ_2_SEQ_LM、TOKEN_CLS

    # 高级选项
    modules_to_save=None,          # 额外要训练的模块（例如：["lm_head"]）
    layers_to_transform=None,      # 要适配的特定层索引
    use_rslora=False,              # 秩稳定LoRA缩放
    use_dora=False,                # 权重分解LoRA
)

Target Modules by Architecture

不同架构的目标模块

python

undefined

python

undefined

Llama, Mistral, Qwen

Llama、Mistral、Qwen

target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

GPT-2, GPT-J

GPT-2、GPT-J

target_modules = ["c_attn", "c_proj", "c_fc"]

BERT, RoBERTa

BERT、RoBERTa

target_modules = ["query", "key", "value", "dense"]

Falcon

target_modules = ["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h"]

Phi

target_modules = ["q_proj", "k_proj", "v_proj", "dense", "fc1", "fc2"]

undefined

target_modules = ["q_proj", "k_proj", "v_proj", "dense", "fc1", "fc2"]

undefined

Finding Target Modules

查找目标模块

python

undefined

python

undefined

Print all linear layer names

打印所有线性层名称

from peft.utils import get_peft_model_state_dict

def find_target_modules(model): linear_modules = set() for name, module in model.named_modules(): if isinstance(module, torch.nn.Linear): # Get the last part of the name (e.g., "q_proj" from "model.layers.0.self_attn.q_proj") layer_name = name.split(".")[-1] linear_modules.add(layer_name) return list(linear_modules)

print(find_target_modules(model))

undefined

from peft.utils import get_peft_model_state_dict

def find_target_modules(model): linear_modules = set() for name, module in model.named_modules(): if isinstance(module, torch.nn.Linear): # 获取名称的最后部分（例如：从"model.layers.0.self_attn.q_proj"中提取"q_proj"） layer_name = name.split(".")[-1] linear_modules.add(layer_name) return list(linear_modules)

print(find_target_modules(model))

undefined

QLoRA (Quantized LoRA)

QLoRA（量化LoRA）

QLoRA combines 4-bit quantization with LoRA, enabling fine-tuning of large models on consumer GPUs.

QLoRA将4位量化与LoRA相结合，支持在消费级GPU上微调大模型。

Setup

设置

python

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import torch

python

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import torch

4-bit quantization config

4位量化配置

bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", # Normalized float 4-bit bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, # Nested quantization )

bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", # 归一化浮点4位 bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, # 嵌套量化 )

Load quantized model

加载量化模型

model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-3B", quantization_config=bnb_config, device_map="auto", )

Prepare for k-bit training

为k位训练做准备

model = prepare_model_for_kbit_training(model)

Apply LoRA

应用LoRA

lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_dropout=0.05, bias="none", task_type=TaskType.CAUSAL_LM, )

model = get_peft_model(model, lora_config)

undefined

model = get_peft_model(model, lora_config)

undefined

Memory Requirements

内存需求

Model Size	Full FT (16-bit)	LoRA (16-bit)	QLoRA (4-bit)
7B	~60 GB	~16 GB	~6 GB
13B	~104 GB	~28 GB	~10 GB
70B	~560 GB	~160 GB	~48 GB

模型大小	全量微调（16位）	LoRA（16位）	QLoRA（4位）
7B	~60 GB	~16 GB	~6 GB
13B	~104 GB	~28 GB	~10 GB
70B	~560 GB	~160 GB	~48 GB

Training Patterns

训练模式

With Hugging Face Trainer

使用Hugging Face Trainer

python

from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
from datasets import load_dataset

python

from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
from datasets import load_dataset

Prepare dataset

准备数据集

dataset = load_dataset("tatsu-lab/alpaca", split="train")

def format_prompt(example): if example["input"]: text = f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:\n{example['output']}" else: text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['output']}" return {"text": text}

dataset = dataset.map(format_prompt)

def tokenize(examples): return tokenizer( examples["text"], truncation=True, max_length=512, padding=False, )

tokenized = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)

dataset = load_dataset("tatsu-lab/alpaca", split="train")

def format_prompt(example): if example["input"]: text = f"### 指令:\n{example['instruction']}\n\n### 输入:\n{example['input']}\n\n### 响应:\n{example['output']}" else: text = f"### 指令:\n{example['instruction']}\n\n### 响应:\n{example['output']}" return {"text": text}

dataset = dataset.map(format_prompt)

def tokenize(examples): return tokenizer( examples["text"], truncation=True, max_length=512, padding=False, )

tokenized = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)

Training arguments (note higher learning rate)

训练参数（注意更高的学习率）

training_args = TrainingArguments( output_dir="./lora-output", per_device_train_batch_size=4, gradient_accumulation_steps=4, num_train_epochs=1, learning_rate=2e-4, # Higher than full fine-tuning bf16=True, logging_steps=10, save_steps=500, warmup_ratio=0.03, gradient_checkpointing=True, optim="adamw_torch_fused", )

trainer = Trainer( model=model, args=training_args, train_dataset=tokenized, data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False), )

trainer.train()

undefined

training_args = TrainingArguments( output_dir="./lora-output", per_device_train_batch_size=4, gradient_accumulation_steps=4, num_train_epochs=1, learning_rate=2e-4, # 高于全量微调的学习率 bf16=True, logging_steps=10, save_steps=500, warmup_ratio=0.03, gradient_checkpointing=True, optim="adamw_torch_fused", )

trainer = Trainer( model=model, args=training_args, train_dataset=tokenized, data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False), )

trainer.train()

undefined

With SFTTrainer (TRL)

使用SFTTrainer（TRL）

python

from trl import SFTTrainer, SFTConfig

sft_config = SFTConfig(
    output_dir="./sft-lora",
    max_seq_length=1024,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    bf16=True,
    logging_steps=10,
    gradient_checkpointing=True,
)

trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=lora_config,      # Pass config directly, SFTTrainer applies it
    dataset_text_field="text",
)

trainer.train()

python

from trl import SFTTrainer, SFTConfig

sft_config = SFTConfig(
    output_dir="./sft-lora",
    max_seq_length=1024,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    bf16=True,
    logging_steps=10,
    gradient_checkpointing=True,
)

trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=lora_config,      # 直接传入配置，SFTTrainer会自动应用
    dataset_text_field="text",
)

trainer.train()

Classification Task

分类任务

python

from transformers import AutoModelForSequenceClassification
from peft import LoraConfig, get_peft_model, TaskType

model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=2,
)

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_CLS,
    modules_to_save=["classifier"],  # Train classification head fully
)

model = get_peft_model(model, lora_config)

python

from transformers import AutoModelForSequenceClassification
from peft import LoraConfig, get_peft_model, TaskType

model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=2,
)

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_CLS,
    modules_to_save=["classifier"],  # 全量训练分类头
)

model = get_peft_model(model, lora_config)

Saving and Loading

保存与加载

Save Adapter

保存适配器

python

undefined

python

undefined

Save only LoRA weights (small file)

仅保存LoRA权重（文件体积小）

model.save_pretrained("./my-lora-adapter") tokenizer.save_pretrained("./my-lora-adapter")

Push to Hub

推送到Hub

model.push_to_hub("username/my-lora-adapter")

undefined

model.push_to_hub("username/my-lora-adapter")

undefined

Load Adapter

加载适配器

python

from peft import PeftModel
from transformers import AutoModelForCausalLM

python

from peft import PeftModel
from transformers import AutoModelForCausalLM

Load base model

加载基础模型

base_model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-1B", torch_dtype=torch.bfloat16, device_map="auto", )

Load adapter

加载适配器

model = PeftModel.from_pretrained(base_model, "./my-lora-adapter")

For inference

推理模式

model.eval()

undefined

model.eval()

undefined

Switch Between Adapters

切换适配器

python

undefined

python

undefined

Load multiple adapters

加载多个适配器

model.load_adapter("./adapter-1", adapter_name="task1") model.load_adapter("./adapter-2", adapter_name="task2")

Switch active adapter

切换激活的适配器

model.set_adapter("task1") output = model.generate(**inputs)

model.set_adapter("task2") output = model.generate(**inputs)

model.set_adapter("task1") output = model.generate(**inputs)

model.set_adapter("task2") output = model.generate(**inputs)

Disable adapter (use base model)

禁用适配器（使用基础模型）

with model.disable_adapter(): output = model.generate(**inputs)

undefined

with model.disable_adapter(): output = model.generate(**inputs)

undefined

Merging Adapters

合并适配器

Merge LoRA weights into the base model for deployment without adapter overhead.

python

from peft import PeftModel

将LoRA权重合并到基础模型中，消除部署时的适配器开销。

python

from peft import PeftModel

Load base model

加载基础模型

base_model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-1B", torch_dtype=torch.bfloat16, device_map="cpu", # Merge on CPU to avoid memory issues )

base_model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-1B", torch_dtype=torch.bfloat16, device_map="cpu", # 在CPU上合并以避免内存问题 )

Load adapter

加载适配器

model = PeftModel.from_pretrained(base_model, "./my-lora-adapter")

Merge and unload

合并并卸载适配器

merged_model = model.merge_and_unload()

Save merged model

保存合并后的模型

merged_model.save_pretrained("./merged-model") tokenizer.save_pretrained("./merged-model")

Push merged model to Hub

将合并后的模型推送到Hub

merged_model.push_to_hub("username/my-merged-model")

undefined

merged_model.push_to_hub("username/my-merged-model")

undefined

Best Practices

最佳实践

Start with r=16: Scale up to 32 or 64 if the model underfits, down to 8 if overfitting or memory-constrained
Set lora_alpha = 2 × r: This is a common heuristic; the effective scaling is
```
alpha/r
```
Target all attention and MLP layers: For best results on LLMs, include gate/up/down projections:
python
```
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
```
Use higher learning rate: 2e-4 is typical for LoRA vs 2e-5 for full fine-tuning
Enable gradient checkpointing: Reduces memory at cost of ~20% slower training:
python
```
model.gradient_checkpointing_enable()
```
Use QLoRA for large models: Essential for fine-tuning 7B+ models on consumer GPUs
Keep dropout low: 0.05 is usually sufficient; higher values may hurt performance
Save checkpoints frequently: LoRA adapters are small, so save often
Evaluate on base model too: Ensure adapter doesn't degrade base capabilities
Consider modules_to_save for task heads: For classification, train the classifier fully:
python
```
modules_to_save=["classifier", "score"]
```

从r=16开始：如果模型欠拟合，可增大到32或64；如果过拟合或内存受限，可减小到8
设置lora_alpha = 2 × r：这是常用的经验法则，有效缩放比例为
```
alpha/r
```
针对所有注意力和MLP层：为了在大语言模型上获得最佳效果，包含gate/up/down投影层：
python
```
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
```
使用更高的学习率：LoRA的典型学习率为2e-4，而全量微调用2e-5
启用梯度检查点：以约20%的训练速度为代价减少内存占用：
python
```
model.gradient_checkpointing_enable()
```
对大模型使用QLoRA：在消费级GPU上微调7B+模型的必备技术
保持低dropout：0.05通常足够；更高的值可能损害性能
频繁保存检查点：LoRA适配器体积小，因此可以经常保存
同时在基础模型上评估：确保适配器不会降低基础模型的能力
考虑为任务头设置modules_to_save：对于分类任务，全量训练分类头：
python
```
modules_to_save=["classifier", "score"]
```

References

参考资料

See

reference/

for detailed documentation:

```
advanced-techniques.md
```
- DoRA, rsLoRA, adapter composition, and debugging

详见

reference/

目录下的详细文档：

```
advanced-techniques.md
```
- DoRA、rsLoRA、适配器组合与调试

lora

Original

Translation

Using LoRA for Fine-tuning

使用LoRA进行微调

Table of Contents

目录

Core Concepts

核心概念

How LoRA Works

LoRA工作原理

Why Use LoRA

为什么选择LoRA

Basic Setup

基础设置

Installation

安装

Minimal Example

最简示例

Load base model

加载基础模型

Configure LoRA

配置LoRA

Apply LoRA

应用LoRA

trainable params: 3,407,872 || all params: 1,238,300,672 || trainable%: 0.28%

trainable params: 3,407,872 || all params: 1,238,300,672 || trainable%: 0.28%

Configuration Parameters

配置参数

LoraConfig Options

LoraConfig选项

Target Modules by Architecture

不同架构的目标模块

Llama, Mistral, Qwen

Llama、Mistral、Qwen

GPT-2, GPT-J

GPT-2、GPT-J

BERT, RoBERTa

BERT、RoBERTa

Falcon

Falcon

Phi

Phi

Finding Target Modules

查找目标模块

Print all linear layer names

打印所有线性层名称

QLoRA (Quantized LoRA)

QLoRA（量化LoRA）

Setup

设置

4-bit quantization config

4位量化配置

Load quantized model

加载量化模型

Prepare for k-bit training

为k位训练做准备

Apply LoRA

应用LoRA

Memory Requirements

内存需求

Training Patterns

训练模式

With Hugging Face Trainer

使用Hugging Face Trainer

Prepare dataset

准备数据集

Training arguments (note higher learning rate)

训练参数（注意更高的学习率）

With SFTTrainer (TRL)

使用SFTTrainer（TRL）

Classification Task

分类任务

Saving and Loading

保存与加载

Save Adapter

保存适配器

Save only LoRA weights (small file)

仅保存LoRA权重（文件体积小）

Push to Hub