dspy-finetune-bootstrap

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DSPy BootstrapFinetune Optimizer

DSPy BootstrapFinetune 优化器

Goal

目标

Distill a DSPy program into fine-tuned model weights for efficient production deployment.
将DSPy程序蒸馏为微调后的模型权重,以实现高效的生产部署。

When to Use

适用场景

  • You have a working DSPy program with a large model
  • Need to reduce inference costs
  • Want faster responses (smaller model)
  • Deploying to resource-constrained environments
  • 你有一个基于大模型的可用DSPy程序
  • 需要降低推理成本
  • 想要更快的响应速度(使用更小的模型)
  • 部署到资源受限的环境中

Inputs

输入项

InputTypeDescription
program
dspy.Module
Teacher program to distill
trainset
list[dspy.Example]
Training examples
metric
callable
Validation metric (optional)
train_kwargs
dict
Training hyperparameters
输入项类型描述
program
dspy.Module
要蒸馏的教师程序
trainset
list[dspy.Example]
训练示例集
metric
callable
验证指标(可选)
train_kwargs
dict
训练超参数

Outputs

输出项

OutputTypeDescription
finetuned_program
dspy.Module
Program with fine-tuned weights
model_path
str
Path to saved model
输出项类型描述
finetuned_program
dspy.Module
带有微调后权重的程序
model_path
str
保存模型的路径

Workflow

工作流程

Phase 1: Prepare Teacher Program

阶段1:准备教师程序

python
import dspy
python
import dspy

Configure with strong teacher model

Configure with strong teacher model

dspy.configure(lm=dspy.LM("openai/gpt-4o"))
class TeacherQA(dspy.Module): def init(self): self.cot = dspy.ChainOfThought("question -> answer")
def forward(self, question):
    return self.cot(question=question)
undefined
dspy.configure(lm=dspy.LM("openai/gpt-4o"))
class TeacherQA(dspy.Module): def init(self): self.cot = dspy.ChainOfThought("question -> answer")
def forward(self, question):
    return self.cot(question=question)
undefined

Phase 2: Enable Experimental Features & Generate Training Traces

阶段2:启用实验性功能并生成训练轨迹

BootstrapFinetune is experimental and requires enabling the flag:
python
import dspy
from dspy.teleprompt import BootstrapFinetune
BootstrapFinetune属于实验性功能,需要启用对应标志:
python
import dspy
from dspy.teleprompt import BootstrapFinetune

Enable experimental features

Enable experimental features

dspy.settings.experimental = True
optimizer = BootstrapFinetune( metric=lambda gold, pred, trace=None: gold.answer.lower() in pred.answer.lower(), train_kwargs={ 'learning_rate': 5e-5, 'num_train_epochs': 3, 'per_device_train_batch_size': 4, 'warmup_ratio': 0.1 } )
undefined
dspy.settings.experimental = True
optimizer = BootstrapFinetune( metric=lambda gold, pred, trace=None: gold.answer.lower() in pred.answer.lower(), train_kwargs={ 'learning_rate': 5e-5, 'num_train_epochs': 3, 'per_device_train_batch_size': 4, 'warmup_ratio': 0.1 } )
undefined

Phase 3: Fine-tune Student Model

阶段3:微调学生模型

python
finetuned = optimizer.compile(
    TeacherQA(),
    trainset=trainset
)
python
finetuned = optimizer.compile(
    TeacherQA(),
    trainset=trainset
)

Phase 4: Deploy

阶段4:部署

python
undefined
python
undefined

Save the fine-tuned model (saves state-only by default)

Save the fine-tuned model (saves state-only by default)

finetuned.save("finetuned_qa_model.json")
finetuned.save("finetuned_qa_model.json")

Load and use (must recreate architecture first)

Load and use (must recreate architecture first)

loaded = TeacherQA() loaded.load("finetuned_qa_model.json") result = loaded(question="What is machine learning?")
undefined
loaded = TeacherQA() loaded.load("finetuned_qa_model.json") result = loaded(question="What is machine learning?")
undefined

Production Example

生产环境示例

python
import dspy
from dspy.teleprompt import BootstrapFinetune
from dspy.evaluate import Evaluate
import logging
import os
python
import dspy
from dspy.teleprompt import BootstrapFinetune
from dspy.evaluate import Evaluate
import logging
import os

Enable experimental features

Enable experimental features

dspy.settings.experimental = True
logger = logging.getLogger(name)
class ClassificationSignature(dspy.Signature): """Classify text into categories.""" text: str = dspy.InputField() label: str = dspy.OutputField(desc="Category: positive, negative, neutral")
class TextClassifier(dspy.Module): def init(self): self.classify = dspy.Predict(ClassificationSignature)
def forward(self, text):
    return self.classify(text=text)
def classification_metric(gold, pred, trace=None): """Exact label match.""" gold_label = gold.label.lower().strip() pred_label = pred.label.lower().strip() if pred.label else "" return gold_label == pred_label
def finetune_classifier(trainset, devset, output_dir="./finetuned_model"): """Full fine-tuning pipeline."""
# Configure teacher (strong model)
dspy.configure(lm=dspy.LM("openai/gpt-4o"))

teacher = TextClassifier()

# Evaluate teacher
evaluator = Evaluate(devset=devset, metric=classification_metric, num_threads=8)
teacher_score = evaluator(teacher)
logger.info(f"Teacher score: {teacher_score:.2%}")

# Fine-tune (train_kwargs passed to constructor)
optimizer = BootstrapFinetune(
    metric=classification_metric,
    train_kwargs={
        'learning_rate': 2e-5,
        'num_train_epochs': 3,
        'per_device_train_batch_size': 8,
        'gradient_accumulation_steps': 2,
        'warmup_ratio': 0.1,
        'weight_decay': 0.01,
        'logging_steps': 10,
        'save_strategy': 'epoch',
        'output_dir': output_dir
    }
)

finetuned = optimizer.compile(
    teacher,
    trainset=trainset
)

# Evaluate fine-tuned model
student_score = evaluator(finetuned)
logger.info(f"Student score: {student_score:.2%}")

# Save (state-only as JSON)
finetuned.save(os.path.join(output_dir, "final_model.json"))

return {
    "teacher_score": teacher_score,
    "student_score": student_score,
    "model_path": os.path.join(output_dir, "final_model.json")
}
dspy.settings.experimental = True
logger = logging.getLogger(name)
class ClassificationSignature(dspy.Signature): """Classify text into categories.""" text: str = dspy.InputField() label: str = dspy.OutputField(desc="Category: positive, negative, neutral")
class TextClassifier(dspy.Module): def init(self): self.classify = dspy.Predict(ClassificationSignature)
def forward(self, text):
    return self.classify(text=text)
def classification_metric(gold, pred, trace=None): """Exact label match.""" gold_label = gold.label.lower().strip() pred_label = pred.label.lower().strip() if pred.label else "" return gold_label == pred_label
def finetune_classifier(trainset, devset, output_dir="./finetuned_model"): """Full fine-tuning pipeline."""
# Configure teacher (strong model)
dspy.configure(lm=dspy.LM("openai/gpt-4o"))

teacher = TextClassifier()

# Evaluate teacher
evaluator = Evaluate(devset=devset, metric=classification_metric, num_threads=8)
teacher_score = evaluator(teacher)
logger.info(f"Teacher score: {teacher_score:.2%}")

# Fine-tune (train_kwargs passed to constructor)
optimizer = BootstrapFinetune(
    metric=classification_metric,
    train_kwargs={
        'learning_rate': 2e-5,
        'num_train_epochs': 3,
        'per_device_train_batch_size': 8,
        'gradient_accumulation_steps': 2,
        'warmup_ratio': 0.1,
        'weight_decay': 0.01,
        'logging_steps': 10,
        'save_strategy': 'epoch',
        'output_dir': output_dir
    }
)

finetuned = optimizer.compile(
    teacher,
    trainset=trainset
)

# Evaluate fine-tuned model
student_score = evaluator(finetuned)
logger.info(f"Student score: {student_score:.2%}")

# Save (state-only as JSON)
finetuned.save(os.path.join(output_dir, "final_model.json"))

return {
    "teacher_score": teacher_score,
    "student_score": student_score,
    "model_path": os.path.join(output_dir, "final_model.json")
}

For RAG fine-tuning

For RAG fine-tuning

class RAGClassifier(dspy.Module): """RAG pipeline that can be fine-tuned."""
def __init__(self, num_passages=3):
    self.retrieve = dspy.Retrieve(k=num_passages)
    self.classify = dspy.ChainOfThought("context, text -> label")

def forward(self, text):
    context = self.retrieve(text).passages
    return self.classify(context=context, text=text)
def finetune_rag_classifier(trainset, devset): """Fine-tune a RAG-based classifier."""
# Configure retriever and LM
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(
    lm=dspy.LM("openai/gpt-4o"),
    rm=colbert
)

rag = RAGClassifier()

# Fine-tune (train_kwargs in constructor)
optimizer = BootstrapFinetune(
    metric=classification_metric,
    train_kwargs={
        'learning_rate': 1e-5,
        'num_train_epochs': 5
    }
)

finetuned = optimizer.compile(
    rag,
    trainset=trainset
)

return finetuned
undefined
class RAGClassifier(dspy.Module): """RAG pipeline that can be fine-tuned."""
def __init__(self, num_passages=3):
    self.retrieve = dspy.Retrieve(k=num_passages)
    self.classify = dspy.ChainOfThought("context, text -> label")

def forward(self, text):
    context = self.retrieve(text).passages
    return self.classify(context=context, text=text)
def finetune_rag_classifier(trainset, devset): """Fine-tune a RAG-based classifier."""
# Configure retriever and LM
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(
    lm=dspy.LM("openai/gpt-4o"),
    rm=colbert
)

rag = RAGClassifier()

# Fine-tune (train_kwargs in constructor)
optimizer = BootstrapFinetune(
    metric=classification_metric,
    train_kwargs={
        'learning_rate': 1e-5,
        'num_train_epochs': 5
    }
)

finetuned = optimizer.compile(
    rag,
    trainset=trainset
)

return finetuned
undefined

Training Arguments Reference

训练参数参考

ArgumentDescriptionTypical Value
learning_rate
Learning rate1e-5 to 5e-5
num_train_epochs
Training epochs3-5
per_device_train_batch_size
Batch size4-16
gradient_accumulation_steps
Gradient accumulation2-8
warmup_ratio
Warmup proportion0.1
weight_decay
L2 regularization0.01
max_grad_norm
Gradient clipping1.0
参数描述典型值
learning_rate
学习率1e-5 至 5e-5
num_train_epochs
训练轮数3-5
per_device_train_batch_size
单设备训练批次大小4-16
gradient_accumulation_steps
梯度累积步数2-8
warmup_ratio
预热比例0.1
weight_decay
L2正则化系数0.01
max_grad_norm
梯度裁剪阈值1.0

Best Practices

最佳实践

  1. Strong teacher - Use GPT-4 or Claude as teacher
  2. Quality data - Teacher traces are only as good as training examples
  3. Validate improvement - Compare student to teacher on held-out set
  4. Start with more epochs - Fine-tuning often needs 3-5 epochs
  5. Monitor overfitting - Track validation loss during training
  1. 强大的教师模型 - 使用GPT-4或Claude作为教师模型
  2. 高质量数据 - 教师轨迹的质量取决于训练示例
  3. 验证性能提升 - 在预留数据集上比较学生模型与教师模型的表现
  4. 初始设置更多训练轮数 - 微调通常需要3-5轮
  5. 监控过拟合 - 训练过程中跟踪验证损失

Limitations

局限性

  • Requires access to model weights (not API-only models)
  • Training requires GPU resources
  • Student may not match teacher quality on all inputs
  • Fine-tuning takes hours/days depending on data size
  • Model size reduction may cause capability loss
  • 需要访问模型权重(仅API调用的模型无法使用)
  • 训练需要GPU资源
  • 学生模型在部分输入上的性能可能无法匹配教师模型
  • 微调时间根据数据量大小可能需要数小时/数天
  • 模型尺寸缩小可能导致能力下降

Official Documentation

官方文档