dspy-finetune-bootstrap

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

DSPy BootstrapFinetune Optimizer

DSPy BootstrapFinetune 优化器

Goal

目标

Distill a DSPy program into fine-tuned model weights for efficient production deployment.

将DSPy程序蒸馏为微调后的模型权重，以实现高效的生产部署。

When to Use

适用场景

You have a working DSPy program with a large model
Need to reduce inference costs
Want faster responses (smaller model)
Deploying to resource-constrained environments

你有一个基于大模型的可用DSPy程序
需要降低推理成本
想要更快的响应速度（使用更小的模型）
部署到资源受限的环境中

Inputs

输入项

Input	Type	Description
`program`	`dspy.Module`	Teacher program to distill
`trainset`	`list[dspy.Example]`	Training examples
`metric`	`callable`	Validation metric (optional)
`train_kwargs`	`dict`	Training hyperparameters

输入项	类型	描述
`program`	`dspy.Module`	要蒸馏的教师程序
`trainset`	`list[dspy.Example]`	训练示例集
`metric`	`callable`	验证指标（可选）
`train_kwargs`	`dict`	训练超参数

Outputs

输出项

Output	Type	Description
`finetuned_program`	`dspy.Module`	Program with fine-tuned weights
`model_path`	`str`	Path to saved model

输出项	类型	描述
`finetuned_program`	`dspy.Module`	带有微调后权重的程序
`model_path`	`str`	保存模型的路径

Workflow

工作流程

Phase 1: Prepare Teacher Program

阶段1：准备教师程序

python

import dspy

python

import dspy

Configure with strong teacher model

dspy.configure(lm=dspy.LM("openai/gpt-4o"))

class TeacherQA(dspy.Module): def init(self): self.cot = dspy.ChainOfThought("question -> answer")

def forward(self, question):
    return self.cot(question=question)

undefined

dspy.configure(lm=dspy.LM("openai/gpt-4o"))

class TeacherQA(dspy.Module): def init(self): self.cot = dspy.ChainOfThought("question -> answer")

def forward(self, question):
    return self.cot(question=question)

undefined

Phase 2: Enable Experimental Features & Generate Training Traces

阶段2：启用实验性功能并生成训练轨迹

BootstrapFinetune is experimental and requires enabling the flag:

python

import dspy
from dspy.teleprompt import BootstrapFinetune

BootstrapFinetune属于实验性功能，需要启用对应标志：

python

import dspy
from dspy.teleprompt import BootstrapFinetune

Enable experimental features

dspy.settings.experimental = True

optimizer = BootstrapFinetune( metric=lambda gold, pred, trace=None: gold.answer.lower() in pred.answer.lower(), train_kwargs={ 'learning_rate': 5e-5, 'num_train_epochs': 3, 'per_device_train_batch_size': 4, 'warmup_ratio': 0.1 } )

undefined

dspy.settings.experimental = True

undefined

Phase 3: Fine-tune Student Model

阶段3：微调学生模型

python

finetuned = optimizer.compile(
    TeacherQA(),
    trainset=trainset
)

python

finetuned = optimizer.compile(
    TeacherQA(),
    trainset=trainset
)

Phase 4: Deploy

阶段4：部署

python

undefined

python

undefined

Save the fine-tuned model (saves state-only by default)

finetuned.save("finetuned_qa_model.json")

Load and use (must recreate architecture first)

loaded = TeacherQA() loaded.load("finetuned_qa_model.json") result = loaded(question="What is machine learning?")

undefined

loaded = TeacherQA() loaded.load("finetuned_qa_model.json") result = loaded(question="What is machine learning?")

undefined

Production Example

生产环境示例

python

import dspy
from dspy.teleprompt import BootstrapFinetune
from dspy.evaluate import Evaluate
import logging
import os

python

import dspy
from dspy.teleprompt import BootstrapFinetune
from dspy.evaluate import Evaluate
import logging
import os

Enable experimental features

dspy.settings.experimental = True

logger = logging.getLogger(name)

class ClassificationSignature(dspy.Signature): """Classify text into categories.""" text: str = dspy.InputField() label: str = dspy.OutputField(desc="Category: positive, negative, neutral")

class TextClassifier(dspy.Module): def init(self): self.classify = dspy.Predict(ClassificationSignature)

def forward(self, text):
    return self.classify(text=text)

def classification_metric(gold, pred, trace=None): """Exact label match.""" gold_label = gold.label.lower().strip() pred_label = pred.label.lower().strip() if pred.label else "" return gold_label == pred_label

def finetune_classifier(trainset, devset, output_dir="./finetuned_model"): """Full fine-tuning pipeline."""

# Configure teacher (strong model)
dspy.configure(lm=dspy.LM("openai/gpt-4o"))

teacher = TextClassifier()

# Evaluate teacher
evaluator = Evaluate(devset=devset, metric=classification_metric, num_threads=8)
teacher_score = evaluator(teacher)
logger.info(f"Teacher score: {teacher_score:.2%}")

# Fine-tune (train_kwargs passed to constructor)
optimizer = BootstrapFinetune(
    metric=classification_metric,
    train_kwargs={
        'learning_rate': 2e-5,
        'num_train_epochs': 3,
        'per_device_train_batch_size': 8,
        'gradient_accumulation_steps': 2,
        'warmup_ratio': 0.1,
        'weight_decay': 0.01,
        'logging_steps': 10,
        'save_strategy': 'epoch',
        'output_dir': output_dir
    }
)

finetuned = optimizer.compile(
    teacher,
    trainset=trainset
)

# Evaluate fine-tuned model
student_score = evaluator(finetuned)
logger.info(f"Student score: {student_score:.2%}")

# Save (state-only as JSON)
finetuned.save(os.path.join(output_dir, "final_model.json"))

return {
    "teacher_score": teacher_score,
    "student_score": student_score,
    "model_path": os.path.join(output_dir, "final_model.json")
}

dspy.settings.experimental = True

logger = logging.getLogger(name)

class ClassificationSignature(dspy.Signature): """Classify text into categories.""" text: str = dspy.InputField() label: str = dspy.OutputField(desc="Category: positive, negative, neutral")

class TextClassifier(dspy.Module): def init(self): self.classify = dspy.Predict(ClassificationSignature)

def forward(self, text):
    return self.classify(text=text)

def finetune_classifier(trainset, devset, output_dir="./finetuned_model"): """Full fine-tuning pipeline."""

# Configure teacher (strong model)
dspy.configure(lm=dspy.LM("openai/gpt-4o"))

teacher = TextClassifier()

# Evaluate teacher
evaluator = Evaluate(devset=devset, metric=classification_metric, num_threads=8)
teacher_score = evaluator(teacher)
logger.info(f"Teacher score: {teacher_score:.2%}")

# Fine-tune (train_kwargs passed to constructor)
optimizer = BootstrapFinetune(
    metric=classification_metric,
    train_kwargs={
        'learning_rate': 2e-5,
        'num_train_epochs': 3,
        'per_device_train_batch_size': 8,
        'gradient_accumulation_steps': 2,
        'warmup_ratio': 0.1,
        'weight_decay': 0.01,
        'logging_steps': 10,
        'save_strategy': 'epoch',
        'output_dir': output_dir
    }
)

finetuned = optimizer.compile(
    teacher,
    trainset=trainset
)

# Evaluate fine-tuned model
student_score = evaluator(finetuned)
logger.info(f"Student score: {student_score:.2%}")

# Save (state-only as JSON)
finetuned.save(os.path.join(output_dir, "final_model.json"))

return {
    "teacher_score": teacher_score,
    "student_score": student_score,
    "model_path": os.path.join(output_dir, "final_model.json")
}

For RAG fine-tuning

class RAGClassifier(dspy.Module): """RAG pipeline that can be fine-tuned."""

def __init__(self, num_passages=3):
    self.retrieve = dspy.Retrieve(k=num_passages)
    self.classify = dspy.ChainOfThought("context, text -> label")

def forward(self, text):
    context = self.retrieve(text).passages
    return self.classify(context=context, text=text)

def finetune_rag_classifier(trainset, devset): """Fine-tune a RAG-based classifier."""

# Configure retriever and LM
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(
    lm=dspy.LM("openai/gpt-4o"),
    rm=colbert
)

rag = RAGClassifier()

# Fine-tune (train_kwargs in constructor)
optimizer = BootstrapFinetune(
    metric=classification_metric,
    train_kwargs={
        'learning_rate': 1e-5,
        'num_train_epochs': 5
    }
)

finetuned = optimizer.compile(
    rag,
    trainset=trainset
)

return finetuned

undefined

class RAGClassifier(dspy.Module): """RAG pipeline that can be fine-tuned."""

def __init__(self, num_passages=3):
    self.retrieve = dspy.Retrieve(k=num_passages)
    self.classify = dspy.ChainOfThought("context, text -> label")

def forward(self, text):
    context = self.retrieve(text).passages
    return self.classify(context=context, text=text)

def finetune_rag_classifier(trainset, devset): """Fine-tune a RAG-based classifier."""

# Configure retriever and LM
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(
    lm=dspy.LM("openai/gpt-4o"),
    rm=colbert
)

rag = RAGClassifier()

# Fine-tune (train_kwargs in constructor)
optimizer = BootstrapFinetune(
    metric=classification_metric,
    train_kwargs={
        'learning_rate': 1e-5,
        'num_train_epochs': 5
    }
)

finetuned = optimizer.compile(
    rag,
    trainset=trainset
)

return finetuned

undefined

Training Arguments Reference

训练参数参考

Argument	Description	Typical Value
`learning_rate`	Learning rate	1e-5 to 5e-5
`num_train_epochs`	Training epochs	3-5
`per_device_train_batch_size`	Batch size	4-16
`gradient_accumulation_steps`	Gradient accumulation	2-8
`warmup_ratio`	Warmup proportion	0.1
`weight_decay`	L2 regularization	0.01
`max_grad_norm`	Gradient clipping	1.0

参数	描述	典型值
`learning_rate`	学习率	1e-5 至 5e-5
`num_train_epochs`	训练轮数	3-5
`per_device_train_batch_size`	单设备训练批次大小	4-16
`gradient_accumulation_steps`	梯度累积步数	2-8
`warmup_ratio`	预热比例	0.1
`weight_decay`	L2正则化系数	0.01
`max_grad_norm`	梯度裁剪阈值	1.0

Best Practices

最佳实践

Strong teacher - Use GPT-4 or Claude as teacher
Quality data - Teacher traces are only as good as training examples
Validate improvement - Compare student to teacher on held-out set
Start with more epochs - Fine-tuning often needs 3-5 epochs
Monitor overfitting - Track validation loss during training

强大的教师模型 - 使用GPT-4或Claude作为教师模型
高质量数据 - 教师轨迹的质量取决于训练示例
验证性能提升 - 在预留数据集上比较学生模型与教师模型的表现
初始设置更多训练轮数 - 微调通常需要3-5轮
监控过拟合 - 训练过程中跟踪验证损失

Limitations

局限性

Requires access to model weights (not API-only models)
Training requires GPU resources
Student may not match teacher quality on all inputs
Fine-tuning takes hours/days depending on data size
Model size reduction may cause capability loss

需要访问模型权重（仅API调用的模型无法使用）
训练需要GPU资源
学生模型在部分输入上的性能可能无法匹配教师模型
微调时间根据数据量大小可能需要数小时/数天
模型尺寸缩小可能导致能力下降

Official Documentation

官方文档

DSPy Documentation: https://dspy.ai/
DSPy GitHub: https://github.com/stanfordnlp/dspy
BootstrapFinetune API: https://dspy.ai/api/optimizers/BootstrapFinetune/
Fine-tuning Guide: https://dspy.ai/tutorials/classification_finetuning/

DSPy文档: https://dspy.ai/
DSPy GitHub: https://github.com/stanfordnlp/dspy
BootstrapFinetune API: https://dspy.ai/api/optimizers/BootstrapFinetune/
微调指南: https://dspy.ai/tutorials/classification_finetuning/