ai-summarizing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Build an AI Summarizer

构建AI摘要工具

Guide the user through building AI that condenses long content into useful summaries. Uses DSPy to produce consistent, faithful summaries with controllable length and detail.
引导用户构建可将长内容浓缩为实用摘要的AI工具。借助DSPy生成一致性强、忠实于原文且长度和细节可控的摘要。

Step 1: Understand the task

步骤1:明确任务

Ask the user:
  1. What are you summarizing? (meeting transcripts, articles, support threads, documents, emails?)
  2. What format should the summary be? (bullet points, narrative paragraph, executive brief, action items?)
  3. How long should summaries be? (one sentence, a paragraph, 3-5 bullets, custom word limit?)
  4. Who reads the summaries? (executives, team members, customers, developers?)
询问用户:
  1. 你要总结的内容类型是什么?(会议记录、文章、支持对话线程、文档、邮件?)
  2. 摘要需要什么格式?(要点列表、叙述段落、执行简报、行动项?)
  3. 摘要的长度要求是多少?(一句话、一段文字、3-5个要点、自定义字数限制?)
  4. 摘要的阅读对象是谁?(高管、团队成员、客户、开发人员?)

Step 2: Build a basic summarizer

步骤2:构建基础摘要工具

Simple text-to-summary

简单文本转摘要

python
import dspy

class Summarize(dspy.Signature):
    """Summarize the text concisely while preserving key information."""
    text: str = dspy.InputField(desc="The text to summarize")
    summary: str = dspy.OutputField(desc="A concise summary of the text")

summarizer = dspy.ChainOfThought(Summarize)
result = summarizer(text="...")
print(result.summary)
python
import dspy

class Summarize(dspy.Signature):
    """Summarize the text concisely while preserving key information."""
    text: str = dspy.InputField(desc="The text to summarize")
    summary: str = dspy.OutputField(desc="A concise summary of the text")

summarizer = dspy.ChainOfThought(Summarize)
result = summarizer(text="...")
print(result.summary)

Audience-aware summary

面向特定受众的摘要

Adapt the signature for specific audiences:
python
class SummarizeForAudience(dspy.Signature):
    """Summarize the text for the target audience."""
    text: str = dspy.InputField(desc="The text to summarize")
    audience: str = dspy.InputField(desc="Who will read this summary")
    summary: str = dspy.OutputField(desc="A summary tailored to the audience")
针对特定受众调整签名:
python
class SummarizeForAudience(dspy.Signature):
    """Summarize the text for the target audience."""
    text: str = dspy.InputField(desc="The text to summarize")
    audience: str = dspy.InputField(desc="Who will read this summary")
    summary: str = dspy.OutputField(desc="A summary tailored to the audience")

Step 3: Structured summaries

步骤3:结构化摘要

Extract multiple aspects from the same content at once:
从同一内容中同时提取多个维度的信息:

Meeting transcript processor

会议记录处理器

python
from pydantic import BaseModel, Field

class MeetingSummary(BaseModel):
    tldr: str = Field(description="One-sentence overview of the meeting")
    decisions: list[str] = Field(description="Decisions that were made")
    action_items: list[str] = Field(description="Tasks assigned with owners if mentioned")
    key_points: list[str] = Field(description="Important facts or updates discussed")

class SummarizeMeeting(dspy.Signature):
    """Extract a structured summary from a meeting transcript."""
    transcript: str = dspy.InputField(desc="Meeting transcript")
    summary: MeetingSummary = dspy.OutputField()

summarizer = dspy.ChainOfThought(SummarizeMeeting)
python
from pydantic import BaseModel, Field

class MeetingSummary(BaseModel):
    tldr: str = Field(description="One-sentence overview of the meeting")
    decisions: list[str] = Field(description="Decisions that were made")
    action_items: list[str] = Field(description="Tasks assigned with owners if mentioned")
    key_points: list[str] = Field(description="Important facts or updates discussed")

class SummarizeMeeting(dspy.Signature):
    """Extract a structured summary from a meeting transcript."""
    transcript: str = dspy.InputField(desc="Meeting transcript")
    summary: MeetingSummary = dspy.OutputField()

summarizer = dspy.ChainOfThought(SummarizeMeeting)

Parallel multi-aspect extraction

并行多维度提取

Extract different aspects independently for better quality:
python
class ExtractDecisions(dspy.Signature):
    """Extract decisions made in this meeting."""
    transcript: str = dspy.InputField()
    decisions: list[str] = dspy.OutputField(desc="Decisions that were made")

class ExtractActionItems(dspy.Signature):
    """Extract action items with assigned owners."""
    transcript: str = dspy.InputField()
    action_items: list[str] = dspy.OutputField(desc="Tasks with owners")

class ExtractKeyFacts(dspy.Signature):
    """Extract key facts and updates discussed."""
    transcript: str = dspy.InputField()
    key_facts: list[str] = dspy.OutputField(desc="Important facts and updates")

class MeetingSummarizer(dspy.Module):
    def __init__(self):
        self.tldr = dspy.ChainOfThought("transcript -> tldr")
        self.decisions = dspy.ChainOfThought(ExtractDecisions)
        self.actions = dspy.ChainOfThought(ExtractActionItems)
        self.facts = dspy.ChainOfThought(ExtractKeyFacts)

    def forward(self, transcript):
        return dspy.Prediction(
            tldr=self.tldr(transcript=transcript).tldr,
            decisions=self.decisions(transcript=transcript).decisions,
            action_items=self.actions(transcript=transcript).action_items,
            key_facts=self.facts(transcript=transcript).key_facts,
        )
独立提取不同维度信息以提升质量:
python
class ExtractDecisions(dspy.Signature):
    """Extract decisions made in this meeting."""
    transcript: str = dspy.InputField()
    decisions: list[str] = dspy.OutputField(desc="Decisions that were made")

class ExtractActionItems(dspy.Signature):
    """Extract action items with assigned owners."""
    transcript: str = dspy.InputField()
    action_items: list[str] = dspy.OutputField(desc="Tasks with owners")

class ExtractKeyFacts(dspy.Signature):
    """Extract key facts and updates discussed."""
    transcript: str = dspy.InputField()
    key_facts: list[str] = dspy.OutputField(desc="Important facts and updates")

class MeetingSummarizer(dspy.Module):
    def __init__(self):
        self.tldr = dspy.ChainOfThought("transcript -> tldr")
        self.decisions = dspy.ChainOfThought(ExtractDecisions)
        self.actions = dspy.ChainOfThought(ExtractActionItems)
        self.facts = dspy.ChainOfThought(ExtractKeyFacts)

    def forward(self, transcript):
        return dspy.Prediction(
            tldr=self.tldr(transcript=transcript).tldr,
            decisions=self.decisions(transcript=transcript).decisions,
            action_items=self.actions(transcript=transcript).action_items,
            key_facts=self.facts(transcript=transcript).key_facts,
        )

Step 4: Control length and detail

步骤4:控制摘要长度与细节

Word limit enforcement

字数限制强制执行

python
class LengthControlledSummarizer(dspy.Module):
    def __init__(self):
        self.summarize = dspy.ChainOfThought(SummarizeWithLimit)

    def forward(self, text, max_words=100):
        result = self.summarize(text=text, max_words=max_words)
        word_count = len(result.summary.split())
        dspy.Assert(
            word_count <= max_words,
            f"Summary is {word_count} words but must be under {max_words}. "
            "Make it more concise."
        )
        return result

class SummarizeWithLimit(dspy.Signature):
    """Summarize the text within the word limit."""
    text: str = dspy.InputField()
    max_words: int = dspy.InputField(desc="Maximum number of words for the summary")
    summary: str = dspy.OutputField(desc="A concise summary within the word limit")
python
class LengthControlledSummarizer(dspy.Module):
    def __init__(self):
        self.summarize = dspy.ChainOfThought(SummarizeWithLimit)

    def forward(self, text, max_words=100):
        result = self.summarize(text=text, max_words=max_words)
        word_count = len(result.summary.split())
        dspy.Assert(
            word_count <= max_words,
            f"Summary is {word_count} words but must be under {max_words}. "
            "Make it more concise."
        )
        return result

class SummarizeWithLimit(dspy.Signature):
    """Summarize the text within the word limit."""
    text: str = dspy.InputField()
    max_words: int = dspy.InputField(desc="Maximum number of words for the summary")
    summary: str = dspy.OutputField(desc="A concise summary within the word limit")

Detail level control

细节程度控制

Use a detail parameter to control how much information to keep:
python
from typing import Literal

class SummarizeWithDetail(dspy.Signature):
    """Summarize the text at the specified detail level."""
    text: str = dspy.InputField()
    detail_level: Literal["brief", "standard", "detailed"] = dspy.InputField(
        desc="brief = 1-2 sentences, standard = short paragraph, detailed = comprehensive"
    )
    summary: str = dspy.OutputField()

class MultiDetailSummarizer(dspy.Module):
    def __init__(self):
        self.summarize = dspy.ChainOfThought(SummarizeWithDetail)

    def forward(self, text, detail_level="standard"):
        result = self.summarize(text=text, detail_level=detail_level)

        # Enforce approximate length expectations
        word_count = len(result.summary.split())
        limits = {"brief": 50, "standard": 150, "detailed": 400}
        max_words = limits[detail_level]

        dspy.Suggest(
            word_count <= max_words,
            f"Summary is {word_count} words for '{detail_level}' level, "
            f"aim for under {max_words}."
        )
        return result
使用细节参数控制保留的信息量:
python
from typing import Literal

class SummarizeWithDetail(dspy.Signature):
    """Summarize the text at the specified detail level."""
    text: str = dspy.InputField()
    detail_level: Literal["brief", "standard", "detailed"] = dspy.InputField(
        desc="brief = 1-2 sentences, standard = short paragraph, detailed = comprehensive"
    )
    summary: str = dspy.OutputField()

class MultiDetailSummarizer(dspy.Module):
    def __init__(self):
        self.summarize = dspy.ChainOfThought(SummarizeWithDetail)

    def forward(self, text, detail_level="standard"):
        result = self.summarize(text=text, detail_level=detail_level)

        # 强制执行大致长度要求
        word_count = len(result.summary.split())
        limits = {"brief": 50, "standard": 150, "detailed": 400}
        max_words = limits[detail_level]

        dspy.Suggest(
            word_count <= max_words,
            f"Summary is {word_count} words for '{detail_level}' level, "
            f"aim for under {max_words}."
        )
        return result

Step 5: Handle long documents

步骤5:处理长篇文档

When the input is too long for a single LM call, use chunked summarization.
当输入内容过长无法单次调用大模型时,使用分块摘要法。

Map-reduce pattern

映射-归约模式

Split → summarize each chunk → combine:
python
class SummarizeChunk(dspy.Signature):
    """Summarize this section of a larger document."""
    chunk: str = dspy.InputField(desc="A section of a larger document")
    chunk_summary: str = dspy.OutputField(desc="Key points from this section")

class CombineSummaries(dspy.Signature):
    """Combine section summaries into one coherent summary."""
    section_summaries: list[str] = dspy.InputField(desc="Summaries of each section")
    original_length: int = dspy.InputField(desc="Word count of the original document")
    summary: str = dspy.OutputField(desc="A unified summary of the full document")

class LongDocSummarizer(dspy.Module):
    def __init__(self, chunk_size=2000):
        self.chunk_size = chunk_size
        self.map_step = dspy.ChainOfThought(SummarizeChunk)
        self.reduce_step = dspy.ChainOfThought(CombineSummaries)

    def forward(self, text):
        chunks = self._split(text)

        # Map: summarize each chunk
        chunk_summaries = []
        for chunk in chunks:
            result = self.map_step(chunk=chunk)
            chunk_summaries.append(result.chunk_summary)

        # Reduce: combine into final summary
        return self.reduce_step(
            section_summaries=chunk_summaries,
            original_length=len(text.split()),
        )

    def _split(self, text):
        words = text.split()
        chunks = []
        for i in range(0, len(words), self.chunk_size):
            chunks.append(" ".join(words[i:i + self.chunk_size]))
        return chunks
拆分→摘要每个分块→合并结果:
python
class SummarizeChunk(dspy.Signature):
    """Summarize this section of a larger document."""
    chunk: str = dspy.InputField(desc="A section of a larger document")
    chunk_summary: str = dspy.OutputField(desc="Key points from this section")

class CombineSummaries(dspy.Signature):
    """Combine section summaries into one coherent summary."""
    section_summaries: list[str] = dspy.InputField(desc="Summaries of each section")
    original_length: int = dspy.InputField(desc="Word count of the original document")
    summary: str = dspy.OutputField(desc="A unified summary of the full document")

class LongDocSummarizer(dspy.Module):
    def __init__(self, chunk_size=2000):
        self.chunk_size = chunk_size
        self.map_step = dspy.ChainOfThought(SummarizeChunk)
        self.reduce_step = dspy.ChainOfThought(CombineSummaries)

    def forward(self, text):
        chunks = self._split(text)

        # 映射:摘要每个分块
        chunk_summaries = []
        for chunk in chunks:
            result = self.map_step(chunk=chunk)
            chunk_summaries.append(result.chunk_summary)

        # 归约:合并为最终摘要
        return self.reduce_step(
            section_summaries=chunk_summaries,
            original_length=len(text.split()),
        )

    def _split(self, text):
        words = text.split()
        chunks = []
        for i in range(0, len(words), self.chunk_size):
            chunks.append(" ".join(words[i:i + self.chunk_size]))
        return chunks

Hierarchical summarization

分层摘要法

For very long documents, summarize chunks, then summarize the summaries:
python
class HierarchicalSummarizer(dspy.Module):
    def __init__(self, chunk_size=2000, max_chunks_per_level=10):
        self.chunk_size = chunk_size
        self.max_chunks = max_chunks_per_level
        self.summarize_chunk = dspy.ChainOfThought(SummarizeChunk)
        self.combine = dspy.ChainOfThought(CombineSummaries)

    def forward(self, text):
        chunks = self._split(text)
        summaries = [self.summarize_chunk(chunk=c).chunk_summary for c in chunks]

        # If still too many summaries, summarize again
        while len(summaries) > self.max_chunks:
            grouped = [summaries[i:i+self.max_chunks]
                       for i in range(0, len(summaries), self.max_chunks)]
            summaries = [
                self.combine(
                    section_summaries=group,
                    original_length=len(text.split()),
                ).summary
                for group in grouped
            ]

        return self.combine(
            section_summaries=summaries,
            original_length=len(text.split()),
        )

    def _split(self, text):
        words = text.split()
        return [" ".join(words[i:i+self.chunk_size])
                for i in range(0, len(words), self.chunk_size)]
针对极长篇文档,先摘要分块,再摘要分块的结果:
python
class HierarchicalSummarizer(dspy.Module):
    def __init__(self, chunk_size=2000, max_chunks_per_level=10):
        self.chunk_size = chunk_size
        self.max_chunks = max_chunks_per_level
        self.summarize_chunk = dspy.ChainOfThought(SummarizeChunk)
        self.combine = dspy.ChainOfThought(CombineSummaries)

    def forward(self, text):
        chunks = self._split(text)
        summaries = [self.summarize_chunk(chunk=c).chunk_summary for c in chunks]

        # 如果摘要数量仍过多,再次进行摘要
        while len(summaries) > self.max_chunks:
            grouped = [summaries[i:i+self.max_chunks]
                       for i in range(0, len(summaries), self.max_chunks)]
            summaries = [
                self.combine(
                    section_summaries=group,
                    original_length=len(text.split()),
                ).summary
                for group in grouped
            ]

        return self.combine(
            section_summaries=summaries,
            original_length=len(text.split()),
        )

    def _split(self, text):
        words = text.split()
        return [" ".join(words[i:i+self.chunk_size])
                for i in range(0, len(words), self.chunk_size)]

Step 6: Multi-format output

步骤6:多格式输出

Generate different summary formats from the same input:
python
class FlexibleSummarizer(dspy.Module):
    def __init__(self):
        self.bullets = dspy.ChainOfThought(BulletSummary)
        self.narrative = dspy.ChainOfThought(NarrativeSummary)
        self.executive = dspy.ChainOfThought(ExecutiveBrief)

    def forward(self, text, format="bullets"):
        if format == "bullets":
            return self.bullets(text=text)
        elif format == "narrative":
            return self.narrative(text=text)
        elif format == "executive":
            return self.executive(text=text)

class BulletSummary(dspy.Signature):
    """Summarize as a bulleted list of key points."""
    text: str = dspy.InputField()
    summary: str = dspy.OutputField(desc="Bulleted list of key points")

class NarrativeSummary(dspy.Signature):
    """Summarize as a flowing narrative paragraph."""
    text: str = dspy.InputField()
    summary: str = dspy.OutputField(desc="A narrative paragraph summary")

class ExecutiveBrief(dspy.Signature):
    """Create a brief executive summary with context, key findings, and recommendation."""
    text: str = dspy.InputField()
    context: str = dspy.OutputField(desc="One sentence of context")
    key_findings: list[str] = dspy.OutputField(desc="3-5 most important findings")
    recommendation: str = dspy.OutputField(desc="Suggested next step")
从同一输入生成不同格式的摘要:
python
class FlexibleSummarizer(dspy.Module):
    def __init__(self):
        self.bullets = dspy.ChainOfThought(BulletSummary)
        self.narrative = dspy.ChainOfThought(NarrativeSummary)
        self.executive = dspy.ChainOfThought(ExecutiveBrief)

    def forward(self, text, format="bullets"):
        if format == "bullets":
            return self.bullets(text=text)
        elif format == "narrative":
            return self.narrative(text=text)
        elif format == "executive":
            return self.executive(text=text)

class BulletSummary(dspy.Signature):
    """Summarize as a bulleted list of key points."""
    text: str = dspy.InputField()
    summary: str = dspy.OutputField(desc="Bulleted list of key points")

class NarrativeSummary(dspy.Signature):
    """Summarize as a flowing narrative paragraph."""
    text: str = dspy.InputField()
    summary: str = dspy.OutputField(desc="A narrative paragraph summary")

class ExecutiveBrief(dspy.Signature):
    """Create a brief executive summary with context, key findings, and recommendation."""
    text: str = dspy.InputField()
    context: str = dspy.OutputField(desc="One sentence of context")
    key_findings: list[str] = dspy.OutputField(desc="3-5 most important findings")
    recommendation: str = dspy.OutputField(desc="Suggested next step")

Step 7: Test and optimize

步骤7:测试与优化

Faithfulness metric

忠实度指标

Does the summary accurately reflect the source? No fabricated claims?
python
class JudgeFaithfulness(dspy.Signature):
    """Judge whether the summary is faithful to the source text."""
    source_text: str = dspy.InputField()
    summary: str = dspy.InputField()
    is_faithful: bool = dspy.OutputField(desc="Does the summary only contain info from the source?")
    hallucinated_claims: list[str] = dspy.OutputField(desc="Claims not in the source, if any")

def faithfulness_metric(example, prediction, trace=None):
    judge = dspy.Predict(JudgeFaithfulness)
    result = judge(source_text=example.text, summary=prediction.summary)
    return result.is_faithful
摘要是否准确反映原文?是否存在虚构内容?
python
class JudgeFaithfulness(dspy.Signature):
    """Judge whether the summary is faithful to the source text."""
    source_text: str = dspy.InputField()
    summary: str = dspy.InputField()
    is_faithful: bool = dspy.OutputField(desc="Does the summary only contain info from the source?")
    hallucinated_claims: list[str] = dspy.OutputField(desc="Claims not in the source, if any")

def faithfulness_metric(example, prediction, trace=None):
    judge = dspy.Predict(JudgeFaithfulness)
    result = judge(source_text=example.text, summary=prediction.summary)
    return result.is_faithful

Key-point coverage metric

要点覆盖度指标

Does the summary capture the important points?
python
class JudgeCoverage(dspy.Signature):
    """Judge whether the summary covers the key points."""
    source_text: str = dspy.InputField()
    summary: str = dspy.InputField()
    reference_summary: str = dspy.InputField(desc="Gold-standard summary for comparison")
    coverage_score: float = dspy.OutputField(desc="0.0-1.0 how well key points are covered")

def coverage_metric(example, prediction, trace=None):
    judge = dspy.Predict(JudgeCoverage)
    result = judge(
        source_text=example.text,
        summary=prediction.summary,
        reference_summary=example.summary,
    )
    return result.coverage_score
摘要是否涵盖了所有重要要点?
python
class JudgeCoverage(dspy.Signature):
    """Judge whether the summary covers the key points."""
    source_text: str = dspy.InputField()
    summary: str = dspy.InputField()
    reference_summary: str = dspy.InputField(desc="Gold-standard summary for comparison")
    coverage_score: float = dspy.OutputField(desc="0.0-1.0 how well key points are covered")

def coverage_metric(example, prediction, trace=None):
    judge = dspy.Predict(JudgeCoverage)
    result = judge(
        source_text=example.text,
        summary=prediction.summary,
        reference_summary=example.summary,
    )
    return result.coverage_score

Combined metric

综合指标

python
def summary_metric(example, prediction, trace=None):
    faithful = faithfulness_metric(example, prediction, trace)
    coverage = coverage_metric(example, prediction, trace)
    concise = len(prediction.summary.split()) < len(example.text.split()) * 0.3
    return (faithful * 0.4) + (coverage * 0.4) + (concise * 0.2)
python
def summary_metric(example, prediction, trace=None):
    faithful = faithfulness_metric(example, prediction, trace)
    coverage = coverage_metric(example, prediction, trace)
    concise = len(prediction.summary.split()) < len(example.text.split()) * 0.3
    return (faithful * 0.4) + (coverage * 0.4) + (concise * 0.2)

Optimize

优化

python
optimizer = dspy.BootstrapFewShot(metric=summary_metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(summarizer, trainset=trainset)
python
optimizer = dspy.BootstrapFewShot(metric=summary_metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(summarizer, trainset=trainset)

Key patterns

核心模式

  • ChainOfThought for summaries — reasoning helps the model decide what's important to keep
  • Pydantic models for structured summaries — extract action items, decisions, key facts in one pass
  • Assert for length limits — enforce word counts; DSPy retries with feedback
  • Map-reduce for long docs — chunk, summarize each piece, combine results
  • Faithfulness metrics — always check that summaries don't fabricate claims
  • Detail levels — give users control over summary depth with a simple parameter
  • ChainOfThought生成摘要 — 推理过程帮助模型判断哪些信息需要保留
  • Pydantic模型实现结构化摘要 — 一次性提取行动项、决策、关键要点
  • Assert强制执行长度限制 — 控制字数;DSPy会根据反馈重试
  • 映射-归约处理长文档 — 分块、摘要每个分块、合并结果
  • 忠实度指标 — 始终检查摘要是否存在虚构内容
  • 细节层级控制 — 通过简单参数让用户控制摘要的详细程度

Additional resources

额外资源

  • For worked examples (meetings, support threads, long docs), see examples.md
  • Need to extract structured fields instead of summaries? Use
    /ai-parsing-data
  • Need to answer questions about docs? Use
    /ai-searching-docs
  • Next:
    /ai-improving-accuracy
    to measure and improve your summarizer
  • 如需实战示例(会议、支持对话线程、长文档),请查看examples.md
  • 如需提取结构化字段而非摘要?请使用
    /ai-parsing-data
  • 如需针对文档答疑?请使用
    /ai-searching-docs
  • 下一步:使用
    /ai-improving-accuracy
    衡量并优化你的摘要工具