Translation Comparison - accelerate

Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first. Consult the context-management reference in the agent-workflow skill for window optimization and budget strategies.

Make the workflow faster and cheaper without sacrificing quality. Measure before and after.

调用 {{command_prefix}}agent-workflow —— 它包含了工作流原则、反模式，以及上下文收集协议。在继续操作前请遵循该协议，如果还没有任何工作流上下文，你必须先运行 {{command_prefix}}teach-maestro。如需了解窗口优化和预算策略，请参考agent-workflow技能中的上下文管理相关说明。

在不损失质量的前提下让工作流更快、成本更低，优化前后都要进行指标测量。

Performance Audit

性能审计

Measure current performance:

text

Current metrics:
  Latency (p50): ___ms
  Latency (p95): ___ms
  Cost per request: $___
  Token usage (avg): ___ input / ___ output
  Error rate: ___%

测量当前性能：

text

Current metrics:
  Latency (p50): ___ms
  Latency (p95): ___ms
  Cost per request: $___
  Token usage (avg): ___ input / ___ output
  Error rate: ___%

Acceleration Strategies

提速策略

Reduce Token Usage

Shorten system prompts (remove redundant instructions)
Compress few-shot examples to minimum viable length
Use structured output schemas instead of verbose text
Summarize context instead of passing raw documents
Reduce output length requirements

Model Cascading

Route simple tasks to cheaper/faster models
Escalate only complex tasks to capable models
Use classification to determine complexity

Caching

Cache responses for identical or near-identical inputs
Cache tool results with appropriate TTL
Cache embeddings for frequently-queried documents
Use semantic caching for similar (not identical) queries

Parallelization

Run independent tool calls in parallel
Run independent agent steps in parallel
Use streaming to start processing before full response

Context Optimization

Retrieve less, retrieve better (improve retrieval precision)
Use context compression techniques
Implement sliding window for long conversations

减少Token用量

缩短系统提示词（移除冗余指令）
将少样本示例压缩到最小可用长度
使用结构化输出Schema替代冗长文本
对上下文进行摘要，而非直接传递原始文档
降低输出长度要求

模型级联

将简单任务路由到更便宜/速度更快的模型
仅将复杂任务提交给能力更强的模型处理
通过分类判断任务复杂度

缓存

对完全相同或高度相似的输入的响应进行缓存
设置合适的TTL缓存工具调用结果
对高频查询文档的Embedding进行缓存
对相似（非完全相同）的查询使用语义缓存

并行化

并行执行相互独立的工具调用
并行执行相互独立的Agent步骤
使用流式输出，在完整响应返回前就开始处理

上下文优化

少检索、精检索（提升检索准确率）
使用上下文压缩技术
为长对话实现滑动窗口机制

Acceleration Report

提速报告

For each optimization:

What changed: Specific modification
Before: Latency/cost/tokens before
After: Latency/cost/tokens after
Quality impact: Any quality change (verify with golden tests)
Trade-off: What was sacrificed for the improvement

针对每一项优化：

改动内容：具体的修改项
优化前：优化前的延迟/成本/Token用量
优化后：优化后的延迟/成本/Token用量
质量影响：任何质量变化（通过基准测试验证）
权衡取舍：为了提升性能做出了哪些牺牲

Acceleration Checklist

提速检查清单

Baseline metrics recorded before any changes
Each optimization measured with before/after comparison
Quality impact verified (golden tests still pass)
Trade-offs documented for each change
Cost/latency improvements quantified

改动前已记录基线指标
每项优化都已完成前后对比测量
质量影响已验证（基准测试全部通过）
每项改动的权衡取舍都已记录在案
成本/延迟的提升效果已量化

accelerate

Original

Translation

MANDATORY PREPARATION

必备准备工作