accelerate

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MANDATORY PREPARATION

必备准备工作

Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first. Consult the context-management reference in the agent-workflow skill for window optimization and budget strategies.

Make the workflow faster and cheaper without sacrificing quality. Measure before and after.
调用 {{command_prefix}}agent-workflow —— 它包含了工作流原则、反模式,以及上下文收集协议。在继续操作前请遵循该协议,如果还没有任何工作流上下文,你必须先运行 {{command_prefix}}teach-maestro。 如需了解窗口优化和预算策略,请参考agent-workflow技能中的上下文管理相关说明。

在不损失质量的前提下让工作流更快、成本更低,优化前后都要进行指标测量。

Performance Audit

性能审计

Measure current performance:
text
Current metrics:
  Latency (p50): ___ms
  Latency (p95): ___ms
  Cost per request: $___
  Token usage (avg): ___ input / ___ output
  Error rate: ___%
测量当前性能:
text
Current metrics:
  Latency (p50): ___ms
  Latency (p95): ___ms
  Cost per request: $___
  Token usage (avg): ___ input / ___ output
  Error rate: ___%

Acceleration Strategies

提速策略

Reduce Token Usage
  • Shorten system prompts (remove redundant instructions)
  • Compress few-shot examples to minimum viable length
  • Use structured output schemas instead of verbose text
  • Summarize context instead of passing raw documents
  • Reduce output length requirements
Model Cascading
  • Route simple tasks to cheaper/faster models
  • Escalate only complex tasks to capable models
  • Use classification to determine complexity
Caching
  • Cache responses for identical or near-identical inputs
  • Cache tool results with appropriate TTL
  • Cache embeddings for frequently-queried documents
  • Use semantic caching for similar (not identical) queries
Parallelization
  • Run independent tool calls in parallel
  • Run independent agent steps in parallel
  • Use streaming to start processing before full response
Context Optimization
  • Retrieve less, retrieve better (improve retrieval precision)
  • Use context compression techniques
  • Implement sliding window for long conversations
减少Token用量
  • 缩短系统提示词(移除冗余指令)
  • 将少样本示例压缩到最小可用长度
  • 使用结构化输出Schema替代冗长文本
  • 对上下文进行摘要,而非直接传递原始文档
  • 降低输出长度要求
模型级联
  • 将简单任务路由到更便宜/速度更快的模型
  • 仅将复杂任务提交给能力更强的模型处理
  • 通过分类判断任务复杂度
缓存
  • 对完全相同或高度相似的输入的响应进行缓存
  • 设置合适的TTL缓存工具调用结果
  • 对高频查询文档的Embedding进行缓存
  • 对相似(非完全相同)的查询使用语义缓存
并行化
  • 并行执行相互独立的工具调用
  • 并行执行相互独立的Agent步骤
  • 使用流式输出,在完整响应返回前就开始处理
上下文优化
  • 少检索、精检索(提升检索准确率)
  • 使用上下文压缩技术
  • 为长对话实现滑动窗口机制

Acceleration Report

提速报告

For each optimization:
  1. What changed: Specific modification
  2. Before: Latency/cost/tokens before
  3. After: Latency/cost/tokens after
  4. Quality impact: Any quality change (verify with golden tests)
  5. Trade-off: What was sacrificed for the improvement
针对每一项优化:
  1. 改动内容:具体的修改项
  2. 优化前:优化前的延迟/成本/Token用量
  3. 优化后:优化后的延迟/成本/Token用量
  4. 质量影响:任何质量变化(通过基准测试验证)
  5. 权衡取舍:为了提升性能做出了哪些牺牲

Acceleration Checklist

提速检查清单

  • Baseline metrics recorded before any changes
  • Each optimization measured with before/after comparison
  • Quality impact verified (golden tests still pass)
  • Trade-offs documented for each change
  • Cost/latency improvements quantified
  • 改动前已记录基线指标
  • 每项优化都已完成前后对比测量
  • 质量影响已验证(基准测试全部通过)
  • 每项改动的权衡取舍都已记录在案
  • 成本/延迟的提升效果已量化

Recommended Next Step

推荐后续步骤

After optimization, run
{{command_prefix}}evaluate
to verify quality didn't degrade, or
{{command_prefix}}iterate
to set up continuous monitoring.
NEVER:
  • Optimize without measuring first (you need a baseline)
  • Sacrifice quality for speed without explicit user approval
  • Cache outputs that depend on real-time data
  • Skip the quality check after optimization
  • Optimize prematurely (make it correct first, then make it fast)
优化完成后,运行
{{command_prefix}}evaluate
验证质量没有下降,或运行
{{command_prefix}}iterate
设置持续监控。
绝对禁止
  • 未先测量就进行优化(你需要先拿到基线数据)
  • 未经用户明确同意就为了速度牺牲质量
  • 对依赖实时数据的输出进行缓存
  • 优化后跳过质量检查
  • 过早优化(先保证功能正确,再追求速度)