accelerate
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMANDATORY PREPARATION
必备准备工作
Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first.
Consult the context-management reference in the agent-workflow skill for window optimization and budget strategies.
Make the workflow faster and cheaper without sacrificing quality. Measure before and after.
调用 {{command_prefix}}agent-workflow —— 它包含了工作流原则、反模式,以及上下文收集协议。在继续操作前请遵循该协议,如果还没有任何工作流上下文,你必须先运行 {{command_prefix}}teach-maestro。
如需了解窗口优化和预算策略,请参考agent-workflow技能中的上下文管理相关说明。
在不损失质量的前提下让工作流更快、成本更低,优化前后都要进行指标测量。
Performance Audit
性能审计
Measure current performance:
text
Current metrics:
Latency (p50): ___ms
Latency (p95): ___ms
Cost per request: $___
Token usage (avg): ___ input / ___ output
Error rate: ___%测量当前性能:
text
Current metrics:
Latency (p50): ___ms
Latency (p95): ___ms
Cost per request: $___
Token usage (avg): ___ input / ___ output
Error rate: ___%Acceleration Strategies
提速策略
Reduce Token Usage
- Shorten system prompts (remove redundant instructions)
- Compress few-shot examples to minimum viable length
- Use structured output schemas instead of verbose text
- Summarize context instead of passing raw documents
- Reduce output length requirements
Model Cascading
- Route simple tasks to cheaper/faster models
- Escalate only complex tasks to capable models
- Use classification to determine complexity
Caching
- Cache responses for identical or near-identical inputs
- Cache tool results with appropriate TTL
- Cache embeddings for frequently-queried documents
- Use semantic caching for similar (not identical) queries
Parallelization
- Run independent tool calls in parallel
- Run independent agent steps in parallel
- Use streaming to start processing before full response
Context Optimization
- Retrieve less, retrieve better (improve retrieval precision)
- Use context compression techniques
- Implement sliding window for long conversations
减少Token用量
- 缩短系统提示词(移除冗余指令)
- 将少样本示例压缩到最小可用长度
- 使用结构化输出Schema替代冗长文本
- 对上下文进行摘要,而非直接传递原始文档
- 降低输出长度要求
模型级联
- 将简单任务路由到更便宜/速度更快的模型
- 仅将复杂任务提交给能力更强的模型处理
- 通过分类判断任务复杂度
缓存
- 对完全相同或高度相似的输入的响应进行缓存
- 设置合适的TTL缓存工具调用结果
- 对高频查询文档的Embedding进行缓存
- 对相似(非完全相同)的查询使用语义缓存
并行化
- 并行执行相互独立的工具调用
- 并行执行相互独立的Agent步骤
- 使用流式输出,在完整响应返回前就开始处理
上下文优化
- 少检索、精检索(提升检索准确率)
- 使用上下文压缩技术
- 为长对话实现滑动窗口机制
Acceleration Report
提速报告
For each optimization:
- What changed: Specific modification
- Before: Latency/cost/tokens before
- After: Latency/cost/tokens after
- Quality impact: Any quality change (verify with golden tests)
- Trade-off: What was sacrificed for the improvement
针对每一项优化:
- 改动内容:具体的修改项
- 优化前:优化前的延迟/成本/Token用量
- 优化后:优化后的延迟/成本/Token用量
- 质量影响:任何质量变化(通过基准测试验证)
- 权衡取舍:为了提升性能做出了哪些牺牲
Acceleration Checklist
提速检查清单
- Baseline metrics recorded before any changes
- Each optimization measured with before/after comparison
- Quality impact verified (golden tests still pass)
- Trade-offs documented for each change
- Cost/latency improvements quantified
- 改动前已记录基线指标
- 每项优化都已完成前后对比测量
- 质量影响已验证(基准测试全部通过)
- 每项改动的权衡取舍都已记录在案
- 成本/延迟的提升效果已量化
Recommended Next Step
推荐后续步骤
After optimization, run to verify quality didn't degrade, or to set up continuous monitoring.
{{command_prefix}}evaluate{{command_prefix}}iterateNEVER:
- Optimize without measuring first (you need a baseline)
- Sacrifice quality for speed without explicit user approval
- Cache outputs that depend on real-time data
- Skip the quality check after optimization
- Optimize prematurely (make it correct first, then make it fast)
优化完成后,运行 验证质量没有下降,或运行 设置持续监控。
{{command_prefix}}evaluate{{command_prefix}}iterate绝对禁止:
- 未先测量就进行优化(你需要先拿到基线数据)
- 未经用户明确同意就为了速度牺牲质量
- 对依赖实时数据的输出进行缓存
- 优化后跳过质量检查
- 过早优化(先保证功能正确,再追求速度)