benchmark-optimization-loop

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Benchmark Optimization Loop

基准测试优化循环

Use this skill to convert "make it 20x faster" or "try 50 recursive optimizations" into a bounded measured loop that can actually improve a system.

使用本技能可将“提升20倍速度”或“尝试50次递归优化”这类需求转化为一个可量化的有限循环，从而切实提升系统性能。

Required Baseline

必备基准条件

Do not optimize until these exist:

the operation being optimized;
the correctness gate that must stay green;
the metric: wall time, p95 latency, rows/sec, cost/run, memory, error rate;
the current baseline;
the search budget: max variants, max time, max spend, max data impact.

If the user asks for an unrealistic target, keep the ambition but make the loop bounded and measurable.

在进行优化前，必须先具备以下条件：

待优化的操作；
必须保持通过的正确性校验机制；
量化指标：实际耗时、p95延迟、每秒处理行数、单次运行成本、内存占用、错误率；
当前基准指标；
搜索预算：最大变体数量、最长耗时、最高成本、最大数据影响。

如果用户提出不切实际的目标，可保留目标愿景，但需将优化循环设置为可量化的有限循环。

Loop

优化循环步骤

Measure the baseline.
Identify bottlenecks from evidence.
Generate variants that test one hypothesis each.
Run variants with the same input shape.
Reject variants that fail correctness, safety, or reproducibility.
Promote the fastest safe variant.
Codify the winning path in a script, command, test, config, or doc.
Rerun the baseline and winner to confirm the delta.

测量基准指标。
根据数据定位性能瓶颈。
生成变体，每个变体仅验证一个假设。
使用相同的输入格式运行所有变体。
剔除未通过正确性、安全性或可重复性校验的变体。
选用速度最快且安全的变体。
将最优方案固化到脚本、命令、测试用例、配置或文档中。
重新运行基准方案和最优方案，确认性能提升幅度。

Variant Table

变体跟踪表

Track variants like this:

text

Variant | Hypothesis | Command | Time | Correct? | Notes
baseline | current path | npm run job | 120s | yes | stable
batch-500 | fewer round trips | npm run job -- --batch 500 | 42s | yes | winner
parallel-8 | more workers | npm run job -- --workers 8 | 31s | no | rate limited

按如下方式跟踪变体：

text

Variant | Hypothesis | Command | Time | Correct? | Notes
baseline | current path | npm run job | 120s | yes | stable
batch-500 | fewer round trips | npm run job -- --batch 500 | 42s | yes | winner
parallel-8 | more workers | npm run job -- --workers 8 | 31s | no | rate limited

Recursive Search

递归搜索

For recursive or hyperparameter work:

persist every run to a ledger;
compare against the prior accepted winner, not only the previous run;
keep a holdout or replay check;
stop when improvement is within noise, correctness fails, cost exceeds the budget, or the search starts changing more variables than it can explain.

Use phrases like "best measured safe variant" instead of "global optimum" unless the search space was actually exhaustive.

针对递归或超参数优化工作：

将每次运行结果记录到台账中；
与之前确认的最优方案对比，而非仅与上一次运行结果对比；
保留验证集或进行重放校验；
当性能提升处于误差范围内、正确性校验失败、成本超出预算，或搜索过程中变更的变量过多无法解释时，停止优化。

除非搜索空间已被完全遍历，否则使用“经测试的最优安全变体”这类表述，而非“全局最优解”。

Promotion Gate

方案推广准入条件

A variant cannot become the new default until:

correctness tests pass;
the performance delta is repeated or explained;
rollback is obvious;
the change is encoded in source control or a durable runbook;
the final summary includes exact commands and measurements.

变体需满足以下条件才能成为新的默认方案：

通过正确性测试；
性能提升幅度可复现或有合理解释；
回退方案清晰明确；
变更已纳入版本控制或固化到持久化运行手册中；
最终总结包含精确的命令和测量数据。