Loading...
Loading...
Compare original and translation side by side
| File | Contents | Load When |
|---|---|---|
| Pattern catalog for converting reasoning to rules | Always |
| 文件路径 | 内容 | 加载时机 |
|---|---|---|
| 将推理转换为规则的模式目录 | 始终加载 |
package-evaluatorsurrogate-verifierpackage-evaluatorsurrogate-verifier| Complexity Signal | Score | Distillation Action |
|---|---|---|
| Decision tree with 3+ branches | HIGH | Convert to explicit if/then lookup table |
| "Use judgment" or "consider context" | HIGH | Replace with concrete heuristic rules |
| Multi-step inference chain | HIGH | Break into numbered atomic steps |
| Reference to domain expertise | MED | Add explicit reference file with knowledge |
| Clear enumerated steps | LOW | Keep as-is |
| Concrete examples with expected output | LOW | Keep as-is |
| 复杂度信号 | 评分 | 蒸馏操作 |
|---|---|---|
| 包含3个及以上分支的决策树 | 高 | 转换为明确的if/then查找表 |
| "运用判断"或"考虑上下文" | 高 | 替换为具体的启发式规则 |
| 多步骤推理链 | 高 | 拆分为编号的原子步骤 |
| 引用领域专业知识 | 中 | 添加包含相关知识的明确参考文件 |
| 清晰的枚举步骤 | 低 | 保持原样 |
| 带有预期输出的具体示例 | 低 | 保持原样 |
evals/cases.yamlevals/cases.yaml| Source Pattern | Distilled Replacement |
|---|---|
| "Analyze the code and determine..." | "Check for these 5 specific patterns: [list]" |
| "Use appropriate formatting" | "Output as a markdown table with columns: [A, B, C]" |
| "Consider the context to decide..." | "If [condition A]: do X. If [condition B]: do Y. Default: Z" |
| "Apply best practices for..." | Reference file with explicit best practices enumerated |
| Multi-paragraph reasoning instruction | Numbered step list with single-sentence steps |
| 源模式 | 蒸馏后替代内容 |
|---|---|
| "分析代码并确定..." | "检查以下5个特定模式:[列表]" |
| "使用合适的格式" | "输出为包含以下列的Markdown表格:[A, B, C]" |
| "根据上下文决定..." | "如果[条件A]:执行X。如果[条件B]:执行Y。默认:Z" |
| "应用...的最佳实践" | 引用包含明确枚举最佳实践的参考文件 |
| 多段落推理指令 | 拆分为包含单句步骤的编号步骤列表 |
surrogate-verifier| Metric | Source (Opus + original) | Target (Haiku + distilled) | Delta |
|---|---|---|---|
| Assertions passed | N/M | N/M | ± |
| Weighted score | X.XX | X.XX | ± |
| Output completeness | % | % | ± |
| Format compliance | % | % | ± |
surrogate-verifier| 指标 | 源模型(Opus + 原技能) | 目标模型(Haiku + 蒸馏后技能) | 差值 |
|---|---|---|---|
| 通过的断言数 | N/M | N/M | ± |
| 加权得分 | X.XX | X.XX | ± |
| 输出完整性 | % | % | ± |
| 格式合规性 | % | % | ± |
undefinedundefined| Model | Assertions Passed | Weighted Score | Format Compliance |
|---|---|---|---|
| Opus | 7/7 | 1.00 | 100% |
| Sonnet | 6/7 | 0.92 | 100% |
| Haiku | 5/7 | 0.85 | 85% |
| 模型 | 通过的断言数 | 加权得分 | 格式合规性 |
|---|---|---|---|
| Opus | 7/7 | 1.00 | 100% |
| Sonnet | 6/7 | 0.92 | 100% |
| Haiku | 5/7 | 0.85 | 85% |
undefinedundefined| Error | Resolution |
|---|---|
| Source skill scores below 70% | Refuse distillation; recommend evolution via test-engineer |
| No execution traces available | Generate synthetic tasks and collect traces before proceeding |
| Target model fails all assertions | Skill may be too complex for target model; report with detail |
| Distilled skill longer than source | Review distillation; patterns may need consolidation |
| 错误类型 | 解决方法 |
|---|---|
| 源技能得分低于70% | 拒绝蒸馏;建议通过test-engineer进行技能演进 |
| 无可用执行追踪 | 生成合成任务并收集追踪结果后再继续 |
| 目标模型未通过所有断言 | 技能可能对目标模型来说过于复杂;详细报告情况 |
| 蒸馏后技能比原技能更长 | 复查蒸馏过程;可能需要合并模式 |