addon-deterministic-eval-suite
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAdd-on: Deterministic Eval Suite
插件:确定性评估套件
Use this skill when a project needs reproducible, merge-blocking evaluation checks.
当项目需要可复现的、会阻断合并的评估检查时使用此技能。
Compatibility
兼容性
- Works with all scaffolds.
architect-* - Recommended default for mode.
production-default
- 兼容所有脚手架。
architect-* - 是模式的推荐默认选项。
production-default
Inputs
输入参数
Collect:
- :
EVAL_SCOPE|skill-only|project-only(defaultboth).both - :
BLOCK_ON_FAIL|yes(defaultno).yes - :
RUN_DOCKER_CHECKS|yes(defaultnofor production-default).yes
收集:
- :
EVAL_SCOPE|skill-only|project-only(默认值为both)。both - :
BLOCK_ON_FAIL|yes(默认值为no)。yes - :
RUN_DOCKER_CHECKS|yes(在production-default模式下默认值为no)。yes
Integration Workflow
集成工作流
- Add deterministic eval artifacts:
text
evals/deterministic/manifest.yaml
evals/deterministic/run.sh
evals/deterministic/checks/
.github/workflows/evals-deterministic.yml- Baseline checks (always include):
- file/contract existence checks
- lint/type/test/build command checks
- docker artifact checks (,
Dockerfile, image build)docker-compose.yml - decision trace checks (,
docs/DECISION_LOG.md)REVIEW_BUNDLE/DECISION_TRACE.md - non-zero exit on failure
- for skills repositories: add repository-local checks that validate skill folder/frontmatter naming
- for skills repositories: add repository-local checks that validate required decision-policy language
- Skill-specific checks:
- one check file per selected skill
- examples:
check_nostr_profile.shcheck_rag_ingest_query.shcheck_review_bundle.shcheck_decision_trace.shcheck_skill_repo_policy.sh
- Output summary:
- write deterministic run summary to .
REVIEW_BUNDLE/TEST_EVIDENCE.md
- 添加确定性评估产物:
text
evals/deterministic/manifest.yaml
evals/deterministic/run.sh
evals/deterministic/checks/
.github/workflows/evals-deterministic.yml- 基线检查(始终包含):
- 文件/契约存在性检查
- lint/类型/测试/构建命令检查
- docker产物检查(、
Dockerfile、镜像构建)docker-compose.yml - 决策轨迹检查(、
docs/DECISION_LOG.md)REVIEW_BUNDLE/DECISION_TRACE.md - 检查失败时返回非零退出码
- 针对技能仓库:添加仓库本地检查,验证技能文件夹/前置元数据的命名
- 针对技能仓库:添加仓库本地检查,验证所需的决策策略描述符合要求
- 特定技能检查:
- 每个选中的技能对应一个检查文件
- 示例:
check_nostr_profile.shcheck_rag_ingest_query.shcheck_review_bundle.shcheck_decision_trace.shcheck_skill_repo_policy.sh
- 输出摘要:
- 将确定性评估运行摘要写入。
REVIEW_BUNDLE/TEST_EVIDENCE.md
Required Template
必需模板
evals/deterministic/manifest.yaml
evals/deterministic/manifest.yamlevals/deterministic/manifest.yaml
evals/deterministic/manifest.yamlyaml
version: 1
checks:
- id: contracts
command: "bash evals/deterministic/checks/check_contracts.sh"
- id: tests
command: "bash evals/deterministic/checks/check_tests.sh"
- id: build
command: "bash evals/deterministic/checks/check_build.sh"
- id: decision_trace
command: "bash evals/deterministic/checks/check_decision_trace.sh"yaml
version: 1
checks:
- id: contracts
command: "bash evals/deterministic/checks/check_contracts.sh"
- id: tests
command: "bash evals/deterministic/checks/check_tests.sh"
- id: build
command: "bash evals/deterministic/checks/check_build.sh"
- id: decision_trace
command: "bash evals/deterministic/checks/check_decision_trace.sh"Guardrails
防护规则
-
Documentation contract for generated code:
- Python: write module docstrings and docstrings for public classes, methods, and functions.
- Next.js/TypeScript: write JSDoc for exported components, hooks, utilities, and route handlers.
- Add concise rationale comments only for non-obvious logic, invariants, or safety constraints.
- Apply this contract even when using template snippets below; expand templates as needed.
-
Deterministic evals are source-of-truth merge gates.
-
Avoid network-dependent assertions unless explicitly required.
-
Keep commands idempotent and non-destructive.
-
Fail closed: missing required checks must fail the run.
-
Treat missing decision rationale artifacts as deterministic failure.
-
生成代码的文档契约:
- Python:为模块、公共类、方法和函数编写docstring。
- Next.js/TypeScript:为导出的组件、hooks、工具函数和路由处理函数编写JSDoc。
- 仅为非显而易见的逻辑、不变量或安全约束添加简洁的原理说明注释。
- 即使使用下方的模板片段也要遵守此契约,可根据需要扩展模板。
-
确定性评估是权威的合并关卡。
-
除非明确要求,否则避免依赖网络的断言。
-
保持命令幂等且无破坏性。
-
安全失败原则:缺失必需的检查时必须判定运行失败。
-
将缺失决策依据产物的情况判定为确定性失败。
Validation Checklist
验证检查清单
- Confirm generated code includes required docstrings/JSDoc and rationale comments for non-obvious logic.
bash
test -f evals/deterministic/manifest.yaml
test -f evals/deterministic/run.sh
test -f .github/workflows/evals-deterministic.yml
bash evals/deterministic/run.sh- 确认生成的代码包含要求的docstrings/JSDoc,以及为非显而易见的逻辑添加的原理说明注释。
bash
test -f evals/deterministic/manifest.yaml
test -f evals/deterministic/run.sh
test -f .github/workflows/evals-deterministic.yml
bash evals/deterministic/run.shDecision Justification Rule
决策依据规则
- Every non-trivial decision must include a concrete justification.
- Capture the alternatives considered and why they were rejected.
- State tradeoffs and residual risks for the chosen option.
- If justification is missing, treat the task as incomplete and surface it as a blocker.
- 每个重要决策都必须包含具体的依据。
- 记录考虑过的替代方案以及被拒绝的原因。
- 说明所选方案的权衡和残余风险。
- 如果缺失依据,判定任务未完成,并将其标记为阻塞项。