optimizespec-new

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OptimizeSpec New

OptimizeSpec 新建

Create the first artifact for an OptimizeSpec self-improvement change. The default workflow directory is:
text
optimizespec/changes/<change-name>/
创建OptimizeSpec自我改进变更的首个工件。默认工作流目录为:
text
optimizespec/changes/<change-name>/

Workflow

工作流

  1. Derive or confirm a kebab-case change name.
  2. Read
    ../optimizespec-common/references/core/reference-contracts.md
    , then load only the proposal-phase core references it names: criteria-first, candidate surface, grader, evidence, and live eval runner. Load runtime-specific references only when repo evidence identifies the target runtime.
  3. Create
    optimizespec/changes/<change-name>/proposal.md
    .
  4. Use
    assets/templates/proposal.md
    as the structure.
  5. Inspect the repository enough to identify the target agent's likely runtime, code location, dependency boundary, import/package setup, existing eval/test/tooling folders, agent package-adjacent module options, tool wiring, environment needs, and command conventions.
  6. Keep all OptimizeSpec artifacts under the repo-root
    optimizespec/changes/<change-name>/
    tree. In the proposal, record where the executable optimization-system code should be created or which existing folder should be reused, and how code in that location will import or invoke the real agent modules.
  7. Capture known details without inventing missing information.
  8. Start from plain-language user intent and examples, then draft the eval design for review.
  9. If the user has not provided enough information after repo inspection, ask at most 3-5 focused questions before drafting. Prefer questions like:
    • What agent should improve?
    • Where does that agent live in this repo?
    • Should the optimization code reuse an existing eval/test folder or create a new one?
    • What behavior should get better?
    • What are 2-3 representative tasks?
    • What would make an answer clearly bad?
    • Which concerns matter most: correctness, formatting, safety, cost, speed, or tool use?
  10. Draft the inferred runtime, runtime evidence and confidence, success criteria, scoring plan, grader strategy, evidence model, optimizer acceptance rules, and optimization-system location decision from the user's input and repo inspection. For Claude Managed Agents, define live rollouts as the eval primitive: candidate, eval case, real Session execution, final report/output, trace evidence, grader, ASI, and live-score optimization.
  11. Ask the user to confirm or correct the inferred eval contract and optimization-system location in the proposal so they can review primary metrics, diagnostics, guardrails, task distribution, grading, evidence persistence, promotion rules, and file layout from a concrete draft.
  12. If the agent, inferred runtime, criteria, scorer, examples, grader trust, evidence model, optimizer acceptance, optimization-system path, or import/runtime access plan are incomplete, record explicit unknowns and candidate discovery questions. Ask about runtime only when repo evidence remains ambiguous and the answer affects the artifacts.
  13. Keep
    proposal.md
    concise. Prefer short bullets and no more than 2-3 eval examples. Defer deeper runner mechanics, calibration details, ledger file layout, and implementation design to
    design.md
    unless they are required to confirm the eval contract or optimization-system location.
  14. Stop after creating
    proposal.md
    .
  1. 推导或确认一个kebab-case格式的变更名称。
  2. 阅读
    ../optimizespec-common/references/core/reference-contracts.md
    ,仅加载其中列出的提案阶段核心参考内容:criteria-first、候选表面(candidate surface)、 grader、证据(evidence)和实时评估运行器(live eval runner)。仅当仓库证据明确目标运行时后,再加载特定于运行时的参考内容。
  3. 创建
    optimizespec/changes/<change-name>/proposal.md
    文件。
  4. assets/templates/proposal.md
    为结构模板。
  5. 充分检查仓库,确定目标Agent的可能运行时、代码位置、依赖边界、导入/包设置、现有评估/测试/工具文件夹、Agent包相邻模块选项、工具连接方式、环境需求和命令约定。
  6. 将所有OptimizeSpec工件保存在仓库根目录下的
    optimizespec/changes/<change-name>/
    目录树中。在提案中记录可执行优化系统代码的创建位置或应复用的现有文件夹,以及该位置的代码如何导入或调用真实Agent模块。
  7. 仅记录已知细节,不编造缺失信息。
  8. 从用户的自然语言意图和示例出发,起草评估设计以供审核。
  9. 如果检查仓库后用户仍未提供足够信息,在起草前最多提出3-5个针对性问题。优先选择以下类型的问题:
    • 需要优化哪个Agent?
    • 该Agent在仓库中的位置在哪里?
    • 优化代码应复用现有评估/测试文件夹还是创建新文件夹?
    • 需要改进哪些行为?
    • 请提供2-3个代表性任务?
    • 哪些情况会被判定为明显错误的答案?
    • 最关注哪些方面:正确性、格式、安全性、成本、速度还是工具使用?
  10. 根据用户输入和仓库检查结果,起草推断出的运行时、运行时证据及置信度、成功标准、评分方案、grader策略、证据模型、优化器验收规则和优化系统位置决策。对于Claude Managed Agents,将实时部署定义为评估原语:候选对象、评估案例、真实Session执行、最终报告/输出、跟踪证据、grader、ASI和实时评分优化。
  11. 请求用户确认或修正提案中推断出的评估合约和优化系统位置,以便他们从具体草案中审核核心指标、诊断信息、防护规则、任务分配、评分、证据持久化、升级规则和文件布局。
  12. 如果Agent、推断的运行时、标准、评分器、示例、grader可信度、证据模型、优化器验收条件、优化系统路径或导入/运行时访问计划存在缺失,记录明确的未知项和候选发现问题。仅当仓库证据仍不明确且答案会影响工件时,才询问关于运行时的问题。
  13. 保持
    proposal.md
    简洁。优先使用简短项目符号,且评估示例不超过2-3个。除非确认评估合约或优化系统位置需要,否则将更深入的运行器机制、校准细节、分类账文件布局和实现设计推迟到
    design.md
    中。
  14. 创建完
    proposal.md
    后停止操作。

Required Proposal Content

提案必填内容

  • Agent and inferred runtime context, including evidence, confidence, and unknowns.
  • Optimization-system location decision: create or reuse, executable code path outside the OptimizeSpec artifact tree by default, rationale, import/runtime access plan, existing agent code to reuse, existing tools/skills/MCP/env/permissions to reuse, and run-output path.
  • Behavior to improve.
  • Candidate fields GEPA may mutate, if known.
  • Success criteria: user outcome, primary criterion, secondary criteria, guardrails, thresholds, non-goals, and blind spots.
  • Draft eval contract for user confirmation or correction.
  • Input examples and expected outputs or output shapes.
  • Numeric scoring intent, preferably
    0.0
    to
    1.0
    .
  • Qualitative rubric.
  • Grading strategy: deterministic, code-based, LLM-based, human, or hybrid, plus why the grader can be trusted.
  • Optimizer acceptance: optimized live metric, diagnostic metrics, guardrails, selection rule, regression tolerance, and required evidence. Promotion or release decisions can be recorded separately, but they are not the Managed Agents core loop.
  • Evidence model: run manifest, candidate versions, rollout records, scoring records, judge records, ASI records, optimizer lineage, best-candidate evidence, and any optional promotion evidence at a high level.
  • Contract references that should guide design and apply work.
  • ASI fields needed for reflection.
  • Unknowns to resolve in design.
For workflow motivation, read
../optimizespec-common/references/core/workflow.md
. For criteria-first eval design, read
../optimizespec-common/references/core/criteria-first-evals.md
. For evidence expectations, read
../optimizespec-common/references/core/eval-system-evidence.md
. For grader expectations, read
../optimizespec-common/references/core/grader-contract.md
. For candidate boundaries, read
../optimizespec-common/references/core/candidate-surface.md
. For ASI-first framing, read
../optimizespec-common/references/core/gepa-reflection.md
. Name
../optimizespec-common/references/core/live-eval-runner-contract.md
as the contract source of truth for live optimization. When the proposal identifies Claude Managed Agents as the likely runtime, also name
../optimizespec-common/references/runtimes/claude-managed-agent/python-managed-agent-package/
as the concrete live Python runner implementation reference for later design and apply work. For other runtimes, record the missing runtime-specific reference coverage and the production adapter assumptions. The primary optimizer objective should be live rollout scoring.
  • Agent和推断的运行时上下文,包括证据、置信度和未知项。
  • 优化系统位置决策:创建新文件夹或复用现有文件夹,默认在OptimizeSpec工件树外的可执行代码路径,决策理由,导入/运行时访问计划,可复用的现有Agent代码,可复用的现有工具/技能/MCP/环境/权限,以及运行输出路径。
  • 需要改进的行为。
  • 若已知,GEPA可能修改的候选字段。
  • 成功标准:用户成果、主要标准、次要标准、防护规则、阈值、非目标和盲区。
  • 供用户确认或修正的评估合约草案。
  • 输入示例和预期输出或输出格式。
  • 数值评分意图,优选
    0.0
    1.0
    范围。
  • 定性评分准则。
  • 评分策略:确定性、基于代码、基于LLM、人工或混合策略,以及该grader可信的理由。
  • 优化器验收条件:优化后的实时指标、诊断指标、防护规则、选择规则、回归容忍度和所需证据。升级或发布决策可单独记录,但不属于Managed Agents核心循环。
  • 证据模型:运行清单、候选版本、部署记录、评分记录、评审记录、ASI记录、优化器谱系、最佳候选证据,以及任何可选的升级证据(高层面概述)。
  • 指导设计和实施工作的合约参考内容。
  • 反思所需的ASI字段。
  • 设计阶段需要解决的未知项。
关于工作流动机,请阅读
../optimizespec-common/references/core/workflow.md
。 关于criteria-first评估设计,请阅读
../optimizespec-common/references/core/criteria-first-evals.md
。 关于证据要求,请阅读
../optimizespec-common/references/core/eval-system-evidence.md
。 关于grader要求,请阅读
../optimizespec-common/references/core/grader-contract.md
。 关于候选边界,请阅读
../optimizespec-common/references/core/candidate-surface.md
。 关于ASI-first框架,请阅读
../optimizespec-common/references/core/gepa-reflection.md
。 将
../optimizespec-common/references/core/live-eval-runner-contract.md
作为实时优化的合约事实来源。当提案确定Claude Managed Agents为可能的运行时时,还需将
../optimizespec-common/references/runtimes/claude-managed-agent/python-managed-agent-package/
作为后续设计和实施工作的具体实时Python运行器实现参考。对于其他运行时,记录缺失的特定于运行时的参考覆盖范围和生产适配器假设。优化器的主要目标应为实时部署评分。