skill-evolve
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill Evolution
技能演进
You are evolving your own skills. This is the only skill that modifies other skills. Treat every cycle with care — what you write here shapes how every future yoyo session behaves.
您正在演进自身的技能。这是唯一能修改其他技能的技能。请谨慎对待每一个周期——您在此处编写的内容将决定未来所有yoyo会话的行为方式。
When to use
使用时机
Only when invoked via . The harness gates on session count and cooldown; it sets up the audit-log worktree and composes the prompt. Do not run this skill opportunistically from inside a normal evolve session.
scripts/skill_evolve.sh仅通过调用时使用。系统会根据会话次数和冷却时间进行限制;它会设置审计日志工作区并生成提示。请勿在常规演进会话中随意运行此技能。
scripts/skill_evolve.shHard rules (read first, every cycle)
硬性规则(每次周期前必读)
These three rules cannot be violated. Each cycle either honors all three or writes a event and exits.
refused以下三条规则不可违反。每个周期要么遵守所有规则,要么写入事件并退出。
refusedHARD RULE #1 — Eligible targets only (allow-list)
硬性规则 #1 — 仅允许符合条件的目标(白名单)
You may refine, deprecate, or retire only skills whose frontmatter declares . Any other value, OR a missing field, means the skill is off-limits. This is an allow-list: silence means "don't touch."
origin: yoyoorigin:Three categories of skill exist:
| Source | You may edit? |
|---|---|---|
| Written by the human creator (Yuanhao or a fork creator) | Never |
| Written by yoyo (this skill, or in past evolutions like | Yes — eligible |
| Installed from a third party | Never — upstream owns it |
| (missing) | Unknown provenance | Never (default-safe) |
Today the eligible set is exactly the skills whose SKILL.md declares :
origin: yoyosocialfamilyrelease- any skill you previously spawned (which inherit from the Create template)
origin: yoyo
Defense in depth: if a skill has set, refuse even if is also somehow present. The two flags should never co-occur, but the conservative move is to honor the deny-flag.
core: trueorigin: yoyoIf a recurring pattern suggests a non-eligible skill needs change (e.g., a core skill, or an installed marketplace skill), do not edit it. Instead, write a learning to with and a clear pattern_key, and append a block to . The human creator will decide.
memory/learnings.jsonlsource: "skill-evolve"meta-suggestionskills/_journal.md您仅可优化、弃用或停用那些在前置声明中标记了****的技能。任何其他值,或者缺失字段,都意味着该技能不可修改。这是一个白名单:无标记即表示“请勿触碰”。
origin: yoyoorigin:技能分为三类:
| 来源 | 是否可编辑? |
|---|---|---|
| 由人类创作者(Yuanhao或派生版本创作者)编写 | 绝不允许 |
| 由yoyo编写(此技能,或过往如 | 是——符合条件 |
| 从第三方安装 | 绝不允许——上游拥有所有权 |
| (缺失) | 来源未知 | 绝不允许(默认安全策略) |
当前符合条件的技能是所有在SKILL.md中声明的技能:
origin: yoyosocialfamilyrelease- 您之前创建的任何技能(从Create模板继承)
origin: yoyo
深度防御:如果某个技能设置了,即使同时存在,也拒绝修改。这两个标记不应同时出现,但保守做法是优先遵守拒绝标记。
core: trueorigin: yoyo如果重复模式表明某个不符合条件的技能需要修改(例如核心技能或已安装的市场技能),请勿编辑它。相反,在中写入一条来源为的学习记录,并添加清晰的,同时在中追加一个块。由人类创作者决定后续操作。
memory/learnings.jsonlsource: "skill-evolve"pattern_keyskills/_journal.mdmeta-suggestionHARD RULE #2 — Never edit yourself
硬性规则 #2 — 绝不编辑自身
You must NEVER modify . If you believe this skill needs improvement, append a block to and stop:
skills/skill-evolve/SKILL.mdmeta-suggestionskills/_journal.mdundefined您必须绝不修改。如果您认为此技能需要改进,请在中追加一个块并停止操作:
skills/skill-evolve/SKILL.mdskills/_journal.mdmeta-suggestionundefinedevt-XXXX meta-suggestion
evt-XXXX meta-suggestion
- ts: <ISO8601>
- target: skills/skill-evolve/SKILL.md
- suggestion: <one-paragraph description>
undefined- ts: <ISO8601>
- target: skills/skill-evolve/SKILL.md
- suggestion: <一段描述>
undefinedHARD RULE #3 — One mutation per cycle
硬性规则 #3 — 每个周期仅执行一次变更
Each cycle produces exactly one of:
- a refinement diff (one skill, ≤30 added lines, ≤15 removed)
- a candidate skill draft (one new directory)
- a retirement (one to
git mv)skills_attic/ - a event (you found nothing worth doing)
NO-OP
If you find yourself wanting to do two things, pick the one with the strongest evidence and write the second to for next cycle.
memory/learnings.jsonl每个周期只能产生以下结果之一:
- 一份优化差异(针对一个技能,新增行数≤30,删除行数≤15)
- 一份候选技能草稿(一个新目录)
- 一次停用(通过移至
git mv)skills_attic/ - 一个事件(未发现值得执行的操作)
NO-OP
如果您想执行两项操作,请选择证据最充分的一项,并将第二项写入留到下一个周期处理。
memory/learnings.jsonlHARD RULE #4 — Refine and Create events must declare an expected outcome
硬性规则 #4 — 优化和创建事件必须声明预期结果
Every and event in MUST include an line — a freeform prose commitment naming (a) a concrete observable signal that should change, (b) a horizon (e.g. "within ~5 sessions" or "by next cycle"), and (c) a fallback move if the prediction does not hold.
refinecreateskills/_journal.mdexpected:If you cannot articulate all three, the edit is not justified by evidence: NO-OP the cycle instead of committing a refine/create without an line. This is decision-observability discipline (paper: arxiv 2604.25850) at the cognitive layer — there is no validator, but a future cycle re-reads the line as informal evidence and a human reads it as an audit trail.
expected:expected:retirerevivemeta-suggestionrefusedNO-OPinitThe body of the line is freeform prose. See "Step 7 — append the event" for the template position and worked examples; see "What an line must do (and must not be)" later in this document for the anti-patterns to refuse.
expected:skills/_journal.mdrefinecreateexpected:如果您无法明确这三点,则该编辑缺乏证据支持:请执行NO-OP而非提交未包含行的优化/创建操作。这是认知层面的决策可观测性准则(参考论文:arxiv 2604.25850)——没有验证器,但未来的周期会将该行作为非正式证据重新读取,人类也会将其作为审计追踪记录查看。
expected:expected:retirerevivemeta-suggestionrefusedNO-OPinit该行内容为自由格式。请参阅“步骤7 — 追加事件”了解模板位置和示例;请参阅本文档后续的“行必须包含(且不能包含)的内容”了解需避免的反模式。
expected:Glossary
术语表
- session — one run of (the main evolution loop). There are ~3 per day.
scripts/evolve.sh - cycle — one run of this skill, invoked from . Cycles are gated by a session-counter and a 24h cooldown, so they fire roughly once every 5+ sessions.
scripts/skill_evolve.sh - real cycle — a cycle that produced one of . Excludes
refine | create | retire | meta-suggestion,init, andrefused.NO-OP
- session(会话) — 一次运行(主要演进循环)。每天约3次。
scripts/evolve.sh - cycle(周期) — 一次此技能的运行,由调用。周期受会话计数器和24小时冷却时间限制,因此大约每5个以上会话触发一次。
scripts/skill_evolve.sh - real cycle(实际周期) — 产生之一的周期。不包括
refine | create | retire | meta-suggestion、init和refused。NO-OP
Bootstrap (first three real cycles only)
引导阶段(仅前三个实际周期)
We are mid-life, not at Day 1, so the cold-start rules from the original design are softened — but the first three real cycles still get extra constraints to let the loop settle.
To know which cycle you are in, count the non-init, non-refused, non-NO-OP entries in :
skills/_journal.mdbash
cycle_index=$(grep -E '^## .*evt-[0-9]+ (refine|create|retire|meta-suggestion)' skills/_journal.md | wc -l)我们处于中期阶段,而非初始阶段,因此原始设计中的冷启动规则已放宽——但前三个实际周期仍有额外约束,以确保循环稳定。
要了解您处于哪个周期,请统计中非、非、非的条目数量:
skills/_journal.mdinitrefusedNO-OPbash
cycle_index=$(grep -E '^## .*evt-[0-9]+ (refine|create|retire|meta-suggestion)' skills/_journal.md | wc -l)cycle_index=0 → this is the first real cycle
cycle_index=0 → 这是第一个实际周期
cycle_index=1 → second
cycle_index=1 → 第二个
cycle_index=2 → third
cycle_index=2 → 第三个
cycle_index>=3 → full lifecycle unlocked
cycle_index>=3 → 完整生命周期解锁
- **First real cycle** (`cycle_index == 0`): only `refine` or `NO-OP` allowed. Do not create. Do not retire.
- **Second real cycle** (`cycle_index == 1`): `refine`, `create`, or `NO-OP`. No retirement yet.
- **Third real cycle onward** (`cycle_index >= 2`): full lifecycle unlocked (`refine` | `create` | `retire` | `NO-OP`).
(Note: the gate-counter at `.skill_evolve_counter` is unrelated to this — it just controls when the cycle fires, not what it can do.)
- **第一个实际周期** (`cycle_index == 0`):仅允许`refine`或`NO-OP`。禁止创建或停用。
- **第二个实际周期** (`cycle_index == 1`):允许`refine`、`create`或`NO-OP`。仍禁止停用。
- **第三个实际周期及以后** (`cycle_index >= 2`):完整生命周期解锁(允许`refine` | `create` | `retire` | `NO-OP`)。
(注意:`.skill_evolve_counter`中的门控计数器与此无关——它仅控制周期何时触发,不控制可执行的操作。)Lifecycle states
生命周期状态
Every eligible skill carries a field in its frontmatter. Five states. Important: yoagent always loads anything with a valid regardless of status — is your bookkeeping, telling you what to do next, not what the loader does. The only way to fully un-load a skill from the agent's prompt is to its directory to (sibling of , not scanned by ).
status:<dir>/SKILL.mdstatus:git mvskills_attic/skills/--skills| State | | Description-prefix | Entry condition | Exit condition |
|---|---|---|---|---|
| dormant | | none | a recurring pattern not yet ratified | ratified by you → |
| candidate | | | you draft a new skill | ≥2 successful invocations → |
| active | | none | promoted from | refinement applied → |
| refined | | none | you applied a diff | falls back to |
| deprecated | | none | | revived by use → |
The prefix is agent-written when you Create a skill (see Create template below). Nothing in the loader injects it. It tells future sessions to treat the skill as experimental.
[CANDIDATE — unreviewed]每个符合条件的技能在其前置声明中都有一个字段。分为五种状态。重要提示:yoagent始终加载任何包含有效的技能,无论其状态如何——是您的记录字段,告诉您下一步该做什么,而非加载器的行为。要完全从代理的提示中卸载技能,必须通过将其目录移至(的同级目录,不会被扫描)。
status:<dir>/SKILL.mdstatus:git mvskills_attic/skills/--skills| 状态 | | 描述前缀 | 进入条件 | 退出条件 |
|---|---|---|---|---|
| 休眠 | | 无 | 重复模式尚未被批准 | 您批准后 → |
| 候选 | | | 您起草了一个新技能 | ≥2次成功调用 → |
| 活跃 | | 无 | 从 | 应用优化 → |
| 已优化 | | 无 | 您应用了差异更新 | 如果得分保持不变,1个会话后返回 |
| 已弃用 | | 无 | | 再次被使用 → |
[CANDIDATE — unreviewed]Cycle execution sequence
周期执行流程
Run these steps in order, every cycle.
请按以下步骤依次执行每个周期。
1. Read evidence
1. 读取证据
bash
undefinedbash
undefinedLatest cycles:
最新周期:
tail -n 200 skills/_journal.md
tail -n 200 skills/_journal.md
Recent self-reflection:
近期自我反思:
tail -n 50 memory/learnings.jsonl
tail -n 50 memory/learnings.jsonl
Top of journal (newest entries are at top):
日志顶部(最新条目在最上方):
head -n 200 journals/JOURNAL.md
head -n 200 journals/JOURNAL.md
Recent runs:
近期运行记录:
gh run list --json url,conclusion,createdAt,name -L 10 || echo "[]"
gh run list --json url,conclusion,createdAt,name -L 10 || echo "[]"
Audit evidence (set by harness, points at audit-log worktree):
审计证据(由系统设置,指向审计日志工作区):
ls "${YOYO_AUDIT_DIR:-/tmp/audit-read/sessions}" 2>/dev/null | tail -30
**First-run handling**: if `$YOYO_AUDIT_DIR` is unset or its directory is empty, the audit-log branch hasn't accumulated evidence yet (this is normal on the first 1–2 cycles). In that case:
- Skip the per-session audit.jsonl mining in step 3 ("Mine patterns").
- Use only `memory/learnings.jsonl` and `journals/JOURNAL.md` for complaint and use signals.
- Lean toward **NO-OP** — without audit evidence, scoring is too noisy to support a confident refine/create/retire decision.
- Write the NO-OP event with note: `evidence: only learnings (audit-log unavailable)`.ls "${YOYO_AUDIT_DIR:-/tmp/audit-read/sessions}" 2>/dev/null | tail -30
**首次运行处理**:如果`$YOYO_AUDIT_DIR`未设置或其目录为空,则审计日志分支尚未积累证据(在前1-2个周期中这是正常的)。在这种情况下:
- 跳过步骤3(“挖掘模式”)中的每个会话audit.jsonl挖掘。
- 仅使用`memory/learnings.jsonl`和`journals/JOURNAL.md`获取反馈和使用信号。
- 倾向于执行**NO-OP**——没有审计证据,评分噪音太大,无法支持自信的优化/创建/停用决策。
- 写入NO-OP事件,并添加说明:`evidence: only learnings (audit-log unavailable)`。2. Enumerate eligible skills
2. 枚举符合条件的技能
bash
undefinedbash
undefinedAllow-list: only skills declaring origin: yoyo are eligible.
白名单:仅声明origin: yoyo的技能符合条件。
Defense in depth: also exclude anything carrying core: true.
深度防御:同时排除任何设置了core: true的技能。
for d in skills/*/; do
name=$(basename "$d")
[ "$name" = "skill-evolve" ] && continue
[ -f "$d/SKILL.md" ] || continue
grep -q "^core: true" "$d/SKILL.md" && continue
grep -q "^origin: yoyo$" "$d/SKILL.md" || continue
echo "$name"
done
undefinedfor d in skills/*/; do
name=$(basename "$d")
[ "$name" = "skill-evolve" ] && continue
[ -f "$d/SKILL.md" ] || continue
grep -q "^core: true" "$d/SKILL.md" && continue
grep -q "^origin: yoyo$" "$d/SKILL.md" || continue
echo "$name"
done
undefined3. Mine patterns
3. 挖掘模式
This step has two layers: counting (the basic signals) and diagnosing (understanding why failures happened, not just that they did). Diagnosis is what turns recurrence into actionable refinement targets.
此步骤分为两层:统计(基础信号)和诊断(理解失败的原因,而非仅知道失败发生)。诊断是将重复模式转化为可操作优化目标的关键。
3a. Count basic signals
3a. 统计基础信号
For each eligible skill, count:
- Complaint signals: entries in whose
memory/learnings.jsonlorpattern_key/titlementions the skill and uses negative language ("wrong", "didn't", "instead", "should have").takeaway - Failure signals: tool-call failures in where the bash command or args reference the skill's domain.
${YOYO_AUDIT_DIR}/day-*/audit.jsonl - Use signals: number of sessions where any string from the skill's frontmatter list appears in that session's
keywords:. This isaudit.jsonl.uses - Win signals: out of those sessions, count the ones where has
outcome.jsonANDtest_ok: true. This istasks_succeeded >= 1.wins
If a skill's frontmatter is missing , fall back to its name as the only keyword (likely noisy — flag in so the operator can add proper keywords).
keywords:_journal.mdCompute and update the EMA score:
wins/usesnew_score = 0.3 * blended + 0.7 * old_score
blended = 0.5 * (wins/uses) + 0.3 * (1 - complaints/uses) + 0.2 * mention_rateUpdate the skill's frontmatter with the new values: , , , and (= the timestamp of the most-recent matching session). These updates are part of your single allowed mutation per cycle — you may bundle them into a refine event, or write a tiny "score-update" event when nothing else changes (this counts as a NO-OP for the bootstrap counter).
scoreuseswinslast_used针对每个符合条件的技能,统计:
- 负面反馈信号:中
memory/learnings.jsonl或pattern_key/title提及该技能并使用负面语言(“错误”、“未”、“反而”、“本应”)的条目。takeaway - 失败信号:中bash命令或参数涉及该技能领域的工具调用失败次数。
${YOYO_AUDIT_DIR}/day-*/audit.jsonl - 使用信号:会话中技能前置声明列表中的任何字符串出现在该会话
keywords:中的次数。即audit.jsonl。uses - 成功信号:在这些会话中,包含
outcome.json且test_ok: true的次数。即tasks_succeeded >= 1。wins
如果技能前置声明缺失,则退而使用其名称作为唯一关键词(可能噪音较大——在中标记,以便操作者添加合适的关键词)。
keywords:_journal.md计算并更新EMA得分:
wins/usesnew_score = 0.3 * blended + 0.7 * old_score
blended = 0.5 * (wins/uses) + 0.3 * (1 - complaints/uses) + 0.2 * mention_rate更新技能前置声明中的新值:、、和(=最近匹配会话的时间戳)。这些更新是您每个周期允许的唯一变更的一部分——您可以将其纳入优化事件,或者在无其他变更时写入一个小型“得分更新”事件(这在引导计数器中视为NO-OP)。
scoreuseswinslast_used3b. Diagnose the cause (trace-based)
3b. 诊断原因(基于追踪)
Counting tells you which skill is struggling. Diagnosing tells you what to fix. Borrowed from the GEPA pattern (Genetic-Pareto Prompt Evolution): read the actual execution traces, don't just count failures.
For each skill where OR (with ), open the relevant session's and look for these failure-mode patterns:
complaint_signals ≥ 2(wins/uses) < 0.5uses ≥ 3audit.jsonl| Pattern in audit.jsonl | Likely cause | Refinement direction |
|---|---|---|
Same | Skill missing a concrete command example | Add a verbatim example in |
| Agent edited and reverted the SAME path — likely the change was rejected by build/test, not just exploratory | Add a |
| Skill's procedure has a recurring blind spot | Add a |
Long bash sequences (10+ tool calls) without intermediate | Skill points at non-existent docs OR doesn't tell agent to verify state | Add a "verify your assumptions" step in |
Tool calls that should be there per | Skill isn't actually being invoked when it should be | The |
For each candidate refinement target, write a 1-2 sentence cause hypothesis:
target: social
hypothesis: 3 sessions show repeated `gh api graphql` calls with malformed `categoryId`
args (sessions day-52, day-55, day-57). Skill's Procedure mentions categoryId
but doesn't show the format. Refinement: add a verbatim example.Carry this hypothesis into step 4 (action selection) and step 5 (Refine — it tells you what to write in the diff). Without a hypothesis, you're guessing; with one, the refinement is targeted and the eval (Refine step R4) has something concrete to compare.
If no clear hypothesis emerges from the traces, prefer NO-OP over speculative refinement. Counting alone is not a license to mutate.
统计告诉您哪个技能存在问题。诊断告诉您要修复什么。借鉴GEPA模式(遗传-帕累托提示演进):阅读实际执行追踪,而非仅统计失败次数。
针对每个或(且)的技能,打开相关会话的并查找以下失败模式:
complaint_signals ≥ 2(wins/uses) < 0.5uses ≥ 3audit.jsonl| audit.jsonl中的模式 | 可能原因 | 优化方向 |
|---|---|---|
相同的 | 技能缺少具体的命令示例 | 在 |
| 代理编辑并还原了同一路径——可能是变更被构建/测试拒绝,而非仅探索性操作 | 在 |
多个会话中出现 | 技能的流程存在重复盲点 | 在 |
长bash序列(10次以上工具调用)未中间 | 技能指向不存在的文档,或未告知代理验证状态 | 在 |
根据 | 技能在应被调用时未被实际触发 | |
针对每个候选优化目标,撰写1-2句话的原因假设:
target: social
hypothesis: 3个会话显示重复调用`gh api graphql`时`categoryId`参数格式错误
(会话day-52、day-55、day-57)。技能的Procedure提及了categoryId
但未展示格式。优化方案:添加逐字示例。将此假设带入步骤4(选择操作)和步骤5(优化——它告诉您差异更新中要写入的内容)。没有假设,您只是猜测;有了假设,优化将更具针对性,评估(优化步骤R4)也有具体的比较依据。
如果从追踪中未得出明确假设,优先执行NO-OP而非推测性优化。仅靠统计不足以支持变更。
4. Pick exactly one action
4. 选择恰好一项操作
Decision order (first match wins):
- Retire (third cycle onward only): if any skill has AND
score < 0.3≥ 10 sessions ago, retire the lowest-scoring one. Skip if there are < 2 active eligible skills (don't bottom out the library).last_used - Refine: if any skill (a) has , OR (b) has
complaint_signals ≥ 2with(wins/uses) < 0.5, AND in either case has not been refined in the last 3 sessions (uses ≥ 3check), refine it. This matches the diagnosis-trigger condition in step 3b. Pick the target with the strongest evidence (highest complaint count, or lowest wins-ratio if no complaints).last_evolved - Create (second cycle onward only, and only if active skill count < 25): if any appears in ≥3 distinct sessions of
pattern_keyAND no existing eligible skill covers it (≥3 keyword overlap → refine that one instead), draft a new skill.learnings.jsonl - NO-OP: nothing meets the bars. Write a event with a one-line note about what evidence you considered.
NO-OP
If you've written 3 consecutive events, also write to the event — the harness reads this and extends the cooldown.
NO-OPevolution_saturation: true决策顺序(匹配到第一个即执行):
- 停用(仅第三个周期及以后):如果任何技能且
score < 0.3≥10个会话之前,停用得分最低的那个。如果活跃符合条件的技能数量<2,则跳过(不要耗尽技能库)。last_used - 优化:如果任何技能(a) ,或(b)
complaint_signals ≥ 2且(wins/uses) < 0.5,且任一情况下该技能在过去3个会话中未被优化(检查uses ≥ 3),则优化它。这与步骤3b中的诊断触发条件匹配。选择证据最充分的目标(负面反馈最多,若无负面反馈则选择成功率最低的)。last_evolved - 创建(仅第二个周期及以后,且活跃技能数量<25):如果中≥3个不同会话出现相同的
learnings.jsonl,且现有符合条件的技能均未覆盖该模式(≥3个关键词重叠→改为优化该技能),则起草一个新技能。pattern_key - NO-OP:无任何操作符合条件。写入事件,并附上一行说明您考虑了哪些证据。
NO-OP
如果您连续写入3个事件,请在第三个事件中添加——系统会读取此标记并延长冷却时间。
NO-OPevolution_saturation: true5. Execute the action
5. 执行操作
Refine
优化
Refinement uses a snapshot + A/B eval pattern (borrowed from Anthropic's skill-creator). The goal: never commit a refinement that doesn't measurably improve the skill on at least one concrete prompt.
Step R1 — Snapshot the baseline.
Before editing, copy the current SKILL.md to a temp location:
bash
mkdir -p /tmp/skill-evolve-baseline
cp "skills/<target>/SKILL.md" "/tmp/skill-evolve-baseline/<target>.SKILL.md"Step R2 — Generate 2-3 synthetic test prompts.
Read the target skill's and sections. Derive concrete prompts a future agent might receive that should trigger this skill. Examples for :
## When to use## Proceduresocial- "Reply to discussion #42 with a thoughtful response"
- "Post a 1-in-4-chance proactive riff in The Show category"
- "Find unanswered questions in the Journal Club category"
Write them to :
/tmp/skill-evolve-eval/<target>/prompts.jsonjson
[
{"id": "p1", "prompt": "...", "expects": "<one-sentence success criterion>"},
{"id": "p2", "prompt": "...", "expects": "..."}
]Step R3 — Write the candidate diff.
Use to apply your refinement. Constraints:
edit_file- ≤30 added lines, ≤15 removed lines (diff stat)
- Touch only the and
## Pitfallssections (or the skill's "what to do" body) — never the top-level## Procedure, never any frontmatter field except the four bookkeeping fields established in step 3a:description:,score,uses,wins. (last_usedis also updated, to today's date.)last_evolved
Step R4 — A/B compare.
For each test prompt, generate a 1-3 sentence summary of how each version (baseline, candidate) would handle the prompt — what tools the agent would call, what order, what the outcome would look like.
Two execution modes, in order of preference:
-
Preferred (sub-agent A/B): if you haveavailable, dispatch two sub-agent calls in parallel:
sub_agent- Sub-agent A: read + the test prompt → output JSON
/tmp/skill-evolve-baseline/<target>.SKILL.md{"summary": "...", "tool_sequence": ["bash", "edit_file", ...]} - Sub-agent B: same with the candidate file
- Use the structured outputs to compare apples-to-apples.
- Sub-agent A: read
-
Fallback (single-agent sequential): ifisn't available or returned an error, read the baseline file, write a baseline summary; then read the candidate file, write a candidate summary. Be deliberate about not letting the candidate read bias the baseline read — write the baseline summary BEFORE looking at the candidate.
sub_agent
For each prompt, decide one of:
- : candidate's procedure is more specific, addresses the prompt more directly
candidate-better - : no meaningful difference
tie - : regression — the refinement made things worse
baseline-better
Step R5 — Decide.
Commit the refinement only if:
- 0 prompts came out , AND
baseline-better - At least 1 prompt came out
candidate-better
Otherwise: revert the edit () and write a event with (or ).
cp /tmp/skill-evolve-baseline/<target>.SKILL.md skills/<target>/SKILL.mdNO-OPeval-result: regressioneval-result: tieStep R6 — Append eval summary to the event.
Add an field to the event:
_journal.mdeval-summary:- eval-summary: 2/2 prompts candidate-better, 0 regressionsOr for a NO-OP-after-eval:
- eval-summary: 1/2 baseline-better — refinement was a regression on prompt p2 ("..."). Reverted.优化采用快照+A/B评估模式(借鉴Anthropic的技能创建器)。目标:绝不提交无法在至少一个具体提示上显著提升技能效果的优化。
步骤R1 — 快照基线
编辑前,将当前SKILL.md复制到临时位置:
bash
mkdir -p /tmp/skill-evolve-baseline
cp "skills/<target>/SKILL.md" "/tmp/skill-evolve-baseline/<target>.SKILL.md"步骤R2 — 生成2-3个合成测试提示
阅读目标技能的和部分。推导未来代理可能收到的、应触发此技能的具体提示。例如技能的示例:
## When to use## Proceduresocial- “回复讨论#42,给出有深度的回应”
- “在The Show类别中发布一个1/4概率的主动式即兴内容”
- “在Journal Club类别中查找未回答的问题”
将它们写入:
/tmp/skill-evolve-eval/<target>/prompts.jsonjson
[
{"id": "p1", "prompt": "...", "expects": "<一句话成功标准>"},
{"id": "p2", "prompt": "...", "expects": "..."}
]步骤R3 — 撰写候选差异
使用应用您的优化。约束:
edit_file- 新增行数≤30,删除行数≤15(差异统计)
- 仅修改和
## Pitfalls部分(或技能的“操作说明”正文)——绝不修改顶层## Procedure,绝不修改前置声明中除步骤3a确立的四个记录字段之外的任何字段:description:、score、uses、wins。(last_used也会更新为当前日期。)last_evolved
步骤R4 — A/B对比
针对每个测试提示,生成一段1-3句话的摘要,说明每个版本(基线、候选)将如何处理该提示——代理将调用哪些工具,顺序如何,结果会是什么样。
两种执行模式,优先顺序如下:
-
首选(子代理A/B):如果可用,并行调度两个子代理调用:
sub_agent- 子代理A:读取+ 测试提示 → 输出JSON
/tmp/skill-evolve-baseline/<target>.SKILL.md{"summary": "...", "tool_sequence": ["bash", "edit_file", ...]} - 子代理B:使用候选文件执行相同操作
- 使用结构化输出进行直接对比。
- 子代理A:读取
-
备选(单代理顺序):如果不可用或返回错误,先读取基线文件,撰写基线摘要;然后读取候选文件,撰写候选摘要。注意不要让候选文件的内容影响基线读取——先撰写基线摘要,再查看候选文件。
sub_agent
针对每个提示,做出以下决策之一:
- :候选流程更具体,更直接地处理提示
candidate-better - :无显著差异
tie - :退化——优化使情况变差
baseline-better
步骤R5 — 决策
仅在以下情况下提交优化:
- 0个提示显示,且
baseline-better - 至少1个提示显示
candidate-better
否则:还原编辑()并写入事件,添加(或)。
cp /tmp/skill-evolve-baseline/<target>.SKILL.md skills/<target>/SKILL.mdNO-OPeval-result: regressioneval-result: tie步骤R6 — 将评估摘要追加到事件
在事件中添加字段:
_journal.mdeval-summary:- eval-summary: 2/2提示候选版本更优,无退化或者针对评估后的NO-OP:
- eval-summary: 1/2提示基线版本更优——优化在提示p2("...")上出现退化。已还原。Create
创建
Draft :
skills/<new-name>/SKILL.mdyaml
---
name: <new-name>
description: "[CANDIDATE — unreviewed] <pushy one-line trigger description, ≤200 chars total>"
tools: [bash, read_file, ...]
origin: yoyo
status: candidate
score: 0.5
uses: 0
wins: 0
last_used: null
last_evolved: <today>
parent_pattern_key: <kebab-case verb.object>
keywords: ["<distinctive substring 1>", "<distinctive substring 2>", "..."] # ≥3 strings that, if found in a session's audit.jsonl, indicate this skill was used
---起草:
skills/<new-name>/SKILL.mdyaml
---
name: <new-name>
description: "[CANDIDATE — unreviewed] <具有引导性的一行触发描述,总长度≤200字符>"
tools: [bash, read_file, ...]
origin: yoyo
status: candidate
score: 0.5
uses: 0
wins: 0
last_used: null
last_evolved: <today>
parent_pattern_key: <短横线分隔的动词.宾语>
keywords: ["<独特子字符串1>", "<独特子字符串2>", "..."] # ≥3个字符串,若在会话audit.jsonl中出现,表明此技能被使用
---<Title>
<标题>
When to use
何时使用
<concrete trigger conditions>
<具体触发条件>
Quick reference
快速参考
<one-screen cheat sheet>
<一屏大小的速查表>
Procedure
操作流程
<numbered steps>
<编号步骤>
Pitfalls
注意事项
<things that have gone wrong before>
<过往出现过的问题>
Verification
验证方式
<how the skill knows it succeeded>
```
The prefix is critical — it tells the agent in future sessions to treat the skill as experimental, not as system-prompt-grade truth.
[CANDIDATE — unreviewed]<技能如何判断自身执行成功>
`[CANDIDATE — unreviewed]`前缀至关重要——它告诉未来会话中的代理将该技能视为实验性技能,而非系统提示级别的可信内容。Retire
停用
bash
git mv skills/<name>/ skills_attic/<name>/Soft delete. Recoverable. If yoyo invokes the skill's domain again within 3 cycles, you may revive it (move back, reset score to 0.5).
bash
git mv skills/<name>/ skills_attic/<name>/软删除。可恢复。如果yoyo在3个周期内再次调用该技能的领域,您可以恢复它(移回原位置,将得分重置为0.5)。
6. Validate
6. 验证
Before committing, run all of these. If any fails, write and exit:
refusedbash
undefined提交前,请运行以下所有验证。如果任何一项失败,写入并退出:
refusedbash
undefinedYAML frontmatter parses (use python3 since yq may not be installed):
YAML前置声明可解析(使用python3,因为yq可能未安装):
python3 -c "
import sys, re
content = open('skills/<name>/SKILL.md').read()
m = re.match(r'---\n(.*?)\n---\n', content, re.DOTALL)
assert m, 'no frontmatter'
fm = m.group(1)
assert len(fm) <= 1900, f'frontmatter too long: {len(fm)}'
python3 -c "
import sys, re
content = open('skills/<name>/SKILL.md').read()
m = re.match(r'---\n(.*?)\n---\n', content, re.DOTALL)
assert m, 'no frontmatter'
fm = m.group(1)
assert len(fm) <= 1900, f'frontmatter too long: {len(fm)}'
crude parse
粗略解析
for line in fm.splitlines():
if line.strip() and ':' not in line:
sys.exit(f'invalid line: {line}')
"
for line in fm.splitlines():
if line.strip() and ':' not in line:
sys.exit(f'invalid line: {line}')
"
Description ≤ 200 chars:
描述≤200字符:
desc=$(grep '^description:' skills/<name>/SKILL.md | head -1 | sed 's/^description: *//')
[ "${#desc}" -le 200 ] || { echo "description too long"; exit 1; }
desc=$(grep '^description:' skills/<name>/SKILL.md | head -1 | sed 's/^description: *//')
[ "${#desc}" -le 200 ] || { echo "description too long"; exit 1; }
Body token estimate (~ word count, ceiling 5000):
正文字数估计(~单词数,上限5000):
body_words=$(awk '/^---$/{n++; next} n>=2' skills/<name>/SKILL.md | wc -w)
[ "$body_words" -le 5000 ] || { echo "body too long"; exit 1; }
body_words=$(awk '/^---$/{n++; next} n>=2' skills/<name>/SKILL.md | wc -w)
[ "$body_words" -le 5000 ] || { echo "body too long"; exit 1; }
Build still works (the meta-skill itself shouldn't break the build, but defense in depth):
构建仍可正常运行(元技能本身不应破坏构建,但需深度防御):
cargo build --release 2>&1 | tail -5
undefinedcargo build --release 2>&1 | tail -5
undefined7. Append the event to skills/_journal.md
skills/_journal.md7. 将事件追加到skills/_journal.md
skills/_journal.mdGet the next event number:
bash
last=$(grep -oE 'evt-[0-9]+' skills/_journal.md | sort -u | tail -1)
n=$((${last#evt-} + 1))
evt=$(printf 'evt-%04d' $n)Append (using , never overwrite):
>>undefined获取下一个事件编号:
bash
last=$(grep -oE 'evt-[0-9]+' skills/_journal.md | sort -u | tail -1)
n=$((${last#evt-} + 1))
evt=$(printf 'evt-%04d' $n)追加(使用,绝不要覆盖):
>>undefined<ISO8601> <evt-NNNN> <type>
<ISO8601> <evt-NNNN> <type>
- skill: <name or "-">
- trigger: <one-line summary of evidence>
- diff: <+A -B (path)> or "n/a"
- validation: <pass | reason for refusal>
- score-delta: <old> → <new>
- parent-event: <evt-NNNN>
- expected: <observable signal | horizon | fallback> # required for refine/create only; forbidden on all other types
- note: <optional one-line>
Where `<type>` is one of: `init`, `refine`, `create`, `retire`, `revive`, `meta-suggestion`, `refused`, `NO-OP`.- skill: <名称或"-">
- trigger: <一句话证据摘要>
- diff: <+A -B (路径)> 或 "n/a"
- validation: <pass | 拒绝原因>
- score-delta: <旧值> → <新值>
- parent-event: <evt-NNNN>
- expected: <可观测信号 | 时间范围 | 备选方案> # 仅refine/create需要;其他类型禁止使用
- note: <可选一句话说明>
其中`<type>`为以下之一:`init`、`refine`、`create`、`retire`、`revive`、`meta-suggestion`、`refused`、`NO-OP`。What an expected:
line must do (and must not be)
expected:expected:
行必须包含(且不能包含)的内容
expected:A good line names all three of: a concrete observable signal, a horizon, and a fallback move.
expected:Concrete observables you may reference:
- A skill's frontmatter /
uses/wins(e.g. "social.uses should grow by ≥3 over the next 5 sessions")score - A specific failure cluster's recurrence in audit-log sessions (e.g. "the gh-discussion-comment STUCK cluster should drop to 0 hits within 5 sessions")
- A trace pattern from step 3b (e.g. "the revert-after-edit pattern on social/SKILL.md should not recur in the next 3 sessions")
git checkout - A concrete tool-call sequence that should/should not appear in audit.jsonl
Horizons: "by next cycle", "within ~3 sessions", "within ~5 sessions", "within 7 days". Do not say "eventually" or omit the horizon.
Fallbacks: name the next move if the prediction does not hold. Examples: "...otherwise this is a sub-skill candidate, not a prose refine"; "...otherwise the is the wrong target — try refining the body instead"; "...otherwise retire the skill".
description:Worked examples:
For a event:
refine- expected: STUCK rate on the gh-discussion-comment cluster should drop to 0
within the next ~5 evolve sessions; if not, the prose tweak was insufficient
and a helper script (sub-skill) is the right next stepFor a event:
create- expected: at least 2 sessions in the next 5 should match this skill's
keywords[] AND have outcome.json.test_ok=true (i.e. wins ≥ 2 by next cycle);
if uses < 2 by then, the description: is too narrow and needs widening, or
the pattern was a one-off and the skill should retireAnti-patterns to refuse (these do not satisfy HARD RULE #4 — NO-OP instead of writing them):
- "feels better"
- "will be more readable"
- "the prose is now clearer"
- "users will like it"
- "yoyo will use this skill more" (no horizon, no signal)
- "this should help" (no horizon, no signal, no fallback)
If your candidate line reads like one of those, you do not have a theory of impact — the evidence does not justify a mutation this cycle. Write and move on.
expected:NO-OP优秀的行必须明确三点:具体可观测信号、时间范围、备选方案。
expected:可引用的具体可观测信号:
- 技能前置声明中的/
uses/wins(例如“social.uses应在未来5个会话中增长≥3”)score - 审计日志会话中特定失败集群的重复出现次数(例如“gh-discussion-comment STUCK集群应在5个会话内降至0次”)
- 步骤3b中的追踪模式(例如“social/SKILL.md上的编辑后还原模式在未来3个会话中不应再次出现”)
git checkout - audit.jsonl中应出现或不应出现的具体工具调用序列
时间范围:“到下一个周期”、“约3个会话内”、“约5个会话内”、“7天内”。不要说“最终”或省略时间范围。
备选方案:如果预测未实现,说明下一步操作。示例:“...否则这是子技能候选,而非文案优化”;“...否则不是正确的优化目标——尝试优化正文”;“...否则停用该技能”。
description:示例:
针对事件:
refine- expected: gh-discussion-comment集群的STUCK率应在未来约5个演进会话内降至0;
若未实现,则文案调整不足,下一步应使用辅助脚本(子技能)针对事件:
create- expected: 未来5个会话中至少2个应匹配此技能的keywords[],且outcome.json.test_ok=true
(即到下一个周期时wins≥2);若届时uses<2,则说明description:
过于狭窄,需要放宽,或者该模式是一次性的,技能应被停用需避免的反模式(这些不符合硬性规则#4——请执行NO-OP而非写入):
- “感觉更好”
- “将更具可读性”
- “文案现在更清晰了”
- “用户会喜欢它”
- “yoyo将更多使用此技能” (无时间范围,无信号)
- “这应该会有帮助” (无时间范围,无信号,无备选方案)
如果您的候选行类似于上述内容,则您没有明确的影响理论——现有证据不足以支持本周期的变更。请写入NO-OP并继续。
expected:8. Commit
8. 提交
bash
git add skills/ skills_attic/ memory/learnings.jsonl
git commit -m "skill-evolve: <type> <skill-name>" || trueThe harness pushes (or doesn't, depending on its config). Do not push from inside this skill.
bash
git add skills/ skills_attic/ memory/learnings.jsonl
git commit -m "skill-evolve: <type> <skill-name>" || true系统会负责推送(或不推送,取决于其配置)。请勿在此技能内部执行推送操作。
Anti-bloat ceilings
防膨胀上限
Before any action, verify all of these:
create- Active skill count (any with or
status: active) ≤ 25 before this create. If at the limit, you muststatus: refinedfirst or writeretire.NO-OP - Total skill count in (excluding any skill with
skills/) ≤ 30.core: true - The new skill's frontmatter is ≤ 1900 chars.
- The new skill's description is ≤ 200 chars (including the prefix).
[CANDIDATE — unreviewed] - The new skill's body is ≤ 5000 words.
- No existing eligible skill has ≥3 keyword overlap with the new skill's section. If so, refine that skill instead.
When to use
执行任何操作前,请验证以下所有条件:
create- 创建前活跃技能数量(任何或
status: active的技能)≤25。如果已达上限,您必须先status: refined一个技能或执行NO-OP。retire - 中的总技能数量(排除任何
skills/的技能)≤30。core: true - 新技能的前置声明≤1900字符。
- 新技能的描述≤200字符(包括前缀)。
[CANDIDATE — unreviewed] - 新技能的正文≤5000单词。
- 现有符合条件的技能与新技能的部分的关键词重叠≥3个。如果是,改为优化现有技能。
When to use
Failure modes you must guard against
您必须防范的失败模式
| Mode | What it looks like | What you do |
|---|---|---|
| Skill thrashing | Same skill refined twice within 3 sessions | Read |
| Saturation | 3 consecutive NO-OP events in | Add |
| Self-edit attempt | Pattern points at | HARD RULE #2 — write |
| Core-edit attempt | Pattern points at one of the core 4 | HARD RULE #1 — write |
| Skill collision | New skill's triggers overlap an existing skill | Refine the existing skill instead |
| Identity drift | Pattern would contradict IDENTITY.md / PERSONALITY.md | Refuse; write a |
| 模式 | 表现 | 应对措施 |
|---|---|---|
| 技能震荡 | 同一技能在3个会话内被优化两次 | 优化前查看 |
| 饱和 | | 在第三个事件中添加 |
| 尝试自我编辑 | 模式指向 | 遵守硬性规则#2——写入 |
| 尝试编辑核心技能 | 模式指向4个核心技能之一 | 遵守硬性规则#1——写入learnings.jsonl条目并停止 |
| 技能冲突 | 新技能的触发条件与现有技能重叠 | 改为优化现有技能 |
| 身份漂移 | 模式与IDENTITY.md / PERSONALITY.md矛盾 | 拒绝操作;写入learnings.jsonl条目,注明矛盾点 |
What good looks like
健康状态的表现
A healthy after 30 days:
skills/_journal.md- 4–10 events total (you don't run every session, and most cycles are NO-OP)
- Mix of refine (~50%), create (~10%), retire (~10%), NO-OP (~30%)
- Zero or
refused: self-editevents (your hard rules are holding)refused: core-edit - Per-skill EMA scores trending up or stable (not down)
- recurrence dispersal falling over time — yoyo is internalizing patterns, not re-discovering them
pattern_key
If you see thrashing, score decay, or many refusals, write a and let the human creator tighten the loop.
meta-suggestion30天后健康的应具备以下特征:
skills/_journal.md- 总计4–10个事件(您并非每个会话都运行,且大多数周期为NO-OP)
- 混合了优化(约50%)、创建(约10%)、停用(约10%)、NO-OP(约30%)
- 无或
refused: self-edit事件(您遵守了硬性规则)refused: core-edit - 每个技能的EMA得分呈上升或稳定趋势(而非下降)
- 的重复出现率随时间下降——yoyo正在内化模式,而非重复发现模式
pattern_key
如果您看到震荡、得分下降或大量拒绝事件,请写入,让人类创作者收紧循环。
meta-suggestion