skill-evolve

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Skill Evolution

技能演进

You are evolving your own skills. This is the only skill that modifies other skills. Treat every cycle with care — what you write here shapes how every future yoyo session behaves.

您正在演进自身的技能。这是唯一能修改其他技能的技能。请谨慎对待每一个周期——您在此处编写的内容将决定未来所有yoyo会话的行为方式。

When to use

使用时机

Only when invoked via
scripts/skill_evolve.sh
. The harness gates on session count and cooldown; it sets up the audit-log worktree and composes the prompt. Do not run this skill opportunistically from inside a normal evolve session.

仅通过
scripts/skill_evolve.sh
调用时使用。系统会根据会话次数和冷却时间进行限制；它会设置审计日志工作区并生成提示。请勿在常规演进会话中随意运行此技能。

Hard rules (read first, every cycle)

硬性规则（每次周期前必读）

These three rules cannot be violated. Each cycle either honors all three or writes a

refused

event and exits.

以下三条规则不可违反。每个周期要么遵守所有规则，要么写入

refused

事件并退出。

HARD RULE #1 — Eligible targets only (allow-list)

硬性规则 #1 — 仅允许符合条件的目标（白名单）

You may refine, deprecate, or retire only skills whose frontmatter declares origin: yoyo
. Any other value, OR a missing

origin:

field, means the skill is off-limits. This is an allow-list: silence means "don't touch."

Three categories of skill exist:

`origin:` value	Source	You may edit?
`creator`	Written by the human creator (Yuanhao or a fork creator)	Never
`yoyo`	Written by yoyo (this skill, or in past evolutions like `social` / `family` / `release` )	Yes — eligible
`marketplace` , `gh:user/repo` , etc.	Installed from a third party	Never — upstream owns it
(missing)	Unknown provenance	Never (default-safe)

Today the eligible set is exactly the skills whose SKILL.md declares

origin: yoyo

```
social
```
```
family
```
```
release
```
any skill you previously spawned (which inherit
```
origin: yoyo
```
from the Create template)

Defense in depth: if a skill has

core: true

set, refuse even if

origin: yoyo

is also somehow present. The two flags should never co-occur, but the conservative move is to honor the deny-flag.

If a recurring pattern suggests a non-eligible skill needs change (e.g., a core skill, or an installed marketplace skill), do not edit it. Instead, write a learning to

memory/learnings.jsonl

with

source: "skill-evolve"

and a clear pattern_key, and append a

meta-suggestion

block to

skills/_journal.md

. The human creator will decide.

您仅可优化、弃用或停用那些在前置声明中标记了**

origin: yoyo

**的技能。任何其他值，或者缺失

origin:

字段，都意味着该技能不可修改。这是一个白名单：无标记即表示“请勿触碰”。

技能分为三类：

`origin:` 值	来源	是否可编辑？
`creator`	由人类创作者（Yuanhao或派生版本创作者）编写	绝不允许
`yoyo`	由yoyo编写（此技能，或过往如 `social` / `family` / `release` 等演进版本）	是——符合条件
`marketplace` , `gh:user/repo` 等	从第三方安装	绝不允许——上游拥有所有权
（缺失）	来源未知	绝不允许（默认安全策略）

当前符合条件的技能是所有在SKILL.md中声明

origin: yoyo

的技能：

```
social
```
```
family
```
```
release
```
您之前创建的任何技能（从Create模板继承
```
origin: yoyo
```
）

深度防御：如果某个技能设置了

core: true

，即使同时存在

origin: yoyo

，也拒绝修改。这两个标记不应同时出现，但保守做法是优先遵守拒绝标记。

如果重复模式表明某个不符合条件的技能需要修改（例如核心技能或已安装的市场技能），请勿编辑它。相反，在

memory/learnings.jsonl

中写入一条来源为

source: "skill-evolve"

的学习记录，并添加清晰的

pattern_key

，同时在

skills/_journal.md

中追加一个

meta-suggestion

块。由人类创作者决定后续操作。

HARD RULE #2 — Never edit yourself

硬性规则 #2 — 绝不编辑自身

You must NEVER modify

skills/skill-evolve/SKILL.md

. If you believe this skill needs improvement, append a

meta-suggestion

block to

skills/_journal.md

and stop:

undefined

您必须绝不修改

skills/skill-evolve/SKILL.md

。如果您认为此技能需要改进，请在

skills/_journal.md

中追加一个

meta-suggestion

块并停止操作：

undefined

evt-XXXX meta-suggestion

ts: <ISO8601>
target: skills/skill-evolve/SKILL.md
suggestion: <one-paragraph description>

undefined

ts: <ISO8601>
target: skills/skill-evolve/SKILL.md
suggestion: <一段描述>

undefined

HARD RULE #3 — One mutation per cycle

硬性规则 #3 — 每个周期仅执行一次变更

Each cycle produces exactly one of:

a refinement diff (one skill, ≤30 added lines, ≤15 removed)
a candidate skill draft (one new directory)
a retirement (one
```
git mv
```
to
```
skills_attic/
```
)
a
```
NO-OP
```
event (you found nothing worth doing)

If you find yourself wanting to do two things, pick the one with the strongest evidence and write the second to

memory/learnings.jsonl

for next cycle.

每个周期只能产生以下结果之一：

一份优化差异（针对一个技能，新增行数≤30，删除行数≤15）
一份候选技能草稿（一个新目录）
一次停用（通过
```
git mv
```
移至
```
skills_attic/
```
）
一个
```
NO-OP
```
事件（未发现值得执行的操作）

如果您想执行两项操作，请选择证据最充分的一项，并将第二项写入

memory/learnings.jsonl

留到下一个周期处理。

HARD RULE #4 — Refine and Create events must declare an expected outcome

硬性规则 #4 — 优化和创建事件必须声明预期结果

Every

refine

and

create

event in

skills/_journal.md

MUST include an

expected:

line — a freeform prose commitment naming (a) a concrete observable signal that should change, (b) a horizon (e.g. "within ~5 sessions" or "by next cycle"), and (c) a fallback move if the prediction does not hold.

If you cannot articulate all three, the edit is not justified by evidence: NO-OP the cycle instead of committing a refine/create without an

expected:

line. This is decision-observability discipline (paper: arxiv 2604.25850) at the cognitive layer — there is no validator, but a future cycle re-reads the line as informal evidence and a human reads it as an audit trail.

expected:

is forbidden on

retire

revive

meta-suggestion

refused

NO-OP

, and

init

events (they do not ship a behavioral change, so there is nothing to predict).

The body of the line is freeform prose. See "Step 7 — append the event" for the template position and worked examples; see "What an

expected:

line must do (and must not be)" later in this document for the anti-patterns to refuse.

skills/_journal.md

中的每个

refine

和

create

事件必须包含

expected:

行——一段自由格式的陈述，需明确：(a) 应发生变化的具体可观测信号，(b) 时间范围（例如“约5个会话内”或“到下一个周期”），以及(c) 预测未实现时的备选方案。

如果您无法明确这三点，则该编辑缺乏证据支持：请执行NO-OP而非提交未包含

expected:

行的优化/创建操作。这是认知层面的决策可观测性准则（参考论文：arxiv 2604.25850）——没有验证器，但未来的周期会将该行作为非正式证据重新读取，人类也会将其作为审计追踪记录查看。

expected:

在

retire

、

revive

、

meta-suggestion

、

refused

、

NO-OP

和

init

事件中禁止使用（这些事件不会带来行为变化，因此无需预测）。

该行内容为自由格式。请参阅“步骤7 — 追加事件”了解模板位置和示例；请参阅本文档后续的“

expected:

行必须包含（且不能包含）的内容”了解需避免的反模式。

Glossary

术语表

session — one run of
```
scripts/evolve.sh
```
(the main evolution loop). There are ~3 per day.
cycle — one run of this skill, invoked from
```
scripts/skill_evolve.sh
```
. Cycles are gated by a session-counter and a 24h cooldown, so they fire roughly once every 5+ sessions.
real cycle — a cycle that produced one of
```
refine | create | retire | meta-suggestion
```
. Excludes
```
init
```
,
```
refused
```
, and
```
NO-OP
```
.

session（会话） — 一次
```
scripts/evolve.sh
```
运行（主要演进循环）。每天约3次。
cycle（周期） — 一次此技能的运行，由
```
scripts/skill_evolve.sh
```
调用。周期受会话计数器和24小时冷却时间限制，因此大约每5个以上会话触发一次。
real cycle（实际周期） — 产生
```
refine | create | retire | meta-suggestion
```
之一的周期。不包括
```
init
```
、
```
refused
```
和
```
NO-OP
```
。

Bootstrap (first three real cycles only)

引导阶段（仅前三个实际周期）

We are mid-life, not at Day 1, so the cold-start rules from the original design are softened — but the first three real cycles still get extra constraints to let the loop settle.

To know which cycle you are in, count the non-init, non-refused, non-NO-OP entries in

skills/_journal.md

bash

cycle_index=$(grep -E '^## .*evt-[0-9]+ (refine|create|retire|meta-suggestion)' skills/_journal.md | wc -l)

我们处于中期阶段，而非初始阶段，因此原始设计中的冷启动规则已放宽——但前三个实际周期仍有额外约束，以确保循环稳定。

要了解您处于哪个周期，请统计

skills/_journal.md

中非

init

、非

refused

、非

NO-OP

的条目数量：

bash

cycle_index=$(grep -E '^## .*evt-[0-9]+ (refine|create|retire|meta-suggestion)' skills/_journal.md | wc -l)

cycle_index=0 → this is the first real cycle

cycle_index=0 → 这是第一个实际周期

cycle_index=1 → second

cycle_index=1 → 第二个

cycle_index=2 → third

cycle_index=2 → 第三个

cycle_index>=3 → full lifecycle unlocked

cycle_index>=3 → 完整生命周期解锁


- **First real cycle** (`cycle_index == 0`): only `refine` or `NO-OP` allowed. Do not create. Do not retire.
- **Second real cycle** (`cycle_index == 1`): `refine`, `create`, or `NO-OP`. No retirement yet.
- **Third real cycle onward** (`cycle_index >= 2`): full lifecycle unlocked (`refine` | `create` | `retire` | `NO-OP`).

(Note: the gate-counter at `.skill_evolve_counter` is unrelated to this — it just controls when the cycle fires, not what it can do.)


- **第一个实际周期** (`cycle_index == 0`)：仅允许`refine`或`NO-OP`。禁止创建或停用。
- **第二个实际周期** (`cycle_index == 1`)：允许`refine`、`create`或`NO-OP`。仍禁止停用。
- **第三个实际周期及以后** (`cycle_index >= 2`)：完整生命周期解锁（允许`refine` | `create` | `retire` | `NO-OP`）。

（注意：`.skill_evolve_counter`中的门控计数器与此无关——它仅控制周期何时触发，不控制可执行的操作。）

Lifecycle states

生命周期状态

Every eligible skill carries a

status:

field in its frontmatter. Five states. Important: yoagent always loads anything with a valid

<dir>/SKILL.md

regardless of status —

status:

is your bookkeeping, telling you what to do next, not what the loader does. The only way to fully un-load a skill from the agent's prompt is to

git mv

its directory to

skills_attic/

(sibling of

skills/

, not scanned by

--skills

State	`status:` value	Description-prefix	Entry condition	Exit condition
dormant	`dormant`	none	a recurring pattern not yet ratified	ratified by you → `candidate`
candidate	`candidate`	`[CANDIDATE — unreviewed]` (you write it on Create)	you draft a new skill	≥2 successful invocations → `active` ; 3 sessions without one → back to `dormant`
active	`active`	none	promoted from `candidate`	refinement applied → `refined` ; score < 0.3 → `deprecated`
refined	`refined`	none	you applied a diff	falls back to `active` after 1 session if score holds
deprecated	`deprecated`	none	`score < 0.3` or 10 sessions unused	revived by use → `active` ; 5 more idle → `git mv` to `skills_attic/`

The

[CANDIDATE — unreviewed]

prefix is agent-written when you Create a skill (see Create template below). Nothing in the loader injects it. It tells future sessions to treat the skill as experimental.

每个符合条件的技能在其前置声明中都有一个

status:

字段。分为五种状态。重要提示：yoagent始终加载任何包含有效

<dir>/SKILL.md

的技能，无论其状态如何——

status:

是您的记录字段，告诉您下一步该做什么，而非加载器的行为。要完全从代理的提示中卸载技能，必须通过

git mv

将其目录移至

skills_attic/

（

skills/

的同级目录，不会被

--skills

扫描）。

状态	`status:` 值	描述前缀	进入条件	退出条件
休眠	`dormant`	无	重复模式尚未被批准	您批准后 → `candidate`
候选	`candidate`	`[CANDIDATE — unreviewed]` （创建时由您写入）	您起草了一个新技能	≥2次成功调用 → `active` ；3个会话未调用 → 返回 `dormant`
活跃	`active`	无	从 `candidate` 晋升而来	应用优化 → `refined` ；得分<0.3 → `deprecated`
已优化	`refined`	无	您应用了差异更新	如果得分保持不变，1个会话后返回 `active`
已弃用	`deprecated`	无	`score < 0.3` 或10个会话未使用	再次被使用 → `active` ；再闲置5个会话 → `git mv` 至 `skills_attic/`

[CANDIDATE — unreviewed]

前缀是您创建技能时必须写入的（请参阅下文的Create模板）。加载器不会自动添加该前缀。它告诉未来的会话将该技能视为实验性技能。

Cycle execution sequence

周期执行流程

Run these steps in order, every cycle.

请按以下步骤依次执行每个周期。

1. Read evidence

1. 读取证据

bash

undefined

bash

undefined

Latest cycles:

Recent self-reflection:

近期自我反思：

tail -n 50 memory/learnings.jsonl

Top of journal (newest entries are at top):

日志顶部（最新条目在最上方）：

head -n 200 journals/JOURNAL.md

Recent runs:

近期运行记录：

gh run list --json url,conclusion,createdAt,name -L 10 || echo "[]"

Audit evidence (set by harness, points at audit-log worktree):

审计证据（由系统设置，指向审计日志工作区）：

ls "${YOYO_AUDIT_DIR:-/tmp/audit-read/sessions}" 2>/dev/null | tail -30


**First-run handling**: if `$YOYO_AUDIT_DIR` is unset or its directory is empty, the audit-log branch hasn't accumulated evidence yet (this is normal on the first 1–2 cycles). In that case:

- Skip the per-session audit.jsonl mining in step 3 ("Mine patterns").
- Use only `memory/learnings.jsonl` and `journals/JOURNAL.md` for complaint and use signals.
- Lean toward **NO-OP** — without audit evidence, scoring is too noisy to support a confident refine/create/retire decision.
- Write the NO-OP event with note: `evidence: only learnings (audit-log unavailable)`.

ls "${YOYO_AUDIT_DIR:-/tmp/audit-read/sessions}" 2>/dev/null | tail -30


**首次运行处理**：如果`$YOYO_AUDIT_DIR`未设置或其目录为空，则审计日志分支尚未积累证据（在前1-2个周期中这是正常的）。在这种情况下：

- 跳过步骤3（“挖掘模式”）中的每个会话audit.jsonl挖掘。
- 仅使用`memory/learnings.jsonl`和`journals/JOURNAL.md`获取反馈和使用信号。
- 倾向于执行**NO-OP**——没有审计证据，评分噪音太大，无法支持自信的优化/创建/停用决策。
- 写入NO-OP事件，并添加说明：`evidence: only learnings (audit-log unavailable)`。

2. Enumerate eligible skills

2. 枚举符合条件的技能

bash

undefined

bash

undefined

Allow-list: only skills declaring origin: yoyo are eligible.

白名单：仅声明origin: yoyo的技能符合条件。

Defense in depth: also exclude anything carrying core: true.

深度防御：同时排除任何设置了core: true的技能。

for d in skills/*/; do name=$(basename "$d") [ "$name" = "skill-evolve" ] && continue [ -f "$d/SKILL.md" ] || continue grep -q "^core: true" "$d/SKILL.md" && continue grep -q "^origin: yoyo$" "$d/SKILL.md" || continue echo "$name" done

undefined

undefined

3. Mine patterns

3. 挖掘模式

This step has two layers: counting (the basic signals) and diagnosing (understanding why failures happened, not just that they did). Diagnosis is what turns recurrence into actionable refinement targets.

此步骤分为两层：统计（基础信号）和诊断（理解失败的原因，而非仅知道失败发生）。诊断是将重复模式转化为可操作优化目标的关键。

3a. Count basic signals

3a. 统计基础信号

For each eligible skill, count:

Complaint signals: entries in
```
memory/learnings.jsonl
```
whose
```
pattern_key
```
or
```
title
```
/
```
takeaway
```
mentions the skill and uses negative language ("wrong", "didn't", "instead", "should have").
Failure signals: tool-call failures in
```
${YOYO_AUDIT_DIR}/day-*/audit.jsonl
```
where the bash command or args reference the skill's domain.
Use signals: number of sessions where any string from the skill's frontmatter
```
keywords:
```
list appears in that session's
```
audit.jsonl
```
. This is
```
uses
```
.
Win signals: out of those sessions, count the ones where
```
outcome.json
```
has
```
test_ok: true
```
AND
```
tasks_succeeded >= 1
```
. This is
```
wins
```
.

If a skill's frontmatter is missing

keywords:

, fall back to its name as the only keyword (likely noisy — flag in

_journal.md

so the operator can add proper keywords).

Compute

wins/uses

and update the EMA score:

new_score = 0.3 * blended + 0.7 * old_score
blended   = 0.5 * (wins/uses) + 0.3 * (1 - complaints/uses) + 0.2 * mention_rate

Update the skill's frontmatter with the new values:

score

uses

wins

, and

last_used

(= the timestamp of the most-recent matching session). These updates are part of your single allowed mutation per cycle — you may bundle them into a refine event, or write a tiny "score-update" event when nothing else changes (this counts as a NO-OP for the bootstrap counter).

针对每个符合条件的技能，统计：

负面反馈信号：
```
memory/learnings.jsonl
```
中
```
pattern_key
```
或
```
title
```
/
```
takeaway
```
提及该技能并使用负面语言（“错误”、“未”、“反而”、“本应”）的条目。
失败信号：
```
${YOYO_AUDIT_DIR}/day-*/audit.jsonl
```
中bash命令或参数涉及该技能领域的工具调用失败次数。
使用信号：会话中技能前置声明
```
keywords:
```
列表中的任何字符串出现在该会话
```
audit.jsonl
```
中的次数。即
```
uses
```
。
成功信号：在这些会话中，
```
outcome.json
```
包含
```
test_ok: true
```
且
```
tasks_succeeded >= 1
```
的次数。即
```
wins
```
。

如果技能前置声明缺失

keywords:

，则退而使用其名称作为唯一关键词（可能噪音较大——在

_journal.md

中标记，以便操作者添加合适的关键词）。

计算

wins/uses

并更新EMA得分：

new_score = 0.3 * blended + 0.7 * old_score
blended   = 0.5 * (wins/uses) + 0.3 * (1 - complaints/uses) + 0.2 * mention_rate

更新技能前置声明中的新值：

score

、

uses

、

wins

和

last_used

（=最近匹配会话的时间戳）。这些更新是您每个周期允许的唯一变更的一部分——您可以将其纳入优化事件，或者在无其他变更时写入一个小型“得分更新”事件（这在引导计数器中视为NO-OP）。

3b. Diagnose the cause (trace-based)

3b. 诊断原因（基于追踪）

Counting tells you which skill is struggling. Diagnosing tells you what to fix. Borrowed from the GEPA pattern (Genetic-Pareto Prompt Evolution): read the actual execution traces, don't just count failures.

For each skill where

complaint_signals ≥ 2

(wins/uses) < 0.5

(with

uses ≥ 3

), open the relevant session's

audit.jsonl

and look for these failure-mode patterns:

Pattern in audit.jsonl	Likely cause	Refinement direction
Same `bash` command retried 3+ times with small arg variations	Skill missing a concrete command example	Add a verbatim example in `## Procedure`
`edit_file <P>` followed within 2 tool calls by `git checkout … <P>` (same path), repeated in ≥2 distinct sessions	Agent edited and reverted the SAME path — likely the change was rejected by build/test, not just exploratory	Add a `## Pitfalls` entry naming the brittle pattern
`success: false` with the same `tool` and similar `args` across multiple sessions	Skill's procedure has a recurring blind spot	Add a `## Pitfalls` entry; consider a "do this first" prelude
Long bash sequences (10+ tool calls) without intermediate `read_file` of relevant docs	Skill points at non-existent docs OR doesn't tell agent to verify state	Add a "verify your assumptions" step in `## Procedure`
Tool calls that should be there per `keywords:` are absent	Skill isn't actually being invoked when it should be	The `description:` is too weak — refine that field instead of the body

For each candidate refinement target, write a 1-2 sentence cause hypothesis:

target: social
hypothesis: 3 sessions show repeated `gh api graphql` calls with malformed `categoryId`
            args (sessions day-52, day-55, day-57). Skill's Procedure mentions categoryId
            but doesn't show the format. Refinement: add a verbatim example.

Carry this hypothesis into step 4 (action selection) and step 5 (Refine — it tells you what to write in the diff). Without a hypothesis, you're guessing; with one, the refinement is targeted and the eval (Refine step R4) has something concrete to compare.

If no clear hypothesis emerges from the traces, prefer NO-OP over speculative refinement. Counting alone is not a license to mutate.

统计告诉您哪个技能存在问题。诊断告诉您要修复什么。借鉴GEPA模式（遗传-帕累托提示演进）：阅读实际执行追踪，而非仅统计失败次数。

针对每个

complaint_signals ≥ 2

或

(wins/uses) < 0.5

（且

uses ≥ 3

）的技能，打开相关会话的

audit.jsonl

并查找以下失败模式：

audit.jsonl中的模式	可能原因	优化方向
相同的 `bash` 命令重试3次以上，参数仅有微小变化	技能缺少具体的命令示例	在 `## Procedure` 中添加逐字示例
`edit_file <P>` 之后的2次工具调用内出现 `git checkout … <P>` （相同路径），且在≥2个不同会话中重复出现	代理编辑并还原了同一路径——可能是变更被构建/测试拒绝，而非仅探索性操作	在 `## Pitfalls` 中添加一个条目，指出该脆弱模式
多个会话中出现 `success: false` ，且 `tool` 和 `args` 相似	技能的流程存在重复盲点	在 `## Pitfalls` 中添加条目；考虑添加“先执行此操作”的前置步骤
长bash序列（10次以上工具调用）未中间 `read_file` 相关文档	技能指向不存在的文档，或未告知代理验证状态	在 `## Procedure` 中添加“验证您的假设”步骤
根据 `keywords:` 应该出现的工具调用未出现	技能在应被调用时未被实际触发	`description:` 太薄弱——优化该字段而非正文

针对每个候选优化目标，撰写1-2句话的原因假设：

target: social
hypothesis: 3个会话显示重复调用`gh api graphql`时`categoryId`参数格式错误
            （会话day-52、day-55、day-57）。技能的Procedure提及了categoryId
            但未展示格式。优化方案：添加逐字示例。

将此假设带入步骤4（选择操作）和步骤5（优化——它告诉您差异更新中要写入的内容）。没有假设，您只是猜测；有了假设，优化将更具针对性，评估（优化步骤R4）也有具体的比较依据。

如果从追踪中未得出明确假设，优先执行NO-OP而非推测性优化。仅靠统计不足以支持变更。

4. Pick exactly one action

4. 选择恰好一项操作

Decision order (first match wins):

Retire (third cycle onward only): if any skill has
```
score < 0.3
```
AND
```
last_used
```
≥ 10 sessions ago, retire the lowest-scoring one. Skip if there are < 2 active eligible skills (don't bottom out the library).
Refine: if any skill (a) has
```
complaint_signals ≥ 2
```
, OR (b) has
```
(wins/uses) < 0.5
```
with
```
uses ≥ 3
```
, AND in either case has not been refined in the last 3 sessions (
```
last_evolved
```
check), refine it. This matches the diagnosis-trigger condition in step 3b. Pick the target with the strongest evidence (highest complaint count, or lowest wins-ratio if no complaints).
Create (second cycle onward only, and only if active skill count < 25): if any
```
pattern_key
```
appears in ≥3 distinct sessions of
```
learnings.jsonl
```
AND no existing eligible skill covers it (≥3 keyword overlap → refine that one instead), draft a new skill.
NO-OP: nothing meets the bars. Write a
```
NO-OP
```
event with a one-line note about what evidence you considered.

If you've written 3 consecutive

NO-OP

events, also write

evolution_saturation: true

to the event — the harness reads this and extends the cooldown.

决策顺序（匹配到第一个即执行）：

停用（仅第三个周期及以后）：如果任何技能
```
score < 0.3
```
且
```
last_used
```
≥10个会话之前，停用得分最低的那个。如果活跃符合条件的技能数量<2，则跳过（不要耗尽技能库）。
优化：如果任何技能(a)
```
complaint_signals ≥ 2
```
，或(b)
```
(wins/uses) < 0.5
```
且
```
uses ≥ 3
```
，且任一情况下该技能在过去3个会话中未被优化（检查
```
last_evolved
```
），则优化它。这与步骤3b中的诊断触发条件匹配。选择证据最充分的目标（负面反馈最多，若无负面反馈则选择成功率最低的）。
创建（仅第二个周期及以后，且活跃技能数量<25）：如果
```
learnings.jsonl
```
中≥3个不同会话出现相同的
```
pattern_key
```
，且现有符合条件的技能均未覆盖该模式（≥3个关键词重叠→改为优化该技能），则起草一个新技能。
NO-OP：无任何操作符合条件。写入
```
NO-OP
```
事件，并附上一行说明您考虑了哪些证据。

如果您连续写入3个

NO-OP

事件，请在第三个事件中添加

evolution_saturation: true

——系统会读取此标记并延长冷却时间。

5. Execute the action

5. 执行操作

Refine

优化

Refinement uses a snapshot + A/B eval pattern (borrowed from Anthropic's skill-creator). The goal: never commit a refinement that doesn't measurably improve the skill on at least one concrete prompt.

Step R1 — Snapshot the baseline. Before editing, copy the current SKILL.md to a temp location:

bash

mkdir -p /tmp/skill-evolve-baseline
cp "skills/<target>/SKILL.md" "/tmp/skill-evolve-baseline/<target>.SKILL.md"

Step R2 — Generate 2-3 synthetic test prompts. Read the target skill's

## When to use

and

## Procedure

sections. Derive concrete prompts a future agent might receive that should trigger this skill. Examples for

social

"Reply to discussion #42 with a thoughtful response"
"Post a 1-in-4-chance proactive riff in The Show category"
"Find unanswered questions in the Journal Club category"

Write them to

/tmp/skill-evolve-eval/<target>/prompts.json

json

[
  {"id": "p1", "prompt": "...", "expects": "<one-sentence success criterion>"},
  {"id": "p2", "prompt": "...", "expects": "..."}
]

Step R3 — Write the candidate diff. Use

edit_file

to apply your refinement. Constraints:

≤30 added lines, ≤15 removed lines (diff stat)
Touch only the
```
## Pitfalls
```
and
```
## Procedure
```
sections (or the skill's "what to do" body) — never the top-level
```
description:
```
, never any frontmatter field except the four bookkeeping fields established in step 3a:
```
score
```
,
```
uses
```
,
```
wins
```
,
```
last_used
```
. (
```
last_evolved
```
is also updated, to today's date.)

Step R4 — A/B compare. For each test prompt, generate a 1-3 sentence summary of how each version (baseline, candidate) would handle the prompt — what tools the agent would call, what order, what the outcome would look like.

Two execution modes, in order of preference:

Preferred (sub-agent A/B): if you have
```
sub_agent
```
available, dispatch two sub-agent calls in parallel:
- Sub-agent A: read
```
/tmp/skill-evolve-baseline/<target>.SKILL.md
```
  + the test prompt → output JSON
```
{"summary": "...", "tool_sequence": ["bash", "edit_file", ...]}
```
- Sub-agent B: same with the candidate file
- Use the structured outputs to compare apples-to-apples.
Fallback (single-agent sequential): if
```
sub_agent
```
isn't available or returned an error, read the baseline file, write a baseline summary; then read the candidate file, write a candidate summary. Be deliberate about not letting the candidate read bias the baseline read — write the baseline summary BEFORE looking at the candidate.

For each prompt, decide one of:

```
candidate-better
```
: candidate's procedure is more specific, addresses the prompt more directly
```
tie
```
: no meaningful difference
```
baseline-better
```
: regression — the refinement made things worse

Step R5 — Decide. Commit the refinement only if:

0 prompts came out
```
baseline-better
```
, AND
At least 1 prompt came out
```
candidate-better
```

Otherwise: revert the edit (

cp /tmp/skill-evolve-baseline/<target>.SKILL.md skills/<target>/SKILL.md

) and write a

NO-OP

event with

eval-result: regression

(or

eval-result: tie

Step R6 — Append eval summary to the
_journal.md
event. Add an

eval-summary:

field to the event:

- eval-summary: 2/2 prompts candidate-better, 0 regressions

Or for a NO-OP-after-eval:

- eval-summary: 1/2 baseline-better — refinement was a regression on prompt p2 ("..."). Reverted.

优化采用快照+A/B评估模式（借鉴Anthropic的技能创建器）。目标：绝不提交无法在至少一个具体提示上显著提升技能效果的优化。

步骤R1 — 快照基线 编辑前，将当前SKILL.md复制到临时位置：

bash

mkdir -p /tmp/skill-evolve-baseline
cp "skills/<target>/SKILL.md" "/tmp/skill-evolve-baseline/<target>.SKILL.md"

步骤R2 — 生成2-3个合成测试提示 阅读目标技能的

## When to use

和

## Procedure

部分。推导未来代理可能收到的、应触发此技能的具体提示。例如

social

技能的示例：

“回复讨论#42，给出有深度的回应”
“在The Show类别中发布一个1/4概率的主动式即兴内容”
“在Journal Club类别中查找未回答的问题”

将它们写入

/tmp/skill-evolve-eval/<target>/prompts.json

：

json

[
  {"id": "p1", "prompt": "...", "expects": "<一句话成功标准>"},
  {"id": "p2", "prompt": "...", "expects": "..."}
]

步骤R3 — 撰写候选差异 使用

edit_file

应用您的优化。约束：

新增行数≤30，删除行数≤15（差异统计）
仅修改
```
## Pitfalls
```
和
```
## Procedure
```
部分（或技能的“操作说明”正文）——绝不修改顶层
```
description:
```
，绝不修改前置声明中除步骤3a确立的四个记录字段之外的任何字段：
```
score
```
、
```
uses
```
、
```
wins
```
、
```
last_used
```
。（
```
last_evolved
```
也会更新为当前日期。）

步骤R4 — A/B对比 针对每个测试提示，生成一段1-3句话的摘要，说明每个版本（基线、候选）将如何处理该提示——代理将调用哪些工具，顺序如何，结果会是什么样。

两种执行模式，优先顺序如下：

首选（子代理A/B）：如果
```
sub_agent
```
可用，并行调度两个子代理调用：
- 子代理A：读取
```
/tmp/skill-evolve-baseline/<target>.SKILL.md
```
  + 测试提示 → 输出JSON
```
{"summary": "...", "tool_sequence": ["bash", "edit_file", ...]}
```
- 子代理B：使用候选文件执行相同操作
- 使用结构化输出进行直接对比。
备选（单代理顺序）：如果
```
sub_agent
```
不可用或返回错误，先读取基线文件，撰写基线摘要；然后读取候选文件，撰写候选摘要。注意不要让候选文件的内容影响基线读取——先撰写基线摘要，再查看候选文件。

针对每个提示，做出以下决策之一：

```
candidate-better
```
：候选流程更具体，更直接地处理提示
```
tie
```
：无显著差异
```
baseline-better
```
：退化——优化使情况变差

步骤R5 — 决策 仅在以下情况下提交优化：

0个提示显示
```
baseline-better
```
，且
至少1个提示显示
```
candidate-better
```

否则：还原编辑（

cp /tmp/skill-evolve-baseline/<target>.SKILL.md skills/<target>/SKILL.md

）并写入

NO-OP

事件，添加

eval-result: regression

（或

eval-result: tie

）。

步骤R6 — 将评估摘要追加到
_journal.md
事件在事件中添加

eval-summary:

字段：

- eval-summary: 2/2提示候选版本更优，无退化

或者针对评估后的NO-OP：

- eval-summary: 1/2提示基线版本更优——优化在提示p2（"..."）上出现退化。已还原。

Create

创建

Draft

skills/<new-name>/SKILL.md

yaml

---
name: <new-name>
description: "[CANDIDATE — unreviewed] <pushy one-line trigger description, ≤200 chars total>"
tools: [bash, read_file, ...]
origin: yoyo
status: candidate
score: 0.5
uses: 0
wins: 0
last_used: null
last_evolved: <today>
parent_pattern_key: <kebab-case verb.object>
keywords: ["<distinctive substring 1>", "<distinctive substring 2>", "..."]   # ≥3 strings that, if found in a session's audit.jsonl, indicate this skill was used
---

起草

skills/<new-name>/SKILL.md

：

yaml

---
name: <new-name>
description: "[CANDIDATE — unreviewed] <具有引导性的一行触发描述，总长度≤200字符>"
tools: [bash, read_file, ...]
origin: yoyo
status: candidate
score: 0.5
uses: 0
wins: 0
last_used: null
last_evolved: <today>
parent_pattern_key: <短横线分隔的动词.宾语>
keywords: ["<独特子字符串1>", "<独特子字符串2>", "..."]   # ≥3个字符串，若在会话audit.jsonl中出现，表明此技能被使用
---

<Title>

<标题>

When to use

何时使用

<具体触发条件>

Quick reference

快速参考

<one-screen cheat sheet>

<一屏大小的速查表>

Procedure

操作流程

<编号步骤>

Pitfalls

注意事项

<过往出现过的问题>

Verification

验证方式

<how the skill knows it succeeded> ```

The

[CANDIDATE — unreviewed]

prefix is critical — it tells the agent in future sessions to treat the skill as experimental, not as system-prompt-grade truth.

<技能如何判断自身执行成功>


`[CANDIDATE — unreviewed]`前缀至关重要——它告诉未来会话中的代理将该技能视为实验性技能，而非系统提示级别的可信内容。

Retire

停用

bash

git mv skills/<name>/ skills_attic/<name>/

Soft delete. Recoverable. If yoyo invokes the skill's domain again within 3 cycles, you may revive it (move back, reset score to 0.5).

bash

git mv skills/<name>/ skills_attic/<name>/

软删除。可恢复。如果yoyo在3个周期内再次调用该技能的领域，您可以恢复它（移回原位置，将得分重置为0.5）。

6. Validate

6. 验证

Before committing, run all of these. If any fails, write

refused

and exit:

bash

undefined

提交前，请运行以下所有验证。如果任何一项失败，写入

refused

并退出：

bash

undefined

YAML frontmatter parses (use python3 since yq may not be installed):

YAML前置声明可解析（使用python3，因为yq可能未安装）：

python3 -c " import sys, re content = open('skills/<name>/SKILL.md').read() m = re.match(r'---\n(.*?)\n---\n', content, re.DOTALL) assert m, 'no frontmatter' fm = m.group(1) assert len(fm) <= 1900, f'frontmatter too long: {len(fm)}'

crude parse

粗略解析

for line in fm.splitlines(): if line.strip() and ':' not in line: sys.exit(f'invalid line: {line}') "

Description ≤ 200 chars:

描述≤200字符：

desc=$(grep '^description:' skills/<name>/SKILL.md | head -1 | sed 's/^description: *//') [ "${#desc}" -le 200 ] || { echo "description too long"; exit 1; }

Body token estimate (~ word count, ceiling 5000):

正文字数估计（~单词数，上限5000）：

body_words=$(awk '/^---$/{n++; next} n>=2' skills/<name>/SKILL.md | wc -w) [ "$body_words" -le 5000 ] || { echo "body too long"; exit 1; }

Build still works (the meta-skill itself shouldn't break the build, but defense in depth):

构建仍可正常运行（元技能本身不应破坏构建，但需深度防御）：

cargo build --release 2>&1 | tail -5

undefined

cargo build --release 2>&1 | tail -5

undefined

7. Append the event to

skills/_journal.md

7. 将事件追加到

skills/_journal.md

Get the next event number:

bash

last=$(grep -oE 'evt-[0-9]+' skills/_journal.md | sort -u | tail -1)
n=$((${last#evt-} + 1))
evt=$(printf 'evt-%04d' $n)

Append (using

>>

, never overwrite):

undefined

获取下一个事件编号：

bash

last=$(grep -oE 'evt-[0-9]+' skills/_journal.md | sort -u | tail -1)
n=$((${last#evt-} + 1))
evt=$(printf 'evt-%04d' $n)

追加（使用

>>

，绝不要覆盖）：

undefined

<ISO8601> <evt-NNNN> <type>

skill: <name or "-">
trigger: <one-line summary of evidence>
diff: <+A -B (path)> or "n/a"
validation: <pass | reason for refusal>
score-delta: <old> → <new>
parent-event: <evt-NNNN>
expected: <observable signal | horizon | fallback> # required for refine/create only; forbidden on all other types
note: <optional one-line>


Where `<type>` is one of: `init`, `refine`, `create`, `retire`, `revive`, `meta-suggestion`, `refused`, `NO-OP`.

skill: <名称或"-">
trigger: <一句话证据摘要>
diff: <+A -B (路径)> 或 "n/a"
validation: <pass | 拒绝原因>
score-delta: <旧值> → <新值>
parent-event: <evt-NNNN>
expected: <可观测信号 | 时间范围 | 备选方案> # 仅refine/create需要；其他类型禁止使用
note: <可选一句话说明>


其中`<type>`为以下之一：`init`、`refine`、`create`、`retire`、`revive`、`meta-suggestion`、`refused`、`NO-OP`。

What an

expected:

line must do (and must not be)

expected:

行必须包含（且不能包含）的内容

A good

expected:

line names all three of: a concrete observable signal, a horizon, and a fallback move.

Concrete observables you may reference:

A skill's frontmatter
```
uses
```
/
```
wins
```
/
```
score
```
(e.g. "social.uses should grow by ≥3 over the next 5 sessions")
A specific failure cluster's recurrence in audit-log sessions (e.g. "the gh-discussion-comment STUCK cluster should drop to 0 hits within 5 sessions")
A trace pattern from step 3b (e.g. "the
```
git checkout
```
revert-after-edit pattern on social/SKILL.md should not recur in the next 3 sessions")
A concrete tool-call sequence that should/should not appear in audit.jsonl

Horizons: "by next cycle", "within ~3 sessions", "within ~5 sessions", "within 7 days". Do not say "eventually" or omit the horizon.

Fallbacks: name the next move if the prediction does not hold. Examples: "...otherwise this is a sub-skill candidate, not a prose refine"; "...otherwise the

description:

is the wrong target — try refining the body instead"; "...otherwise retire the skill".

Worked examples:

For a

refine

event:

- expected: STUCK rate on the gh-discussion-comment cluster should drop to 0
  within the next ~5 evolve sessions; if not, the prose tweak was insufficient
  and a helper script (sub-skill) is the right next step

For a

create

event:

- expected: at least 2 sessions in the next 5 should match this skill's
  keywords[] AND have outcome.json.test_ok=true (i.e. wins ≥ 2 by next cycle);
  if uses < 2 by then, the description: is too narrow and needs widening, or
  the pattern was a one-off and the skill should retire

Anti-patterns to refuse (these do not satisfy HARD RULE #4 — NO-OP instead of writing them):

"feels better"
"will be more readable"
"the prose is now clearer"
"users will like it"
"yoyo will use this skill more" (no horizon, no signal)
"this should help" (no horizon, no signal, no fallback)

If your candidate

expected:

line reads like one of those, you do not have a theory of impact — the evidence does not justify a mutation this cycle. Write

NO-OP

and move on.

优秀的

expected:

行必须明确三点：具体可观测信号、时间范围、备选方案。

可引用的具体可观测信号：

技能前置声明中的
```
uses
```
/
```
wins
```
/
```
score
```
（例如“social.uses应在未来5个会话中增长≥3”）
审计日志会话中特定失败集群的重复出现次数（例如“gh-discussion-comment STUCK集群应在5个会话内降至0次”）
步骤3b中的追踪模式（例如“social/SKILL.md上的
```
git checkout
```
编辑后还原模式在未来3个会话中不应再次出现”）
audit.jsonl中应出现或不应出现的具体工具调用序列

时间范围：“到下一个周期”、“约3个会话内”、“约5个会话内”、“7天内”。不要说“最终”或省略时间范围。

备选方案：如果预测未实现，说明下一步操作。示例：“...否则这是子技能候选，而非文案优化”；“...否则

description:

不是正确的优化目标——尝试优化正文”；“...否则停用该技能”。

示例：

针对

refine

事件：

- expected: gh-discussion-comment集群的STUCK率应在未来约5个演进会话内降至0；
  若未实现，则文案调整不足，下一步应使用辅助脚本（子技能）

针对

create

事件：

- expected: 未来5个会话中至少2个应匹配此技能的keywords[]，且outcome.json.test_ok=true
  （即到下一个周期时wins≥2）；若届时uses<2，则说明description:
  过于狭窄，需要放宽，或者该模式是一次性的，技能应被停用

需避免的反模式（这些不符合硬性规则#4——请执行NO-OP而非写入）：

“感觉更好”
“将更具可读性”
“文案现在更清晰了”
“用户会喜欢它”
“yoyo将更多使用此技能” （无时间范围，无信号）
“这应该会有帮助” （无时间范围，无信号，无备选方案）

如果您的候选

expected:

行类似于上述内容，则您没有明确的影响理论——现有证据不足以支持本周期的变更。请写入NO-OP并继续。

8. Commit

8. 提交

bash

git add skills/ skills_attic/ memory/learnings.jsonl
git commit -m "skill-evolve: <type> <skill-name>" || true

The harness pushes (or doesn't, depending on its config). Do not push from inside this skill.

bash

git add skills/ skills_attic/ memory/learnings.jsonl
git commit -m "skill-evolve: <type> <skill-name>" || true

系统会负责推送（或不推送，取决于其配置）。请勿在此技能内部执行推送操作。

Anti-bloat ceilings

防膨胀上限

Before any

create

action, verify all of these:

Active skill count (any with
```
status: active
```
or
```
status: refined
```
) ≤ 25 before this create. If at the limit, you must
```
retire
```
first or write
```
NO-OP
```
.
Total skill count in
```
skills/
```
(excluding any skill with
```
core: true
```
) ≤ 30.
The new skill's frontmatter is ≤ 1900 chars.
The new skill's description is ≤ 200 chars (including the
```
[CANDIDATE — unreviewed]
```
prefix).
The new skill's body is ≤ 5000 words.
No existing eligible skill has ≥3 keyword overlap with the new skill's
```
When to use
```
section. If so, refine that skill instead.

执行任何

create

操作前，请验证以下所有条件：

创建前活跃技能数量（任何
```
status: active
```
或
```
status: refined
```
的技能）≤25。如果已达上限，您必须先
```
retire
```
一个技能或执行NO-OP。
```
skills/
```
中的总技能数量（排除任何
```
core: true
```
的技能）≤30。
新技能的前置声明≤1900字符。
新技能的描述≤200字符（包括
```
[CANDIDATE — unreviewed]
```
前缀）。
新技能的正文≤5000单词。
现有符合条件的技能与新技能的
```
When to use
```
部分的关键词重叠≥3个。如果是，改为优化现有技能。

Failure modes you must guard against

您必须防范的失败模式

Mode	What it looks like	What you do
Skill thrashing	Same skill refined twice within 3 sessions	Read `last_evolved` before refining; if < 3 sessions ago, pick a different target or NO-OP
Saturation	3 consecutive NO-OP events in `_journal.md`	Add `evolution_saturation: true` to the third event; harness will extend cooldown
Self-edit attempt	Pattern points at `skill-evolve` itself	HARD RULE #2 — write `meta-suggestion` and stop
Core-edit attempt	Pattern points at one of the core 4	HARD RULE #1 — write `learnings.jsonl` entry and stop
Skill collision	New skill's triggers overlap an existing skill	Refine the existing skill instead
Identity drift	Pattern would contradict IDENTITY.md / PERSONALITY.md	Refuse; write a `learnings.jsonl` entry noting the contradiction

模式	表现	应对措施
技能震荡	同一技能在3个会话内被优化两次	优化前查看 `last_evolved` ；如果距上次优化<3个会话，选择其他目标或执行NO-OP
饱和	`_journal.md` 中连续3个NO-OP事件	在第三个事件中添加 `evolution_saturation: true` ；系统将延长冷却时间
尝试自我编辑	模式指向 `skill-evolve` 自身	遵守硬性规则#2——写入 `meta-suggestion` 并停止
尝试编辑核心技能	模式指向4个核心技能之一	遵守硬性规则#1——写入learnings.jsonl条目并停止
技能冲突	新技能的触发条件与现有技能重叠	改为优化现有技能
身份漂移	模式与IDENTITY.md / PERSONALITY.md矛盾	拒绝操作；写入learnings.jsonl条目，注明矛盾点

What good looks like

健康状态的表现

A healthy

skills/_journal.md

after 30 days:

4–10 events total (you don't run every session, and most cycles are NO-OP)
Mix of refine (~50%), create (~10%), retire (~10%), NO-OP (~30%)
Zero
```
refused: self-edit
```
or
```
refused: core-edit
```
events (your hard rules are holding)
Per-skill EMA scores trending up or stable (not down)
```
pattern_key
```
recurrence dispersal falling over time — yoyo is internalizing patterns, not re-discovering them

If you see thrashing, score decay, or many refusals, write a

meta-suggestion

and let the human creator tighten the loop.

30天后健康的

skills/_journal.md

应具备以下特征：

总计4–10个事件（您并非每个会话都运行，且大多数周期为NO-OP）
混合了优化（约50%）、创建（约10%）、停用（约10%）、NO-OP（约30%）
无
```
refused: self-edit
```
或
```
refused: core-edit
```
事件（您遵守了硬性规则）
每个技能的EMA得分呈上升或稳定趋势（而非下降）
```
pattern_key
```
的重复出现率随时间下降——yoyo正在内化模式，而非重复发现模式

如果您看到震荡、得分下降或大量拒绝事件，请写入

meta-suggestion

，让人类创作者收紧循环。

skill-evolve

Original

Translation

Skill Evolution

技能演进

When to use

使用时机

Hard rules (read first, every cycle)

硬性规则（每次周期前必读）

HARD RULE #1 — Eligible targets only (allow-list)

硬性规则 #1 — 仅允许符合条件的目标（白名单）

HARD RULE #2 — Never edit yourself

硬性规则 #2 — 绝不编辑自身

evt-XXXX meta-suggestion

evt-XXXX meta-suggestion

HARD RULE #3 — One mutation per cycle

硬性规则 #3 — 每个周期仅执行一次变更

HARD RULE #4 — Refine and Create events must declare an expected outcome

硬性规则 #4 — 优化和创建事件必须声明预期结果

Glossary

术语表

Bootstrap (first three real cycles only)

引导阶段（仅前三个实际周期）

cycle_index=0 → this is the first real cycle

cycle_index=0 → 这是第一个实际周期

cycle_index=1 → second

cycle_index=1 → 第二个

cycle_index=2 → third

cycle_index=2 → 第三个

cycle_index>=3 → full lifecycle unlocked

cycle_index>=3 → 完整生命周期解锁

Lifecycle states

生命周期状态

Cycle execution sequence

周期执行流程

1. Read evidence

1. 读取证据

Latest cycles:

最新周期：

Recent self-reflection:

近期自我反思：

Top of journal (newest entries are at top):

日志顶部（最新条目在最上方）：

Recent runs:

近期运行记录：

Audit evidence (set by harness, points at audit-log worktree):

审计证据（由系统设置，指向审计日志工作区）：

2. Enumerate eligible skills

2. 枚举符合条件的技能

Allow-list: only skills declaring origin: yoyo are eligible.

白名单：仅声明origin: yoyo的技能符合条件。

Defense in depth: also exclude anything carrying core: true.

深度防御：同时排除任何设置了core: true的技能。

3. Mine patterns

3. 挖掘模式

3a. Count basic signals

3a. 统计基础信号

3b. Diagnose the cause (trace-based)

3b. 诊断原因（基于追踪）

4. Pick exactly one action

4. 选择恰好一项操作

5. Execute the action

5. 执行操作

Refine

优化

Create

创建

<Title>

<标题>

When to use

何时使用

Quick reference

快速参考

Procedure

操作流程

Pitfalls

注意事项

Verification

验证方式

Retire

7. Append the event to
`skills/_journal.md`

7. 将事件追加到
`skills/_journal.md`

What an
`expected:`
line must do (and must not be)

`expected:`
行必须包含（且不能包含）的内容