writing-great-skills

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

A skill exists to wrangle determinism out of a stochastic system. Predictability — the agent taking the same process every run, not producing the same output — is the root virtue; every lever below serves it.

Bold terms are defined in

GLOSSARY.md

; look them up there for the full meaning.

Skill的存在是为了从随机系统中获取确定性。可预测性——指Agent每次运行都遵循相同的流程，而非生成相同的输出——是核心要义；以下所有准则都是为实现这一点服务的。

加粗术语的定义见

GLOSSARY.md

；如需完整含义，请查阅该文档。

Invocation

调用方式

Two choices, trading different costs:

A model-invoked skill keeps a description, so the agent can fire it autonomously and other skills can reach it (you can still type its name too). It contributes to context load — the description sits in the window every turn. Mechanics: omit
```
disable-model-invocation
```
, and write a model-facing description with rich trigger phrasing ("Use when the user wants…, mentions…").
A user-invoked skill strips the description from the agent's reach: only you, typing its name, can invoke it — and no other skill can. Zero context load, but it spends cognitive load: you are the index that must remember it exists. Mechanics: set
```
disable-model-invocation: true
```
; the
```
description
```
becomes human-facing — a one-line summary, trigger lists stripped.

Pick model-invocation only when the agent must reach the skill on its own, or another skill must. If it only ever fires by hand, make it user-invoked and pay no context load.

When user-invoked skills multiply past what you can remember, that piled-up cognitive load is cured by a router skill: one user-invoked skill that names the others and when to reach for each.

有两种选择，各有不同的成本权衡：

模型调用式Skill保留描述信息，因此Agent可以自主触发它，其他Skill也能调用它（你也可以手动输入名称触发）。它会增加上下文负载——描述信息会在每一轮交互中留在上下文窗口内。实现方式：省略
```
disable-model-invocation
```
，并编写面向模型的描述信息，包含丰富的触发表述（如“当用户想要……、提及……时使用”）。
用户调用式Skill会屏蔽Agent对描述信息的访问权限：只有你手动输入名称才能触发它，且其他Skill无法调用它。这种方式无上下文负载，但会增加认知负载——你需要记住它的存在。实现方式：设置
```
disable-model-invocation: true
```
；此时
```
description
```
变为面向人类的内容——一句简短的概述，移除触发列表。

只有当Agent必须自主调用该Skill，或其他Skill需要调用它时，才选择模型调用式。如果它仅需手动触发，就设为用户调用式，无需承担上下文负载。

当用户调用式Skill数量多到你记不住时，堆积的认知负载可以通过路由Skill解决：一个用户调用式Skill，负责列出其他Skill及其适用场景。

Writing the description

描述信息撰写

A model-invoked description does two jobs — state what the skill is, and list the branches that should trigger it. Every word increases context load, so a description earns even harder pruning than the body:

Front-load the skill's leading word — the description is where it does its invocation work.
One trigger per branch. Synonyms that rename a single branch are duplication — "build features using TDD … asks for test-first development" is one branch written twice. Collapse them; keep only genuinely distinct branches.
Cut identity that's already in the body. Keep the description to triggers, plus any "when another skill needs…" reach clause.

模型调用式Skill的描述信息有两个作用——说明Skill的用途，以及列出应触发它的分支场景。每一个词都会增加上下文负载，因此描述信息的精简程度要比正文更严格：

前置Skill的核心关键词——描述信息是触发Skill的关键载体。
每个分支场景对应一个触发条件。同义词重复描述同一分支属于冗余——比如“使用TDD构建功能……要求先写测试再开发”其实是同一个分支的两种表述。应将它们合并，只保留真正不同的分支场景。
删除正文中已有的身份说明。描述信息只需保留触发条件，以及“当其他Skill需要……时调用”的关联条款。

Information hierarchy

信息层级

A skill is built from two content types — steps and reference — that mix freely: a skill can be all steps, all reference, or both. The core decision is which to use and where each sits on the information hierarchy, a ladder ranked by how immediately the agent needs the material:

In-skill step — an ordered action in
```
SKILL.md
```
, the primary tier: what the agent does, in order. Each step ends on a completion criterion, the condition that tells the agent the work is done. Make it checkable (can the agent tell done from not-done?) and, where it matters, exhaustive ("every modified model accounted for", not "produce a change list") — a vague criterion invites premature completion.
In-skill reference — a definition, rule, or fact in
```
SKILL.md
```
, consulted on demand. Often a legitimately flat peer-set (every rule of a review on one rung) — a fine arrangement, not a smell. This skill is all reference.
External reference — reference pushed out of
```
SKILL.md
```
into a separate file, reached by a context pointer, loaded only when the pointer fires. (Spans disclosed reference — a sibling file like
```
GLOSSARY.md
```
, still part of the skill — through fully external reference that lives outside the skill system and any skill can point at.)

A demanding completion criterion drives thorough legwork — the digging the agent does within the work — whether the skill has steps or not, since "every rule applied" binds flat reference just as "every step done" binds a sequence.

Push too little down and the top bloats; push too much and you hide material the agent actually needs. That tension is the whole decision.

Progressive disclosure is the move down the ladder — out of

SKILL.md

into a linked file — so the top stays legible. Mechanics: a linked

.md

file in the skill folder, named for what it holds (this skill discloses its full definitions to

GLOSSARY.md

). Some skills are used in more than one way, and each distinct way is a branch — different runs taking different paths through the skill. Branching is the cleanest disclosure test: inline what every branch needs, and push behind a pointer what only some branches reach. A context pointer's wording, not its target, decides when and how reliably the agent reaches the material.

Where the ladder decides how far down a piece sits, co-location decides what sits beside it once there: keep a concept's definition, rules, and caveats under one heading rather than scattered, so reading one part brings its neighbours with it.

Skill由两种内容类型构成——步骤和参考资料——二者可以自由组合：一个Skill可以全是步骤、全是参考资料，或二者兼具。核心决策是选择哪种类型，以及每种内容在信息层级中的位置，该层级根据Agent对内容的即时需求程度排序：

Skill内步骤——
```
SKILL.md
```
中的有序操作，属于核心层级：Agent需要按顺序执行的动作。每个步骤都有完成标准，即告知Agent工作已完成的条件。完成标准需具备可验证性（Agent能否区分完成与未完成状态？），在必要时还需具备穷尽性（如“所有修改的模型都已记录”，而非“生成变更列表”）——模糊的标准会导致提前完成问题。
Skill内参考资料——
```
SKILL.md
```
中的定义、规则或事实，供Agent按需查阅。通常是合理的平级结构（比如评审的所有规则放在同一层级）——这种安排没问题，并非不合理设计。本Skill全部为参考资料。
外部参考资料——从
```
SKILL.md
```
中移出到独立文件的参考内容，通过上下文指针访问，仅当指针触发时才加载。（涵盖公开参考资料——如
```
GLOSSARY.md
```
这类同属Skill的同级文件——以及完全外部参考资料——位于Skill系统之外，任何Skill都可指向它。）

严格的完成标准会促使Agent完成全面的基础工作——即在任务范围内深入挖掘——无论Skill是否包含步骤，因为“所有规则已应用”对平级参考资料的约束，就像“所有步骤已完成”对序列步骤的约束一样。

如果内容下放太少，顶层会过于臃肿；下放太多，又会隐藏Agent实际需要的内容。这种权衡是决策的核心。

渐进式公开是将内容沿层级下移的方式——从

SKILL.md

移至链接文件——以保持顶层内容的可读性。实现方式：在Skill文件夹中创建一个链接的

.md

文件，根据内容命名（本Skill将完整定义公开至

GLOSSARY.md

）。有些Skill有多种使用方式，每种不同的方式就是一个分支场景——不同的运行流程会走Skill的不同路径。分支场景是最清晰的公开测试标准：将所有分支都需要的内容内联显示，仅将部分分支需要的内容放在指针之后。上下文指针的表述方式，而非其目标，决定Agent何时以及能否可靠地访问相关内容。

层级决定了内容的下放深度，而同址存放则决定了内容下放后的位置：将一个概念的定义、规则和注意事项放在同一标题下，而非分散存放，这样阅读一部分内容时就能关联到相关的其他内容。

When to split

拆分时机

Granularity is how finely you divide skills, and each cut spends one of the two loads, so split only when the cut earns it. Two cuts:

By invocation — split off a model-invoked skill when you have a distinct leading word that should trigger it on its own, or another skill must reach it. You pay context load for the new always-loaded description, so that independent reach has to be worth it.
By sequence — split a run of steps when the steps still ahead (a step's post-completion steps) tempt the agent to rush the one in front of it (premature completion). Keeping them out of view encourages the agent to do more legwork on the current task.

粒度是指Skill的细分程度，每一次拆分都会消耗两种负载之一，因此只有当拆分带来的收益大于成本时才进行拆分。两种拆分方式：

按调用方式拆分——当你有一个独特的核心关键词可以自主触发Skill，或其他Skill需要调用它时，拆分出一个模型调用式Skill。你需要为新的常驻描述信息支付上下文负载，因此这种独立调用的价值必须足够高。
按流程序列拆分——当后续步骤（某一步骤的完成后步骤）会诱使Agent仓促完成当前步骤（提前完成）时，拆分步骤序列。将后续步骤隐藏起来，能促使Agent在当前任务上完成更多基础工作。

Pruning

精简优化

Keep each meaning in a single source of truth: one authoritative place, so changing the behaviour is a one-place edit.

Check every line for relevance: does it still bear on what the skill does?

Then hunt no-ops sentence by sentence, not just line by line: run the no-op test on each sentence in isolation, and when one fails, delete the whole sentence rather than trim words from it. Be aggressive — most prose that fails should go, not be rewritten.

确保每个含义都有单一可信来源：一个权威的存放位置，这样修改行为只需在一处编辑即可。

检查每一行内容的相关性：它是否仍与Skill的功能相关？

然后逐句排查无效内容，而非仅逐行检查：对每个句子单独进行无效性测试，一旦测试不通过，就删除整个句子，而非删减词语。要果断——大多数不符合要求的内容都应直接删除，而非改写。

Leading words

核心关键词

A leading word is a compact concept already living in the model's pretraining that the agent thinks with while running the skill (e.g. lesson, fog of war, tracer bullets). Repeated throughout the text (though not necessarily - a strong leading word might only be needed once), it accumulates a distributed definition and anchors a whole region of behaviour in the fewest tokens, by recruiting priors the model already holds.

It serves predictability twice. In the body it anchors execution: the agent reaches for the same behaviour every time the word appears. In the description it anchors invocation: when the same word lives in your prompts, docs, and code, the agent links that shared language to the skill and fires it more reliably.

Hunt for opportunities to refactor skills to use leading words. A triad spelled out at three sites (duplication), a description spending a sentence to gesture at one idea — each is a passage begging to collapse into a single token. Examples include:

"fast, deterministic, low-overhead" -> tight — one quality restated across a phase — into a single pretrained word (a tight loop).
"a loop you believe in" -> red — converts a fuzzy gate into a binary observable state (the loop goes red on the bug, or it doesn't).

You win twice over: fewer tokens, and a sharper hook for the agent to hang its thinking on. Assume every skill is carrying restatements that leading words retire — go find them.

核心关键词是模型预训练中已存在的简洁概念，Agent在运行Skill时会以此为思考依据（例如_lesson_、fog of war、tracer bullets）。在文本中重复使用（也并非必须——强大的核心关键词可能只需出现一次），它会积累分布式定义，并通过调用模型已有的先验知识，用最少的token锚定一整套行为。

它从两方面保障可预测性。在正文中，它锚定执行过程：每当出现该关键词，Agent都会采取相同的行为。在描述信息中，它锚定触发逻辑：当你的提示词、文档和代码中使用同一个关键词时，Agent会将这种共享语言与Skill关联起来，从而更可靠地触发它。

寻找重构Skill以使用核心关键词的机会。在三个地方重复表述的三元组（冗余）、用一句话来表达某个想法的描述信息——这些内容都可以浓缩为一个单一token。例如：

“快速、确定、低开销” -> tight ——将一个阶段重复表述的特质浓缩为一个预训练词汇（如_tight_ loop）。
“你信任的循环” -> red ——将模糊的判断标准转化为可观测的二元状态（循环遇到bug时变为_red_，否则不变）。

这样做有双重好处：减少token数量，同时为Agent提供更清晰的思考锚点。假设每个Skill都存在可以用核心关键词替代的重复表述——去找到它们。

Failure modes

失败模式

Use these to diagnose issues the user may be having with the skill.

Premature completion — ending a step before it's genuinely done, attention slipping to being done. Defence, in order: sharpen the completion criterion first (cheap, local); only if it is irreducibly fuzzy and you observe the rush, hide the post-completion steps by splitting (the sequence cut).
Duplication — the same meaning in more than one place. Costs maintenance and tokens, and inflates a meaning's prominence on the ladder past its real rank.
Sediment — stale layers that settle because adding feels safe and removing feels risky. The default fate of any skill without a pruning discipline.
Sprawl — a skill simply too long, even when every line is live and unique. Hurts readability and maintainability and wastes tokens. The cure is the ladder: disclose reference behind pointers, and split by branch or sequence so each path carries only what it needs.
No-op — a line the model already obeys by default, so you pay load to say nothing. The test: does it change behaviour versus the default? A weak leading word (be thorough when the agent is already thorough-ish) is a no-op; the fix is a stronger word (relentless), not a different technique.

用这些模式诊断用户在使用Skill时可能遇到的问题。

提前完成——在步骤真正完成前就结束，注意力转移到“完成任务”上。防御措施按优先级排序：首先细化完成标准（成本低、本地化）；只有当完成标准不可避免地模糊，且确实观察到仓促完成的情况时，才通过拆分（流程序列拆分）隐藏完成后步骤。
冗余——同一含义出现在多个地方。会增加维护成本和token消耗，还会让该含义在信息层级中的地位高于其实际重要性。
沉淀冗余——因添加内容看似安全、删除内容看似有风险而积累的陈旧内容。如果没有精简优化的机制，这是所有Skill的默认结局。
内容膨胀——Skill过长，即使每一行内容都是有效且唯一的。会损害可读性和可维护性，浪费token。解决方法是利用信息层级：将参考资料通过指针公开，按分支场景或流程序列拆分，让每个路径只携带所需内容。
无效内容——模型默认已遵循的内容，因此你付出了负载成本却没有产生任何效果。测试方法：与默认情况相比，它是否改变了行为？弱核心关键词（比如当Agent本来就比较严谨时，使用_be thorough_）属于无效内容；解决方法是使用更强的关键词（如_relentless_），而非更换技术手段。