Writing Skills

Overview

Writing skills means applying Test-Driven Development (TDD) to process documentation.

Personal skills are stored in agent-specific directories (Claude Code uses
~/.claude/skills
, Codex uses
~/.agents/skills/
)

You write test cases (stress scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch the tests pass (agent follows rules), then refactor (plug loopholes).

Core Principle: If you don't observe the agent failing without the skill, you don't know if the skill teaches the right thing.

Required Background: Before using this skill, you must understand superpowers:test-driven-development. That skill defines the basic red-green-refactor cycle. This skill adapts TDD to documentation writing.

Official Guidelines: For Anthropic's official best practices for skill writing, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement this skill's TDD-oriented approach.

What is a Skill?

A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective methods.

Skills are: Reusable techniques, patterns, tools, reference guides

Skills are NOT: Narratives about how you solved a problem once

TDD Mapping to Skills

TDD Concept	Skill Creation
Test Case	Stress scenario with subagents
Production Code	Skill documentation (SKILL.md)
Test Failure (Red)	Agent violates rules without the skill (baseline)
Test Pass (Green)	Agent follows rules with the skill
Refactor	Plug loopholes while maintaining compliance
Write Test First	Run baseline scenarios before writing the skill
Observe Failure	Record the exact rationalizations the agent uses
Minimal Code	Write the skill specifically for those violations
Observe Pass	Verify the agent now follows the rules
Refactor Cycle	Discover new rationalizations → plug → re-verify

The entire skill creation process follows red-green-refactor.

When to Create a Skill

Create when:

The technique isn't intuitively obvious to you
You'll reference it repeatedly across different projects
The pattern has broad applicability (not project-specific)
Others will also benefit

Don't create:

One-off solutions
Standard practices that are well-documented elsewhere
Project-specific conventions (put in CLAUDE.md)
Mechanical constraints (automate if you can enforce with regex/validation – documentation is for scenarios requiring judgment)

Skill Types

Technical

Methods with specific steps (condition-based-waiting, root-cause-tracing)

Pattern

Ways to think about problems (flatten-with-flags, test-invariants)

Reference

API docs, syntax guides, tool documentation (office docs)

Directory Structure

skills/
  skill-name/
    SKILL.md              # Main reference document (required)
    supporting-file.*     # Only when needed

Flat Namespace - All skills live in a single searchable namespace

When to separate files:

Large reference content (100+ lines) - API documentation, comprehensive syntax explanations
Reusable tools - Scripts, utilities, templates

Keep inline:

Principles and concepts
Code patterns (< 50 lines)
Everything else

SKILL.md Structure

Frontmatter (YAML):

Two required fields:
```
name
```
and
```
description
```
(see agentskills.io/specification for full supported fields)
Maximum 1024 characters total
```
name
```
: Use only letters, numbers, and hyphens (no parentheses or special characters)
```
description
```
: Third-person, only describes when to use (not what it does)
- Start with "Use when...", focus on trigger conditions
- Include specific symptoms, scenarios, and context
- Never summarize the skill's process or workflow (see CSO section for why)
- Try to keep it under 500 characters

markdown

---
name: Skill-Name-With-Hyphens
description: Use when [specific trigger conditions and symptoms]
---

# Skill Name

## Overview
What is it? Explain the core principle in 1-2 sentences.

## When to Use
[Use small inline flowcharts if the decision isn't obvious]

Bullet list of symptoms and use cases
Scenarios where it doesn't apply

## Core Pattern (Technical/Pattern Skills)
Before/after code comparison

## Quick Reference
Table or bullets for quick scanning of common operations

## Implementation
Simple patterns inline code
Large reference or reusable tools link to files

## Common Mistakes
Common issues + fixes

## Practical Outcomes (Optional)
Specific results

Claude Search Optimization (CSO)

Discovery is Critical: Future Claude instances need to find your skills

1. Rich Description Field

Purpose: Claude reads descriptions to decide which skills to load for the current task. Let it answer: "Should I read this skill right now?"

Format: Start with "Use when...", focus on trigger conditions

Key: Description = When to use, not what the skill does

The description should only describe trigger conditions. Never summarize the skill's process or workflow in the description.

Why this matters: Testing shows that when descriptions summarize the skill's workflow, Claude may follow the description instead of reading the full skill content. A description stating "Perform code reviews between tasks" caused Claude to only do one review, even though the skill's flowchart clearly showed two reviews (first spec compliance, then code quality).

When the description was changed to only "Use when executing implementation plans with independent tasks in the current session" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.

Pitfall: Descriptions that summarize workflows create shortcuts Claude will take. The skill body becomes documentation Claude skips.

yaml

# Wrong: Summarizes workflow - Claude may follow description instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review between tasks

# Wrong: Too much process detail
description: Use for TDD - write test first, watch it fail, write minimal code, refactor

# Correct: Only trigger conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session

# Correct: Only trigger conditions
description: Use when implementing any feature or bugfix, before writing implementation code

Content:

Use specific trigger conditions, symptoms, and scenarios to indicate when this skill applies
Describe the problem (race conditions, inconsistent behavior) rather than language-specific symptoms (setTimeout, sleep)
Keep trigger conditions technology-agnostic unless the skill itself is technology-specific
If the skill is technology-specific, clearly state that in the trigger conditions
Write in third person (for injection into system prompts)
Never summarize the skill's process or workflow

yaml

# Wrong: Too abstract, vague, no when-to-use context
description: For async testing

# Wrong: First person
description: I can help you with async tests when they're flaky

# Wrong: Mentions technology but skill isn't technology-specific
description: Use when tests use setTimeout/sleep and are flaky

# Correct: Starts with "Use when", describes problem, no workflow
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently

# Correct: Technology-specific skill with clear trigger conditions
description: Use when using React Router and handling authentication redirects

2. Keyword Coverage

Use words Claude will search for:

Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
Symptoms: "flaky", "hanging", "zombie", "pollution"
Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
Tools: Actual commands, library names, file types

3. Descriptive Naming

Use active voice, verb-first:

✅
```
creating-skills
```
instead of
```
skill-creation
```

✅

condition-based-waiting

instead of

async-test-helpers

4. Token Efficiency (Critical)

Problem: getting-started and frequently referenced skills load into every conversation. Every token matters.

Target word counts:

getting-started workflows: <150 words each
Frequently loaded skills: <200 words total
Other skills: <500 words (still be concise)

Techniques:

Move details to tool help:

bash

# Wrong: List all parameters in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

# Correct: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.

Use cross-references:

markdown

# Wrong: Repeat workflow details
When searching, dispatch subagents with templates...
[20 lines of repeated instructions]

# Correct: Reference other skills
Always use subagents (saves 50-100x context). Required: Use [other-skill-name] workflow.

Compress examples:

markdown

# Wrong: Verbose example (42 words)
Your partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]

# Correct: Concise example (20 words)
Partner: "How did we handle authentication errors in React Router before?"
You: Searching...
[Dispatch subagent → integrate]

Eliminate redundancy:

Don't repeat content already in cross-referenced skills
Don't explain what's obvious from the command
Don't provide multiple examples for the same pattern

Validation:

bash

wc -w skills/path/SKILL.md
# getting-started workflows: target <150 each
# other frequently loaded: target <200 total

Name after what you do or core insight:

✅

condition-based-waiting

async-test-helpers

✅
```
using-skills
```
instead of
```
skill-usage
```

✅

flatten-with-flags

data-structure-refactoring

✅
```
root-cause-tracing
```
>
```
debugging-techniques
```

Gerunds (-ing) are good for processes:

creating-skills

testing-skills

debugging-with-logs

Active, describes what you're doing

4. Cross-Reference Other Skills

When writing documentation that references other skills:

Only use the skill name with clear required markers:

✅ Good:

**Required Subskill:** Use superpowers:test-driven-development

✅ Good:

**Required Background:** You must understand superpowers:systematic-debugging

❌ Poor:

See skills/testing/test-driven-development

(unclear if required)

❌ Poor:
```
@skills/testing/test-driven-development/SKILL.md
```
(forces immediate loading, wastes context)

Why no @ links: The

syntax immediately forces file loading, consuming 200k+ context before you need it.

Flowchart Usage

dot

digraph when_flowchart {
    "Need to show information?" [shape=diamond];
    "Might I make a mistake in decision?" [shape=diamond];
    "Use markdown" [shape=box];
    "Small inline flowchart" [shape=box];

    "Need to show information?" -> "Might I make a mistake in decision?" [label="Yes"];
    "Might I make a mistake in decision?" -> "Small inline flowchart" [label="Yes"];
    "Might I make a mistake in decision?" -> "Use markdown" [label="No"];
}

Only use flowcharts when:

Non-obvious decision points
Process loops where you might stop early
Deciding "when to use A vs B"

Never use flowcharts for:

Reference material → tables, lists
Code examples → Markdown code blocks
Linear instructions → numbered lists
Labels with no semantic meaning (step1, helper2)

See @graphviz-conventions.dot for graphviz style rules.

Visualize for your partner: Use

render-graphs.js

in this directory to render your skill's flowcharts as SVG:

bash

./render-graphs.js ../some-skill           # Render each chart separately
./render-graphs.js ../some-skill --combine # Combine all charts into one SVG

Code Examples

One excellent example beats multiple mediocre ones

Choose the most relevant language:

Testing techniques → TypeScript/JavaScript
System debugging → Shell/Python
Data processing → Python

Good examples:

Fully runnable
Well-commented, explains why
From real scenarios
Clearly demonstrates the pattern
Can be adapted directly (not generic templates)

Don't:

Implement in more than 5 languages
Create fill-in-the-blank templates
Write artificially constructed examples

You're good at language porting – one excellent example is enough.

File Organization

Self-Contained Skill

defense-in-depth/
  SKILL.md    # All content inline

Applicable when: All content fits, no large reference needed

Skill with Reusable Tools

condition-based-waiting/
  SKILL.md    # Overview + pattern
  example.ts  # Adaptable working code

Applicable when: Tools are reusable code, not just narrative

Skill with Large Reference

pptx/
  SKILL.md       # Overview + workflow
  pptxgenjs.md   # 600-line API reference
  ooxml.md       # 500-line XML structure
  scripts/       # Executable tools

Applicable when: Reference material is too large to inline

Iron Rule (Same as TDD)

No skill without a failing test

This applies to new skills and edits to existing skills.

Wrote the skill first then tested? Delete it. Start over. Edited a skill without testing? Same violation.

No exceptions:

Doesn't apply to "simple additions"
Doesn't apply to "just adding a section"
Doesn't apply to "documentation updates"
Don't keep untested changes as "reference"
Don't "tweak" while running tests
Delete means delete

Required Background: The superpowers:test-driven-development skill explains why this is important. The same principles apply to documentation.

Testing All Skill Types

Different skill types require different testing approaches:

Discipline-Enforcing Skills (Rules/Requirements)

Examples: TDD, verify before completion, design before coding

How to test:

Academic questions: Do they understand the rules?
Stress scenarios: Do they follow under pressure?
Combined multiple stresses: Time + sunk cost + fatigue
Identify rationalizations and add explicit rebuttals

Success criteria: Agent follows rules under maximum pressure

Technical Skills (How-To Guides)

Examples: condition-based-waiting, root-cause-tracing, defensive-programming

How to test:

Application scenarios: Can they apply the technique correctly?
Variant scenarios: Can they handle edge cases?
Missing information test: Do they indicate if something is missing?

Success criteria: Agent successfully applies the technique to new scenarios

Pattern Skills (Mental Models)

Examples: reducing-complexity, information-hiding concepts

How to test:

Recognition scenarios: Can they identify when the pattern applies?
Application scenarios: Can they use the mental model?
Counterexamples: Do they know when not to apply it?

Success criteria: Agent correctly identifies when/how to apply the pattern

Reference Skills (Documentation/API)

Examples: API docs, command references, library guides

How to test:

Retrieval scenarios: Can they find the correct information?
Application scenarios: Can they correctly use what they found?
Coverage tests: Are common use cases covered?

Success criteria: Agent finds and correctly applies reference information

Common Rationalizations for Skipping Tests

Rationalization	Reality
"The skill is obviously clear"	Clear to you ≠ clear to other agents. Test it.
"This is just reference material"	Reference material can have gaps and ambiguities. Test retrieval.
"Testing is overkill"	Untested skills always have issues. 15 minutes of testing saves hours.
"I'll test if there's a problem"	Problem = agent can't use the skill. Test before deployment.
"Testing is too tedious"	Testing is less tedious than debugging bad skills in production.
"I'm confident it's good"	Overconfidence guarantees problems. Test anyway.
"Academic review is enough"	Reading ≠ using. Test application scenarios.
"I don't have time to test"	Deploying untested skills wastes more time than fixing later.

All of these mean: Test before deployment. No exceptions.

Making Skills Resistant to Rationalization

Discipline-enforcing skills (like TDD) need to resist rationalization. Agents are smart and will find loopholes under pressure.

Psychology Note: Understanding why persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundations (Cialdini, 2021; Meincke et al., 2025), covering authority, commitment, scarcity, social proof, and belonging principles.

Explicitly Plug Each Loophole

Don't just state the rule – prohibit specific workarounds:

<Bad> ```markdown Wrote code before tests? Delete it. ``` </Bad> <Good> ```markdown Wrote code before tests? Delete it. Start over.

No exceptions:

Don't keep it as "reference"
Don't "tweak" it while writing tests
Don't look at it
Delete means delete

</Good>

### Address "Spirit vs Letter" Debates

Add a foundational principle upfront:

```markdown
**Violating the letter of the rule is violating the spirit of the rule.**

This cuts off an entire category of "I followed the spirit" rationalizations.

Build a Rationalization Table

Capture rationalizations from baseline tests (see testing section below). Every excuse the agent uses goes into the table:

markdown

| Rationalization | Reality |
|------|------|
| "Too simple to test" | Simple code still breaks. Testing takes 30 seconds. |
| "I'll test later" | Testing immediately proves nothing. |
| "Writing tests later works just as well" | Writing tests later = "What does this do?" Writing tests first = "What should this do?" |

Create a Red Line List

Make it easy for agents to self-check if they're rationalizing:

markdown

## Red Lines - Stop and Start Over

- Wrote code before tests
- "I've tested manually"
- "Writing tests later works just as well"
- "The spirit matters more than the ritual"
- "This case is different because..."

**All of these mean: Delete the code. Start over with TDD.**

Update CSO to Include Violation Symptoms

Add to the description: Symptoms that you're about to violate the rule:

yaml

description: use when implementing any feature or bugfix, before writing implementation code

Red-Green-Refactor for Skills

Follow the TDD cycle:

Red: Write Failing Tests (Baseline)

Run stress scenarios without the skill. Record behavior verbatim:

What choices did they make?
What rationalizations did they use (exact wording)?
Which stresses triggered violations?

This is "observing test failure" – you must see what the agent naturally does before writing the skill.

Green: Write Minimal Skill

Write the skill specifically for those rationalizations. Don't add extra content for hypothetical cases.

Run the same scenarios with the skill. The agent should now comply.

Refactor: Plug Loopholes

Did the agent find new rationalizations? Add explicit rebuttals. Retest until it's bulletproof.

Testing Methodology: See @testing-skills-with-subagents.md for complete testing methods:

How to write stress scenarios
Types of stress (time, sunk cost, authority, fatigue)
Systematically plugging loopholes
Meta-testing techniques

Anti-Patterns

Narrative Examples

"In the session on 2025-10-03, we discovered empty projectDir caused..." Why bad: Too specific, not reusable

Multi-Language Dilution

example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden

Code in Flowcharts

dot

step1 [label="import fs"];
step2 [label="read file"];

Why bad: Can't copy-paste, hard to read

Generic Labels

helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning

Stop: Before Moving to Next Skill

After writing any skill, you must stop and complete the deployment process.

Don't:

Batch-create multiple skills without testing each one
Move to the next skill before validating the current one
Skip testing because "batch processing is more efficient"

The deployment checklist below is mandatory for every skill.

Deploying untested skills = deploying untested code. This is a violation of quality standards.

Skill Creation Checklist (TDD-Adapted)

Important: Use TodoWrite to create todos for each checklist item below.

Red Phase - Write Failing Tests:

Create stress scenarios (3+ combined stresses for discipline skills)
Run scenarios without the skill - record baseline behavior verbatim
Identify patterns in rationalizations

Green Phase - Write Minimal Skill:

Name uses only letters, numbers, hyphens (no parentheses/special characters)
YAML frontmatter includes required
```
name
```
and
```
description
```
fields (max 1024 characters; see spec)
Description starts with "Use when..." and includes specific trigger conditions/symptoms
Description is in third person
Full text includes search keywords (errors, symptoms, tools)
Clear overview with core principles
Addresses specific baseline failures identified in red phase
Code is inline or linked to separate files
One excellent example (not multi-language)
Run scenarios with the skill - verify agent now complies

Refactor Phase - Plug Loopholes:

Identify new rationalizations from testing
Add explicit rebuttals (for discipline skills)
Build rationalization table from all test iterations
Create red line list
Retest until bulletproof

Quality Check:

Only use small flowcharts when decisions aren't obvious
Quick reference table
Common mistakes section
No narrative stories
Supporting files only for tools or large reference

Deployment:

Commit skill to git and push to your fork (if configured)
Consider contributing back via PR (if widely applicable)

Discovery Workflow

How future Claude instances find your skills:

Encounter a problem ("Tests are flaky")
Find skill (description matches)
Scan overview (Is this relevant?)
Read pattern (Quick reference table)
Load example (Only when implementing)

Optimize for this flow - Put searchable terms upfront and throughout.

Summary

Creating skills is TDD for process documentation.

Same iron rule: No skill without a failing test. Same cycle: Red (baseline) → Green (write skill) → Refactor (plug loopholes). Same benefits: Higher quality, fewer surprises, bulletproof results.

If you follow TDD for code, you should follow it for skills. It's the same discipline applied to documentation.

writing-skills

NPX Install

Tags

SKILL.md Content (Chinese)

Writing Skills

Overview

What is a Skill?

TDD Mapping to Skills

When to Create a Skill

Skill Types

Technical

Pattern

Reference

Directory Structure

SKILL.md Structure

Claude Search Optimization (CSO)

1. Rich Description Field

2. Keyword Coverage

3. Descriptive Naming

4. Token Efficiency (Critical)

4. Cross-Reference Other Skills

Flowchart Usage

Code Examples

File Organization

Self-Contained Skill

Skill with Reusable Tools

Skill with Large Reference

Iron Rule (Same as TDD)

Testing All Skill Types

Discipline-Enforcing Skills (Rules/Requirements)

Technical Skills (How-To Guides)

Pattern Skills (Mental Models)

Reference Skills (Documentation/API)

Common Rationalizations for Skipping Tests

Making Skills Resistant to Rationalization

Explicitly Plug Each Loophole

Build a Rationalization Table

Create a Red Line List

Update CSO to Include Violation Symptoms

Red-Green-Refactor for Skills

Red: Write Failing Tests (Baseline)

Green: Write Minimal Skill

Refactor: Plug Loopholes

Anti-Patterns

Narrative Examples

Multi-Language Dilution

Code in Flowcharts

Generic Labels

Stop: Before Moving to Next Skill

Skill Creation Checklist (TDD-Adapted)

Discovery Workflow

Summary