writing-skills

Original：🇨🇳 Chinese

Translated

1 scripts

Use when creating new skills, editing existing skills, or verifying if skills are valid before deployment

2installs

Sourcejnmetacode/superpowers-zh

Added on2026-04-19

NPX Install

npx skill4agent add jnmetacode/superpowers-zh writing-skills

SKILL.md Content (Chinese)

View Translation Comparison →

Writing Skills

Overview

Writing skills applies test-driven development to process documentation.

Personal skills are stored in agent-specific directories (Claude Code uses
~/.claude/skills
, Codex uses
~/.agents/skills/
).

You write test cases (stress scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agent follows rules), then refactor (plug gaps).

Core Principle: If you don't observe the agent failing without the skill, you don't know if the skill teaches the right thing.

Required Background: Before using this skill, you must understand superpowers:test-driven-development. That skill defines the basic red-green-refactor cycle. This skill adapts TDD to documentation writing.

Official Guidelines: For Anthropic's official best practices for writing skills, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement this skill's TDD-oriented approach.

What is a Skill?

A Skill is a reference guide for validated techniques, patterns, or tools. Skills help future Claude instances find and apply effective methods.

Skills are: Reusable techniques, patterns, tools, reference guides

Skills are not: Narratives about how you solved a problem once

TDD Mapped to Skills

TDD Concept	Skill Creation
Test Case	Stress scenario with subagents
Production Code	Skill documentation (SKILL.md)
Test Failure (Red)	Agent violates rules without skill (baseline)
Test Pass (Green)	Agent follows rules with skill
Refactor	Plug gaps while maintaining compliance
Write Tests First	Run baseline scenarios before writing skill
Observe Failure	Record exact rationalizations the agent uses
Minimal Code	Write skill targeted at those specific violations
Observe Pass	Verify agent now follows rules
Refactor Cycle	Discover new rationalizations → plug → re-verify

The entire skill creation process follows red-green-refactor.

When to Create a Skill

Create when:

The technique isn't intuitively obvious to you
You'll reference it repeatedly across different projects
The pattern has broad applicability (not project-specific)
Others will also benefit

Don't create:

One-off solutions
Standard practices already well-documented elsewhere
Project-specific conventions (put in CLAUDE.md)
Mechanical constraints (automate if you can enforce with regex/validation — documentation is for scenarios requiring judgment)

Skill Types

Technical

Methods with specific steps (condition-based-waiting, root-cause-tracing)

Pattern

Ways of thinking about problems (flatten-with-flags, test-invariants)

Reference

API docs, syntax guides, tool documentation (office docs)

Directory Structure

skills/
  skill-name/
    SKILL.md              # Primary reference document (required)
    supporting-file.*     # Only if needed

Flat Namespace - All skills exist in a single searchable namespace

When to separate files:

Large reference content (100+ lines) - API docs, comprehensive syntax explanations
Reusable tools - Scripts, utilities, templates

Keep inline:

Principles and concepts
Code patterns (< 50 lines)
Everything else

SKILL.md Structure

Frontmatter (YAML):

Two required fields:
```
name
```
and
```
description
```
(see agentskills.io/specification for full supported fields)
Maximum 1024 characters total
```
name
```
: Use only letters, numbers, and hyphens (no brackets, special characters)
```
description
```
: Third person, describes only when to use (not what it does)
- Start with "Use when...", focus on trigger conditions
- Include specific symptoms, scenarios, and context
- Never summarize the skill's process or workflow (see CSO section for why)
- Try to keep under 500 characters

markdown

---
name: Skill-Name-With-Hyphens
description: Use when [specific trigger conditions and symptoms]
---

# Skill Name

## Overview
What is it? Explain core principles in 1-2 sentences.

## When to Use
[Use small inline flowcharts if decisions aren't obvious]

Bullet list of symptoms and use cases
Scenarios where it doesn't apply

## Core Pattern (Technical/Pattern Types)
Before-and-after code comparison

## Quick Reference
Table or bullet points for quick browsing of common operations

## Implementation
Simple patterns inline code
Large reference or reusable tools link to files

## Common Mistakes
Common issues + fixes

## Practical Outcomes (Optional)
Specific results

Claude Search Optimization (CSO)

Discovery is critical: Future Claude instances need to find your skills

1. Rich Description Field

Purpose: Claude reads descriptions to decide which skills to load for the current task. Let it answer: "Should I read this skill right now?"

Format: Start with "Use when...", focus on trigger conditions

Key: Description = when to use, not what the skill does

Descriptions should only describe trigger conditions. Do not summarize the skill's process or workflow in the description.

Why this matters: Testing shows that when descriptions summarize the skill's workflow, Claude may follow the description instead of reading the full skill. A description stating "conduct code reviews between tasks" caused Claude to only do one review, even though the skill's flowchart clearly showed two reviews (first spec compliance then code quality).

When the description was changed to only "Use when executing implementation plans with independent tasks in the current session" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.

Pitfall: Descriptions that summarize workflows create shortcuts Claude will take. The skill body becomes documentation Claude skips.

yaml

# Bad: Summarizes workflow - Claude may follow description instead of skill
description: Use when executing plans - dispatches subagent per task with code review between tasks

# Bad: Too many process details
description: Use for TDD - write test first, watch it fail, write minimal code, refactor

# Good: Only trigger conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session

# Good: Only trigger conditions
description: Use when implementing any feature or bugfix, before writing implementation code

Content:

Use specific trigger conditions, symptoms, and scenarios to indicate when this skill applies
Describe problems (race conditions, inconsistent behavior) not language-specific symptoms (setTimeout, sleep)
Keep trigger conditions technology-agnostic unless the skill itself is technology-specific
If the skill is technology-specific, clearly state that in the trigger conditions
Write in third person (injected into system prompts)
Never summarize the skill's process or workflow

yaml

# Bad: Too abstract, vague, no when-to-use
description: For async testing

# Bad: First person
description: I can help you with async tests when they're flaky

# Bad: Mentions technology but skill isn't specific to it
description: Use when tests use setTimeout/sleep and are flaky

# Good: Starts with "Use when", describes problem, no workflow
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently

# Good: Technology-specific skill with clear trigger conditions
description: Use when using React Router and handling authentication redirects

2. Keyword Coverage

Use terms Claude will search for:

Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
Symptoms: "flaky", "hanging", "zombie", "pollution"
Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
Tools: Actual commands, library names, file types

3. Descriptive Naming

Use active voice, verb-first:

✅
```
creating-skills
```
instead of
```
skill-creation
```

✅

condition-based-waiting

instead of

async-test-helpers

4. Token Efficiency (Critical)

Problem: getting-started and frequently referenced skills load into every conversation. Every token matters.

Target word counts:

getting-started workflows: <150 words each
Frequently loaded skills: <200 words total
Other skills: <500 words (still be concise)

Tips:

Move details to tool help:

bash

# Bad: List all parameters in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

# Good: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.

Use cross-references:

markdown

# Bad: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]

# Good: Reference other skills
Always use subagents (saves 50-100x context). Required: Use [other-skill-name] workflow.

Compress examples:

markdown

# Bad: Verbose example (42 words)
Your partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]

# Good: Streamlined example (20 words)
Partner: "How did we handle authentication errors in React Router before?"
You: Searching...
[Dispatch subagent → integrate]

Eliminate redundancy:

Don't repeat content already in cross-referenced skills
Don't explain things obvious from commands
Don't provide multiple examples for the same pattern

Validation:

bash

wc -w skills/path/SKILL.md
# getting-started workflows: target <150 each
# Other frequently loaded: target <200 total

Name by what you do or core insight:

✅

condition-based-waiting

async-test-helpers

✅
```
using-skills
```
instead of
```
skill-usage
```

✅

flatten-with-flags

data-structure-refactoring

✅
```
root-cause-tracing
```
>
```
debugging-techniques
```

Gerunds (-ing) are good for processes:

creating-skills

testing-skills

debugging-with-logs

Active, describes what you're doing

4. Cross-Reference Other Skills

When writing documentation that references other skills:

Only use the skill name with clear required marking:

✅ Good:

**Required Subskill:** Use superpowers:test-driven-development

✅ Good:

**Required Background:** You must understand superpowers:systematic-debugging

❌ Bad:

See skills/testing/test-driven-development

(unclear if required)

❌ Bad:
```
@skills/testing/test-driven-development/SKILL.md
```
(forces immediate loading, wastes context)

Why no @ links:

syntax forces immediate loading of the file, consuming 200k+ context before you need it.

Flowchart Usage

dot

digraph when_flowchart {
    "Need to show information?" [shape=diamond];
    "Might I make a mistake in decision?" [shape=diamond];
    "Use markdown" [shape=box];
    "Small inline flowchart" [shape=box];

    "Need to show information?" -> "Might I make a mistake in decision?" [label="Yes"];
    "Might I make a mistake in decision?" -> "Small inline flowchart" [label="Yes"];
    "Might I make a mistake in decision?" -> "Use markdown" [label="No"];
}

Only use flowcharts for:

Non-obvious decision points
Process loops where you might stop early
"When to use A vs B" decisions

Never use flowcharts for:

Reference material → tables, lists
Code examples → Markdown code blocks
Linear instructions → numbered lists
Labels with no semantic meaning (step1, helper2)

See @graphviz-conventions.dot for graphviz style rules.

Visualize for your partner: Use

render-graphs.js

in this directory to render your skill's flowcharts as SVG:

bash

./render-graphs.js ../some-skill           # Render each chart separately
./render-graphs.js ../some-skill --combine # Combine all charts into one SVG

Code Examples

One excellent example beats multiple mediocre ones

Choose the most relevant language:

Testing techniques → TypeScript/JavaScript
System debugging → Shell/Python
Data processing → Python

Good examples:

Fully runnable
Well-commented, explains why
From real scenarios
Clearly demonstrates the pattern
Can be adapted directly (not generic templates)

Don't:

Implement in more than 5 languages
Create fill-in-the-blank templates
Write artificially constructed examples

You're good at language porting — one excellent example is enough.

File Organization

Self-Contained Skill

defense-in-depth/
  SKILL.md    # All content inline

Use when: All content fits, no need for large references

Skill with Reusable Tools

condition-based-waiting/
  SKILL.md    # Overview + pattern
  example.ts  # Adaptable working code

Use when: Tool is reusable code, not just narrative

Skill with Large Reference

pptx/
  SKILL.md       # Overview + workflow
  pptxgenjs.md   # 600-line API reference
  ooxml.md       # 500-line XML structure
  scripts/       # Executable tools

Use when: Reference material is too large to inline

Iron Rule (Same as TDD)

Don't write a skill without a failing test

This applies to new skills and edits to existing skills.

Wrote the skill first then tested? Delete it. Start over. Edited a skill without testing? Also a violation.

No exceptions:

Doesn't apply to "simple additions"
Doesn't apply to "just adding a section"
Doesn't apply to "documentation updates"
Don't keep untested changes as "reference"
Don't "adjust" while running tests
Delete means delete

Required Background: The superpowers:test-driven-development skill explains why this matters. The same principles apply to documentation.

Testing All Skill Types

Different skill types require different testing approaches:

Discipline-Enforcing Skills (Rules/Requirements)

Examples: TDD, pre-completion validation, design before coding

How to test:

Academic questions: Do they understand the rules?
Stress scenarios: Do they follow them under pressure?
Multiple stress combinations: Time + sunk cost + fatigue
Identify rationalizations and add explicit rebuttals

Success criteria: Agent follows rules under maximum pressure

Technical Skills (How-To Guides)

Examples: condition-based-waiting, root-cause-tracing, defensive-programming

How to test:

Application scenarios: Can they apply the technique correctly?
Variant scenarios: Can they handle edge cases?
Missing information tests: Do they indicate when something is missing?

Success criteria: Agent successfully applies technique to new scenarios

Pattern Skills (Mental Models)

Examples: reducing-complexity, information-hiding concepts

How to test:

Recognition scenarios: Can they identify when the pattern applies?
Application scenarios: Can they use the mental model?
Counterexamples: Do they know when not to apply it?

Success criteria: Agent correctly identifies when/how to apply pattern

Reference Skills (Documentation/API)

Examples: API docs, command references, library guides

How to test:

Retrieval scenarios: Can they find the correct information?
Application scenarios: Can they use the found content correctly?
Coverage tests: Are common use cases all covered?

Success criteria: Agent finds and correctly applies reference information

Common Rationalizations for Skipping Tests

Rationalization	Reality
"The skill is obviously clear"	Clear to you ≠ clear to other agents. Test it.
"It's just reference material"	Reference material can have gaps, unclear parts. Test retrieval.
"Testing is overkill"	Untested skills always have issues. 15 minutes of testing saves hours.
"Test when there's a problem"	Problem = agent can't use the skill. Test before deployment.
"Testing is too tedious"	Testing is less tedious than debugging bad skills in production.
"I'm confident it's good"	Overconfidence guarantees problems. Test anyway.
"Academic review is enough"	Reading ≠ using. Test application scenarios.
"No time to test"	Deploying untested skills wastes more time than fixing later.

All of the above mean: Test before deployment. No exceptions.

Making Skills Resist Rationalization

Discipline-enforcing skills (like TDD) need to resist rationalization. Agents are smart and will find loopholes under pressure.

Psychology note: Understanding why persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundations (Cialdini, 2021; Meincke et al., 2025), covering principles of authority, commitment, scarcity, social proof, and belonging.

Explicitly Plug Each Loophole

Don't just state the rule — ban specific workarounds:

<Bad> ```markdown Wrote code before tests? Delete it. ``` </Bad> <Good> ```markdown Wrote code before tests? Delete it. Start over.

No exceptions:

Don't keep it as "reference"
Don't "adjust" it while writing tests
Don't look at it
Delete means delete

</Good>

### Address "Spirit vs Letter" Debates

Add foundational principle upfront:

```markdown
**Violating the letter of the rule is violating the spirit of the rule.**

This cuts off an entire category of "I followed the spirit" rationalizations.

Build a Rationalization Table

Capture rationalizations from baseline tests (see testing section below). Every excuse the agent uses goes into the table:

markdown

| Rationalization | Reality |
|------|------|
| "Too simple to test" | Simple code still breaks. Testing takes 30 seconds. |
| "I'll test later" | Tests passing immediately proves nothing. |
| "Testing after works just as well" | Testing after = "What does this do?" Testing before = "What should this do?" |

Create a Red Line List

Make it easy for agents to self-audit for rationalization:

markdown

## Red Lines - Stop and Start Over

- Wrote code before tests
- "I've manually tested this"
- "Testing after works just as well"
- "The spirit matters more than the ritual"
- "This case is different because..."

**All of the above mean: Delete code. Restart with TDD.**

Update CSO to Include Violation Symptoms

Add to description: Symptoms that you're about to violate the rule:

yaml

description: use when implementing any feature or bugfix, before writing implementation code

Red-Green-Refactor for Skills

Follow the TDD cycle:

Red: Write Failing Test (Baseline)

Run stress scenarios without the skill. Record behavior verbatim:

What choices did they make?
What exact rationalizations did they use?
Which stresses triggered violations?

This is "observing test failure" — you must see how agents naturally behave before writing the skill.

Green: Write Minimal Skill

Write a skill targeted at those specific rationalizations. Don't add extra content for hypothetical cases.

Run the same scenarios with the skill. Agents should now comply.

Refactor: Plug Loopholes

Agents found new rationalizations? Add explicit rebuttals. Retest until unbreakable.

Testing Methodology: See @testing-skills-with-subagents.md for complete testing methods:

How to write stress scenarios
Types of stress (time, sunk cost, authority, fatigue)
Systematically plugging loopholes
Meta-testing techniques

Anti-Patterns

Narrative Examples

"In the 2025-10-03 session, we discovered empty projectDir caused..." Why bad: Too specific, not reusable

Multi-Language Dilution

example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden

Code in Flowcharts

dot

step1 [label="import fs"];
step2 [label="read file"];

Why bad: Can't copy-paste, hard to read

Generic Labels

helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning

Stop: Before Moving to Next Skill

After writing any skill, you must stop and complete the deployment process.

Don't:

Batch-create multiple skills without testing each one
Move to the next skill before validating the current one
Skip testing because "batch processing is more efficient"

The deployment checklist below is mandatory for every skill.

Deploying an untested skill = deploying untested code. This is a violation of quality standards.

Skill Creation Checklist (TDD-Adapted)

Important: Use TodoWrite to create todos for each checklist item below.

Red Phase - Write Failing Test:

Create stress scenarios (3+ combined stresses for discipline skills)
Run scenarios without skill - record baseline behavior verbatim
Identify patterns in rationalizations

Green Phase - Write Minimal Skill:

Name uses only letters, numbers, hyphens (no brackets/special characters)
YAML frontmatter includes required
```
name
```
and
```
description
```
fields (max 1024 characters; see spec)
Description starts with "Use when..." and includes specific trigger conditions/symptoms
Description uses third person
Full text includes search keywords (errors, symptoms, tools)
Clear overview with core principles
Addresses specific baseline failures identified in Red Phase
Code inline or linked to separate files
One excellent example (not multi-language)
Run scenarios with skill - verify agents now comply

Refactor Phase - Plug Loopholes:

Identify new rationalizations from testing
Add explicit rebuttals (discipline skills)
Build rationalization table from all test iterations
Create red line list
Retest until unbreakable

Quality Check:

Only use small flowcharts when decisions aren't obvious
Quick reference table
Common mistakes section
No narrative stories
Supporting files only for tools or large reference

Deployment:

Commit skill to git and push to your fork (if configured)
Consider contributing back via PR (if broadly useful)

Discovery Workflow

How future Claude instances find your skills:

Encounter problem ("Tests are flaky")
Find skill (description matches)
Scan overview (Is this relevant?)
Read pattern (Quick reference table)
Load example (Only when implementing)

Optimize for this flow - Put searchable terms upfront and throughout.

Summary

Creating skills is TDD for process documentation.

Same iron rule: Don't write a skill without a failing test. Same cycle: Red (baseline) → Green (write skill) → Refactor (plug loopholes). Same benefits: Higher quality, fewer surprises, unbreakable results.

If you follow TDD for code, you should follow it for skills too. It's the same discipline applied to documentation.

writing-skills

NPX Install

Tags

SKILL.md Content (Chinese)

Writing Skills

Overview

What is a Skill?

TDD Mapped to Skills

When to Create a Skill

Skill Types

Technical

Pattern

Reference

Directory Structure

SKILL.md Structure

Claude Search Optimization (CSO)

1. Rich Description Field

2. Keyword Coverage

3. Descriptive Naming

4. Token Efficiency (Critical)

4. Cross-Reference Other Skills

Flowchart Usage

Code Examples

File Organization

Self-Contained Skill

Skill with Reusable Tools

Skill with Large Reference

Iron Rule (Same as TDD)

Testing All Skill Types

Discipline-Enforcing Skills (Rules/Requirements)

Technical Skills (How-To Guides)

Pattern Skills (Mental Models)

Reference Skills (Documentation/API)

Common Rationalizations for Skipping Tests

Making Skills Resist Rationalization

Explicitly Plug Each Loophole

Build a Rationalization Table

Create a Red Line List

Update CSO to Include Violation Symptoms

Red-Green-Refactor for Skills

Red: Write Failing Test (Baseline)

Green: Write Minimal Skill

Refactor: Plug Loopholes

Anti-Patterns

Narrative Examples

Multi-Language Dilution

Code in Flowcharts

Generic Labels

Stop: Before Moving to Next Skill

Skill Creation Checklist (TDD-Adapted)

Discovery Workflow

Summary