writing-skills

Original🇨🇳 Chinese
Translated
1 scripts

Use when creating new skills, editing existing skills, or verifying if skills are valid before deployment

2installs
Added on

NPX Install

npx skill4agent add jnmetacode/superpowers-zh writing-skills

SKILL.md Content (Chinese)

View Translation Comparison →

Writing Skills

Overview

Writing skills applies test-driven development to process documentation.
Personal skills are stored in agent-specific directories (Claude Code uses
~/.claude/skills
, Codex uses
~/.agents/skills/
).
You write test cases (stress scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agent follows rules), then refactor (plug gaps).
Core Principle: If you don't observe the agent failing without the skill, you don't know if the skill teaches the right thing.
Required Background: Before using this skill, you must understand superpowers:test-driven-development. That skill defines the basic red-green-refactor cycle. This skill adapts TDD to documentation writing.
Official Guidelines: For Anthropic's official best practices for writing skills, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement this skill's TDD-oriented approach.

What is a Skill?

A Skill is a reference guide for validated techniques, patterns, or tools. Skills help future Claude instances find and apply effective methods.
Skills are: Reusable techniques, patterns, tools, reference guides
Skills are not: Narratives about how you solved a problem once

TDD Mapped to Skills

TDD ConceptSkill Creation
Test CaseStress scenario with subagents
Production CodeSkill documentation (SKILL.md)
Test Failure (Red)Agent violates rules without skill (baseline)
Test Pass (Green)Agent follows rules with skill
RefactorPlug gaps while maintaining compliance
Write Tests FirstRun baseline scenarios before writing skill
Observe FailureRecord exact rationalizations the agent uses
Minimal CodeWrite skill targeted at those specific violations
Observe PassVerify agent now follows rules
Refactor CycleDiscover new rationalizations → plug → re-verify
The entire skill creation process follows red-green-refactor.

When to Create a Skill

Create when:
  • The technique isn't intuitively obvious to you
  • You'll reference it repeatedly across different projects
  • The pattern has broad applicability (not project-specific)
  • Others will also benefit
Don't create:
  • One-off solutions
  • Standard practices already well-documented elsewhere
  • Project-specific conventions (put in CLAUDE.md)
  • Mechanical constraints (automate if you can enforce with regex/validation — documentation is for scenarios requiring judgment)

Skill Types

Technical

Methods with specific steps (condition-based-waiting, root-cause-tracing)

Pattern

Ways of thinking about problems (flatten-with-flags, test-invariants)

Reference

API docs, syntax guides, tool documentation (office docs)

Directory Structure

skills/
  skill-name/
    SKILL.md              # Primary reference document (required)
    supporting-file.*     # Only if needed
Flat Namespace - All skills exist in a single searchable namespace
When to separate files:
  1. Large reference content (100+ lines) - API docs, comprehensive syntax explanations
  2. Reusable tools - Scripts, utilities, templates
Keep inline:
  • Principles and concepts
  • Code patterns (< 50 lines)
  • Everything else

SKILL.md Structure

Frontmatter (YAML):
  • Two required fields:
    name
    and
    description
    (see agentskills.io/specification for full supported fields)
  • Maximum 1024 characters total
  • name
    : Use only letters, numbers, and hyphens (no brackets, special characters)
  • description
    : Third person, describes only when to use (not what it does)
    • Start with "Use when...", focus on trigger conditions
    • Include specific symptoms, scenarios, and context
    • Never summarize the skill's process or workflow (see CSO section for why)
    • Try to keep under 500 characters
markdown
---
name: Skill-Name-With-Hyphens
description: Use when [specific trigger conditions and symptoms]
---

# Skill Name

## Overview
What is it? Explain core principles in 1-2 sentences.

## When to Use
[Use small inline flowcharts if decisions aren't obvious]

Bullet list of symptoms and use cases
Scenarios where it doesn't apply

## Core Pattern (Technical/Pattern Types)
Before-and-after code comparison

## Quick Reference
Table or bullet points for quick browsing of common operations

## Implementation
Simple patterns inline code
Large reference or reusable tools link to files

## Common Mistakes
Common issues + fixes

## Practical Outcomes (Optional)
Specific results

Claude Search Optimization (CSO)

Discovery is critical: Future Claude instances need to find your skills

1. Rich Description Field

Purpose: Claude reads descriptions to decide which skills to load for the current task. Let it answer: "Should I read this skill right now?"
Format: Start with "Use when...", focus on trigger conditions
Key: Description = when to use, not what the skill does
Descriptions should only describe trigger conditions. Do not summarize the skill's process or workflow in the description.
Why this matters: Testing shows that when descriptions summarize the skill's workflow, Claude may follow the description instead of reading the full skill. A description stating "conduct code reviews between tasks" caused Claude to only do one review, even though the skill's flowchart clearly showed two reviews (first spec compliance then code quality).
When the description was changed to only "Use when executing implementation plans with independent tasks in the current session" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.
Pitfall: Descriptions that summarize workflows create shortcuts Claude will take. The skill body becomes documentation Claude skips.
yaml
# Bad: Summarizes workflow - Claude may follow description instead of skill
description: Use when executing plans - dispatches subagent per task with code review between tasks

# Bad: Too many process details
description: Use for TDD - write test first, watch it fail, write minimal code, refactor

# Good: Only trigger conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session

# Good: Only trigger conditions
description: Use when implementing any feature or bugfix, before writing implementation code
Content:
  • Use specific trigger conditions, symptoms, and scenarios to indicate when this skill applies
  • Describe problems (race conditions, inconsistent behavior) not language-specific symptoms (setTimeout, sleep)
  • Keep trigger conditions technology-agnostic unless the skill itself is technology-specific
  • If the skill is technology-specific, clearly state that in the trigger conditions
  • Write in third person (injected into system prompts)
  • Never summarize the skill's process or workflow
yaml
# Bad: Too abstract, vague, no when-to-use
description: For async testing

# Bad: First person
description: I can help you with async tests when they're flaky

# Bad: Mentions technology but skill isn't specific to it
description: Use when tests use setTimeout/sleep and are flaky

# Good: Starts with "Use when", describes problem, no workflow
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently

# Good: Technology-specific skill with clear trigger conditions
description: Use when using React Router and handling authentication redirects

2. Keyword Coverage

Use terms Claude will search for:
  • Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
  • Symptoms: "flaky", "hanging", "zombie", "pollution"
  • Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
  • Tools: Actual commands, library names, file types

3. Descriptive Naming

Use active voice, verb-first:
  • creating-skills
    instead of
    skill-creation
  • condition-based-waiting
    instead of
    async-test-helpers

4. Token Efficiency (Critical)

Problem: getting-started and frequently referenced skills load into every conversation. Every token matters.
Target word counts:
  • getting-started workflows: <150 words each
  • Frequently loaded skills: <200 words total
  • Other skills: <500 words (still be concise)
Tips:
Move details to tool help:
bash
# Bad: List all parameters in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

# Good: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.
Use cross-references:
markdown
# Bad: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]

# Good: Reference other skills
Always use subagents (saves 50-100x context). Required: Use [other-skill-name] workflow.
Compress examples:
markdown
# Bad: Verbose example (42 words)
Your partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]

# Good: Streamlined example (20 words)
Partner: "How did we handle authentication errors in React Router before?"
You: Searching...
[Dispatch subagent → integrate]
Eliminate redundancy:
  • Don't repeat content already in cross-referenced skills
  • Don't explain things obvious from commands
  • Don't provide multiple examples for the same pattern
Validation:
bash
wc -w skills/path/SKILL.md
# getting-started workflows: target <150 each
# Other frequently loaded: target <200 total
Name by what you do or core insight:
  • condition-based-waiting
    >
    async-test-helpers
  • using-skills
    instead of
    skill-usage
  • flatten-with-flags
    >
    data-structure-refactoring
  • root-cause-tracing
    >
    debugging-techniques
Gerunds (-ing) are good for processes:
  • creating-skills
    ,
    testing-skills
    ,
    debugging-with-logs
  • Active, describes what you're doing

4. Cross-Reference Other Skills

When writing documentation that references other skills:
Only use the skill name with clear required marking:
  • ✅ Good:
    **Required Subskill:** Use superpowers:test-driven-development
  • ✅ Good:
    **Required Background:** You must understand superpowers:systematic-debugging
  • ❌ Bad:
    See skills/testing/test-driven-development
    (unclear if required)
  • ❌ Bad:
    @skills/testing/test-driven-development/SKILL.md
    (forces immediate loading, wastes context)
Why no @ links:
@
syntax forces immediate loading of the file, consuming 200k+ context before you need it.

Flowchart Usage

dot
digraph when_flowchart {
    "Need to show information?" [shape=diamond];
    "Might I make a mistake in decision?" [shape=diamond];
    "Use markdown" [shape=box];
    "Small inline flowchart" [shape=box];

    "Need to show information?" -> "Might I make a mistake in decision?" [label="Yes"];
    "Might I make a mistake in decision?" -> "Small inline flowchart" [label="Yes"];
    "Might I make a mistake in decision?" -> "Use markdown" [label="No"];
}
Only use flowcharts for:
  • Non-obvious decision points
  • Process loops where you might stop early
  • "When to use A vs B" decisions
Never use flowcharts for:
  • Reference material → tables, lists
  • Code examples → Markdown code blocks
  • Linear instructions → numbered lists
  • Labels with no semantic meaning (step1, helper2)
See @graphviz-conventions.dot for graphviz style rules.
Visualize for your partner: Use
render-graphs.js
in this directory to render your skill's flowcharts as SVG:
bash
./render-graphs.js ../some-skill           # Render each chart separately
./render-graphs.js ../some-skill --combine # Combine all charts into one SVG

Code Examples

One excellent example beats multiple mediocre ones
Choose the most relevant language:
  • Testing techniques → TypeScript/JavaScript
  • System debugging → Shell/Python
  • Data processing → Python
Good examples:
  • Fully runnable
  • Well-commented, explains why
  • From real scenarios
  • Clearly demonstrates the pattern
  • Can be adapted directly (not generic templates)
Don't:
  • Implement in more than 5 languages
  • Create fill-in-the-blank templates
  • Write artificially constructed examples
You're good at language porting — one excellent example is enough.

File Organization

Self-Contained Skill

defense-in-depth/
  SKILL.md    # All content inline
Use when: All content fits, no need for large references

Skill with Reusable Tools

condition-based-waiting/
  SKILL.md    # Overview + pattern
  example.ts  # Adaptable working code
Use when: Tool is reusable code, not just narrative

Skill with Large Reference

pptx/
  SKILL.md       # Overview + workflow
  pptxgenjs.md   # 600-line API reference
  ooxml.md       # 500-line XML structure
  scripts/       # Executable tools
Use when: Reference material is too large to inline

Iron Rule (Same as TDD)

Don't write a skill without a failing test
This applies to new skills and edits to existing skills.
Wrote the skill first then tested? Delete it. Start over. Edited a skill without testing? Also a violation.
No exceptions:
  • Doesn't apply to "simple additions"
  • Doesn't apply to "just adding a section"
  • Doesn't apply to "documentation updates"
  • Don't keep untested changes as "reference"
  • Don't "adjust" while running tests
  • Delete means delete
Required Background: The superpowers:test-driven-development skill explains why this matters. The same principles apply to documentation.

Testing All Skill Types

Different skill types require different testing approaches:

Discipline-Enforcing Skills (Rules/Requirements)

Examples: TDD, pre-completion validation, design before coding
How to test:
  • Academic questions: Do they understand the rules?
  • Stress scenarios: Do they follow them under pressure?
  • Multiple stress combinations: Time + sunk cost + fatigue
  • Identify rationalizations and add explicit rebuttals
Success criteria: Agent follows rules under maximum pressure

Technical Skills (How-To Guides)

Examples: condition-based-waiting, root-cause-tracing, defensive-programming
How to test:
  • Application scenarios: Can they apply the technique correctly?
  • Variant scenarios: Can they handle edge cases?
  • Missing information tests: Do they indicate when something is missing?
Success criteria: Agent successfully applies technique to new scenarios

Pattern Skills (Mental Models)

Examples: reducing-complexity, information-hiding concepts
How to test:
  • Recognition scenarios: Can they identify when the pattern applies?
  • Application scenarios: Can they use the mental model?
  • Counterexamples: Do they know when not to apply it?
Success criteria: Agent correctly identifies when/how to apply pattern

Reference Skills (Documentation/API)

Examples: API docs, command references, library guides
How to test:
  • Retrieval scenarios: Can they find the correct information?
  • Application scenarios: Can they use the found content correctly?
  • Coverage tests: Are common use cases all covered?
Success criteria: Agent finds and correctly applies reference information

Common Rationalizations for Skipping Tests

RationalizationReality
"The skill is obviously clear"Clear to you ≠ clear to other agents. Test it.
"It's just reference material"Reference material can have gaps, unclear parts. Test retrieval.
"Testing is overkill"Untested skills always have issues. 15 minutes of testing saves hours.
"Test when there's a problem"Problem = agent can't use the skill. Test before deployment.
"Testing is too tedious"Testing is less tedious than debugging bad skills in production.
"I'm confident it's good"Overconfidence guarantees problems. Test anyway.
"Academic review is enough"Reading ≠ using. Test application scenarios.
"No time to test"Deploying untested skills wastes more time than fixing later.
All of the above mean: Test before deployment. No exceptions.

Making Skills Resist Rationalization

Discipline-enforcing skills (like TDD) need to resist rationalization. Agents are smart and will find loopholes under pressure.
Psychology note: Understanding why persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundations (Cialdini, 2021; Meincke et al., 2025), covering principles of authority, commitment, scarcity, social proof, and belonging.

Explicitly Plug Each Loophole

Don't just state the rule — ban specific workarounds:
<Bad> ```markdown Wrote code before tests? Delete it. ``` </Bad> <Good> ```markdown Wrote code before tests? Delete it. Start over.
No exceptions:
  • Don't keep it as "reference"
  • Don't "adjust" it while writing tests
  • Don't look at it
  • Delete means delete
</Good>

### Address "Spirit vs Letter" Debates

Add foundational principle upfront:

```markdown
**Violating the letter of the rule is violating the spirit of the rule.**
This cuts off an entire category of "I followed the spirit" rationalizations.

Build a Rationalization Table

Capture rationalizations from baseline tests (see testing section below). Every excuse the agent uses goes into the table:
markdown
| Rationalization | Reality |
|------|------|
| "Too simple to test" | Simple code still breaks. Testing takes 30 seconds. |
| "I'll test later" | Tests passing immediately proves nothing. |
| "Testing after works just as well" | Testing after = "What does this do?" Testing before = "What should this do?" |

Create a Red Line List

Make it easy for agents to self-audit for rationalization:
markdown
## Red Lines - Stop and Start Over

- Wrote code before tests
- "I've manually tested this"
- "Testing after works just as well"
- "The spirit matters more than the ritual"
- "This case is different because..."

**All of the above mean: Delete code. Restart with TDD.**

Update CSO to Include Violation Symptoms

Add to description: Symptoms that you're about to violate the rule:
yaml
description: use when implementing any feature or bugfix, before writing implementation code

Red-Green-Refactor for Skills

Follow the TDD cycle:

Red: Write Failing Test (Baseline)

Run stress scenarios without the skill. Record behavior verbatim:
  • What choices did they make?
  • What exact rationalizations did they use?
  • Which stresses triggered violations?
This is "observing test failure" — you must see how agents naturally behave before writing the skill.

Green: Write Minimal Skill

Write a skill targeted at those specific rationalizations. Don't add extra content for hypothetical cases.
Run the same scenarios with the skill. Agents should now comply.

Refactor: Plug Loopholes

Agents found new rationalizations? Add explicit rebuttals. Retest until unbreakable.
Testing Methodology: See @testing-skills-with-subagents.md for complete testing methods:
  • How to write stress scenarios
  • Types of stress (time, sunk cost, authority, fatigue)
  • Systematically plugging loopholes
  • Meta-testing techniques

Anti-Patterns

Narrative Examples

"In the 2025-10-03 session, we discovered empty projectDir caused..." Why bad: Too specific, not reusable

Multi-Language Dilution

example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden

Code in Flowcharts

dot
step1 [label="import fs"];
step2 [label="read file"];
Why bad: Can't copy-paste, hard to read

Generic Labels

helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning

Stop: Before Moving to Next Skill

After writing any skill, you must stop and complete the deployment process.
Don't:
  • Batch-create multiple skills without testing each one
  • Move to the next skill before validating the current one
  • Skip testing because "batch processing is more efficient"
The deployment checklist below is mandatory for every skill.
Deploying an untested skill = deploying untested code. This is a violation of quality standards.

Skill Creation Checklist (TDD-Adapted)

Important: Use TodoWrite to create todos for each checklist item below.
Red Phase - Write Failing Test:
  • Create stress scenarios (3+ combined stresses for discipline skills)
  • Run scenarios without skill - record baseline behavior verbatim
  • Identify patterns in rationalizations
Green Phase - Write Minimal Skill:
  • Name uses only letters, numbers, hyphens (no brackets/special characters)
  • YAML frontmatter includes required
    name
    and
    description
    fields (max 1024 characters; see spec)
  • Description starts with "Use when..." and includes specific trigger conditions/symptoms
  • Description uses third person
  • Full text includes search keywords (errors, symptoms, tools)
  • Clear overview with core principles
  • Addresses specific baseline failures identified in Red Phase
  • Code inline or linked to separate files
  • One excellent example (not multi-language)
  • Run scenarios with skill - verify agents now comply
Refactor Phase - Plug Loopholes:
  • Identify new rationalizations from testing
  • Add explicit rebuttals (discipline skills)
  • Build rationalization table from all test iterations
  • Create red line list
  • Retest until unbreakable
Quality Check:
  • Only use small flowcharts when decisions aren't obvious
  • Quick reference table
  • Common mistakes section
  • No narrative stories
  • Supporting files only for tools or large reference
Deployment:
  • Commit skill to git and push to your fork (if configured)
  • Consider contributing back via PR (if broadly useful)

Discovery Workflow

How future Claude instances find your skills:
  1. Encounter problem ("Tests are flaky")
  2. Find skill (description matches)
  3. Scan overview (Is this relevant?)
  4. Read pattern (Quick reference table)
  5. Load example (Only when implementing)
Optimize for this flow - Put searchable terms upfront and throughout.

Summary

Creating skills is TDD for process documentation.
Same iron rule: Don't write a skill without a failing test. Same cycle: Red (baseline) → Green (write skill) → Refactor (plug loopholes). Same benefits: Higher quality, fewer surprises, unbreakable results.
If you follow TDD for code, you should follow it for skills too. It's the same discipline applied to documentation.