speckit-checklist

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Spec Kit Checklist Skill

Spec Kit 检查清单技能

When to Use

适用场景

You need a requirements-quality checklist tailored to a feature or domain.

你需要一个针对特定功能或领域的需求质量检查清单。

Inputs

输入信息

The user's request describing the checklist focus and scope.
Existing artifacts (spec/plan/tasks) for context when available.

If the request is empty or unclear, ask a targeted question before continuing.

用户描述检查清单重点和范围的请求。
若有可用的现有工件（spec/计划/任务）作为上下文信息。

如果请求为空或表述模糊，请先提出针对性问题再继续。

Checklist Purpose: "Unit Tests for English"

检查清单用途：「英文需求的单元测试」

CRITICAL CONCEPT: Checklists are UNIT TESTS FOR REQUIREMENTS WRITING - they validate the quality, clarity, and completeness of requirements in a given domain.

NOT for verification/testing:

❌ NOT "Verify the button clicks correctly"
❌ NOT "Test error handling works"
❌ NOT "Confirm the API returns 200"
❌ NOT checking if code/implementation matches the spec

FOR requirements quality validation:

✅ "Are visual hierarchy requirements defined for all card types?" (completeness)
✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity)
✅ "Are hover state requirements consistent across all interactive elements?" (consistency)
✅ "Are accessibility requirements defined for keyboard navigation?" (coverage)
✅ "Does the spec define what happens when logo image fails to load?" (edge cases)

Metaphor: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works.

核心概念：检查清单是需求写作的单元测试——它们验证特定领域中需求的质量、清晰度和完整性。

不用于验证/测试实现：

❌ 不用于「验证按钮能否正确点击」
❌ 不用于「测试错误处理是否有效」
❌ 不用于「确认API返回200状态码」
❌ 不用于检查代码/实现是否与需求文档匹配

用于验证需求质量：

✅ 「是否为所有卡片类型定义了视觉层级需求？」（完整性）
✅ 「是否对‘突出显示’进行了量化，明确了具体尺寸/位置？」（清晰度）
✅ 「所有交互元素的悬停状态需求是否一致？」（一致性）
✅ 「是否为键盘导航定义了可访问性需求？」（覆盖范围）
✅ 「需求文档是否定义了Logo图片加载失败时的处理逻辑？」（边缘情况）

类比：如果你的需求文档是用英文写的「代码」，那么检查清单就是它的单元测试套件。你要测试的是需求是否撰写得当、完整、无歧义且可用于实现——而非实现后的功能是否正常工作。

Workflow

工作流程

Setup: Run
```
.specify/scripts/bash/check-prerequisites.sh --json
```
from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS list.
- All file paths must be absolute.
- For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'''m Groot' (or double-quote if possible: "I'm Groot").
Clarify intent (dynamic): Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST:
- Be generated from the user's phrasing + extracted signals from spec/plan/tasks
- Only ask about information that materially changes checklist content
- Be skipped individually if already unambiguous in the user's request
- Prefer precision over breadth
Generation algorithm:
1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts").
2. Cluster signals into candidate focus areas (max 4) ranked by relevance.
3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit.
4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria.
5. Formulate questions chosen from these archetypes:
  - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?")
  - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?")
  - Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?")
  - Audience framing (e.g., "Will this be used by the author only or peers during PR review?")
  - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?")
  - Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?")
Question formatting rules:
- If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters
- Limit to A–E options maximum; omit table if a free-form answer is clearer
- Never ask the user to restate what they already said
- Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope."
Defaults when interaction impossible:
- Depth: Standard
- Audience: Reviewer (PR) if code-related; Author otherwise
- Focus: Top 2 relevance clusters
Output the questions (label Q1/Q2/Q3). After answers: if ≥2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more.
Understand user request: Combine the user's request + clarifying answers:
- Derive checklist theme (e.g., security, review, deploy, ux)
- Consolidate explicit must-have items mentioned by user
- Map focus selections to category scaffolding
- Infer any missing context from spec/plan/tasks (do NOT hallucinate)
Load feature context: Read from FEATURE_DIR:
- spec.md: Feature requirements and scope
- plan.md (if exists): Technical details, dependencies
- tasks.md (if exists): Implementation tasks
Context Loading Strategy:
- Load only necessary portions relevant to active focus areas (avoid full-file dumping)
- Prefer summarizing long sections into concise scenario/requirement bullets
- Use progressive disclosure: add follow-on retrieval only if gaps detected
- If source docs are large, generate interim summary items instead of embedding raw text
Generate checklist - Create "Unit Tests for Requirements":
- Create
```
FEATURE_DIR/checklists/
```
  directory if it doesn't exist
- Generate unique checklist filename:
  - Use short, descriptive name based on domain (e.g.,
```
ux.md
```
    ,
```
api.md
```
    ,
```
security.md
```
    )
  - Format:
```
[domain].md
```
  - If file exists, append to existing file
- Number items sequentially starting from CHK001
- Each checklist run creates a NEW file (never overwrites existing checklists)
CORE PRINCIPLE - Test the Requirements, Not the Implementation: Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for:
- Completeness: Are all necessary requirements present?
- Clarity: Are requirements unambiguous and specific?
- Consistency: Do requirements align with each other?
- Measurability: Can requirements be objectively verified?
- Coverage: Are all scenarios/edge cases addressed?
Category Structure - Group items by requirement quality dimensions:
- Requirement Completeness (Are all necessary requirements documented?)
- Requirement Clarity (Are requirements specific and unambiguous?)
- Requirement Consistency (Do requirements align without conflicts?)
- Acceptance Criteria Quality (Are success criteria measurable?)
- Scenario Coverage (Are all flows/cases addressed?)
- Edge Case Coverage (Are boundary conditions defined?)
- Non-Functional Requirements (Performance, Security, Accessibility, etc. - are they specified?)
- Dependencies & Assumptions (Are they documented and validated?)
- Ambiguities & Conflicts (What needs clarification?)
HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English":

❌ WRONG (Testing implementation):
- "Verify landing page displays 3 episode cards"
- "Test hover states work on desktop"
- "Confirm logo click navigates home"
✅ CORRECT (Testing requirements quality):
- "Are the exact number and layout of featured episodes specified?" [Completeness]
- "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity]
- "Are hover state requirements consistent across all interactive elements?" [Consistency]
- "Are keyboard navigation requirements defined for all interactive UI?" [Coverage]
- "Is the fallback behavior specified when logo image fails to load?" [Edge Cases]
- "Are loading states defined for asynchronous episode data?" [Completeness]
- "Does the spec define visual hierarchy for competing UI elements?" [Clarity]
ITEM STRUCTURE: Each item should follow this pattern:
- Question format asking about requirement quality
- Focus on what's WRITTEN (or not written) in the spec/plan
- Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.]
- Reference spec section
```
[Spec §X.Y]
```
  when checking existing requirements
- Use
```
[Gap]
```
  marker when checking for missing requirements
EXAMPLES BY QUALITY DIMENSION:

Completeness:
- "Are error handling requirements defined for all API failure modes? [Gap]"
- "Are accessibility requirements specified for all interactive elements? [Completeness]"
- "Are mobile breakpoint requirements defined for responsive layouts? [Gap]"
Clarity:
- "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]"
- "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]"
- "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]"
Consistency:
- "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]"
- "Are card component requirements consistent between landing and detail pages? [Consistency]"
Coverage:
- "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]"
- "Are concurrent user interaction scenarios addressed? [Coverage, Gap]"
- "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]"
Measurability:
- "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]"
- "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]"
Scenario Classification & Coverage (Requirements Quality Focus):
- Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios
- For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?"
- If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]"
- Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]"
Traceability Requirements:
- MINIMUM: ≥80% of items MUST include at least one traceability reference
- Each item should reference: spec section
```
[Spec §X.Y]
```
  , or use markers:
```
[Gap]
```
  ,
```
[Ambiguity]
```
  ,
```
[Conflict]
```
  ,
```
[Assumption]
```
- If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]"
Surface & Resolve Issues (Requirements Quality Problems): Ask questions about the requirements themselves:
- Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]"
- Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]"
- Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]"
- Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]"
- Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]"
Content Consolidation:
- Soft cap: If raw candidate items > 40, prioritize by risk/impact
- Merge near-duplicates checking the same requirement aspect
- If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]"
🚫 ABSOLUTELY PROHIBITED - These make it an implementation test, not a requirements test:
- ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior
- ❌ References to code execution, user actions, system behavior
- ❌ "Displays correctly", "works properly", "functions as expected"
- ❌ "Click", "navigate", "render", "load", "execute"
- ❌ Test cases, test plans, QA procedures
- ❌ Implementation details (frameworks, APIs, algorithms)
✅ REQUIRED PATTERNS - These test requirements quality:
- ✅ "Are [requirement type] defined/specified/documented for [scenario]?"
- ✅ "Is [vague term] quantified/clarified with specific criteria?"
- ✅ "Are requirements consistent between [section A] and [section B]?"
- ✅ "Can [requirement] be objectively measured/verified?"
- ✅ "Are [edge cases/scenarios] addressed in requirements?"
- ✅ "Does the spec define [missing aspect]?"
Structure Reference: Generate the checklist following the canonical template in
```
.specify/templates/checklist-template.md
```
for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines,
```
##
```
category sections containing
```
- [ ] CHK### <requirement item>
```
lines with globally incrementing IDs starting at CHK001.
Report: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize:
- Focus areas selected
- Depth level
- Actor/timing
- Any explicit user-specified must-have items incorporated

Important: Each checklist run creates a checklist file using short, descriptive names unless the file already exists. This allows:

Multiple checklists of different types (e.g.,
```
ux.md
```
,
```
test.md
```
,
```
security.md
```
)
Simple, memorable filenames that indicate checklist purpose
Easy identification and navigation in the
```
checklists/
```
folder

To avoid clutter, use descriptive types and clean up obsolete checklists when done.

准备工作：从仓库根目录运行
```
.specify/scripts/bash/check-prerequisites.sh --json
```
并解析JSON以获取FEATURE_DIR和AVAILABLE_DOCS列表。
- 所有文件路径必须为绝对路径。
- 对于参数中的单引号（如"I'm Groot"），使用转义语法：例如 'I'''m Groot'（或尽可能使用双引号："I'm Groot"）。
明确意图（动态）：生成最多三个初始上下文澄清问题（无预设目录）。问题必须：
- 从用户表述+从需求文档/计划/任务中提取的信号生成
- 仅询问会实质性改变检查清单内容的信息
- 如果用户请求中已经表述明确，则跳过对应问题
- 优先精准性而非广度
生成算法：
1. 提取信号：功能领域关键词（如auth、latency、UX、API）、风险指标（「critical」「must」「compliance」）、利益相关者提示（「QA」「review」「security team」）以及明确的交付物（「a11y」「rollback」「contracts」）。
2. 将信号聚类为候选重点领域（最多4个）并按相关性排序。
3. 若未明确说明，则推断目标受众与适用时机（作者、评审者、QA、发布阶段）。
4. 检测缺失维度：范围广度、深度/严谨性、风险重点、排除边界、可衡量的验收标准。
5. 从以下原型中构建问题：
  - 范围细化（例如：「此检查清单是否应包含与X和Y的集成触点，还是仅局限于本地模块的正确性？」）
  - 风险优先级（例如：「这些潜在风险领域中，哪些应设置强制检查关卡？」）
  - 深度校准（例如：「这是一个轻量级的提交前检查清单，还是正式的发布关卡检查清单？」）
  - 受众定位（例如：「此检查清单仅由作者使用，还是供PR评审时的同行使用？」）
  - 边界排除（例如：「本次是否应明确排除性能调优相关项？」）
  - 场景类别缺失（例如：「未检测到恢复流程——回滚/部分失败路径是否在范围内？」）
问题格式规则：
- 若提供选项，生成紧凑表格，列标题为：选项 | 候选内容 | 重要性说明
- 选项最多限制为A-E；若自由格式回答更清晰，则省略表格
- 绝不要求用户重复已说明的内容
- 避免推测性分类（不凭空捏造）。若不确定，直接询问：「请确认X是否属于范围。」
无法交互时的默认值：
- 深度：标准
- 受众：若与代码相关则为评审者（PR阶段）；否则为作者
- 重点：前2个相关性最高的聚类
输出问题（标记为Q1/Q2/Q3）。收到答案后：若仍有≥2个场景类别（替代/异常/恢复/非功能领域）不明确，可最多再提出2个针对性跟进问题（Q4/Q5），每个问题附带一行理由（例如：「未解决的恢复路径风险」）。问题总数不得超过5个。若用户明确拒绝更多问题，则停止跟进。
理解用户请求：结合用户请求+澄清答案：
- 推导检查清单主题（如安全、评审、部署、UX）
- 整合用户提及的明确必备项
- 将重点选择映射到类别框架
- 从需求文档/计划/任务中推断缺失的上下文信息（不得凭空捏造）
加载功能上下文：从FEATURE_DIR中读取：
- spec.md：功能需求与范围
- plan.md（若存在）：技术细节、依赖关系
- tasks.md（若存在）：实现任务
上下文加载策略：
- 仅加载与当前重点领域相关的必要部分（避免全文件导入）
- 优先将长段落总结为简洁的场景/需求要点
- 逐步披露：仅在检测到信息缺口时才补充检索
- 若源文档较大，生成临时摘要而非嵌入原始文本
生成检查清单 - 创建「英文需求的单元测试」：
- 若
```
FEATURE_DIR/checklists/
```
  目录不存在则创建
- 生成唯一的检查清单文件名：
  - 使用基于领域的简短描述性名称（如
```
ux.md
```
    、
```
api.md
```
    、
```
security.md
```
    ）
  - 格式：
```
[domain].md
```
  - 若文件已存在，则追加内容到现有文件
- 从CHK001开始按顺序编号
- 每次运行检查清单都会创建新文件（绝不覆盖现有检查清单）
核心原则 - 测试需求，而非实现：每个检查清单项必须针对需求本身评估以下维度：
- 完整性：所有必要需求是否已记录？
- 清晰度：需求是否具体且无歧义？
- 一致性：需求之间是否一致？
- 可衡量性：需求能否被客观验证？
- 覆盖范围：所有场景/边缘情况是否已覆盖？
类别结构 - 按需求质量维度分组：
- 需求完整性（所有必要需求是否已文档化？）
- 需求清晰度（需求是否具体且无歧义？）
- 需求一致性（需求之间是否无冲突？）
- 验收标准质量（成功标准是否可衡量？）
- 场景覆盖（所有流程/案例是否已覆盖？）
- 边缘情况覆盖（边界条件是否已定义？）
- 非功能需求（性能、安全、可访问性等是否已明确？）
- 依赖关系与假设（是否已文档化并验证？）
- 歧义与冲突（哪些内容需要澄清？）
如何编写检查清单项 - 「英文需求的单元测试」：

❌ 错误示例（测试实现）：
- 「验证着陆页显示3个剧集卡片」
- 「测试桌面端悬停状态正常工作」
- 「确认Logo点击可导航至首页」
✅ 正确示例（测试需求质量）：
- 「是否明确指定了推荐剧集的数量和布局？」[完整性]
- 「是否对‘突出显示’进行了量化，明确了具体尺寸/位置？」[清晰度]
- 「所有交互元素的悬停状态需求是否一致？」[一致性]
- 「是否为所有交互式UI定义了键盘导航需求？」[覆盖范围]
- 「需求文档是否定义了Logo图片加载失败时的回退行为？」[边缘情况]
- 「是否为异步剧集数据定义了加载状态需求？」[完整性]
- 「需求文档是否定义了竞争UI元素的视觉层级？」[清晰度]
项结构：每个项应遵循以下模式：
- 以问题形式询问需求质量
- 聚焦于需求文档中已撰写（或未撰写）的内容
- 在括号中注明质量维度[完整性/清晰度/一致性等]
- 检查现有需求时引用需求文档章节
```
[Spec §X.Y]
```
- 检查缺失需求时使用
```
[Gap]
```
  标记
按质量维度划分的示例：

完整性：
- 「是否为所有API失败模式定义了错误处理需求？[Gap]」
- 「是否为所有交互元素指定了可访问性需求？[完整性]」
- 「是否为响应式布局定义了移动端断点需求？[Gap]」
清晰度：
- 「是否用具体时间阈值量化了‘快速加载’？[清晰度, Spec §NFR-2]」
- 「是否明确定义了‘相关剧集’的选择标准？[清晰度, Spec §FR-5]」
- 「是否用可衡量的视觉属性定义了‘突出’？[歧义, Spec §FR-4]」
一致性：
- 「所有页面的导航需求是否一致？[一致性, Spec §FR-10]」
- 「着陆页和详情页的卡片组件需求是否一致？[一致性]」
覆盖范围：
- 「是否为零状态场景（无剧集）定义了需求？[覆盖范围, 边缘情况]」
- 「是否覆盖了并发用户交互场景？[覆盖范围, Gap]」
- 「是否为部分数据加载失败定义了需求？[覆盖范围, 异常流程]」
可衡量性：
- 「视觉层级需求是否可衡量/可测试？[验收标准, Spec §FR-1]」
- 「‘平衡视觉权重’能否被客观验证？[可衡量性, Spec §FR-2]」
场景分类与覆盖（聚焦需求质量）：
- 检查是否存在针对以下场景的需求：主流程、替代流程、异常/错误流程、恢复流程、非功能场景
- 针对每个场景类别，询问：「[场景类型]需求是否完整、清晰且一致？」
- 若场景类别缺失：「[场景类型]需求是有意排除还是遗漏？[Gap]」
- 当存在状态变更时，包含弹性/回滚相关项：「是否为迁移失败定义了回滚需求？[Gap]」
可追溯性要求：
- 最低要求：≥80%的项必须包含至少一个可追溯性引用
- 每个项应引用：需求文档章节
```
[Spec §X.Y]
```
  ，或使用标记：
```
[Gap]
```
  、
```
[Ambiguity]
```
  、
```
[Conflict]
```
  、
```
[Assumption]
```
- 若无ID系统：「是否已建立需求与验收标准的ID体系？[可追溯性]」
发现并解决问题（需求质量问题）：针对需求本身提出问题：
- 歧义：「‘快速’一词是否用具体指标量化？[歧义, Spec §NFR-1]」
- 冲突：「§FR-10和§FR-10a中的导航需求是否存在冲突？[冲突]」
- 假设：「‘播客API始终可用’这一假设是否已验证？[假设]」
- 依赖关系：「是否已文档化外部播客API的需求？[依赖关系, Gap]」
- 缺失定义：「是否用可衡量的标准定义了‘视觉层级’？[Gap]」
内容整合：
- 软限制：若原始候选项超过40个，按风险/影响优先级排序
- 合并检查同一需求方面的近似重复项
- 若有超过5个低影响边缘情况，合并为一个项：「需求中是否覆盖了边缘情况X、Y、Z？[覆盖范围]」
🚫 绝对禁止 - 以下内容会使其变成实现测试，而非需求测试：
- ❌ 任何以「Verify」「Test」「Confirm」「Check」+ 实现行为开头的项
- ❌ 提及代码执行、用户操作、系统行为
- ❌ 「显示正确」「正常工作」「符合预期」
- ❌ 「点击」「导航」「渲染」「加载」「执行」
- ❌ 测试用例、测试计划、QA流程
- ❌ 实现细节（框架、API、算法）
✅ 要求模式 - 以下模式用于测试需求质量：
- ✅ 「是否为[场景]定义/指定/文档化了[需求类型]？」
- ✅ 「是否用具体标准量化/澄清了[模糊术语]？」
- ✅ 「[章节A]和[章节B]的需求是否一致？」
- ✅ 「[需求]能否被客观衡量/验证？」
- ✅ 「需求中是否覆盖了[边缘情况/场景]？」
- ✅ 「需求文档是否定义了[缺失方面]？」
结构参考：按照
```
.specify/templates/checklist-template.md
```
中的规范模板生成检查清单，包括标题、元数据部分、类别标题和ID格式。若模板不可用，使用以下结构：H1标题、用途/创建时间元数据行、
```
##
```
级别分类章节，包含
```
- [ ] CHK### <需求项>
```
行，ID从CHK001开始全局递增。
报告：输出已创建检查清单的完整路径、项数量，并提醒用户每次运行都会创建新文件。总结：
- 选定的重点领域
- 深度级别
- 角色/时机
- 已纳入的用户明确指定的必备项

重要说明：每次检查清单运行都会使用简短的描述性名称创建文件，除非文件已存在。这允许：

存在不同类型的多个检查清单（如
```
ux.md
```
、
```
test.md
```
、
```
security.md
```
）
简单易记的文件名，明确检查清单用途
便于在
```
checklists/
```
文件夹中识别和导航

为避免杂乱，请使用描述性类型，并在完成后清理过时的检查清单。

Example Checklist Types & Sample Items

示例检查清单类型与样本项

UX Requirements Quality:

ux.md

Sample items (testing the requirements, NOT the implementation):

"Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]"
"Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]"
"Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]"
"Are accessibility requirements specified for all interactive elements? [Coverage, Gap]"
"Is fallback behavior defined when images fail to load? [Edge Case, Gap]"
"Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]"

API Requirements Quality:

api.md

Sample items:

"Are error response formats specified for all failure scenarios? [Completeness]"
"Are rate limiting requirements quantified with specific thresholds? [Clarity]"
"Are authentication requirements consistent across all endpoints? [Consistency]"
"Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]"
"Is versioning strategy documented in requirements? [Gap]"

Performance Requirements Quality:

performance.md

Sample items:

"Are performance requirements quantified with specific metrics? [Clarity]"
"Are performance targets defined for all critical user journeys? [Coverage]"
"Are performance requirements under different load conditions specified? [Completeness]"
"Can performance requirements be objectively measured? [Measurability]"
"Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]"

Security Requirements Quality:

security.md

Sample items:

"Are authentication requirements specified for all protected resources? [Coverage]"
"Are data protection requirements defined for sensitive information? [Completeness]"
"Is the threat model documented and requirements aligned to it? [Traceability]"
"Are security requirements consistent with compliance obligations? [Consistency]"
"Are security failure/breach response requirements defined? [Gap, Exception Flow]"

UX需求质量检查：

ux.md

样本项（测试需求，而非实现）：

「是否用可衡量的标准定义了视觉层级需求？[清晰度, Spec §FR-1]」
「是否明确指定了UI元素的数量和位置？[完整性, Spec §FR-1]」
「交互状态需求（悬停、聚焦、激活）是否一致定义？[一致性]」
「是否为所有交互元素指定了可访问性需求？[覆盖范围, Gap]」
「是否定义了图片加载失败时的回退行为？[边缘情况, Gap]」
「‘突出显示’能否被客观衡量？[可衡量性, Spec §FR-4]」

API需求质量检查：

api.md

样本项：

「是否为所有失败场景指定了错误响应格式？[完整性]」
「是否用具体阈值量化了速率限制需求？[清晰度]」
「所有端点的认证需求是否一致？[一致性]」
「是否为外部依赖定义了重试/超时需求？[覆盖范围, Gap]」
「需求文档中是否记录了版本控制策略？[Gap]」

性能需求质量检查：

performance.md

样本项：

「是否用具体指标量化了性能需求？[清晰度]」
「是否为所有关键用户旅程定义了性能目标？[覆盖范围]」
「是否指定了不同负载条件下的性能需求？[完整性]」
「性能需求能否被客观衡量？[可衡量性]」
「是否为高负载场景定义了降级需求？[边缘情况, Gap]」

安全需求质量检查：

security.md

样本项：

「是否为所有受保护资源指定了认证需求？[覆盖范围]」
「是否为敏感信息定义了数据保护需求？[完整性]」
「是否已文档化威胁模型，且需求与之一致？[可追溯性]」
「安全需求是否符合合规义务？[一致性]」
「是否定义了安全故障/ breach响应需求？[Gap, 异常流程]」

Anti-Examples: What NOT To Do

反例：请勿这样做

❌ WRONG - These test implementation, not requirements:

markdown

- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001]
- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003]
- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010]
- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005]

✅ CORRECT - These test requirements quality:

markdown

- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001]
- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003]
- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010]
- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]

Key Differences:

Wrong: Tests if the system works correctly
Correct: Tests if the requirements are written correctly
Wrong: Verification of behavior
Correct: Validation of requirement quality
Wrong: "Does it do X?"
Correct: "Is X clearly specified?"

❌ 错误示例 - 测试实现而非需求：

markdown

- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001]
- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003]
- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010]
- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005]

✅ 正确示例 - 测试需求质量：

markdown

- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001]
- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003]
- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010]
- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]

核心差异：

错误：测试系统是否正常工作
正确：测试需求是否撰写得当
错误：验证行为
正确：验证需求质量
错误：「它是否能做X？」
正确：「X是否被明确指定？」

Outputs

输出

```
specs/<feature>/checklists/<domain>.md
```
(new checklist file per run)

```
specs/<feature>/checklists/<domain>.md
```
（每次运行生成新的检查清单文件）