verification-before-completion

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Verification Before Completion Skill

完成前验证技能

Purpose

目的

Enforce rigorous verification before declaring any task complete. Implements defense-in-depth validation with multiple independent checks to catch errors before they reach users. Never say "done", "fixed", or "complete" without running actual verification steps.
在宣布任何任务完成前强制进行严格验证。通过多重独立检查实现纵深防御式验证,在错误影响用户前将其发现。未执行实际验证步骤,绝不说“完成”“修复”或“搞定”。

Operator Context

操作上下文

This skill operates as an operator for code quality assurance workflows, configuring Claude's behavior for defensive verification and thorough validation before task completion.
本技能作为代码质量保障工作流的操作器,配置Claude在任务完成前执行防御性验证和全面校验的行为。

Hardcoded Behaviors (Always Apply)

硬编码行为(始终适用)

  • CLAUDE.md Compliance: Read and follow repository CLAUDE.md files before verification
  • Over-Engineering Prevention: Only verify what was actually changed. Don't add verification steps that weren't requested. Keep validation focused on the specific changes made.
  • Never declare completion without tests: ALWAYS run relevant tests before saying "done"
  • Show complete verification output: Display full test results, build output, validation messages
  • Check all changed files: Review every file modification with Read tool
  • Validate assumptions: Verify that what you think happened actually happened
  • No summarization: Never say "tests pass" - show the actual test output
  • Adversarial Distrust: Never trust executor claims. Summary claims document what was SAID, not what IS. Verify what ACTUALLY exists in the codebase by inspecting files, running commands, and tracing data flow. WHY: The same agent that writes code has inherent bias toward believing its own output is correct. Structural distrust counteracts this bias.
  • CLAUDE.md 合规性:验证前阅读并遵循仓库中的CLAUDE.md文件
  • 防止过度设计:仅验证实际变更的内容。不要添加未被要求的验证步骤。保持验证聚焦于具体变更。
  • 未测试绝不宣布完成:在说“完成”前务必运行相关测试
  • 显示完整验证输出:展示完整的测试结果、构建输出、验证消息
  • 检查所有变更文件:使用Read工具审核每一个文件修改
  • 验证假设:确认你认为发生的变更确实已发生
  • 禁止总结:绝不说“测试通过”——要展示实际的测试输出
  • 对抗性不信任:绝不信任执行器的声明。总结性声明仅记录“所说内容”,而非“实际情况”。通过检查文件、运行命令、追踪数据流来验证代码库中“实际存在”的内容。原因:编写代码的代理天生倾向于认为自己的输出是正确的,结构性不信任可以抵消这种偏见。

Default Behaviors (ON unless disabled)

默认行为(启用状态,除非被禁用)

  • Communication Style: Report verification results concisely without self-congratulation. Show command output rather than describing it. Be factual and direct.
  • Temporary File Cleanup: Remove any temporary files, test artifacts, or debug outputs created during verification at task completion.
  • Run full test suite: Execute complete test suite for the affected domain
  • Verify build succeeds: Run build commands to ensure compilation/bundling works
  • Check for regressions: Test that existing functionality still works
  • 沟通风格:简洁报告验证结果,不自我夸赞。展示命令输出而非描述输出。保持事实性和直接性。
  • 临时文件清理:任务完成后删除验证过程中创建的所有临时文件、测试工件或调试输出
  • 运行完整测试套件:执行受影响领域的完整测试套件
  • 验证构建成功:运行构建命令确保编译/打包正常工作
  • 检查回归问题:测试现有功能是否仍能正常运行

Optional Behaviors (OFF unless enabled)

可选行为(禁用状态,除非被启用)

  • Run integration tests: Execute full integration test suite (slow)
  • Performance testing: Run benchmarks to check for performance regressions
  • Security scanning: Run security analysis tools
  • 运行集成测试:执行完整的集成测试套件(速度较慢)
  • 性能测试:运行基准测试检查是否存在性能回归
  • 安全扫描:运行安全分析工具

What This Skill CAN Do

本技能可完成的工作

  • Run domain-specific test suites (Python, Go, JavaScript)
  • Verify build/compilation succeeds
  • Check for unintended changes via git diff
  • Validate changed files by reading them
  • Detect debug statements and sensitive data left in code
  • Perform 4-level artifact verification (EXISTS, SUBSTANTIVE, WIRED, DATA FLOWS)
  • Apply goal-backward framing to decompose completion into verifiable conditions
  • Run automated stub detection and anti-pattern scans against changed files
  • 运行特定领域的测试套件(Python、Go、JavaScript等)
  • 验证构建/编译是否成功
  • 通过git diff检查意外变更
  • 通过读取文件验证变更内容
  • 检测代码中遗留的调试语句和敏感数据
  • 执行4级工件验证(EXISTS、SUBSTANTIVE、WIRED、DATA FLOWS)
  • 应用目标倒推框架将完成条件分解为可验证项
  • 对变更文件执行自动存根检测和反模式扫描

What This Skill CANNOT Do

本技能不可完成的工作

  • Declare task complete without running tests
  • Summarize test output (must show full output)
  • Skip verification for "simple" changes
  • Ignore test failures as "pre-existing"
  • Mark complete when any verification step fails
  • 未运行测试就宣布任务完成
  • 总结测试输出(必须展示完整输出)
  • 跳过“简单”变更的验证
  • 将测试失败视为“预先存在”而忽略
  • 任何验证步骤失败时标记为完成

Instructions

操作步骤

Step 1: Identify What Changed

步骤1:识别变更内容

Before verification, understand the scope of changes:
bash
undefined
验证前,先了解变更范围:
bash
undefined

For git repositories

针对git仓库

git diff --name-only

Use `git status --short` (not just `git diff`) to capture both modified AND untracked (new) files. New files created during the session are easy to miss in status summaries.

For each changed file:
- Read the file with the Read tool
- Summarize what changed
- Identify affected systems/modules and dependencies

Report separately:
- **New files**: [files with `??` or `A` status in git]
- **Modified files**: [files with `M` status]
git diff --name-only

使用`git status --short`(而非仅`git diff`)来捕获已修改和未跟踪(新增)的文件。会话期间创建的新文件很容易在状态总结中被遗漏。

对于每个变更文件:
- 使用Read工具读取文件
- 总结变更内容
- 识别受影响的系统/模块和依赖项

分开报告:
- **新增文件**:[git中状态为`??`或`A`的文件]
- **修改文件**:[git中状态为`M`的文件]

Step 2: Run Domain-Specific Tests

步骤2:运行特定领域测试

Run the appropriate test suite and show complete output (not summaries):
LanguageTest CommandBuild CommandLint Command
Python
pytest -v
python -m py_compile {files}
ruff check {files}
Go
go test ./... -v -race
go build ./...
golangci-lint run ./...
JavaScript
npm test
npm run build
npm run lint
TypeScript
npm test
npx tsc --noEmit
npm run lint
Rust
cargo test
cargo build
cargo clippy
Output Requirements:
  • Show COMPLETE test output (not "X tests passed")
  • Display all test names that ran
  • Show any warnings or deprecation notices
  • Include execution time
运行合适的测试套件并展示完整输出(而非总结):
语言测试命令构建命令lint命令
Python
pytest -v
python -m py_compile {files}
ruff check {files}
Go
go test ./... -v -race
go build ./...
golangci-lint run ./...
JavaScript
npm test
npm run build
npm run lint
TypeScript
npm test
npx tsc --noEmit
npm run lint
Rust
cargo test
cargo build
cargo clippy
输出要求:
  • 展示完整的测试输出(而非“X个测试通过”)
  • 显示所有运行的测试名称
  • 展示任何警告或弃用通知
  • 包含执行时间

Step 3: Verify Build/Compilation

步骤3:验证构建/编译

Run the build command from the table above and show the full output. Confirm:
  • Build completes without errors
  • No new warnings introduced
  • Output artifacts are created (if applicable)
bash
undefined
运行上表中的构建命令并展示完整输出。确认:
  • 构建完成且无错误
  • 未引入新警告
  • 生成了输出工件(如适用)
bash
undefined

Example: Go project

示例:Go项目

go build ./...
go build ./...

Example: Python - check syntax of changed files

示例:Python - 检查变更文件的语法

python -m py_compile path/to/changed_file.py
python -m py_compile path/to/changed_file.py

Example: JavaScript/TypeScript

示例:JavaScript/TypeScript

npm run build

If the build fails, stop immediately. Fix build issues before proceeding to any other verification step. Re-run from Step 1 after fixing.
npm run build

如果构建失败,立即停止。修复构建问题后再进行其他验证步骤。修复后从步骤1重新运行验证。

Step 4: Validate Changed Files

步骤4:验证变更文件

For each changed file, use the Read tool to inspect the actual file contents. Do not rely on memory of what you wrote -- re-read the file to confirm.
For each file verify:
  1. Syntax is correct (no unterminated strings, mismatched brackets)
  2. Logic makes sense (no inverted conditions, off-by-one errors)
  3. Formatting is consistent with surrounding code
  4. Imports/dependencies are present and correct
  5. No leftover artifacts (commented-out code, placeholder values, TODO markers)
对于每个变更文件,使用Read工具检查实际文件内容。不要依赖对已编写内容的记忆——重新读取文件以确认。
对于每个文件,验证:
  1. 语法正确(无未终止字符串、不匹配的括号)
  2. 逻辑合理(无反转条件、差一错误)
  3. 格式与周围代码一致
  4. 导入/依赖存在且正确
  5. 无遗留工件(注释掉的代码、占位符值、TODO标记)

Step 5: Check for Unintended Changes

步骤5:检查意外变更

bash
undefined
bash
undefined

Check git diff for unexpected changes

检查git diff中的意外变更

git diff
git diff

Look for debug code that should be removed

查找应移除的调试代码

grep -r "console.log|print(|fmt.Println|debugger|pdb.set_trace" {changed_files}
grep -r "console.log\|print(\|fmt.Println\|debugger\|pdb.set_trace" {changed_files}

Check for TODO/FIXME comments that should be resolved

检查应解决的TODO/FIXME注释

grep -r "TODO|FIXME|HACK|XXX" {changed_files}
grep -r "TODO\|FIXME\|HACK\|XXX" {changed_files}

Verify no sensitive data

验证无敏感数据

grep -r "password|secret|api_key|token" {changed_files}

If `git diff` shows changes to files you didn't intend to modify, investigate before proceeding. Unintended changes are a red flag for accidental side effects.
grep -r "password\|secret\|api_key\|token" {changed_files}

如果`git diff`显示了你无意修改的文件,先调查再继续。意外变更是意外副作用的危险信号。

Step 6: Review Verification Checklist

步骤6:审查验证检查清单

Core Verification (Required):
  • Tests pass (actual output shown)
  • Build succeeds (actual output shown)
  • Changed files reviewed (Read tool used)
  • No unintended changes (diff checked)
  • No debug/console statements left
  • No sensitive data exposed
Extended Verification (Recommended):
  • Documentation updated if needed
  • No new warnings introduced
  • Error handling adequate
  • Backwards compatibility maintained
核心验证(必填):
  • 测试通过(展示实际输出)
  • 构建成功(展示实际输出)
  • 变更文件已审查(使用了Read工具)
  • 无意外变更(已检查diff)
  • 无遗留调试/console语句
  • 无敏感数据暴露
扩展验证(推荐):
  • 必要时已更新文档
  • 未引入新警告
  • 错误处理充分
  • 保持了向后兼容性

Step 7: Final Verification Statement

步骤7:最终验证声明

ONLY AFTER all checks pass, provide verification statement:
Verification Complete

**Tests Run:**
{paste actual test output}

**Build Status:**
{paste actual build output}

**Files Verified:**
- {file1}: Reviewed, syntax valid, logic correct
- {file2}: Reviewed, syntax valid, logic correct

**Checklist Status:** X/X core checks passed

Test if this addresses the issue.
NEVER say:
  • "Should be fixed now"
  • "This is working"
  • "All done"
  • "Tests pass" (without showing output)
ALWAYS say:
  • "Test if this addresses the issue"
  • "Please verify the changes work for your use case"

仅在所有检查通过后,提供验证声明:
验证完成

**已运行测试:**
{粘贴实际测试输出}

**构建状态:**
{粘贴实际构建输出}

**已验证文件:**
- {file1}:已审查,语法有效,逻辑正确
- {file2}:已审查,语法有效,逻辑正确

**检查清单状态:** X/X项核心检查已通过

请测试此变更是否解决了问题。
绝不说:
  • "现在应该修复了"
  • "这个可以运行了"
  • "全部完成"
  • "测试通过"(不展示输出)
必须说:
  • "请测试此变更是否解决了问题"
  • "请验证变更是否符合你的使用场景"

Adversarial Verification Methodology

对抗性验证方法论

Core Principle: Never trust executor claims. The verification question is not "did the executor say it's done?" but "does the codebase prove it's done?"
Steps 1-7 above verify that tests pass, builds succeed, and files contain what you expect. The adversarial methodology below goes deeper: it verifies that artifacts are real implementations (not stubs), actually integrated (not orphaned), and processing real data (not hardcoded empties). Apply this methodology after Steps 1-7 pass, focusing on artifacts that are part of the stated goal.
核心原则:绝不信任执行器的声明。验证的问题不是“执行器说完成了吗?”,而是“代码库能证明完成了吗?”
上述步骤1-7验证了测试通过、构建成功以及文件包含预期内容。以下对抗性方法论更深入:它验证工件是真实实现(而非存根)、已实际集成(而非孤立)、并处理真实数据(而非硬编码空值)。在步骤1-7通过后应用此方法论,重点关注与目标相关的工件。

Goal-Backward Framing

目标倒推框架

Do NOT ask: "Were all tasks completed?" Instead ask: "What must be TRUE for the goal to be achieved?"
This framing matters because task-forward verification invites the executor to confirm its own narrative. Goal-backward verification derives conditions independently from the goal itself, then checks whether the codebase satisfies them.
Procedure:
  1. State the goal as a testable condition: Express what the user asked for as a concrete, verifiable outcome.
    • Example: "Users can create a PR with quality scoring that blocks merges below threshold"
  2. Decompose into must-be-true conditions: Break the goal into independent conditions that must ALL hold.
    • "A scoring function exists" (L1)
    • "It contains real scoring logic, not stubs" (L2)
    • "It is called by the PR pipeline" (L3)
    • "It receives actual PR data and its score affects the merge gate" (L4)
  3. Verify each condition independently at the appropriate level using the 4-Level system below.
  4. Report unverified conditions as blockers -- not "you missed a task" but "this condition is not yet true in the codebase."
不要问: "所有任务都完成了吗?" 而是问: "要实现目标,哪些条件必须为真?"
这种框架很重要,因为任务向前的验证会让执行器倾向于确认自己的叙事,而目标倒推的验证从目标本身独立推导条件,然后检查代码库是否满足这些条件。
流程:
  1. 将目标表述为可测试条件:将用户需求转化为具体、可验证的结果。
    • 示例:"用户可以创建带有质量评分的PR,评分低于阈值时会阻止合并"
  2. 分解为必须为真的条件:将目标分解为所有必须满足的独立条件。
    • "存在评分函数"(L1)
    • "包含真实评分逻辑,而非存根"(L2)
    • "被PR流水线调用"(L3)
    • "接收实际PR数据,其评分会影响合并 gate"(L4)
  3. 使用以下4级系统,在适当级别独立验证每个条件
  4. 报告未验证的条件作为阻塞项——不说“你遗漏了任务”,而是说“代码库中此条件尚未成立”。

4-Level Artifact Verification

4级工件验证

Each artifact produced during the task is verified at four progressively deeper levels. Higher levels subsume lower ones -- an artifact at Level 4 has passed Levels 1-3 by definition.
WHY four levels: Existence checks (L1) catch forgotten writes. Substance checks (L2) catch stubs. Wiring checks (L3) catch orphaned files. Data flow checks (L4) catch integration that exists structurally but passes no real data. Each level catches a distinct class of premature-completion failure.
任务中生成的每个工件都要经过四个逐步深入的验证级别。高级别包含低级别——Level4的工件默认已通过Level1-3的验证。
为什么是4级? 存在性检查(L1)捕获遗忘的写入操作。实质性检查(L2)捕获存根。连接检查(L3)捕获孤立文件。数据流检查(L4)捕获结构上存在但未传递真实数据的集成。每个级别捕获一类不同的提前完成错误。

Level 1: EXISTS

Level1:EXISTS(存在性)

The file is present on disk.
Check: Use Glob or Bash (
ls
,
test -f
) to confirm the file exists.
What this catches: Claims about files that were never created (forgotten Write calls, planned-but-not-executed steps).
What this misses: Everything else. Existence is necessary but nowhere near sufficient.
文件实际存在于磁盘上。
检查方式:使用Glob或Bash命令(
ls
test -f
)确认文件存在。
能捕获的问题:声称已创建但实际未创建的文件(遗忘的Write调用、计划但未执行的步骤)。
无法捕获的问题:所有其他问题。存在性是必要条件,但远远不够。

Level 2: SUBSTANTIVE

Level2:SUBSTANTIVE(实质性)

The file contains real logic, not placeholder implementations.
Check: Scan for stub indicators using Grep against changed files. See the Stub Detection Patterns table below.
What this catches: Files that exist but contain no real implementation -- the most common form of premature completion claim.
What this misses: Code that has logic but wrong logic, or logic that handles only the happy path.
文件包含真实逻辑,而非占位实现。
检查方式:使用Grep扫描变更文件中的存根标识。见下方存根检测模式表。
能捕获的问题:存在但无真实实现的文件——最常见的提前完成声明形式。
无法捕获的问题:有逻辑但逻辑错误,或仅处理正常路径的代码。

Level 3: WIRED

Level3:WIRED(已连接)

The artifact is imported AND used by other code in the codebase.
Check:
  1. Search for import/require statements referencing the artifact
  2. Verify the imported symbols are actually called (not just imported)
  3. Check that the call sites pass real arguments (not empty objects or nil)
bash
undefined
工件已被代码库中的其他代码导入并使用。
检查方式
  1. 搜索引用该工件的import/require语句
  2. 验证导入的符号实际被调用(而非仅导入)
  3. 检查调用点传递的是真实参数(而非空对象或nil)
bash
undefined

Example: Check if scoring.py is imported anywhere

示例:检查scoring.py是否在任何地方被导入

grep -r "from.*scoring import|import.scoring" --include=".py" .
grep -r "from.*scoring import\|import.scoring" --include=".py" .

Example: Check if the imported function is actually called

示例:检查导入的函数是否实际被调用

grep -r "calculate_score|score_package" --include="*.py" .

**What this catches**: Orphaned files that were created but never integrated.

**What this misses**: Circular or dead-end wiring where the integration exists but the code path is never reached at runtime.
grep -r "calculate_score\|score_package" --include="*.py" .

**能捕获的问题**:已创建但从未集成的孤立文件。

**无法捕获的问题**:循环或死路连接,即集成存在但运行时永远不会到达该代码路径。

Level 4: DATA FLOWS

Level4:DATA FLOWS(数据流)

Real data reaches the artifact and real results come out.
Check:
  1. Trace the call chain from entry point to the artifact
  2. Verify inputs are not hardcoded empty values (
    []
    ,
    {}
    ,
    ""
    ,
    0
    )
  3. Verify outputs are consumed by downstream code (not discarded)
  4. If tests exist, verify test inputs exercise meaningful cases (not just empty-input tests)
What this catches: Integration that exists structurally but passes no real data -- functions wired in but fed empty arrays, handlers registered but never triggered.
What this misses: Semantic correctness (the data flows but produces wrong results). That is the domain of testing, not verification.
真实数据到达工件,且有真实结果输出。
检查方式
  1. 从入口点到工件追踪调用链
  2. 验证输入不是硬编码的空值(
    []
    {}
    ""
    0
  3. 验证输出被下游代码消费(而非丢弃)
  4. 如果存在测试,验证测试输入覆盖了有意义的场景(而非仅空输入测试)
能捕获的问题:结构上存在但未传递真实数据的集成——已连接但仅接收空数组的函数、已注册但从未触发的处理器。
无法捕获的问题:语义正确性(数据流正常但产生错误结果),这属于测试领域,而非验证领域。

Stub Detection Patterns

存根检测模式

Scan changed files for these patterns. A match does not automatically mean failure --
return []
is sometimes correct -- but each match requires investigation to confirm the empty return or placeholder is intentional.
PatternLanguageIndicates
return []
Python, JS/TSEmpty list return -- may be stub if function should compute results
return {}
Python, JS/TSEmpty dict/object return -- may be stub if function should build a structure
return None
PythonSole return in non-optional function -- likely stub
return nil, nil
GoReturning no value and no error -- likely stub
return nil
GoSingle nil return in a function expected to produce a value
pass
(as sole body)
PythonEmpty function body -- definite stub
...
(Ellipsis as body)
PythonProtocol/abstract stub -- should not appear in concrete implementations
() => {}
JS/TSEmpty arrow function -- no-op handler
onClick={() => {}}
JSX/TSXEmpty click handler -- UI wired but non-functional
throw new Error("not implemented")
JS/TSExplicit "not done" marker
panic("not implemented")
GoExplicit "not done" marker
raise NotImplementedError
PythonExplicit "not done" marker
TODO
,
FIXME
,
HACK
,
XXX
AnyMarkers for incomplete work (in non-test files)
PLACEHOLDER
,
stub
,
mock
AnySelf-described placeholder code (in non-test files)
"coming soon"
,
"not yet implemented"
AnyPlaceholder UI/API text
Automated scan command (run against files changed in the current task):
bash
undefined
扫描变更文件中的以下模式。匹配并不自动意味着失败——
return []
有时是正确的——但每个匹配都需要调查以确认空返回或占位符是有意为之。
模式语言指示内容
return []
Python、JS/TS返回空列表——如果函数应计算结果,则可能是存根
return {}
Python、JS/TS返回空字典/对象——如果函数应构建结构,则可能是存根
return None
Python非可选函数中唯一的返回语句——很可能是存根
return nil, nil
Go返回无值和无错误——很可能是存根
return nil
Go预期产生值的函数中仅返回nil
pass
(作为函数体唯一内容)
Python空函数体——明确的存根
...
(作为函数体的省略号)
Python协议/抽象存根——不应出现在具体实现中
() => {}
JS/TS空箭头函数——无操作处理器
onClick={() => {}}
JSX/TSX空点击处理器——UI已连接但无功能
throw new Error("not implemented")
JS/TS明确的“未实现”标记
panic("not implemented")
Go明确的“未实现”标记
raise NotImplementedError
Python明确的“未实现”标记
TODO
FIXME
HACK
XXX
任何语言未完成工作的标记(非测试文件中)
PLACEHOLDER
stub
mock
任何语言自描述的占位符代码(非测试文件中)
"coming soon"
"not yet implemented"
任何语言占位符UI/API文本
自动扫描命令(针对当前任务中变更的文件):
bash
undefined

Get changed files relative to base branch

获取相对于基准分支的变更文件

changed_files=$(git diff --name-only main...HEAD)
changed_files=$(git diff --name-only main...HEAD)

Scan for stub patterns (adjust base branch as needed)

扫描存根模式(根据需要调整基准分支)

grep -n -E "(return []|return {}|return None|return nil|pass$|raise NotImplementedError|panic("not implemented")|throw new Error("not implemented")|TODO|FIXME|HACK|XXX|PLACEHOLDER)" $changed_files

Review each match. If the pattern is intentional (e.g., a function that genuinely returns an empty list), note it in the verification report. If it is a stub, flag it as a blocker.
grep -n -E "(return \[\]|return \{\}|return None|return nil|pass$|raise NotImplementedError|panic\("not implemented"\)|throw new Error\("not implemented"\)|TODO|FIXME|HACK|XXX|PLACEHOLDER)" $changed_files

审查每个匹配项。如果模式是有意为之(例如函数确实应返回空列表),则在验证报告中注明。如果是存根,则标记为阻塞项。

Anti-Pattern Scan

反模式扫描

Beyond stub detection, scan for patterns that correlate with premature completion claims:
Log-only functions -- functions whose entire body is a log/print statement with no real logic:
bash
undefined
除存根检测外,扫描与提前完成声明相关的模式:
仅日志函数——函数体仅包含日志/打印语句,无真实逻辑:
bash
undefined

Python: functions that only log

Python:仅包含日志的函数

grep -A2 "def " $changed_files | grep -B1 "logging.|print(" | grep "def "

**Empty handlers** -- event handlers that prevent default but do nothing else:
```bash
grep -n "onSubmit.*preventDefault" $changed_files
grep -n "handler.*{\\s*}" $changed_files
Placeholder text in non-test files:
bash
grep -n -i "(placeholder|example data|test data|lorem ipsum)" $changed_files
Dead imports -- modules imported but never used:
bash
undefined
grep -A2 "def " $changed_files | grep -B1 "logging\.\|print(" | grep "def "

**空处理器**——阻止默认行为但不执行其他操作的事件处理器:
```bash
grep -n "onSubmit.*preventDefault" $changed_files
grep -n "handler.*{\\\\s*}" $changed_files
非测试文件中的占位文本
bash
grep -n -i "(placeholder|example data|test data|lorem ipsum)" $changed_files
无用导入——已导入但从未使用的模块:
bash
undefined

Python: imported but not referenced later in the file

Python:已导入但文件后续未引用

(manual check -- read the file and verify each import is used)

(手动检查——读取文件并验证每个导入都被使用)

undefined
undefined

Verification Report Format

验证报告格式

After completing 4-level verification, produce a structured report. This replaces the simpler verification statement in Step 7 when adversarial verification applies.
markdown
undefined
完成4级验证后,生成结构化报告。当应用对抗性验证时,此报告替代步骤7中的简单验证声明。
markdown
undefined

Verification Report

验证报告

Goal

目标

[Stated goal as a testable condition]
[可测试条件形式的目标陈述]

Conditions

条件

ConditionL1L2L3L4Status
[condition 1]Y/NY/NY/NY/N/-VERIFIED / INCOMPLETE -- [reason]
[condition 2]Y/NY/NY/NY/N/-VERIFIED / INCOMPLETE -- [reason]
条件L1L2L3L4状态
[条件1]是/否是/否是/否是/否/-已验证 / 未完成 -- [原因]
[条件2]是/否是/否是/否是/否/-已验证 / 未完成 -- [原因]

Blockers

阻塞项

  • [Any condition not verified at the required level]
  • [任何未在要求级别验证的条件]

Stub Scan Results

存根扫描结果

  • [N matches found, M confirmed intentional, K flagged as blockers]
  • [发现N个匹配项,M个确认是有意为之,K个标记为阻塞项]

Verdict

结论

COMPLETE / NOT COMPLETE -- [summary]

Use `-` in a level column when that level does not apply (e.g., a configuration file does not need L3 wiring checks).
已完成 / 未完成 -- [总结]

当某级别不适用时,在该级别列中使用`-`(例如配置文件不需要L3连接检查)。

When to Apply Each Level

各级别适用场景

Not every artifact needs Level 4 verification. Applying deep verification to trivial changes wastes resources.
Artifact TypeMinimum LevelRationale
Core feature code (new modules, handlers, logic)Level 4Must prove data flows end-to-end
Configuration files, YAML, envLevel 1Existence is sufficient -- content verified by build/tests
Test filesLevel 2Must be substantive (not empty test stubs), but wiring is implicit
Documentation, README, commentsLevel 1Existence check only
Integration glue (imports, routing, wiring)Level 3Must be wired, but data flow verified through the module it connects
Bug fixes to existing codeLevel 2 + testsSubstance verified, plus tests must cover the fix
并非所有工件都需要Level4验证。对微小变更应用深度验证会浪费资源。
工件类型最低要求级别理由
核心功能代码(新模块、处理器、逻辑)Level4必须证明端到端数据流正常
配置文件、YAML、环境变量Level1存在性已足够——内容由构建/测试验证
测试文件Level2必须是实质性的(非空测试存根),但连接是隐含的
文档、README、注释Level1仅需存在性检查
集成胶水代码(导入、路由、连接)Level3必须已连接,但数据流通过其连接的模块验证
现有代码的Bug修复Level2 + 测试已验证实质性,且测试必须覆盖修复内容

Error Handling

错误处理

Error: "Tests failed after changes"
  • DO NOT declare task complete
  • Show full test failure output
  • Analyze what went wrong
  • Fix issues and re-run full verification
Error: "Build failed"
  • Stop immediately
  • Show complete build error output
  • Fix build issues before proceeding
  • Re-run verification from Step 1
Error: "No tests exist for changed code"
  • Acknowledge lack of test coverage
  • Recommend writing tests (but don't require unless user requests)
  • Perform extra manual validation
  • Document that changes are untested
Error: "Cannot run tests (missing dependencies)"
  • Document what's missing
  • Attempt alternative verification (syntax checks, manual review)
  • Be explicit about verification limitations
Error: "Stub patterns detected in changed files"
  • Review each match individually -- some stubs are intentional (e.g.,
    return []
    when empty list is the correct result)
  • For confirmed stubs: flag as blocker, DO NOT declare task complete
  • For intentional patterns: document in verification report with rationale
  • If unsure: treat as stub (false positive is safer than false negative)
Error: "Artifact exists but is not wired (Level 3 failure)"
  • Identify what should import/reference the artifact
  • Check if the wiring was planned but not executed (common in multi-step tasks)
  • Flag as blocker with specific guidance: "File X exists but is not imported by Y"
Error: "Data flow gap detected (Level 4 failure)"
  • Trace the call chain to identify where real data stops flowing
  • Common cause: function called with hardcoded
    []
    or
    {}
    instead of computed values
  • Flag as blocker: "Function X is called but receives empty data at call site Y"
错误:“变更后测试失败
  • 绝不宣布任务完成
  • 展示完整的测试失败输出
  • 分析问题原因
  • 修复问题并重新运行完整验证
错误:“构建失败”
  • 立即停止
  • 展示完整的构建错误输出
  • 修复构建问题后再继续
  • 从步骤1重新运行验证
错误:“变更代码无对应测试”
  • 承认测试覆盖率不足
  • 建议编写测试(但除非用户要求,否则不强制)
  • 执行额外的手动验证
  • 记录变更未经过测试
错误:“无法运行测试(缺少依赖)”
  • 记录缺少的内容
  • 尝试替代验证方式(语法检查、手动审查)
  • 明确说明验证限制
错误:“变更文件中检测到存根模式”
  • 单独审查每个匹配项——有些存根是有意为之(例如
    return []
    在空列表是正确结果时)
  • 对于确认的存根:标记为阻塞项,绝不宣布任务完成
  • 对于有意的模式:在验证报告中记录并说明理由
  • 不确定时:视为存根(假阳性比假阴性更安全)
错误:“工件存在但未连接(Level3失败)”
  • 识别应导入/引用该工件的位置
  • 检查连接是否已规划但未执行(多步骤任务中常见)
  • 标记为阻塞项并提供具体指导:“文件X存在但未被Y导入”
错误:“检测到数据流缺口(Level4失败)”
  • 追踪调用链以识别真实数据停止流动的位置
  • 常见原因:函数被调用时传入硬编码的
    []
    {}
    而非计算值
  • 标记为阻塞项:“函数X被调用,但在调用点Y接收空数据”

Common Anti-Patterns

常见反模式

Anti-Pattern 1: Claiming Success Without Running Tests

反模式1:未运行测试就声称成功

Bad:
User: Did you fix the bug?
Assistant: Yes, I've fixed the authentication issue. The code looks good now.
No actual verification performed. Assumptions about correctness without evidence. The user will discover the issue wasn't actually fixed.
Good:
User: Did you fix the bug?
Assistant: I've made changes to address the authentication issue. Let me verify:

$ pytest tests/test_auth.py -v
============================= test session starts ==============================
collected 12 items

tests/test_auth.py::test_login_success PASSED                            [  8%]
tests/test_auth.py::test_login_failure PASSED                            [ 16%]
[... complete output ...]
============================= 12 passed in 2.43s ===============================

Verification Complete - all authentication tests pass.
Test if this addresses the issue.
错误示例:
用户:你修复Bug了吗?
助手:是的,我修复了认证问题。现在代码看起来没问题。
未执行实际验证。无证据假设正确性,用户会发现问题实际未修复。
正确示例:
用户:你修复Bug了吗?
助手:我已做出变更以解决认证问题。让我验证一下:

$ pytest tests/test_auth.py -v
============================= test session starts ==============================
collected 12 items

tests/test_auth.py::test_login_success PASSED                            [  8%]
tests/test_auth.py::test_login_failure PASSED                            [ 16%]
[... 完整输出 ...]
============================= 12 passed in 2.43s ===============================

验证完成 - 所有认证测试均通过。
请测试此变更是否解决了问题。

Anti-Pattern 2: Summarizing Test Results

反模式2:总结测试结果

Bad:
I ran the tests and they all passed! Everything is working now.
No evidence provided. Can't verify which tests actually ran. Hides important warnings or partial failures. The user cannot validate the verification.
Good: Show the complete, unabbreviated test output:
$ go test ./... -v
=== RUN   TestUserCreate
--- PASS: TestUserCreate (0.01s)
=== RUN   TestUserUpdate
--- PASS: TestUserUpdate (0.02s)
=== RUN   TestUserDelete
--- PASS: TestUserDelete (0.01s)
PASS
ok      github.com/example/users    0.123s
错误示例:
我运行了测试,全部通过!一切正常。
未提供证据。无法验证实际运行了哪些测试,隐藏了重要警告或部分失败。用户无法验证验证过程。
正确示例: 展示完整、未缩写的测试输出:
$ go test ./... -v
=== RUN   TestUserCreate
--- PASS: TestUserCreate (0.01s)
=== RUN   TestUserUpdate
--- PASS: TestUserUpdate (0.02s)
=== RUN   TestUserDelete
--- PASS: TestUserDelete (0.01s)
PASS
ok      github.com/example/users    0.123s

Anti-Pattern 3: Ignoring Failed Tests

反模式3:忽略测试失败

Bad:
The tests mostly passed, with just 2 failures in unrelated tests.
Those were probably already failing, so the changes are good to go.
Assumes failures are pre-existing without verification. Could be introducing new failures. Degrades test suite quality over time.
Good: Investigate ALL failures before claiming completion:
$ pytest -v
...
tests/test_api.py::test_create_user FAILED
tests/test_api.py::test_update_user FAILED
[10 passed, 2 failed]

Verification FAILED - 2 tests are failing.

Let me check if these are pre-existing:
$ git stash
$ pytest tests/test_api.py::test_create_user -v
PASSED

These failures were introduced by my changes. Fixing before completion.
错误示例:
测试大部分通过,仅2个无关测试失败。
这些可能之前就失败了,所以变更可以提交。
未验证就假设失败是预先存在的,可能引入新失败,长期会降低测试套件质量。
正确示例: 声称完成前调查所有失败:
$ pytest -v
...
tests/test_api.py::test_create_user FAILED
tests/test_api.py::test_update_user FAILED
[10 passed, 2 failed]

验证失败 - 2个测试失败。

让我检查这些是否是预先存在的:
$ git stash
$ pytest tests/test_api.py::test_create_user -v
PASSED

这些失败是我的变更引入的,修复后再完成任务。

Anti-Pattern 4: Not Reading Changed Files

反模式4:未读取变更文件

Bad:
I've updated the authentication module. Tests pass. Done!
Didn't verify what was actually written to the file. Could have syntax errors tests don't catch, debug statements, or incomplete changes.
Good: Always read modified files to confirm changes:
I've updated the authentication module. Let me verify what was changed:

$ Read /home/project/auth.py
[Shows file contents - verify changes are correct]

Changes confirmed:
- Added password hashing with bcrypt
- Implemented token expiration check

Now running tests:
$ pytest tests/test_auth.py -v
[... complete test output ...]

Verification Complete
- Changed files reviewed: auth.py (syntax valid, logic correct)
- Tests: 15 passed in 1.23s

Test if this addresses the issue.
错误示例:
我已更新认证模块。测试通过。完成!
未验证实际写入文件的内容,可能存在测试未捕获的语法错误、调试语句或不完整变更。
正确示例: 始终读取修改后的文件以确认变更:
我已更新认证模块。让我验证变更内容:

$ Read /home/project/auth.py
[展示文件内容 - 验证变更正确]

变更已确认:
- 使用bcrypt添加了密码哈希
- 实现了令牌过期检查

现在运行测试:
$ pytest tests/test_auth.py -v
[... 完整测试输出 ...]

验证完成
- 已审查变更文件:auth.py(语法有效,逻辑正确)
- 测试:15个测试通过,耗时1.23s

请测试此变更是否解决了问题。

Anti-Pattern 5: Trusting Executor Summary Claims

反模式5:信任执行器的总结声明

Bad:
Executor: "I implemented the scoring module, added tests, and wired it into the pipeline."
Verifier: Great, scoring module is done. Moving on.
Accepted the claim at face value. Never checked whether the file exists, contains real logic, is imported, or processes real data. Summary claims document what was SAID, not what IS.
Good: Independently verify every claim at the appropriate level:
Executor: "I implemented the scoring module, added tests, and wired it into the pipeline."
Verifier: Let me verify each claim independently.

L1 EXISTS: ls scoring/calculator.py -- file exists, confirmed.
L2 SUBSTANTIVE: Reading file... contains calculate_score() with real logic (weighted average across metrics). No stubs detected.
L3 WIRED: grep -r "from scoring" . -- imported in pipeline/pr_check.py. Checking usage... calculate_score() called on line 47. Confirmed.
L4 DATA FLOWS: Tracing call site... calculate_score(pr_data.files, pr_data.comments) -- receives real PR data, not empty values. Return value used in gate check on line 52. Confirmed.

All 4 levels verified for scoring module.
错误示例:
执行器:“我实现了评分模块,添加了测试,并将其连接到流水线中。”
验证者:好的,评分模块已完成,继续下一步。
表面接受声明,未检查文件是否存在、是否包含真实逻辑、是否被导入或是否处理真实数据。总结声明记录的是“所说内容”,而非“实际情况”。
正确示例: 在适当级别独立验证每个声明:
执行器:“我实现了评分模块,添加了测试,并将其连接到流水线中。”
验证者:让我独立验证每个声明。

L1 存在性:ls scoring/calculator.py -- 文件存在,已确认。
L2 实质性:读取文件... 包含带有真实逻辑的calculate_score()(指标加权平均)。未检测到存根。
L3 已连接:grep -r "from scoring" . -- 在pipeline/pr_check.py中已导入。检查使用情况... 第47行调用了calculate_score()。已确认。
L4 数据流:追踪调用点... calculate_score()接收真实PR数据而非空值。返回值用于第52行的gate检查。已确认。

评分模块已通过所有4级验证。

Anti-Pattern 6: Existence-Only Verification

反模式6:仅验证存在性

Bad:
Verifier: Let me check -- scoring.py exists, tests exist, pipeline file exists. All artifacts present. Verified!
Checked Level 1 (EXISTS) only. Files can exist but be stubs, orphaned, or processing no real data. Existence is necessary but nowhere near sufficient.
Good: Verify at the appropriate depth per artifact type:
Verifier: scoring.py exists (L1). Checking substance...
- Reading file: contains calculate_score() with weighted scoring logic (L2 pass)
- grep for imports: imported in pr_pipeline.py line 12 (L3 pass)
- Tracing data flow: called with pr.changed_files on line 45 (L4 pass)
错误示例:
验证者:让我检查一下 -- scoring.py存在,测试存在,流水线文件存在。所有工件都已存在。验证通过!
仅检查了Level1(存在性)。文件可能存在但为存根、孤立或不处理真实数据。存在性是必要条件但远远不够。
正确示例: 根据工件类型在适当深度验证:
验证者:scoring.py存在(L1)。检查实质性...
- 读取文件:包含带有加权评分逻辑的calculate_score()(L2通过)
- grep导入:在pr_pipeline.py第12行已导入(L3通过)
- 追踪数据流:第45行调用时传入pr.changed_files(L4通过)

References

参考资料

This skill uses these shared patterns:
  • Anti-Rationalization - Prevents shortcut rationalizations
  • Verification Checklist - Pre-completion checks
  • Adversarial Verification - 4-level artifact verification methodology (reusable)
本技能使用以下共享模式:
  • 反合理化 - 防止捷径式合理化
  • 验证检查清单 - 完成前检查
  • 对抗性验证 - 4级工件验证方法论(可复用)

Domain-Specific Anti-Rationalization

特定领域反合理化

RationalizationWhy It's WrongRequired Action
"Tests pass" (without showing output)Claim without evidence is unverifiableShow complete test output
"Simple change, no need to verify"Simple changes cause complex bugsRun full verification regardless
"Those failures were pre-existing"Assumption without verificationCheck with git stash to confirm
"Code looks correct"Looking correct ≠ being correctRun tests and read changed files
"I implemented X" (executor claim)Summary claims document what was SAID, not what ISVerify independently at L1-L4
"File exists, so it's done"Existence (L1) is necessary but not sufficientCheck substance (L2), wiring (L3), data flow (L4)
"It's imported, so it works"Import without invocation is dead codeVerify the symbol is called with real arguments
"Stubs are fine for now"Stubs in goal-critical artifacts mean the goal is not achievedFlag as blocker unless explicitly scoped out
合理化理由错误原因要求操作
“测试通过”(不展示输出)无证据的声明无法验证展示完整测试输出
“简单变更,无需验证”简单变更可能导致复杂Bug无论变更大小,都运行完整验证
“那些失败是预先存在的”未验证就假设使用git stash确认
“代码看起来正确”看起来正确≠实际正确运行测试并读取变更文件
“我实现了X”(执行器声明)总结声明记录的是“所说内容”,而非“实际情况”在L1-L4级别独立验证
“文件存在,所以已完成”存在性(L1)是必要但不充分条件检查实质性(L2)、连接(L3)、数据流(L4)
“已导入,所以可以运行”仅导入未调用是死代码验证符号是否被传入真实参数调用
“存根现在没问题”目标关键工件中的存根意味着目标未实现除非明确排除范围,否则标记为阻塞项
",