verify-this
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVerify This
验证此项
Verification is not a recap. It proves or disproves a specific claim with repeatable evidence.
验证不是简单复述,而是用可重复的证据证明或推翻特定声明。
When To Use
适用场景
- The user asks "verify this", "prove it works", "did this fix it", or "show me the evidence".
- A bug fix needs a before/after repro.
- A UI, CLI, API, performance, or memory claim needs measurement.
- A test passes but the user-visible behavior still needs confirmation.
Do not use this for vague claims like "the code is cleaner". Ask for a measurable claim first.
- 用户提出“验证这个”“证明它有效”“这个修复生效了吗”或“给我看证据”等需求时
- Bug修复需要对比修复前后的复现情况时
- UI、CLI、API、性能或内存相关声明需要量化验证时
- 测试通过但仍需确认用户可见行为是否符合预期时
请勿用于“代码更简洁”这类模糊声明,应先要求用户提供可量化的具体声明。
Workflow
工作流程
- Restate the claim in falsifiable form: condition, metric, and threshold.
- Pick the smallest local surface that can disprove it.
- Capture a baseline from the old state: merge base, parent commit, failing branch, or current broken repro.
- Capture treatment from the changed state with the same command, data, warmup, and environment.
- Compare raw artifacts: numbers, screenshots, terminal transcripts, HTTP responses, profiles, heap snapshots, or test output.
- Return exactly one verdict: ,
VERIFIED, orNOT VERIFIED.INCONCLUSIVE
- 以可证伪的形式重述声明:包含触发条件、衡量指标和阈值。
- 选择最小的本地验证层面来证伪声明。
- 从旧状态捕获基准数据:如合并基准分支、父提交、故障分支或当前的问题复现场景。
- 在变更后的状态下,使用相同的命令、数据、预热操作和环境捕获处理后数据。
- 对比原始产物:数值、截图、终端记录、HTTP响应、性能分析报告、堆快照或测试输出。
- 返回唯一判定结果:、
VERIFIED或NOT VERIFIED。INCONCLUSIVE
Local Surfaces
本地验证层面
- Code behavior: focused unit/integration tests or a minimal repro script.
- CLI/TUI behavior: , terminal transcript, or demo recording.
control-cli - UI behavior: , screenshots, accessibility snapshots, or browser traces.
control-ui - API behavior: local HTTP/RPC request and response diff.
- Performance: same-machine baseline/treatment timings or CPU profiles.
- Memory: heap snapshots before and after the suspected operation.
- 代码行为:聚焦的单元/集成测试或最小化复现脚本
- CLI/TUI行为:、终端记录或演示录像
control-cli - UI行为:、截图、无障碍快照或浏览器追踪数据
control-ui - API行为:本地HTTP/RPC请求与响应差异对比
- 性能:同一机器上的基准/处理后耗时或CPU性能分析
- 内存:可疑操作前后的堆快照
Artifact Layout
产物存储结构
When safe to write artifacts:
text
/tmp/verify-this/<claim-slug>/
├── claim.md
├── timeline.md
├── baseline/
├── treatment/
├── diff/
└── verdict.mdIf artifacts may contain sensitive code, prompts, screenshots, HTTP bodies, or heap data, keep only the minimal inline evidence unless the user agrees to disk storage.
当可以安全写入产物时:
text
/tmp/verify-this/<claim-slug>/
├── claim.md
├── timeline.md
├── baseline/
├── treatment/
├── diff/
└── verdict.md如果产物可能包含敏感代码、提示词、截图、HTTP请求体或堆数据,除非用户同意存储到磁盘,否则仅保留必要的内嵌证据。
Verdict Rules
判定规则
- : baseline and treatment differ in the predicted direction, by the claimed threshold, with no obvious confound.
VERIFIED - : the behavior is unchanged, moves the wrong way, or misses the threshold.
NOT VERIFIED - : no valid baseline, noisy signal, failed measurement, or an environment difference invalidates the comparison.
INCONCLUSIVE
- :基准状态与处理后状态的差异符合预期方向,达到声明的阈值,且无明显干扰因素
VERIFIED - :行为未发生变化、朝相反方向变化或未达到阈值
NOT VERIFIED - :无有效基准数据、信号噪声大、测量失败或环境差异导致对比无效
INCONCLUSIVE
Output
输出格式
Use this shape:
text
VERIFIED | NOT VERIFIED | INCONCLUSIVE
Claim: <falsifiable claim>
Evidence:
<metric/artifact>: baseline=<...>, treatment=<...>, delta=<...>, threshold=<...>
Reasoning:
<one tight paragraph naming the evidence and any confounds>Do not soften a negative result. A clear is useful.
NOT VERIFIED使用以下格式:
text
VERIFIED | NOT VERIFIED | INCONCLUSIVE
Claim: <可证伪的声明内容>
Evidence:
<指标/产物>: baseline=<...>, treatment=<...>, delta=<...>, threshold=<...>
Reasoning:
<一段简洁的说明,列出证据及可能存在的干扰因素>不要弱化负面结果,明确的同样具有价值。
NOT VERIFIED