verify-this

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Verify This

验证此项

Verification is not a recap. It proves or disproves a specific claim with repeatable evidence.
验证不是简单复述,而是用可重复的证据证明或推翻特定声明。

When To Use

适用场景

  • The user asks "verify this", "prove it works", "did this fix it", or "show me the evidence".
  • A bug fix needs a before/after repro.
  • A UI, CLI, API, performance, or memory claim needs measurement.
  • A test passes but the user-visible behavior still needs confirmation.
Do not use this for vague claims like "the code is cleaner". Ask for a measurable claim first.
  • 用户提出“验证这个”“证明它有效”“这个修复生效了吗”或“给我看证据”等需求时
  • Bug修复需要对比修复前后的复现情况时
  • UI、CLI、API、性能或内存相关声明需要量化验证时
  • 测试通过但仍需确认用户可见行为是否符合预期时
请勿用于“代码更简洁”这类模糊声明,应先要求用户提供可量化的具体声明。

Workflow

工作流程

  1. Restate the claim in falsifiable form: condition, metric, and threshold.
  2. Pick the smallest local surface that can disprove it.
  3. Capture a baseline from the old state: merge base, parent commit, failing branch, or current broken repro.
  4. Capture treatment from the changed state with the same command, data, warmup, and environment.
  5. Compare raw artifacts: numbers, screenshots, terminal transcripts, HTTP responses, profiles, heap snapshots, or test output.
  6. Return exactly one verdict:
    VERIFIED
    ,
    NOT VERIFIED
    , or
    INCONCLUSIVE
    .
  1. 以可证伪的形式重述声明:包含触发条件、衡量指标和阈值。
  2. 选择最小的本地验证层面来证伪声明。
  3. 从旧状态捕获基准数据:如合并基准分支、父提交、故障分支或当前的问题复现场景。
  4. 在变更后的状态下,使用相同的命令、数据、预热操作和环境捕获处理后数据。
  5. 对比原始产物:数值、截图、终端记录、HTTP响应、性能分析报告、堆快照或测试输出。
  6. 返回唯一判定结果:
    VERIFIED
    NOT VERIFIED
    INCONCLUSIVE

Local Surfaces

本地验证层面

  • Code behavior: focused unit/integration tests or a minimal repro script.
  • CLI/TUI behavior:
    control-cli
    , terminal transcript, or demo recording.
  • UI behavior:
    control-ui
    , screenshots, accessibility snapshots, or browser traces.
  • API behavior: local HTTP/RPC request and response diff.
  • Performance: same-machine baseline/treatment timings or CPU profiles.
  • Memory: heap snapshots before and after the suspected operation.
  • 代码行为:聚焦的单元/集成测试或最小化复现脚本
  • CLI/TUI行为:
    control-cli
    、终端记录或演示录像
  • UI行为:
    control-ui
    、截图、无障碍快照或浏览器追踪数据
  • API行为:本地HTTP/RPC请求与响应差异对比
  • 性能:同一机器上的基准/处理后耗时或CPU性能分析
  • 内存:可疑操作前后的堆快照

Artifact Layout

产物存储结构

When safe to write artifacts:
text
/tmp/verify-this/<claim-slug>/
├── claim.md
├── timeline.md
├── baseline/
├── treatment/
├── diff/
└── verdict.md
If artifacts may contain sensitive code, prompts, screenshots, HTTP bodies, or heap data, keep only the minimal inline evidence unless the user agrees to disk storage.
当可以安全写入产物时:
text
/tmp/verify-this/<claim-slug>/
├── claim.md
├── timeline.md
├── baseline/
├── treatment/
├── diff/
└── verdict.md
如果产物可能包含敏感代码、提示词、截图、HTTP请求体或堆数据,除非用户同意存储到磁盘,否则仅保留必要的内嵌证据。

Verdict Rules

判定规则

  • VERIFIED
    : baseline and treatment differ in the predicted direction, by the claimed threshold, with no obvious confound.
  • NOT VERIFIED
    : the behavior is unchanged, moves the wrong way, or misses the threshold.
  • INCONCLUSIVE
    : no valid baseline, noisy signal, failed measurement, or an environment difference invalidates the comparison.
  • VERIFIED
    :基准状态与处理后状态的差异符合预期方向,达到声明的阈值,且无明显干扰因素
  • NOT VERIFIED
    :行为未发生变化、朝相反方向变化或未达到阈值
  • INCONCLUSIVE
    :无有效基准数据、信号噪声大、测量失败或环境差异导致对比无效

Output

输出格式

Use this shape:
text
VERIFIED | NOT VERIFIED | INCONCLUSIVE
Claim: <falsifiable claim>

Evidence:
<metric/artifact>: baseline=<...>, treatment=<...>, delta=<...>, threshold=<...>

Reasoning:
<one tight paragraph naming the evidence and any confounds>
Do not soften a negative result. A clear
NOT VERIFIED
is useful.
使用以下格式:
text
VERIFIED | NOT VERIFIED | INCONCLUSIVE
Claim: <可证伪的声明内容>

Evidence:
<指标/产物>: baseline=<...>, treatment=<...>, delta=<...>, threshold=<...>

Reasoning:
<一段简洁的说明,列出证据及可能存在的干扰因素>
不要弱化负面结果,明确的
NOT VERIFIED
同样具有价值。