prove-it

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Prove It

Prove It

When to use

适用场景

  • The user asserts certainty: “always”, “never”, “guaranteed”, “optimal”, “cannot fail”, “no downside”, “100%”.
  • The user asks for a devil’s advocate or proof.
  • The claim feels too clean for the domain.
  • 用户做出绝对化断言:“总是”、“从不”、“保证”、“最优”、“不会失败”、“没有弊端”、“100%”。
  • 用户要求提供证明或反方观点。
  • 该断言在所属领域中显得过于绝对。

Round cadence (mandatory)

轮次节奏(强制要求)

  • Definition: one "turn" means one assistant reply.
  • Default: autoloop (no approvals). Run exactly one gauntlet round per assistant turn, publish results, then continue on the next turn until Oracle synthesis.
  • In default mode, after each round, publish:
    • Round Ledger
    • Knowledge Delta
  • If confidence remains low after Oracle synthesis, continue with additional rounds (11+) and publish an updated Oracle synthesis.
  • Do not ask for permission to continue. In default mode, do not wait for "next" between rounds. Pause only when you must ask the user a question or the user says "stop".
  • Step mode (explicit): if the user asks to "pause" / "step" / "one round at a time", run one round then wait for "next".
  • Full auto mode (explicit): if the user asks for "full auto" / "fast mode", run rounds 1-10 + Oracle synthesis in one assistant turn while still reporting each round in order.
  • 定义:一次“回合”指助手的一次回复。
  • 默认模式:自动循环(Autoloop,无需批准)。每回合运行一轮验证流程,发布结果,然后进入下一回合,直至完成Oracle synthesis。
  • 默认模式下,每轮结束后发布:
    • Round Ledger
    • Knowledge Delta
  • 如果完成Oracle synthesis后置信度仍然较低,继续运行额外轮次(11+)并发布更新后的Oracle synthesis。
  • 无需请求批准即可继续。默认模式下,轮次间无需等待用户指令“next”。仅当必须向用户提问或用户说“stop”时暂停。
  • 分步模式(Step mode,需明确指定):如果用户要求“pause” / “step” / “一次一轮”,则运行一轮后等待用户指令“next”。
  • 全自动模式(Full auto,需明确指定):如果用户要求“full auto” / “快速模式”,则在一个回合内完成1-10轮验证 + Oracle synthesis,同时仍按顺序报告每一轮结果。

Mode invocation

模式调用

ModeDefault?How to invokeCadence
Autoloopyes(no phrase)1 round/turn; auto-continue until Oracle
Step modeno"step mode" / "pause each round" / "pause" / "step" / "one round at a time"1 round/turn; wait for "next"
Full autono"full auto" / "fast mode"rounds 1-10 + Oracle in one turn; publish Round Ledger + Knowledge Delta after each round
模式是否默认调用方式节奏
Autoloop(无需特定短语)每回合1轮;自动持续直至Oracle synthesis
Step mode"step mode" / "每轮暂停" / "暂停" / "分步" / "一次一轮"每回合1轮;等待用户指令“next”
Full auto"full auto" / "快速模式"一轮内完成1-10轮验证 + Oracle synthesis;按顺序报告每一轮结果

Quick start

快速开始

  1. Restate the claim and its scope.
  2. Default to autoloop. If the user explicitly requests "step mode" or "full auto", use that instead.
  3. Run round 1 and publish the Round Ledger + Knowledge Delta.
  4. Continue automatically with one round per turn until round 10 (Oracle synthesis).
  5. If confidence remains low, run additional rounds (11+) and publish an updated Oracle synthesis.
  1. 重述断言及其适用范围。
  2. 默认使用自动循环模式。如果用户明确要求“分步模式”或“全自动模式”,则切换至对应模式。
  3. 运行第1轮验证,发布Round Ledger + Knowledge Delta。
  4. 自动以每回合1轮的节奏持续运行,直至第10轮(Oracle synthesis)。
  5. 如果置信度仍然较低,继续运行额外轮次(11+)并发布更新后的Oracle synthesis。

Ten-round gauntlet

十轮验证流程

  1. Counterexamples: smallest concrete break.
  2. Logic traps: missing quantifiers/premises.
  3. Boundary cases: zero/one/max/empty/extreme scale.
  4. Adversarial inputs: worst-case distributions/abuse.
  5. Alternative paradigms: different model flips the conclusion.
  6. Operational constraints: latency/cost/compliance/availability.
  7. Probabilistic uncertainty: variance, tail risk, sampling bias.
  8. Comparative baselines: “better than what?”, which metric?
  9. Meta-test: fastest disproof experiment.
  10. Oracle synthesis: tightest surviving claim with boundaries. If confidence is still low, repeat rounds 1-9 as needed, then re-run Oracle synthesis.
  1. 反例寻找:找出最小的具体反例。
  2. 逻辑陷阱排查:检查是否存在缺失的量词/前提。
  3. 边界案例测试:零/一/最大/空/极端规模场景。
  4. 对抗性输入测试:最坏情况分布/滥用场景。
  5. 替代范式验证:不同模型是否会得出相反结论。
  6. 运营约束检查:延迟/成本/合规性/可用性限制。
  7. 概率不确定性分析:方差、尾部风险、抽样偏差。
  8. 对比基准验证:“比什么更好?”,采用何种指标?
  9. 元测试:最快的反驳实验。
  10. Oracle synthesis:提炼出最严谨的可行断言及边界条件。如果置信度仍然较低,按需重复1-9轮,然后重新运行Oracle synthesis。

Round self-prompt bank (pick exactly 1)

轮次自我提示库(仅选1个)

Internal self-prompts for selecting round focus. Do not ask the user unless blocked.
  • Counterexamples: What is the smallest input that breaks this?
  • Logic traps: What unstated assumption must hold?
  • Boundary cases: Which boundary is most likely in real use?
  • Adversarial: What does worst-case input look like?
  • Alternative paradigm: What objective makes the opposite true?
  • Operational: Which dependency/policy is a hard stop?
  • Uncertainty: What distribution shift flips the result?
  • Baseline: Better than what, on which metric?
  • Meta-test: What experiment would change your mind fastest?
  • Oracle: What explicit boundaries keep this honest?
内部自我提示,用于选择轮次重点。除非遇到阻碍,否则无需询问用户。
  • 反例:什么是能打破该断言的最小输入?
  • 逻辑陷阱:存在哪些未阐明的假设?
  • 边界案例:实际使用中最可能遇到哪种边界场景?
  • 对抗性输入:最坏情况的输入是什么样的?
  • 替代范式:何种目标会使结论反转?
  • 运营约束:哪些依赖项/政策是硬性限制?
  • 不确定性:何种分布偏移会导致结果反转?
  • 基准:比什么更好,基于何种指标?
  • 元测试:什么实验能最快改变你的结论?
  • Oracle:哪些明确的边界条件能使断言更严谨?

Core artifacts

核心产出物

Argument map

Argument map

Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Disproof tests:
- T1:
Refined claim:
Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Disproof tests:
- T1:
Refined claim:

Round Ledger (update every round)

Round Ledger(每轮更新)

Round: <1-10 (or 11+)>
Focus:
Claim scope:
New evidence:
New counterexample:
Remaining gaps:
Next round:
Round: <1-10 (or 11+)>
Focus:
Claim scope:
New evidence:
New counterexample:
Remaining gaps:
Next round:

Knowledge Delta (publish every round)

Knowledge Delta(每轮发布)

- New:
- Updated:
- Invalidated:
- New:
- Updated:
- Invalidated:

Claim boundary table

断言边界表

| Boundary type | Valid when | Invalid when | Assumptions | Stressors |
|---------------|-----------|--------------|-------------|-----------|
| Scale         |           |              |             |           |
| Data quality  |           |              |             |           |
| Environment   |           |              |             |           |
| Adversary     |           |              |             |           |
| Boundary type | Valid when | Invalid when | Assumptions | Stressors |
|---------------|-----------|--------------|-------------|-----------|
| Scale         |           |              |             |           |
| Data quality  |           |              |             |           |
| Environment   |           |              |             |           |
| Adversary     |           |              |             |           |

Next-tests plan

后续测试计划

| Test | Data needed | Success threshold | Stop condition |
|------|-------------|-------------------|----------------|
| Test | Data needed | Success threshold | Stop condition |
|------|-------------|-------------------|----------------|

Domain packs

领域适配包

Performance

性能领域

Use when the claim is about speed, latency, throughput, or resources.
  • Clarify: median vs tail latency vs throughput.
  • Identify workload shape (spiky vs steady) and bottleneck resource.
当断言涉及速度、延迟、吞吐量或资源时使用。
  • 明确:中位数延迟 vs 尾部延迟 vs 吞吐量。
  • 确定工作负载形态(突发型 vs 稳定型)及瓶颈资源。

Product

产品领域

Use when the claim is about user impact, adoption, or behavior.
  • Clarify user segment and success metric.
  • State the baseline/counterfactual.
  • Name the likely unintended behavior/tradeoff.
当断言涉及用户影响、采用率或行为时使用。
  • 明确用户群体及成功指标。
  • 说明基准/反事实情况。
  • 指出可能的非预期行为/权衡。

Oracle synthesis template (round 10 / as needed)

Oracle synthesis模板(第10轮/按需使用)

Original claim:
Refined claim:
Boundaries:
- Valid when:
- Invalid when:
Confidence trail:
- Evidence:
- Gaps:
Next tests:
- ...
Original claim:
Refined claim:
Boundaries:
- Valid when:
- Invalid when:
Confidence trail:
- Evidence:
- Gaps:
Next tests:
- ...

Deliverable format (per turn)

交付格式(每回合)

  • Round number + focus.
  • Round Ledger + Knowledge Delta.
  • At most one question for the user (only when blocked).
  • In default autoloop, run one round in that turn and continue to the next round in the next turn.
  • In step mode, run one round and wait for "next".
  • In full auto (or "fast mode"), run rounds 1-10 + Oracle synthesis in one turn (repeat the above per round).
  • 轮次编号 + 重点。
  • Round Ledger + Knowledge Delta。
  • 最多向用户提出一个问题(仅当遇到阻碍时)。
  • 默认自动循环模式下,本回合运行一轮,下一回合继续下一轮。
  • 分步模式下,运行一轮后等待用户指令“next”。
  • 全自动模式(或“快速模式”)下,在一个回合内完成1-10轮验证 + Oracle synthesis(每轮重复上述交付内容)。

Activation cues

触发提示词

  • "always" / "never" / "guaranteed" / "optimal" / "cannot fail" / "no downside" / "100%"
  • "prove it" / "devil's advocate" / "stress test" / "rigor"
  • "always" / "never" / "guaranteed" / "optimal" / "cannot fail" / "no downside" / "100%"
  • "prove it" / "devil's advocate" / "stress test" / "rigor"