prove-it
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseProve It
Prove It
When to use
适用场景
- The user asserts certainty: “always”, “never”, “guaranteed”, “optimal”, “cannot fail”, “no downside”, “100%”.
- The user asks for a devil’s advocate or proof.
- The claim feels too clean for the domain.
- 用户做出绝对化断言:“总是”、“从不”、“保证”、“最优”、“不会失败”、“没有弊端”、“100%”。
- 用户要求提供证明或反方观点。
- 该断言在所属领域中显得过于绝对。
Round cadence (mandatory)
轮次节奏(强制要求)
- Definition: one "turn" means one assistant reply.
- Default: autoloop (no approvals). Run exactly one gauntlet round per assistant turn, publish results, then continue on the next turn until Oracle synthesis.
- In default mode, after each round, publish:
- Round Ledger
- Knowledge Delta
- If confidence remains low after Oracle synthesis, continue with additional rounds (11+) and publish an updated Oracle synthesis.
- Do not ask for permission to continue. In default mode, do not wait for "next" between rounds. Pause only when you must ask the user a question or the user says "stop".
- Step mode (explicit): if the user asks to "pause" / "step" / "one round at a time", run one round then wait for "next".
- Full auto mode (explicit): if the user asks for "full auto" / "fast mode", run rounds 1-10 + Oracle synthesis in one assistant turn while still reporting each round in order.
- 定义:一次“回合”指助手的一次回复。
- 默认模式:自动循环(Autoloop,无需批准)。每回合运行一轮验证流程,发布结果,然后进入下一回合,直至完成Oracle synthesis。
- 默认模式下,每轮结束后发布:
- Round Ledger
- Knowledge Delta
- 如果完成Oracle synthesis后置信度仍然较低,继续运行额外轮次(11+)并发布更新后的Oracle synthesis。
- 无需请求批准即可继续。默认模式下,轮次间无需等待用户指令“next”。仅当必须向用户提问或用户说“stop”时暂停。
- 分步模式(Step mode,需明确指定):如果用户要求“pause” / “step” / “一次一轮”,则运行一轮后等待用户指令“next”。
- 全自动模式(Full auto,需明确指定):如果用户要求“full auto” / “快速模式”,则在一个回合内完成1-10轮验证 + Oracle synthesis,同时仍按顺序报告每一轮结果。
Mode invocation
模式调用
| Mode | Default? | How to invoke | Cadence |
|---|---|---|---|
| Autoloop | yes | (no phrase) | 1 round/turn; auto-continue until Oracle |
| Step mode | no | "step mode" / "pause each round" / "pause" / "step" / "one round at a time" | 1 round/turn; wait for "next" |
| Full auto | no | "full auto" / "fast mode" | rounds 1-10 + Oracle in one turn; publish Round Ledger + Knowledge Delta after each round |
| 模式 | 是否默认 | 调用方式 | 节奏 |
|---|---|---|---|
| Autoloop | 是 | (无需特定短语) | 每回合1轮;自动持续直至Oracle synthesis |
| Step mode | 否 | "step mode" / "每轮暂停" / "暂停" / "分步" / "一次一轮" | 每回合1轮;等待用户指令“next” |
| Full auto | 否 | "full auto" / "快速模式" | 一轮内完成1-10轮验证 + Oracle synthesis;按顺序报告每一轮结果 |
Quick start
快速开始
- Restate the claim and its scope.
- Default to autoloop. If the user explicitly requests "step mode" or "full auto", use that instead.
- Run round 1 and publish the Round Ledger + Knowledge Delta.
- Continue automatically with one round per turn until round 10 (Oracle synthesis).
- If confidence remains low, run additional rounds (11+) and publish an updated Oracle synthesis.
- 重述断言及其适用范围。
- 默认使用自动循环模式。如果用户明确要求“分步模式”或“全自动模式”,则切换至对应模式。
- 运行第1轮验证,发布Round Ledger + Knowledge Delta。
- 自动以每回合1轮的节奏持续运行,直至第10轮(Oracle synthesis)。
- 如果置信度仍然较低,继续运行额外轮次(11+)并发布更新后的Oracle synthesis。
Ten-round gauntlet
十轮验证流程
- Counterexamples: smallest concrete break.
- Logic traps: missing quantifiers/premises.
- Boundary cases: zero/one/max/empty/extreme scale.
- Adversarial inputs: worst-case distributions/abuse.
- Alternative paradigms: different model flips the conclusion.
- Operational constraints: latency/cost/compliance/availability.
- Probabilistic uncertainty: variance, tail risk, sampling bias.
- Comparative baselines: “better than what?”, which metric?
- Meta-test: fastest disproof experiment.
- Oracle synthesis: tightest surviving claim with boundaries. If confidence is still low, repeat rounds 1-9 as needed, then re-run Oracle synthesis.
- 反例寻找:找出最小的具体反例。
- 逻辑陷阱排查:检查是否存在缺失的量词/前提。
- 边界案例测试:零/一/最大/空/极端规模场景。
- 对抗性输入测试:最坏情况分布/滥用场景。
- 替代范式验证:不同模型是否会得出相反结论。
- 运营约束检查:延迟/成本/合规性/可用性限制。
- 概率不确定性分析:方差、尾部风险、抽样偏差。
- 对比基准验证:“比什么更好?”,采用何种指标?
- 元测试:最快的反驳实验。
- Oracle synthesis:提炼出最严谨的可行断言及边界条件。如果置信度仍然较低,按需重复1-9轮,然后重新运行Oracle synthesis。
Round self-prompt bank (pick exactly 1)
轮次自我提示库(仅选1个)
Internal self-prompts for selecting round focus. Do not ask the user unless blocked.
- Counterexamples: What is the smallest input that breaks this?
- Logic traps: What unstated assumption must hold?
- Boundary cases: Which boundary is most likely in real use?
- Adversarial: What does worst-case input look like?
- Alternative paradigm: What objective makes the opposite true?
- Operational: Which dependency/policy is a hard stop?
- Uncertainty: What distribution shift flips the result?
- Baseline: Better than what, on which metric?
- Meta-test: What experiment would change your mind fastest?
- Oracle: What explicit boundaries keep this honest?
内部自我提示,用于选择轮次重点。除非遇到阻碍,否则无需询问用户。
- 反例:什么是能打破该断言的最小输入?
- 逻辑陷阱:存在哪些未阐明的假设?
- 边界案例:实际使用中最可能遇到哪种边界场景?
- 对抗性输入:最坏情况的输入是什么样的?
- 替代范式:何种目标会使结论反转?
- 运营约束:哪些依赖项/政策是硬性限制?
- 不确定性:何种分布偏移会导致结果反转?
- 基准:比什么更好,基于何种指标?
- 元测试:什么实验能最快改变你的结论?
- Oracle:哪些明确的边界条件能使断言更严谨?
Core artifacts
核心产出物
Argument map
Argument map
Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Disproof tests:
- T1:
Refined claim:Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Disproof tests:
- T1:
Refined claim:Round Ledger (update every round)
Round Ledger(每轮更新)
Round: <1-10 (or 11+)>
Focus:
Claim scope:
New evidence:
New counterexample:
Remaining gaps:
Next round:Round: <1-10 (or 11+)>
Focus:
Claim scope:
New evidence:
New counterexample:
Remaining gaps:
Next round:Knowledge Delta (publish every round)
Knowledge Delta(每轮发布)
- New:
- Updated:
- Invalidated:- New:
- Updated:
- Invalidated:Claim boundary table
断言边界表
| Boundary type | Valid when | Invalid when | Assumptions | Stressors |
|---------------|-----------|--------------|-------------|-----------|
| Scale | | | | |
| Data quality | | | | |
| Environment | | | | |
| Adversary | | | | || Boundary type | Valid when | Invalid when | Assumptions | Stressors |
|---------------|-----------|--------------|-------------|-----------|
| Scale | | | | |
| Data quality | | | | |
| Environment | | | | |
| Adversary | | | | |Next-tests plan
后续测试计划
| Test | Data needed | Success threshold | Stop condition |
|------|-------------|-------------------|----------------|| Test | Data needed | Success threshold | Stop condition |
|------|-------------|-------------------|----------------|Domain packs
领域适配包
Performance
性能领域
Use when the claim is about speed, latency, throughput, or resources.
- Clarify: median vs tail latency vs throughput.
- Identify workload shape (spiky vs steady) and bottleneck resource.
当断言涉及速度、延迟、吞吐量或资源时使用。
- 明确:中位数延迟 vs 尾部延迟 vs 吞吐量。
- 确定工作负载形态(突发型 vs 稳定型)及瓶颈资源。
Product
产品领域
Use when the claim is about user impact, adoption, or behavior.
- Clarify user segment and success metric.
- State the baseline/counterfactual.
- Name the likely unintended behavior/tradeoff.
当断言涉及用户影响、采用率或行为时使用。
- 明确用户群体及成功指标。
- 说明基准/反事实情况。
- 指出可能的非预期行为/权衡。
Oracle synthesis template (round 10 / as needed)
Oracle synthesis模板(第10轮/按需使用)
Original claim:
Refined claim:
Boundaries:
- Valid when:
- Invalid when:
Confidence trail:
- Evidence:
- Gaps:
Next tests:
- ...Original claim:
Refined claim:
Boundaries:
- Valid when:
- Invalid when:
Confidence trail:
- Evidence:
- Gaps:
Next tests:
- ...Deliverable format (per turn)
交付格式(每回合)
- Round number + focus.
- Round Ledger + Knowledge Delta.
- At most one question for the user (only when blocked).
- In default autoloop, run one round in that turn and continue to the next round in the next turn.
- In step mode, run one round and wait for "next".
- In full auto (or "fast mode"), run rounds 1-10 + Oracle synthesis in one turn (repeat the above per round).
- 轮次编号 + 重点。
- Round Ledger + Knowledge Delta。
- 最多向用户提出一个问题(仅当遇到阻碍时)。
- 默认自动循环模式下,本回合运行一轮,下一回合继续下一轮。
- 分步模式下,运行一轮后等待用户指令“next”。
- 全自动模式(或“快速模式”)下,在一个回合内完成1-10轮验证 + Oracle synthesis(每轮重复上述交付内容)。
Activation cues
触发提示词
- "always" / "never" / "guaranteed" / "optimal" / "cannot fail" / "no downside" / "100%"
- "prove it" / "devil's advocate" / "stress test" / "rigor"
- "always" / "never" / "guaranteed" / "optimal" / "cannot fail" / "no downside" / "100%"
- "prove it" / "devil's advocate" / "stress test" / "rigor"