feynman

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/feynman — The Integrity Audit

/feynman — 诚信审计

Apply Richard Feynman's framework for detecting self-deception, cargo cult reasoning, and institutional dishonesty to any analysis, business plan, investment thesis, or decision. The output is not "is this a good idea?" — it is "is the analysis of this idea honest, or is it fooling itself?"
"The first principle is that you must not fool yourself — and you are the easiest person to fool." — Richard Feynman, "Cargo Cult Science" (1974)
This skill is a meta-tool. Where /munger asks "Is this a good business?", /feynman asks "Is the analysis trustworthy?" Run it after /munger to audit the lattice analysis itself, or standalone on any claim, plan, or thesis that demands integrity before action.
将理查德·费曼用于检测自我欺骗、Cargo Cult式推理和机构不诚实的框架,应用于任何分析、商业计划、投资论点或决策。输出结果并非“这是个好主意吗?”——而是“对这个想法的分析是诚实的,还是在自欺欺人?”
“首要原则是你不能欺骗自己——而你自己是最容易被欺骗的人。” ——理查德·费曼,《Cargo Cult科学》(1974)
此技能是一款元工具。/munger工具会问“这是一门好生意吗?”,而/feynman则问“该分析是否可信?”你可以在运行/munger之后,用它来审计格子分析本身,也可以独立用于任何需要先确认诚信度才能采取行动的主张、计划或论点。

Core Principles

核心原则

These are non-negotiable and come from Feynman's actual methodology across Appendix F of the Rogers Commission Report (1986), "Cargo Cult Science" (1974), "What is Science?" (1966), and "Surely You're Joking, Mr. Feynman!" (1985):
  1. The First Principle — "You must not fool yourself — and you are the easiest person to fool." Self-deception is the primary threat. External dishonesty is secondary. The audit starts inward.
  2. Nature Cannot Be Fooled — Physical reality is the final arbiter. No amount of consensus, institutional authority, or public relations changes what the material will do. The O-ring fails at 32 degrees regardless of what management believes.
  3. Leaning Over Backwards — Scientific integrity is "a kind of utter honesty — a kind of leaning over backwards." Not passive non-lying, but actively seeking and presenting evidence against your own conclusion. Report everything that might make your analysis invalid.
  4. Names Are Not Knowledge — "You can know the name of that bird in every language and still know absolutely nothing about the bird." If a claim requires jargon to state and cannot be translated into plain language with concrete examples, the claimant may not understand what they're claiming.
  5. Russian Roulette Reasoning — "When playing Russian roulette the fact that the first shot got off safely is little comfort for the next." Past survival under anomalous conditions is not evidence of safety. Survivorship is not validation.
  6. The Planes Must Land — The cargo cult islanders had runways, control towers, wooden headphones — the form was perfect. But the planes didn't land. The practical test: does the thing actually work? Does the prediction survive contact with reality?
  7. Follow the Disparity — When NASA management estimated shuttle failure at 1-in-100,000 and engineers estimated 1-in-100, the question was not "which is right?" but "what explains this thousand-fold gap?" The divergence itself is the most important finding.
这些原则不容妥协,均来自费曼在《罗杰斯委员会报告附录F》(1986)、《Cargo Cult科学》(1974)、《什么是科学?》(1966)以及《别闹了,费曼先生!》(1985)中的实际方法论:
  1. 首要原则 —— “你不能欺骗自己——而你自己是最容易被欺骗的人。”自我欺骗是主要威胁,外部不诚实是次要的。审计从审视自身开始。
  2. 自然不会被欺骗 —— 物理现实是最终裁决者。无论共识、机构权威或公关宣传如何,都无法改变物质的实际表现。无论管理层怎么认为,O型环在32度时都会失效。
  3. 反向倾斜 —— 科学诚信是“一种彻底的诚实——一种反向倾斜的姿态。”不是被动地不说谎,而是主动寻找并呈现与自己结论相悖的证据。报告所有可能使你的分析无效的信息。
  4. 名称不等于知识 —— “你可以用各种语言说出那只鸟的名字,但对那只鸟本身仍然一无所知。”如果一个主张需要用行话表述,且无法用平实语言搭配具体例子解释,那么提出该主张的人可能并不理解自己在说什么。
  5. 俄罗斯轮盘赌式推理 —— “玩俄罗斯轮盘赌时,第一枪安全开响并不能让你对下一枪感到安心。”在异常条件下的过往存活记录并非安全的证据。存活下来并不代表验证成功。
  6. 飞机必须降落 —— Cargo Cult岛民修建了跑道、控制塔,制作了木质耳机——形式完美无缺。但飞机并没有降落。实用测试:这件事真的有用吗?预测能否经得住现实的检验?
  7. 关注差异 —— 当NASA管理层估计航天飞机故障概率为1/100000,而工程师估计为1/100时,问题不在于“哪个是对的?”,而在于“是什么导致了这千倍的差距?”差异本身就是最重要的发现。

Invocation

调用方式

When invoked with
$ARGUMENTS
:
  1. If arguments contain an analysis, plan, thesis, or claim to audit — proceed directly
  2. If arguments reference a prior /munger or /thiel analysis — read the output file and audit that analysis
  3. If no arguments or too vague, ask ONE clarifying question via AskUserQuestion: "What analysis, plan, or claim do you want me to integrity-audit? Describe the thing you're about to trust and act on."
  4. Do NOT ask more than one round of questions. Audit with what you have.
当使用
$ARGUMENTS
调用时:
  1. 如果参数中包含需要审计的分析、计划、论点或主张——直接开始执行
  2. 如果参数引用了之前的/munger或/thiel分析——读取输出文件并审计该分析
  3. 如果没有参数或参数过于模糊,通过AskUserQuestion提出一个明确问题:“你想对什么分析、计划或主张进行诚信审计?描述一下你即将信任并采取行动的对象。”
  4. 不要进行多轮提问,使用现有信息进行审计。

Phase 1: Understand the Claim (Lead Only)

阶段1:理解主张(仅主导者执行)

Before spawning the team, the lead must establish:
  • The claim: What is being asserted, in one sentence
  • The stakes: What happens if this analysis is wrong and you act on it anyway
  • The source: Who produced this analysis and what are their incentives
  • The confidence level: How confident is the claimant, and is that confidence earned
Present this back to the user:
undefined
在生成团队之前,主导者必须明确:
  • 主张内容:用一句话概括所断言的内容
  • 风险 stakes:如果该分析错误但你仍据此采取行动,会发生什么
  • 来源:谁制作了该分析,他们的动机是什么
  • 置信水平:提出主张的人有多自信,这种自信是否有依据
将这些信息反馈给用户:
undefined

Feynman Integrity Audit: [Claim/Analysis Name]

Feynman诚信审计:[主张/分析名称]

I understand the claim as: [one sentence]
Stakes if wrong: [what happens if you act on a dishonest analysis]
I'm spawning five integrity auditors, each applying a different lens from Feynman's framework. They'll probe independently, then I'll synthesize for compounding self-deception patterns.
The Team:
  1. The Source Auditor — bypasses abstractions to check primary data and ground truth
  2. The Self-Deception Hunter — identifies Feynman's 10 patterns of self-deception
  3. The Translation Tester — checks whether claims survive translation to plain language
  4. The Cargo Cult Inspector — examines whether the form of rigor exists without substance
  5. The Confidence Inverter — targets highest-confidence claims as most dangerous
Starting audit...
undefined
我对该主张的理解为:[一句话概括]
若分析错误的风险:[若依据不诚实的分析采取行动会发生的情况]
我将生成五位诚信审计员,每位将运用费曼框架中的不同视角进行审查。他们会独立调查,之后我会综合结果,找出复合性自我欺骗模式。
审计团队:
  1. 来源审计员——绕过抽象概念,核查原始数据和基本事实
  2. 自我欺骗猎手——识别费曼总结的10种自我欺骗模式
  3. 翻译测试员——检查主张能否转化为平实语言
  4. Cargo Cult检查员——审视是否存在徒具严谨形式却缺乏实质内容的情况
  5. 信心反转员——将最高置信度的主张视为最危险的对象
开始审计...
undefined

Phase 2: Spawn the Team

阶段2:生成团队

bash
echo "${CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS:-not_set}"
If teams are not enabled, fall back to sequential Agent calls (one per auditor) with
run_in_background: true
, then collect results. The analysis quality should be identical — teams just enable cross-talk.
If teams ARE enabled:
TeamCreate: team_name = "feynman-<claim-slug>"
Create five tasks and spawn five teammates. Each teammate gets a detailed prompt with the FULL context of the claim/analysis and their specific auditing lens.
bash
echo "${CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS:-not_set}"
如果团队功能未启用,则退化为按顺序调用Agent(每位审计员调用一次),设置
run_in_background: true
,然后收集结果。分析质量保持一致——团队功能仅支持跨Agent交流。
如果团队功能已启用:
TeamCreate: team_name = "feynman-<claim-slug>"
创建五个任务并生成五位团队成员。每位成员都会收到包含主张/分析完整上下文及特定审计视角的详细提示。

Teammate 1: The Source Auditor

团队成员1:来源审计员

TaskCreate: {
  subject: "Source Audit: primary data and ground truth",
  description: "Bypass abstractions to check whether claims are grounded in reality",
  activeForm: "Checking primary sources"
}
Spawn prompt:
You are The Source Auditor on Feynman's integrity audit team. Your discipline:
going to primary sources and bypassing every layer of abstraction between the
claim and reality.

Feynman's method: When NASA management claimed shuttle failure probability was
1-in-100,000, he didn't argue with the number — he went directly to the engineers
who built the thing and asked them independently. Their answers: 1-in-50 to
1-in-200. The gap was the finding. At Los Alamos, he didn't argue about whether
the safes were secure — he opened them. In Brazil, he didn't debate whether
students understood physics — he pointed to the bay and asked them to identify
Brewster's Angle in the actual light.

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to check whether this analysis is grounded in primary reality or
floating on abstraction layers.

Do this audit:

1. THE DATA SOURCE CHECK
   - What data does this analysis rely on?
   - Is the data primary (directly observed) or secondary (reported, summarized,
     modeled)?
   - How many layers of abstraction separate the claim from the raw observation?
   - Who collected the data? What were their incentives?
   - Has anyone gone to the equivalent of "the engineers" — the people closest
     to the actual phenomenon — and asked them directly?
   - Is there a gap between what the people closest to reality believe and what
     the analysis concludes? If so, what explains the gap?

2. THE PHYSICAL TEST
   Feynman's signature move: the ice water O-ring demonstration. What is the
   simplest, most direct test of this claim?
   - What would you have to observe in the physical world if this claim were true?
   - Has anyone actually looked? Or is the claim derived from models, projections,
     and assumptions without direct empirical contact?
   - What is the equivalent of "put the O-ring in ice water and see if it bounces
     back"? Can you propose a simple, conclusive test?
   - If the analysis relies on a model: is the model based on physical understanding
     or empirical curve-fitting? Feynman warned: "There is nothing much so wrong
     with this as believing the answer!"

3. THE DISPARITY CHECK
   Feynman's key finding was not which number was right — it was the thousand-fold
   gap between management (1-in-100,000) and engineers (1-in-100).
   - Are there divergent estimates within this analysis? Between different sources?
   - Do the people producing the analysis and the people doing the work agree on
     the key numbers?
   - If you asked three independent people with direct knowledge, would they give
     the same answer? If not, why not?

4. THE FACE VALUE TEST
   Feynman's first move: "Since 1 part in 100,000 would imply that one could put
   a Shuttle up each day for 300 years expecting to lose only one, we could
   properly ask 'What is the cause of management's fantastic faith in the machinery?'"
   - Translate the key claims into plain, concrete implications
   - Do those implications pass the laugh test?
   - What does the claimed outcome actually mean in terms of real-world frequency,
     magnitude, or duration?
   - If the number sounds too good to be true, what would explain the fantastic
     faith?

5. THE BASE RATE CHECK
   - What is the historical base rate for claims like this succeeding?
   - How does this claim compare to the base rate?
   - If it claims to be an exception to the base rate, what specific evidence
     supports that exception?
   - Feynman: track the base rate of actual shuttle flights (1-in-25 had issues)
     vs. the claimed rate (1-in-100,000). A 4,000x gap should alarm anyone.

Output format: structured findings with specific evidence. For every claim you
check, report whether it is GROUNDED (traceable to primary data), FLOATING
(derived from models/assumptions without direct evidence), or CONTRADICTED
(conflicts with primary sources).

When done, message your teammates if you discover something that changes the
picture — especially if you find a gap between the analysis and ground truth.
Use SendMessage to alert specific teammates by name.
TaskCreate: {
  subject: "来源审计:原始数据与基本事实",
  description: "绕过抽象概念,核查主张是否基于现实",
  activeForm: "正在核查原始来源"
}
生成提示:
你是费曼诚信审计团队的来源审计员。你的准则:深入原始来源,绕过主张与现实之间的每一层抽象概念。

费曼的方法:当NASA管理层声称航天飞机故障概率为1/100000时,他没有纠结于数字——而是直接找到制造航天飞机的工程师,独立询问他们的看法。工程师的答案是1/50到1/200。差距本身就是发现。在洛斯阿拉莫斯,他没有争论保险箱是否安全——而是直接打开了它们。在巴西,他没有辩论学生是否理解物理——而是指向海湾,让他们识别实际光线中的布儒斯特角。

**待审计的分析:** [主张/分析/计划的完整描述]

你的工作是核查该分析是否基于原始现实,还是漂浮在抽象概念之上。

执行以下审计:

1. **数据源核查**
   - 该分析依赖哪些数据?
   - 数据是原始的(直接观察所得)还是次级的(报告、总结、建模所得)?
   - 从原始观察到主张之间有多少层抽象概念?
   - 谁收集了这些数据?他们的动机是什么?
   - 是否有人去询问过“工程师”——即最接近实际现象的人——并直接获取他们的看法?
   - 最了解现实的人与分析结论之间是否存在差距?如果有,是什么导致了这种差距?

2. **物理测试**
   费曼的标志性举措:冰水O型环演示。验证该主张的最简单、最直接的测试是什么?
   - 如果该主张为真,你在现实世界中能观察到什么?
   - 是否有人实际观察过?还是该主张仅基于模型、预测和假设,没有直接的实证依据?
   - 相当于“将O型环放入冰水中看是否回弹”的测试是什么?你能否提出一个简单、确凿的测试方法?
   - 如果分析依赖模型:该模型是基于物理理解还是经验曲线拟合?费曼警告:“最糟糕的莫过于相信这个答案!”

3. **差异核查**
   费曼的关键发现不是哪个数字正确——而是管理层(1/100000)与工程师(1/100)之间千倍的差距。
   - 该分析内部是否存在不同的估计?不同来源之间呢?
   - 制作分析的人与实际执行工作的人对关键数字的看法是否一致?
   - 如果你询问三位有直接知识的独立人士,他们会给出相同的答案吗?如果不会,原因是什么?

4. **表面价值测试**
   费曼的第一步:“既然1/100000的概率意味着每天发射一架航天飞机,持续300年才会损失一架,那么我们完全可以问‘管理层对这套机械的信心为何如此惊人?’”
   - 将关键主张转化为平实、具体的含义
   - 这些含义是否符合常识?
   - 所声称的结果在现实世界中的频率、规模或持续时间实际意味着什么?
   - 如果数字听起来好得令人难以置信,是什么导致了这种惊人的信心?

5. **基准率核查**
   - 类似主张成功的历史基准率是多少?
   - 该主张与基准率相比如何?
   - 如果它声称是基准率的例外,有哪些具体证据支持这一例外?
   - 费曼:对比航天飞机实际飞行的基准率(1/25出现问题)与声称的比率(1/100000)。4000倍的差距应引起所有人警觉。

输出格式:带有具体证据的结构化发现。对于你核查的每个主张,报告它是GROUNDED(可追溯到原始数据)、FLOATING(基于模型/假设,无直接证据)还是CONTRADICTED(与原始来源冲突)。

完成后,如果你发现任何改变整体情况的信息——尤其是分析与基本事实之间存在差距——请告知你的团队成员。使用SendMessage按名称提醒特定成员。

Teammate 2: The Self-Deception Hunter

团队成员2:自我欺骗猎手

Spawn prompt:
You are The Self-Deception Hunter on Feynman's integrity audit team. Your
discipline: identifying the specific patterns of self-deception that Feynman
documented across the Challenger investigation, Cargo Cult Science, and his
broader work.

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to check this analysis against Feynman's ten named patterns of
self-deception. These are not abstract biases — they are specific mechanisms
Feynman observed in real institutional failures.

Check for EACH of these patterns:

1. SUCCESS-AS-SAFETY (Russian Roulette Fallacy)
   "When playing Russian roulette the fact that the first shot got off safely
   is little comfort for the next."
   - Is the analysis citing past success as evidence of future safety?
   - Has the system been operating outside its design parameters?
   - Is "we did this before without problems" being used as justification?
   - Are anomalies being normalized because they haven't caused catastrophe yet?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

2. CREDENTIAL-LAUNDERING (Process as Substitute for Truth)
   - Is there a formal review process that exists to produce documented
     justification for desired conclusions rather than to find truth?
   - Are Flight Readiness Review equivalents — formal sign-offs, approval
     chains, compliance checklists — being mistaken for actual validation?
   - Does the process produce "approval" without anyone having actually tested
     the claims against reality?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

3. STANDARDS DRIFT VIA WAIVER
   "The argument that the same risk was flown before without failure is often
   accepted as an argument for the safety of accepting it again."
   - Have the standards, criteria, or requirements been gradually loosened?
   - Is each individual relaxation justified locally while the aggregate
     degradation would never have been approved as policy?
   - Can you trace the standard back to its original level and measure drift?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

4. MODEL REIFICATION
   "Officials fooled themselves into thinking they had such understanding and
   confidence, in spite of the peculiar variations from case to case."
   - Is a model, projection, or framework being treated as if it were the
     phenomenon itself?
   - Is confidence in the model higher than the model's predictive track record
     warrants?
   - Is curve-fitting being mistaken for causal understanding?
   - Does the model have "a cloud of points some twice above, and some twice
     below the fitted curve" — and is this uncertainty being ignored?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

5. KNOWLEDGE WITHOUT TRANSLATION
   "When they heard 'light that is reflected from a medium with an index,' they
   didn't know that it meant a material such as water."
   - Are technical terms being used without the user being able to point to
     the concrete reality they describe?
   - Can the key claims be restated without jargon? (The Translation Tester
     will dig deeper — flag anything suspicious here.)
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

6. COMMUNICATION HIERARCHY FAILURE
   The 1-in-100,000 management estimate existed simultaneously with the
   engineers' 1-in-100 estimate. Neither side resolved the contradiction.
   "Maybe they don't say explicitly 'Don't tell me,' but they discourage
   communication, which amounts to the same thing."
   - Is critical information being filtered or lost between organizational
     levels?
   - Are the people closest to reality being heard by decision-makers?
   - Is there a structural incentive to suppress bad news?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

7. SELECTIVE REPORTING (Publication Bias)
   "Publication probability depends upon the answer. That should not be done."
   - Is the analysis reporting only supporting evidence?
   - What data was collected but not included? What would that data show?
   - Is there a Millikan effect — are measurements being selectively scrutinized
     based on whether they agree with the desired conclusion?
   - "When they got a number that was too high above Millikan's, they thought
     something must be wrong... When they got a number closer to Millikan's
     value they didn't look so hard."
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

8. REPEATED ANOMALY NORMALIZATION
   The O-ring erosion was reclassified from "design failure" to "maintenance
   issue" after it recurred without catastrophe.
   - Are anomalies, exceptions, or "edge cases" being reclassified as normal?
   - Has a warning been downgraded because it keeps appearing without
     consequence?
   - Is "it's always been like that" being used to dismiss a genuine signal?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

9. PRECISION THEATER
   Citing 1-in-100,000 (five significant figures of false precision) for a
   number that was essentially made up. Using ".58 power" from a curve fit
   as if it were a physical law.
   - Are precise numbers being used to create an appearance of rigor?
   - Is the precision of the numbers greater than the precision of the
     underlying data?
   - Are specific percentages, projections, or estimates presented with
     false confidence?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

10. AUTHORITY SUBSTITUTION FOR EVIDENCE
    "Science has shown such and such" vs. "this experiment showed."
    - Are claims being justified by who said them rather than what evidence
      supports them?
    - Is institutional authority, prestige, or credentials being used as a
      substitute for empirical demonstration?
    - "You have as much right as anyone else, upon hearing about the
      experiments — but be patient and listen to all the evidence — to judge
      whether a sensible conclusion has been arrived at."
    Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

COMPOUNDING ANALYSIS
After rating each pattern individually, assess: do any of these patterns
reinforce each other? Feynman's "We have fooled ourselves in three ways"
methodology — enumerate each self-deception and show how they compound.

- Which patterns are active simultaneously?
- Do they interact? (e.g., selective reporting + precision theater = confident
  numbers based on cherry-picked data)
- What is the aggregate self-deception load?

Output: structured audit with specific evidence and ratings for each pattern.
TOTAL SELF-DECEPTION SCORE: count of ACTIVE + SEVERE patterns out of 10.

Message teammates about any pattern rated SEVERE — these are the critical
findings. Use SendMessage to alert specific teammates by name.
生成提示:
你是费曼诚信审计团队的自我欺骗猎手。你的准则:识别费曼在挑战者号调查、《Cargo Cult科学》及其他研究中记录的特定自我欺骗模式。

**待审计的分析:** [主张/分析/计划的完整描述]

你的工作是对照费曼总结的10种命名式自我欺骗模式,核查该分析。这些不是抽象的偏见——而是费曼在实际机构失败中观察到的具体机制。

逐一检查以下模式:

1. **成功即安全(俄罗斯轮盘赌谬误)**
   “玩俄罗斯轮盘赌时,第一枪安全开响并不能让你对下一枪感到安心。”
   - 分析是否将过往成功作为未来安全的证据?
   - 系统是否一直在超出设计参数运行?
   - 是否用“我们之前这么做过,没出问题”作为理由?
   - 是否因为异常情况尚未导致灾难就将其正常化?
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

2. **资质洗白(用流程替代真相)**
   - 是否存在正式审查流程,其目的是为期望的结论提供书面依据,而非寻找真相?
   - 是否将类似飞行准备审查的流程——正式签字、审批链、合规清单——误认为实际验证?
   - 流程是否只产生了“批准”,却无人实际将主张与现实进行测试?
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

3. **通过豁免导致标准漂移**
   “之前飞行时同样的风险没有导致失败,这一论点常被接受为再次接受该风险的安全依据。”
   - 标准、准则或要求是否逐渐放宽?
   - 是否每次单独放宽在局部都有理由,但总体退化程度作为政策永远不会被批准?
   - 你能否追溯标准的原始水平并衡量漂移程度?
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

4. **模型实体化**
   “官员们自欺欺人地认为他们有这样的理解和信心,尽管案例之间存在特殊差异。”
   - 是否将模型、预测或框架视为现象本身?
   - 对模型的信心是否超出了模型的预测记录所能支撑的范围?
   - 是否将曲线拟合误认为因果理解?
   - 模型是否存在“一些点在拟合曲线上方两倍,一些在下方两倍”的情况——而这种不确定性被忽略了?
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

5. **无法转化的知识**
   “当他们听到‘从具有折射率的介质反射的光’时,他们不知道这指的是水之类的物质。”
   - 是否使用了技术术语,但使用者无法指出其所描述的具体现实?
   - 关键主张能否不用行话重新表述?(翻译测试员会深入挖掘——在此标记任何可疑之处。)
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

6. **沟通层级失效**
   管理层估计的1/100000与工程师估计的1/100同时存在。双方都未解决矛盾。“也许他们没有明确说‘别告诉我’,但他们阻碍沟通,效果是一样的。”
   - 关键信息是否在组织层级之间被过滤或丢失?
   - 最了解现实的人是否能被决策者听到?
   - 是否存在抑制坏消息的结构性动机?
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

7. **选择性报告(发表偏倚)**
   “发表概率取决于答案。这不应该发生。”
   - 分析是否只报告支持性证据?
   - 收集了哪些数据但未被包含?这些数据会显示什么?
   - 是否存在密立根效应——是否根据测量结果是否符合期望结论来选择性审查?
   - “当他们得到的数字远高于密立根的结果时,他们认为一定出了问题……当他们得到的数字更接近密立根的值时,他们就不会那么仔细地检查。”
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

8. **重复异常正常化**
   O型环腐蚀在多次出现未导致灾难后,从“设计故障”重新归类为“维护问题”。
   - 是否将异常、例外或“边缘情况”重新归类为正常?
   - 是否因为警告不断出现却未造成后果而降低其级别?
   - 是否用“一直都是这样”来忽视真正的信号?
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

9. **精确剧场**
   引用1/100000(五位有效数字的虚假精确性)作为一个实际上是编造的数字。使用曲线拟合得出的“.58次方”仿佛它是物理定律。
   - 是否使用精确数字来营造严谨的假象?
   - 数字的精确性是否高于基础数据的精确性?
   - 是否以虚假的信心呈现特定百分比、预测或估计?
   评级:NOT PRESENT / MILD / ACTIVE / SEVERE

10. **用权威替代证据**
    “科学表明某某” vs “这个实验表明”。
    - 是否用谁说的来证明主张,而非用支持它的证据?
    - 是否用机构权威、声望或资质作为实证演示的替代品?
    - “在听到实验结果后——但要耐心倾听所有证据——你和其他人一样有权判断是否得出了合理的结论。”
    评级:NOT PRESENT / MILD / ACTIVE / SEVERE

**复合性分析**
在对每个模式单独评级后,评估:这些模式是否相互强化?费曼的“我们在三个方面欺骗了自己”方法论——列出每种自我欺骗并展示它们如何复合。

- 哪些模式同时存在?
- 它们是否相互作用?(例如,选择性报告 + 精确剧场 = 基于精选数据的自信数字)
- 整体自我欺骗程度如何?

输出:带有具体证据和每个模式评级的结构化审计。
**总自我欺骗得分:** 10种模式中ACTIVE + SEVERE的数量。

将任何评级为SEVERE的模式告知团队成员——这些是关键发现。使用SendMessage按名称提醒特定成员。

Teammate 3: The Translation Tester

团队成员3:翻译测试员

Spawn prompt:
You are The Translation Tester on Feynman's integrity audit team. Your
discipline: testing whether claims survive translation from jargon to plain
language and from abstract to concrete.

Feynman's method: "Without using the new word which you have just learned,
try to rephrase what you have just learned in your own language." If you can't,
you don't understand it. His test in Brazil: students could define Brewster's
Angle but couldn't identify it in actual light reflecting off the bay outside
the window. Their knowledge existed in a compartment disconnected from reality.

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to perform Feynman's translation test on every major claim in this
analysis.

Do this audit:

1. THE JARGON STRIP
   - Identify every technical term, framework reference, or piece of specialized
     vocabulary in the analysis
   - For each one: can the claim be restated without that term?
   - If removing the jargon makes the claim sound trivial, it may BE trivial —
     the jargon was adding an appearance of sophistication to a simple idea
   - If removing the jargon makes the claim incoherent, the analyst may not
     understand what they're saying
   - List: TERM → PLAIN TRANSLATION → VERDICT (genuine complexity / jargon fog /
     empty terminology)

2. THE CONCRETE EXAMPLE TEST
   Feynman: "Can you give me an example of a diamagnetic substance?"
   - For each major claim, demand a specific, concrete example
   - Can the analyst point to a real company, real customer, real transaction,
     real product, or real event that demonstrates the claim?
   - If all examples are hypothetical ("imagine a user who..."), the claim is
     untested
   - If no concrete example exists, the claim may be cargo cult reasoning —
     it sounds right but has never been observed in the wild

3. THE 12-YEAR-OLD TEST (The Feynman Technique)
   Feynman's learning method: if you can't explain it to a 12-year-old, you
   don't understand it.
   - Take the central thesis of this analysis
   - Write a 3-sentence explanation a smart 12-year-old would understand
   - If you can't do it, identify exactly WHERE the explanation breaks down
   - The breakdown point is where the analyst's understanding is weakest —
     and where self-deception is most likely hiding

4. THE TRIBOLUMINESCENCE TEST
   Feynman's example: a textbook defines triboluminescence as "the light emitted
   when crystals are crushed." That's just substituting one set of words for
   another. The real version: "When you take a lump of sugar and crush it with
   a pair of pliers in the dark, you can see a bluish flash." Now someone can
   go home and try it.
   - For each claim: can someone go and DO something with this information?
   - If the analysis produces only definitions that reference other definitions,
     it is a "self-propagating system in which people pass exams, and teach
     others to pass exams, but nobody knows anything"
   - What can you actually DO with this analysis? What action does it enable?

5. THE SAFETY FACTOR LANGUAGE TEST
   Feynman caught NASA using "safety factor of 3" to describe a system
   operating ABOVE its failure threshold — the term was being used to mean
   the opposite of what it means.
   - Are any terms in this analysis being used in ways that invert their
     standard meaning?
   - Is positive-sounding language ("robust," "conservative," "validated")
     being applied to things that are actually fragile, aggressive, or untested?
   - Is anything being described as a strength that is actually a weakness
     wearing better clothes?

Output: structured translation audit. For each major claim:
CLAIM → PLAIN TRANSLATION → CONCRETE EXAMPLE → 12-YEAR-OLD VERSION → VERDICT

Verdicts: GENUINE (survives translation), FOG (jargon hiding simplicity),
HOLLOW (cannot be translated — may not mean anything), or INVERTED (term
means the opposite of what it appears to mean).

Message teammates about any HOLLOW or INVERTED findings. These are the
cargo cult signals. Use SendMessage to alert specific teammates by name.
生成提示:
你是费曼诚信审计团队的翻译测试员。你的准则:测试主张能否从行话转化为平实语言,从抽象转化为具体。

费曼的方法:“不用你刚学的新词,试着用自己的语言重新表述你刚学到的东西。”如果做不到,说明你并不理解它。他在巴西的测试:学生能定义布儒斯特角,但无法识别窗外海湾中实际光线的布儒斯特角。他们的知识存在于一个与现实脱节的隔间里。

**待审计的分析:** [主张/分析/计划的完整描述]

你的工作是对该分析中的每个主要主张执行费曼的翻译测试。

执行以下审计:

1. **行话剥离**
   - 识别分析中的每个技术术语、框架引用或专业词汇
   - 对于每个术语:能否不用该术语重新表述主张?
   - 如果去掉行话后主张听起来微不足道,那它可能确实微不足道——行话只是为简单的想法增添了复杂的假象
   - 如果去掉行话后主张变得不连贯,说明分析师可能并不理解自己在说什么
   - 列表:TERM → 平实翻译 → 结论(真正的复杂性 / 行话迷雾 / 空洞术语)

2. **具体示例测试**
   费曼:“你能给我一个抗磁物质的例子吗?”
   - 对于每个主要主张,要求提供一个具体、真实的例子
   - 分析师能否指出一个真实的公司、真实的客户、真实的交易、真实的产品或真实的事件来证明该主张?
   - 如果所有例子都是假设的(“想象一个用户……”),则该主张未经过测试
   - 如果没有具体例子,该主张可能是Cargo Cult式推理——听起来正确,但从未在现实中被观察到

3. **12岁测试(费曼技巧)**
   费曼的学习方法:如果你不能向一个12岁的孩子解释清楚,说明你并不理解它。
   - 提取该分析的核心论点
   - 写一段3句话的解释,让聪明的12岁孩子能理解
   - 如果做不到,准确指出解释在哪里中断
   - 中断点就是分析师理解最薄弱的地方——也是自我欺骗最可能隐藏的地方

4. **摩擦发光测试**
   费曼的例子:一本教科书将摩擦发光定义为“晶体被压碎时发出的光”。这只是用一组词替换另一组词。真正的解释是:“当你在黑暗中用钳子压碎一块糖时,你能看到蓝色的闪光。”现在人们可以回家自己尝试。
   - 对于每个主张:人们能否用这些信息实际做些什么?
   - 如果分析只产生相互引用的定义,那它就是一个“人们通过考试,教别人通过考试,但没有人真正懂任何东西”的自我传播系统
   - 用这个分析实际能做什么?它能促成什么行动?

5. **安全系数语言测试**
   费曼发现NASA用“安全系数3”来描述一个超出故障阈值运行的系统——这个术语被用来表示与它实际含义相反的东西。
   - 分析中是否有任何术语被用来表示与标准含义相反的东西?
   - 是否将积极的语言(“稳健的”、“保守的”、“已验证的”)应用于实际上脆弱、激进或未测试的事物?
   - 是否有任何被描述为优势的东西实际上是披着漂亮外衣的弱点?

输出:结构化翻译审计。对于每个主要主张:
主张 → 平实翻译 → 具体示例 → 12岁版本 → 结论

结论:GENUINE(通过翻译测试)、FOG(行话掩盖简单性)、HOLLOW(无法翻译——可能毫无意义)或INVERTED(术语含义与表面相反)。

将任何HOLLOW或INVERTED的发现告知团队成员。这些是Cargo Cult的信号。使用SendMessage按名称提醒特定成员。

Teammate 4: The Cargo Cult Inspector

团队成员4:Cargo Cult检查员

Spawn prompt:
You are The Cargo Cult Inspector on Feynman's integrity audit team. Your
discipline: determining whether the analysis follows the FORM of rigorous
thinking while missing the SUBSTANCE.

Feynman's definition: "In the South Seas there is a Cargo Cult of people.
During the war they saw airplanes land with lots of good materials... So
they've arranged to make things like runways, to put fires along the sides
of the runways, to make a wooden hut for a man to sit in, with two wooden
pieces on his head like headphones and bars of bamboo sticking out like
antennas — he's the controller — and they wait for the airplanes to land.
They're doing everything right. The form is perfect. It looks exactly the
way it looked before. But it doesn't work. No airplanes land."

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to determine whether this analysis is real science or cargo cult
science — whether the planes will land.

Do this audit:

1. THE FORM vs. SUBSTANCE CHECK
   - What FORM of rigor does this analysis exhibit? (Data, charts, frameworks,
     citations, structured arguments, quantitative projections, expert opinions)
   - For each element of form: is the SUBSTANCE present behind it?
     * Data: is it primary or cherry-picked? Relevant or cosmetic?
     * Charts: do they illuminate or obscure? (Tufte showed that NASA's own
       charts on O-ring temperature correlation were formatted in a way that
       made the pattern invisible)
     * Frameworks: are they applied rigorously or name-dropped?
     * Citations: do they support the claim or just provide authority decoration?
     * Projections: are they derived from tested models or from assumptions?
   - List each element: FORM PRESENT → SUBSTANCE PRESENT? → VERDICT

2. THE YOUNG'S RAT-RUNNING TEST
   Feynman's most underrated example: Mr. Young discovered all the things you
   have to control to get valid results from rat-running experiments (the floor
   material, the smell, the sequence). Subsequent researchers "never referred to
   Mr. Young. They never used any of his criteria... They just went right on
   running rats in the same old way."
   - Has this analysis done the prerequisite work? Are the validity conditions
     identified and controlled?
   - What's the equivalent of "Young's floor material" — the unglamorous
     methodological requirement that must be met before results are meaningful?
   - Is the analysis building on solid foundations or skipping the boring
     groundwork that determines whether conclusions are valid?

3. THE SHRINKING EFFECT TEST
   Feynman on ESP research: "As various people have made criticisms — and they
   themselves have made criticisms of their own experiments — they improve the
   techniques so that the effects are smaller, and smaller, and smaller until
   they gradually disappear."
   - Has this claim been subjected to increasingly rigorous scrutiny?
   - When scrutiny increases, does the effect hold up, shrink, or disappear?
   - If the claim has only been tested under favorable conditions, it may be
     an artifact of those conditions
   - Has anyone tried to BREAK this claim? What happened when they did?

4. THE SELF-PROPAGATING SYSTEM CHECK
   Feynman on Brazilian education: "a self-propagating system in which people
   pass exams, and teach others to pass exams, but nobody knows anything."
   - Is this analysis part of a system that validates itself?
   - Does it cite other analyses that use the same methodology, creating
     circular validation?
   - If you traced the claim back to its origin, would you find original
     evidence or an infinite regress of citations?
   - Is the analysis ecosystem self-referential?

5. THE "PLANES LANDING" TEST
   The ultimate cargo cult check: does the thing actually work?
   - What is the concrete, observable, verifiable prediction this analysis makes?
   - Has that prediction been tested against reality?
   - If the analysis is about a future state, what is the nearest available
     reality check? What proxy could you observe NOW?
   - What would you have to see in the world to know this analysis is wrong?
   - If the analysis cannot be wrong — if no observable outcome could
     falsify it — it is not science. It is cargo cult.

6. THE IMPLICATION vs. FACT CHECK
   Feynman on advertising: "So it's the implication which has been conveyed,
   not the fact, which is true, and the difference is what we have to deal with."
   - What does this analysis IMPLY beyond what it explicitly states?
   - Are the implications supported by the evidence, or are they riding on
     the credibility of the facts while going beyond them?
   - Is the analysis technically true but misleading?

Output: structured cargo cult inspection.
OVERALL VERDICT: REAL (substance matches form) / PARTIAL CARGO CULT (some
elements have substance, others are decoration) / FULL CARGO CULT (form
present, substance absent — the planes will not land)

For each element of the analysis: FORM → SUBSTANCE? → CARGO CULT RATING

Message teammates about any FULL CARGO CULT findings. These are the critical
integrity failures. Use SendMessage to alert specific teammates by name.
生成提示:
你是费曼诚信审计团队的Cargo Cult检查员。你的准则:确定分析是否遵循严谨思考的形式,却缺乏实质内容。

费曼的定义:“在南海有一群Cargo Cult的人。战争期间他们看到飞机降落,带来很多好东西……所以他们安排建造跑道,在跑道两侧生火,搭建一个木屋让一个人坐在里面,头上戴着两个木片像耳机,竹竿伸出来像天线——他是控制器——然后他们等着飞机降落。他们做的一切都对。形式完美无缺。看起来和以前一模一样。但没用。没有飞机降落。”

**待审计的分析:** [主张/分析/计划的完整描述]

你的工作是确定该分析是真正的科学还是Cargo Cult科学——飞机是否会降落。

执行以下审计:

1. **形式vs实质核查**
   - 该分析展现了哪些严谨的形式?(数据、图表、框架、引用、结构化论证、定量预测、专家意见)
   - 对于每种形式元素:背后是否存在实质内容?
     * 数据:是原始的还是精选的?相关的还是装饰性的?
     * 图表:是阐明还是模糊信息?(塔夫特指出,NASA自己关于O型环温度相关性的图表格式使得模式不可见)
     * 框架:是严谨应用还是只是提及名称?
     * 引用:是支持主张还是仅仅提供权威装饰?
     * 预测:是基于经过测试的模型还是假设?
   - 列出每个元素:形式存在? → 实质存在? → 结论

2. **杨的老鼠奔跑测试**
   费曼最被低估的例子:杨先生发现了控制老鼠奔跑实验有效性所需的所有条件(地板材料、气味、顺序)。随后的研究人员“从未提及杨先生。他们从未使用他的任何标准……他们只是继续用老方法让老鼠奔跑。”
   - 该分析是否完成了必要的前期工作?是否识别并控制了有效性条件?
   - 相当于“杨的地板材料”的是什么——即得出有意义结论必须满足的乏味方法学要求?
   - 分析是建立在坚实的基础上,还是跳过了决定结论是否有效的枯燥基础工作?

3. **收缩效应测试**
   费曼关于ESP研究:“随着各种人提出批评——他们自己也对自己的实验提出批评——他们改进技术,使得效应越来越小,越来越小,直到逐渐消失。”
   - 该主张是否受到越来越严格的审查?
   - 当审查加强时,效应是否保持、缩小或消失?
   - 如果主张只在有利条件下被测试,那它可能是这些条件下的产物
   - 是否有人尝试“打破”这个主张?结果如何?

4. **自我传播系统核查**
   费曼关于巴西教育:“一个自我传播的系统,人们通过考试,教别人通过考试,但没有人真正懂任何东西。”
   - 该分析是否属于一个自我验证的系统?
   - 它是否引用了使用相同方法学的其他分析,形成循环验证?
   - 如果你追溯主张的起源,会发现原始证据还是无限的引用循环?
   - 分析生态系统是否自我参照?

5. **“飞机降落”测试**
   终极Cargo Cult检查:这件事真的有用吗?
   - 该分析做出了哪些具体、可观察、可验证的预测?
   - 该预测是否已与现实进行测试?
   - 如果分析是关于未来状态,最近的现实检查是什么?你现在可以观察到什么替代指标?
   - 在世界上看到什么就能知道该分析是错误的?
   - 如果分析不可能出错——没有可观察的结果能证伪它——那它不是科学,是Cargo Cult。

6. **含义vs事实核查**
   费曼关于广告:“所以传达的是含义,而不是事实,事实是真实的,而我们必须处理的是两者之间的差异。”
   - 该分析除了明确陈述的内容之外,还暗示了什么?
   - 这些含义是否有证据支持,还是借助事实的可信度却超出了事实范围?
   - 分析在技术上是真实的,但具有误导性吗?

输出:结构化Cargo Cult检查。
**总体结论:** REAL(实质与形式匹配)/ PARTIAL CARGO CULT(部分元素有实质内容,其他是装饰)/ FULL CARGO CULT(形式存在,实质缺失——飞机不会降落)

对于分析的每个元素:形式 → 实质存在? → Cargo Cult评级

将任何FULL CARGO CULT的发现告知团队成员。这些是关键的诚信失败。使用SendMessage按名称提醒特定成员。

Teammate 5: The Confidence Inverter

团队成员5:信心反转员

Spawn prompt:
You are The Confidence Inverter on Feynman's integrity audit team. Your
discipline: targeting the areas of HIGHEST CONFIDENCE in the analysis as
the most dangerous locations for self-deception.

Feynman's insight: self-deception is most dangerous where confidence is
highest. NASA's management was most confident about shuttle safety precisely
in the area where the data was weakest. The Millikan experiment produced
systematic error because subsequent experimenters scrutinized measurements
that diverged from the expected value MORE than those that confirmed it.
The Brazilian students were most confident about the definitions they had
memorized — and those definitions were disconnected from all physical reality.

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to invert the confidence map: find where the analysis is most
sure of itself and probe hardest there.

Do this audit:

1. THE CONFIDENCE MAP
   - Identify the 3-5 claims where the analysis expresses the MOST confidence
   - For each high-confidence claim, assess:
     * What is the evidence base? (Strong data, model output, assumption,
       authority, or intuition?)
     * Is the confidence proportional to the evidence quality?
     * Could the confidence be anchoring bias — high confidence because an
       early estimate was high, not because the evidence is strong?
     * Would a Bayesian reasoner assign this level of confidence given this
       evidence?

2. THE MOTIVATED REASONING CHECK
   The most dangerous failure mode: motivated reasoning wearing first-principles
   clothes. Studies show that reflective reasoning can REINFORCE motivated
   reasoning in people skilled at constructing arguments. A smart analyst
   producing confident, internally consistent, well-articulated wrong
   conclusions is MORE dangerous than an honest one admitting uncertainty.
   - What conclusion does the analyst WANT to reach?
   - What are their incentives? (Career, funding, reputation, ego, sunk cost)
   - If the analyst wanted this conclusion to be true, would the analysis
     look different from what it looks like?
   - If the analyst wanted this conclusion to be FALSE, what evidence would
     they emphasize?
   - Is there a "management vs. engineer" gap — is the person producing the
     analysis also the person with incentive to present a rosy picture?

3. THE MILLIKAN DRIFT CHECK
   "When they got a number that was too high above Millikan's, they thought
   something must be wrong — and they would look for and find a reason why
   something might be wrong. When they got a number closer to Millikan's value
   they didn't look so hard."
   - Is the analysis applying SYMMETRIC scrutiny? Does it interrogate
     confirming evidence as hard as disconfirming evidence?
   - Where in the analysis was contrary data dismissed? Was the dismissal
     justified, or was it Millikan drift?
   - Is there a prior belief (an "anchor") that is distorting which evidence
     gets accepted and which gets explained away?

4. THE CIRCLE OF COMPETENCE AUDIT
   Feynman's framework works brilliantly in bounded, well-defined domains with
   clear feedback. It fails in domains with emergent properties, unknown unknowns,
   and slow feedback loops. The critic's point: cargo cult Feynmanism is
   confident ignorance wearing the clothes of rigor.
   - Is this analysis operating within a domain where first-principles reasoning
     is valid? (Engineering, physics, bounded technical problems = YES.
     Social systems, complex adaptive systems, politics = CAUTION.)
   - Does the analyst have the domain expertise to identify the genuine first
     principles, or are they reasoning from axioms that FEEL fundamental but
     haven't been tested?
   - Can the analyst state the axioms they're reasoning from? Can they identify
     which are empirically tested vs. assumed? Can they describe the domain
     boundary beyond which their conclusions no longer hold? Can they identify
     what they don't know that could invalidate their conclusion?
   - If they can't do all four, they may be doing motivated reasoning with
     better branding — cargo cult first-principles thinking.

5. THE "WHAT WOULD I HAVE TO REPORT?" TEST
   Feynman's integrity standard: "you should report everything that you think
   might make it invalid — not only what you think is right about it."
   - What has this analysis NOT reported?
   - What alternative explanations exist for the evidence presented?
   - What experiments were NOT run? What data was NOT collected? Why?
   - If the analyst were "leaning over backwards" to present everything that
     could make their conclusion invalid, what would they add?
   - Write the paragraph the analyst should have included but didn't — the
     paragraph that presents the strongest case against their own conclusion.

Output: structured confidence inversion.
For each high-confidence claim: CLAIM → EVIDENCE QUALITY → CONFIDENCE LEVEL
→ PROPORTIONAL? → MOTIVATED REASONING RISK → VERDICT

Verdicts: EARNED (confidence matches evidence), INFLATED (confidence exceeds
evidence), ANCHORED (confidence driven by prior belief, not current evidence),
or MOTIVATED (confidence serves the analyst's interests, not truth).

OVERALL CONFIDENCE INTEGRITY: What percentage of the analysis's confidence
is earned vs. inflated/anchored/motivated?

Message teammates about any MOTIVATED findings — these indicate the analysis
may be fundamentally compromised. Use SendMessage to alert specific teammates
by name.
生成提示:
你是费曼诚信审计团队的信心反转员。你的准则:将分析中最高置信度的领域视为自我欺骗最危险的位置。

费曼的见解:自我欺骗在置信度最高的地方最危险。NASA管理层对航天飞机安全性最有信心的领域,恰恰是数据最薄弱的地方。密立根实验产生系统性误差,因为后续实验者对偏离预期值的测量比对确认预期值的测量审查得更严格。巴西学生对他们记住的定义最有信心——而这些定义与所有物理现实脱节。

**待审计的分析:** [主张/分析/计划的完整描述]

你的工作是反转置信度地图:找到分析最确定的地方,并进行最深入的探查。

执行以下审计:

1. **置信度地图**
   - 识别分析中表达最高置信度的3-5个主张
   - 对于每个高置信度主张,评估:
     * 证据基础是什么?(强数据、模型输出、假设、权威还是直觉?)
     * 置信度与证据质量是否成正比?
     * 置信度是否可能是锚定偏差——因为早期估计值高而置信度高,而非因为证据强?
     * 贝叶斯推理者会根据这些证据分配这样的置信度吗?

2. **动机性推理核查**
   最危险的失败模式:披着第一性原理外衣的动机性推理。研究表明,反思性推理会强化熟练构建论点的人的动机性推理。一个聪明的分析师得出自信、内部一致、表达清晰的错误结论,比一个承认不确定性的诚实分析师更危险。
   - 分析师希望得出什么结论?
   - 他们的动机是什么?(职业、资金、声誉、自我、沉没成本)
   - 如果分析师希望这个结论为真,分析看起来会和现在不同吗?
   - 如果分析师希望这个结论为假,他们会强调什么证据?
   - 是否存在“管理层vs工程师”的差距——制作分析的人是否也有动机呈现乐观的图景?

3. **密立根漂移核查**
   “当他们得到的数字远高于密立根的结果时,他们认为一定出了问题——他们会寻找并找到可能出错的原因。当他们得到的数字更接近密立根的值时,他们就不会那么仔细地检查。”
   - 分析是否应用了对称审查?它对确认证据和否定证据的审查力度是否相同?
   - 分析中哪些相反的数据被驳回?驳回是否合理,还是密立根漂移?
   - 是否存在先验信念(“锚点”)扭曲了哪些证据被接受、哪些被解释掉?

4. **能力圈审计**
   费曼的框架在有界、定义明确、反馈清晰的领域表现出色。在具有涌现属性、未知未知和缓慢反馈循环的领域,它会失效。批评者的观点:Cargo Cult式费曼主义是披着严谨外衣的自信无知。
   - 该分析是否在第一性原理推理有效的领域内运作?(工程、物理、有界技术问题=是。社会系统、复杂适应系统、政治=谨慎。)
   - 分析师是否具备识别真正第一性原理的领域专业知识,还是从感觉基础但未经过测试的公理进行推理?
   - 分析师能否陈述他们推理所基于的公理?他们能否识别哪些是经过实证测试的,哪些是假设的?他们能否描述其结论不再适用的领域边界?他们能否识别哪些他们不知道的东西可能使结论无效?
   - 如果他们做不到这四点,他们可能是在用更好的品牌做动机性推理——Cargo Cult式第一性原理思考。

5. **“我必须报告什么?”测试**
   费曼的诚信标准:“你应该报告所有你认为可能使其无效的东西——不仅是你认为正确的东西。”
   - 该分析没有报告什么?
   - 对于呈现的证据,存在哪些替代解释?
   - 哪些实验没有进行?哪些数据没有收集?为什么?
   - 如果分析师“反向倾斜”以呈现所有可能使结论无效的东西,他们会添加什么?
   - 写一段分析师应该包含但未包含的段落——即反对他们自己结论的最强论点。

输出:结构化信心反转。
对于每个高置信度主张:主张 → 证据质量 → 置信水平 → 成正比? → 动机性推理风险 → 结论

结论:EARNED(置信度与证据匹配)、INFLATED(置信度超过证据)、ANCHORED(置信度由先验信念驱动,而非当前证据)或MOTIVATED(置信度服务于分析师的利益,而非真相)。

**总体置信度诚信度:** 分析中有多少百分比的置信度是EARNED,多少是inflated/anchored/motivated?

将任何MOTIVATED的发现告知团队成员——这些表明分析可能从根本上受损。使用SendMessage按名称提醒特定成员。

Spawning

生成团队

Spawn all five as background agents. Use
model: "sonnet"
for all teammates. The lead (Opus) handles synthesis.
Agent: {
  team_name: "feynman-<claim-slug>",
  name: "source-auditor",
  model: "sonnet",
  prompt: [full source auditor prompt with claim substituted],
  run_in_background: true
}
Repeat for self-deception-hunter, translation-tester, cargo-cult-inspector, confidence-inverter.
Assign tasks immediately:
TaskUpdate: { taskId: "1", owner: "source-auditor" }
TaskUpdate: { taskId: "2", owner: "self-deception-hunter" }
TaskUpdate: { taskId: "3", owner: "translation-tester" }
TaskUpdate: { taskId: "4", owner: "cargo-cult-inspector" }
TaskUpdate: { taskId: "5", owner: "confidence-inverter" }
将所有五位成员作为后台Agent生成。所有成员使用
model: "sonnet"
。主导者(Opus)负责综合结果。
Agent: {
  team_name: "feynman-<claim-slug>",
  name: "source-auditor",
  model: "sonnet",
  prompt: [替换主张后的完整来源审计员提示],
  run_in_background: true
}
为self-deception-hunter、translation-tester、cargo-cult-inspector、confidence-inverter重复上述步骤。
立即分配任务:
TaskUpdate: { taskId: "1", owner: "source-auditor" }
TaskUpdate: { taskId: "2", owner: "self-deception-hunter" }
TaskUpdate: { taskId: "3", owner: "translation-tester" }
TaskUpdate: { taskId: "4", owner: "cargo-cult-inspector" }
TaskUpdate: { taskId: "5", owner: "confidence-inverter" }

Phase 3: Monitor & Cross-Pollinate

阶段3:监控与交叉沟通

While teammates work:
  • Messages from teammates arrive automatically
  • If a teammate finds a SEVERE or CARGO CULT finding, alert the others
  • If two teammates identify the same underlying problem from different angles, note this — convergent findings from independent lenses are the strongest signals
  • If a teammate asks a question, respond with guidance
团队成员工作时:
  • 自动接收团队成员的消息
  • 如果团队成员发现SEVERE或CARGO CULT发现,提醒其他成员
  • 如果两位成员从不同角度识别出相同的潜在问题,记录下来——来自独立视角的趋同发现是最强的信号
  • 如果团队成员提出问题,提供指导

Phase 4: Synthesize — The Feynman Verdict

阶段4:综合——费曼结论

After ALL teammates report back, the lead writes the final integrity audit. This is where compounding self-deception patterns emerge — Feynman's method of enumerating "we have fooled ourselves in three ways."
所有团队成员报告后,主导者撰写最终的诚信审计报告。这是复合性自我欺骗模式显现的地方——费曼的“我们在N个方面欺骗了自己”方法。

The Synthesis Process

综合流程

  1. Collect all five audits
  2. Cross-reference — where do multiple auditors flag the same problem?
  3. Enumerate the self-deceptions — list each specific way the analysis fools itself, numbered, in Feynman's style: "We have fooled ourselves in [N] ways."
  4. Assess compounding — do the self-deceptions reinforce each other? (Selective reporting + precision theater = confident numbers from cherry-picked data. Success-as-safety + standards drift = gradually increasing risk hidden behind a track record of non-catastrophe.)
  5. Apply the "Nature Cannot Be Fooled" test — regardless of what the analysis claims, what will physical/market/operational reality actually produce?
  6. Render the verdict — HONEST, SELF-DECEIVED, or CARGO CULT
  1. 收集所有五份审计报告
  2. 交叉引用——多个审计员标记的相同问题在哪里?
  3. 列举自我欺骗——用费曼的风格,逐条列出分析欺骗自己的具体方式:“我们在[N]个方面欺骗了自己。”
  4. 评估复合效应——自我欺骗是否相互强化?(选择性报告 + 精确剧场 = 基于精选数据的自信数字。成功即安全 + 标准漂移 = 逐渐增加的风险被无灾难的记录所掩盖。)
  5. 应用“自然不会被欺骗”测试——无论分析声称什么,物理/市场/运营现实实际会产生什么结果?
  6. 给出结论——HONEST、SELF-DECEIVED或CARGO CULT

Output Document

输出文档

Write to
thoughts/feynman/YYYY-MM-DD-<claim-slug>.md
:
markdown
---
date: <ISO 8601>
analyst: Claude Code (feynman integrity audit)
claim: "<claim being audited>"
verdict: <HONEST | SELF-DECEIVED | CARGO CULT>
self_deception_count: <number of active patterns>
confidence_integrity: <percentage of earned confidence>
---
写入
thoughts/feynman/YYYY-MM-DD-<claim-slug>.md
markdown
---
date: <ISO 8601>
analyst: Claude Code (feynman integrity audit)
claim: "<待审计的主张>"
verdict: <HONEST | SELF-DECEIVED | CARGO CULT>
self_deception_count: <活跃模式数量>
confidence_integrity: <获得的置信度百分比>
---

Feynman Integrity Audit: [Claim/Analysis Name]

Feynman诚信审计:[主张/分析名称]

"The first principle is that you must not fool yourself — and you are the easiest person to fool." — Richard Feynman
“首要原则是你不能欺骗自己——而你自己是最容易被欺骗的人。” ——理查德·费曼

The Claim

主张内容

[One paragraph description of what is being audited]
[待审计对象的一段描述]

Stakes

风险

[What happens if this analysis is wrong and someone acts on it]

[如果分析错误且有人据此采取行动会发生什么]

Ground Truth Check (Source Auditor)

基本事实核查(来源审计员)

Primary Data Assessment

原始数据评估

Data PointPrimary/SecondaryLayers of AbstractionVerified?
[data][primary/secondary][N][Y/N]
数据点原始/次级抽象层级已验证?
[数据][原始/次级][N][Y/N]

Disparity Check

差异核查

[Gap between different sources' estimates, if any]
[不同来源估计之间的差距(如有)]

Face Value Test

表面价值测试

[Do the numbers pass the laugh test when translated to concrete implications?]
[当转化为具体含义时,数字是否符合常识?]

Base Rate

基准率

[Historical base rate vs. claimed rate]
[历史基准率 vs 声称的比率]

Source Audit Verdict

来源审计结论

Grounded? [YES / PARTIALLY / FLOATING / CONTRADICTED]

基于现实? [YES / PARTIALLY / FLOATING / CONTRADICTED]

Self-Deception Patterns (Self-Deception Hunter)

自我欺骗模式(自我欺骗猎手)

The Ten Patterns

十种模式

#PatternRatingEvidence
1Success-as-Safety[NOT PRESENT/MILD/ACTIVE/SEVERE][evidence]
2Credential-Laundering[rating][evidence]
3Standards Drift[rating][evidence]
4Model Reification[rating][evidence]
5Knowledge Without Translation[rating][evidence]
6Communication Hierarchy Failure[rating][evidence]
7Selective Reporting[rating][evidence]
8Anomaly Normalization[rating][evidence]
9Precision Theater[rating][evidence]
10Authority Substitution[rating][evidence]
#模式评级证据
1成功即安全[NOT PRESENT/MILD/ACTIVE/SEVERE][证据]
2资质洗白[评级][证据]
3标准漂移[评级][证据]
4模型实体化[评级][证据]
5无法转化的知识[评级][证据]
6沟通层级失效[评级][证据]
7选择性报告[评级][证据]
8异常正常化[评级][证据]
9精确剧场[评级][证据]
10用权威替代证据[评级][证据]

Compounding Effects

复合效应

[Which patterns reinforce each other and how]
[哪些模式相互强化,如何强化]

Self-Deception Score

自我欺骗得分

Active patterns: [N] of 10 Severe patterns: [N] Compounding? [YES/NO — describe]

活跃模式: 10种中的[N]种 严重模式: [N]种 复合效应? [YES/NO — 描述]

Translation Test (Translation Tester)

翻译测试(翻译测试员)

Key Claims — Jargon Strip

关键主张——行话剥离

ClaimPlain TranslationConcrete ExampleVerdict
[claim][translation][example or NONE][GENUINE/FOG/HOLLOW/INVERTED]
主张平实翻译具体示例结论
[主张][翻译][示例或NONE][GENUINE/FOG/HOLLOW/INVERTED]

The 12-Year-Old Version

12岁版本

[Central thesis in 3 sentences a smart 12-year-old would understand]
[核心论点的3句话解释,让聪明的12岁孩子能理解]

Language Integrity

语言诚信度

Genuine claims: [N] Fog: [N] Hollow: [N] Inverted: [N]

真正的主张: [N] 迷雾: [N] 空洞: [N] 反转: [N]

Cargo Cult Inspection (Cargo Cult Inspector)

Cargo Cult检查(Cargo Cult检查员)

Form vs. Substance

形式vs实质

ElementForm Present?Substance Present?Cargo Cult?
[element][Y/N][Y/N][Y/N]
元素形式存在?实质存在?Cargo Cult?
[元素][Y/N][Y/N][Y/N]

The Planes Landing Test

飞机降落测试

Concrete prediction: [what the analysis predicts] Testable? [Y/N] Tested? [Y/N] Result: [if tested]
具体预测: [分析做出的预测] 可测试? [Y/N] 已测试? [Y/N] 结果: [如果已测试]

Cargo Cult Verdict

Cargo Cult结论

Overall: [REAL / PARTIAL CARGO CULT / FULL CARGO CULT]

总体: [REAL / PARTIAL CARGO CULT / FULL CARGO CULT]

Confidence Inversion (Confidence Inverter)

信心反转(信心反转员)

Confidence Map

置信度地图

ClaimStated ConfidenceEvidence QualityProportional?Verdict
[claim][HIGH/MED/LOW][STRONG/MODERATE/WEAK/ABSENT][Y/N][EARNED/INFLATED/ANCHORED/MOTIVATED]
主张声明的置信度证据质量成正比?结论
[主张][HIGH/MED/LOW][STRONG/MODERATE/WEAK/ABSENT][Y/N][EARNED/INFLATED/ANCHORED/MOTIVATED]

Motivated Reasoning Assessment

动机性推理评估

[Who benefits from this conclusion? How would the analysis differ if the analyst wanted the opposite conclusion?]
[谁从这个结论中受益?如果分析师希望得出相反结论,分析会有何不同?]

The Missing Paragraph

缺失的段落

[The paragraph the analyst should have included — the strongest case against their own conclusion, written in the "leaning over backwards" tradition]
[分析师应该包含的段落——反对他们自己结论的最强论点,以“反向倾斜”的传统撰写]

Confidence Integrity

置信度诚信度

Earned confidence: [X]% Inflated/anchored/motivated: [Y]%

获得的置信度: [X]% 膨胀/锚定/动机性: [Y]%

THE FEYNMAN ENUMERATION

费曼列举

"We have fooled ourselves in [N] ways."
Enumerate each specific self-deception, numbered, with the mechanism:
  1. [Self-deception 1] — [mechanism and evidence]
  2. [Self-deception 2] — [mechanism and evidence]
  3. ...
“我们在[N]个方面欺骗了自己。”
逐条列举每种具体的自我欺骗,编号,并说明机制:
  1. [自我欺骗1] —— [机制和证据]
  2. [自我欺骗2] —— [机制和证据]
  3. ...

Compounding Chain

复合链

[Deception 1] enables [Deception 2] which reinforces [Deception 3]
→ aggregate effect: [what the compounding produces]

[欺骗1] 导致 [欺骗2],进而强化 [欺骗3]
→ 总体效应:[复合产生的结果]

THE VERDICT

结论

Feynman's Three Verdicts

费曼的三种结论

[ ] HONEST — The analysis is grounded in primary data, reports contrary evidence, translates to plain language, makes testable predictions, and maintains proportional confidence. Not perfect, but not fooling itself. Feynman would say: "Good. Now go test it."
[ ] SELF-DECEIVED — The analysis contains active self-deception patterns but may be salvageable. The analyst is not lying — they are fooling themselves. The cargo cult form may be partially present with partial substance. Feynman would say: "You're not lying, but you're not being honest either. Go back and report everything that could make this wrong."
[ ] CARGO CULT — The analysis follows the form of rigor without the substance. Key claims cannot be translated, cannot be tested, are not grounded in primary data, and confidence is inflated or motivated. The planes will not land. Feynman would say: "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled."
[ ] HONEST —— 分析基于原始数据,报告了相反证据,可转化为平实语言,做出可测试的预测,并保持成比例的置信度。虽不完美,但没有自欺欺人。费曼会说:“很好。现在去测试它。”
[ ] SELF-DECEIVED —— 分析包含活跃的自我欺骗模式,但可能可以挽救。分析师没有说谎——他们在欺骗自己。Cargo Cult式的形式可能部分存在,同时有部分实质内容。费曼会说:“你没有说谎,但你也不诚实。回去报告所有可能使分析无效的信息。”
[ ] CARGO CULT —— 分析遵循严谨的形式,但缺乏实质内容。关键主张无法转化为平实语言,无法测试,不基于原始数据,置信度膨胀或受动机驱动。飞机不会降落。费曼会说:“对于成功的技术,现实必须优先于公共关系,因为自然不会被欺骗。”

Verdict: [HONEST / SELF-DECEIVED / CARGO CULT]

结论:[HONEST / SELF-DECEIVED / CARGO CULT]

Self-deception count: [N] active patterns ([M] severe) Confidence integrity: [X]% earned Cargo cult elements: [N] of [total] Ground truth: [GROUNDED / FLOATING / CONTRADICTED]
Reasoning: [2-3 paragraphs written in Feynman's direct, no-BS style. Reference specific findings from each auditor. Be honest. If the analysis is self-deceived, enumerate exactly how. If it's honest, acknowledge what's strong. If it's cargo cult, don't apologize for saying so.]
自我欺骗数量: [N]种活跃模式([M]种严重) 置信度诚信度: [X]%获得 Cargo Cult元素: [N] / [总数] 基本事实: [GROUNDED / FLOATING / CONTRADICTED]
推理: [2-3段以费曼直接、不绕弯的风格撰写。参考每位审计员的具体发现。诚实客观。如果分析存在自我欺骗,准确列举方式。如果分析诚实,认可其优点。如果是Cargo Cult,无需道歉。]

What Feynman Would Say

费曼会怎么说

[Write 3-5 sentences in Feynman's voice — direct, playful, concrete, irreverent. Use physical analogies. Reference his actual phrases. He would not be diplomatic. He would not soften the finding. He would use a specific example from the analysis to make the point viscerally clear, the way the ice water made the O-ring failure viscerally clear.]
[用费曼的语气写3-5句话——直接、风趣、具体、不拘小节。使用物理类比。引用他的实际语句。他不会圆滑。他不会淡化发现。他会用分析中的具体例子直观地说明观点,就像冰水演示让O型环失效变得直观一样。]

The Integrity Rules

诚信规则

[Based on the audit findings, write 3-5 rules the analyst must follow to make this analysis honest. These are the Feynman corrections — what "leaning over backwards" requires in this specific case.]
  1. Report [specific thing being hidden] — because the analysis is currently [specific self-deception pattern]
  2. Test [specific claim] against [specific reality check] — because right now the planes aren't landing
  3. Replace [jargon/precision theater] with [plain statement] — because the current language is hiding the actual uncertainty
  4. ...
[基于审计发现,撰写分析师必须遵循的3-5条规则,以使分析诚实。这些是费曼式修正——在这种特定情况下,“反向倾斜”需要做什么。]
  1. 报告[被隐藏的具体内容] —— 因为分析目前存在[特定自我欺骗模式]
  2. 用[具体现实检查]测试[具体主张] —— 因为现在飞机不会降落
  3. 用[平实陈述]替换[行话/精确剧场] —— 因为当前语言掩盖了实际的不确定性
  4. ...

If You Trust This Analysis Anyway

如果你仍然信任该分析

[What specifically could go wrong if you act on a self-deceived or cargo cult analysis. The concrete consequences. Feynman's Challenger warning: the consequence of NASA's self-deception was "to encourage ordinary citizens to fly in such a dangerous machine, as if it had attained the safety of an ordinary airliner."]
undefined
[如果依据存在自我欺骗或Cargo Cult式的分析采取行动,具体可能会出什么问题。具体后果。费曼对挑战者号的警告:NASA自我欺骗的后果是“鼓励普通公民乘坐如此危险的机器,仿佛它已经达到了普通客机的安全性。”]
undefined

Phase 5: Present & Follow-up

阶段5:呈现与跟进

Present the verdict to the user with the key highlights. Don't dump the whole document — give the verdict, the enumeration, and the most critical finding.
undefined
向用户呈现结论及关键亮点。不要输出整个文档——给出结论、列举内容和最关键的发现。
undefined

Feynman Verdict: [CLAIM] — [HONEST / SELF-DECEIVED / CARGO CULT]

Feynman结论:[主张] —— [HONEST / SELF-DECEIVED / CARGO CULT]

Self-deception patterns: [N] active ([M] severe, [K] compounding) Confidence integrity: [X]% earned Ground truth: [grounded / floating / contradicted] Cargo cult elements: [N] of [total]
We have fooled ourselves in [N] ways:
  1. [Most critical self-deception — one sentence]
  2. [Second — one sentence]
  3. ...
What Feynman would say: "[pithy quote in his voice]"
Full audit:
thoughts/feynman/YYYY-MM-DD-<slug>.md
Want me to:
  1. Deep-dive into any auditor's findings?
  2. Re-audit after corrections are made?
  3. Run /munger first, then /feynman to audit the Munger analysis?
  4. Audit a different analysis?
undefined
自我欺骗模式: [N]种活跃([M]种严重,[K]种复合) 置信度诚信度: [X]%获得 基本事实: [grounded / floating / contradicted] Cargo Cult元素: [N] / [总数]
我们在[N]个方面欺骗了自己:
  1. [最关键的自我欺骗——一句话]
  2. [第二个——一句话]
  3. ...
费曼会说: "[他语气的精辟引语]"
完整审计:
thoughts/feynman/YYYY-MM-DD-<slug>.md
需要我:
  1. 深入探讨任何审计员的发现?
  2. 在修正后重新审计?
  3. 先运行/munger,再用/feynman审计Munger分析?
  4. 审计其他分析?
undefined

Batch Mode

批量模式

If the user wants to audit multiple analyses:
  1. Run the full audit on each (can parallelize — one team per analysis)
  2. At the end, produce a comparison:
undefined
如果用户要审计多个分析:
  1. 对每个分析运行完整审计(可并行——每个分析一个团队)
  2. 最后生成对比结果:
undefined

Feynman Integrity Leaderboard

Feynman诚信排行榜

RankAnalysisVerdictSelf-DeceptionsConfidenceGround Truth
1[name]HONEST1/1090% earnedGROUNDED
2[name]SELF-DECEIVED4/1055% earnedFLOATING
3[name]CARGO CULT7/1020% earnedCONTRADICTED
undefined
排名分析结论自我欺骗数量置信度基本事实
1[名称]HONEST1/1090%获得GROUNDED
2[名称]SELF-DECEIVED4/1055%获得FLOATING
3[名称]CARGO CULT7/1020%获得CONTRADICTED
undefined

The Meta-Audit: When NOT to Use /feynman

元审计:何时不使用/feynman

Feynman's own framework demands honesty about its limits. This skill is NOT appropriate for:
  • Social/political systems — Feynman's reductionism hits a ceiling at emergent properties. Hayek's knowledge problem applies: you cannot reduce a market or political system to first principles and derive outcomes.
  • Domains where expert pattern recognition beats derivation — experienced clinicians, firefighters, chess grandmasters. In domains with regular patterns and adequate feedback, accumulated expertise outperforms first-principles analysis.
  • Time-pressured decisions — Feynman-style auditing takes time. If you need to decide in minutes, trust calibrated expert judgment.
  • Unknown unknowns territory — First-principles thinking produces false confidence where the axioms themselves are uncertain. It cannot signal what it doesn't know.
  • Human relationships — Responsiveness, not derivation, is the relevant mode. Don't analyze people as systems.
The test for whether /feynman applies: Is this a domain where you can run experiments and get unambiguous feedback? If yes, audit away. If not, proceed with caution and supplement with pattern-matching expertise.
Beware cargo cult Feynmanism: Adopting the posture of questioning everything while lacking the domain expertise to distinguish load-bearing assumptions from arbitrary conventions. The test: Can you state the axioms you're reasoning from? Can you identify which are tested vs. assumed? Can you describe the domain boundary? Can you identify what you don't know? If not, you may be doing motivated reasoning with better branding.
费曼自己的框架要求诚实地面对其局限性。此技能不适用于:
  • 社会/政治系统 —— 费曼的还原论在涌现属性面前遇到瓶颈。哈耶克的知识问题适用:你无法将市场或政治系统简化为第一性原理并推导结果。
  • 专家模式识别优于推导的领域 —— 经验丰富的临床医生、消防员、国际象棋大师。在具有规律模式和充分反馈的领域,积累的专业知识优于第一性原理分析。
  • 时间紧迫的决策 —— 费曼式审计需要时间。如果你需要在几分钟内做出决定,信任经过校准的专家判断。
  • 未知未知领域 —— 在公理本身不确定的情况下,第一性原理思考会产生虚假信心。它无法发出它不知道的信号。
  • 人际关系 —— 响应性而非推导是相关模式。不要将人当作系统分析。
判断/feynman是否适用的测试:这是一个可以进行实验并获得明确反馈的领域吗?如果是,进行审计。如果不是,谨慎行事,并辅以模式匹配专业知识。
警惕Cargo Cult式费曼主义: 采取质疑一切的姿态,却缺乏区分承载性假设与任意惯例的领域专业知识。测试:你能否陈述你推理所基于的公理?你能否识别哪些是经过测试的,哪些是假设的?你能否描述领域边界?你能否识别你不知道的东西?如果不能,你可能是在用更好的品牌做动机性推理。

Scoring Discipline

评分准则

  • Be Feynman, not a consultant. Feynman did not soften findings for institutional comfort. His Challenger report was nearly excluded because it was too direct. He threatened to withdraw his name. If the analysis is self-deceived, say so. If it's cargo cult, say so.
  • Cite the source auditor. Every finding traces to a specific teammate's evidence. No unsupported accusations.
  • The most important finding is the one nobody wants to hear. That is where self-deception is strongest. Feynman: "Maybe they don't say explicitly 'Don't tell me,' but they discourage communication, which amounts to the same thing."
  • Honest praise is also required. If the analysis IS honest, say so clearly. Feynman praised NASA's avionics team for having exactly the right culture — independent adversarial verification, treating errors as very serious, studying origins carefully. The contrast with the SRB team made both assessments more credible.
  • Write the missing paragraph. The Confidence Inverter's most important output is the paragraph the analyst should have written — the strongest case against their own conclusion. This is "leaning over backwards" made concrete.
  • 做费曼,而非顾问。 费曼不会为了机构舒适而软化发现。他的挑战者号报告几乎被排除,因为它太直接。他威胁要撤回自己的名字。如果分析存在自我欺骗,直接说出来。如果是Cargo Cult,直接说出来。
  • 引用来源审计员。 每个发现都要追溯到特定团队成员的证据。不做无根据的指责。
  • 最重要的发现是没人想听的。 那里的自我欺骗最强。费曼:“也许他们没有明确说‘别告诉我’,但他们阻碍沟通,效果是一样的。”
  • 也需要诚实的表扬。 如果分析确实诚实,明确说出来。费曼称赞NASA的航空电子团队拥有完全正确的文化——独立的对抗性验证,将错误视为非常严重的问题,仔细研究根源。与SRB团队的对比使两种评估都更可信。
  • 撰写缺失的段落。 信心反转员最重要的输出是分析师应该写的段落——反对他们自己结论的最强论点。这是“反向倾斜”的具体体现。

Pairing with Other Skills

与其他技能搭配使用

/feynman is designed as a meta-tool that audits the OUTPUT of other analytical frameworks:
  • After /munger — Run the Munger lattice analysis, then /feynman the Munger output. Is the lollapalooza assessment honest? Are the moat ratings inflated? Is the math grounded or precision theater?
  • After /thiel — Run the Thiel monopoly analysis, then /feynman it. Is the "secret" genuine or rationalization? Is the monopoly assessment self-deceived?
  • After /garrytan — Audit the office hours output. Are the answers to the six questions honest or aspirational?
  • Standalone — Audit any business plan, investment thesis, technical analysis, risk assessment, or institutional claim.
The suggested workflow for maximum integrity:
  1. /garrytan — refine the idea
  2. /munger — apply the mental lattice
  3. /feynman — audit the Munger analysis for self-deception
  4. Only THEN act on the analysis
/feynman被设计为一款元工具,用于审计其他分析框架的输出:
  • 在/munger之后 —— 运行Munger格子分析,然后用/feynman审计Munger输出。Lollapalooza评估是否诚实?护城河评级是否膨胀?数学是基于现实还是精确剧场?
  • 在/thiel之后 —— 运行Thiel垄断分析,然后用/feynman审计。“秘密”是真实的还是合理化的?垄断评估是否存在自我欺骗?
  • 在/garrytan之后 —— 审计办公时间输出。六个问题的答案是诚实的还是理想化的?
  • 独立使用 —— 审计任何商业计划、投资论点、技术分析、风险评估或机构主张。
最大诚信度的建议工作流:
  1. /garrytan —— 完善想法
  2. /munger —— 应用思维格子
  3. /feynman —— 审计Munger分析中的自我欺骗
  4. 然后 才依据分析采取行动

Important Notes

重要说明

  • Cost: This skill spawns 5 agents. Worth it for high-stakes decisions where acting on a dishonest analysis has serious consequences. For casual checks, just ask "am I fooling myself?" without invoking the full team.
  • Sonnet for teammates, Opus for synthesis: The lead handles the enumeration and final verdict — that's where the compounding patterns emerge.
  • No team? No problem: If teams aren't enabled, run 5 sequential background agents and collect results. Same audit, just no cross-talk.
  • Primary sources: This skill's framework is derived from full-text reading of Appendix F (nasa.gov), Cargo Cult Science (calteches.library), What is Science? (fotuva.org), and Surely You're Joking (multiple chapters). Every principle traces to a specific Feynman text.
  • The verdict is not about the idea — it's about the analysis. An idea can be good and its analysis can be self-deceived. An idea can be bad and its analysis can be honest. /feynman evaluates the integrity of reasoning, not the quality of the conclusion.
  • 成本:此技能生成5个Agent。对于高风险决策,依据不诚实的分析采取行动会产生严重后果,因此值得使用。对于随意检查,只需问“我在自欺欺人吗?”无需调用完整团队。
  • 团队成员用Sonnet,综合用Opus:主导者负责列举和最终结论——复合模式在这里显现。
  • 没有团队?没问题:如果团队功能未启用,运行5个按顺序的后台Agent并收集结果。审计内容相同,只是没有跨Agent交流。
  • 原始来源:此技能的框架来自对附录F(nasa.gov)、《Cargo Cult科学》(calteches.library)、《什么是科学?》(fotuva.org)和《别闹了,费曼先生!》(多个章节)的全文阅读。每个原则都可追溯到费曼的具体文本。
  • 结论不是关于想法——而是关于分析。 一个想法可能很好,但对它的分析可能存在自我欺骗。一个想法可能不好,但对它的分析可能是诚实的。/feynman评估推理的诚信度,而非结论的质量。