multi-model-meta-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMulti-Model Synthesis
多模型输出整合
Combine outputs from multiple AI models into a verified, comprehensive assessment by cross-referencing claims against the actual codebase.
通过对照实际代码库交叉验证结论,将多个AI模型的输出整合为一份经过验证的全面评估报告。
Core Principle
核心原则
Models hallucinate and contradict each other. The source code is the source of truth. Every significant claim must be verified before inclusion in the final assessment.
模型会产生幻觉且相互矛盾。源代码是唯一的事实依据。所有重要结论在纳入最终评估报告前必须经过验证。
Process
流程
1. Extract Claims
1. 提取结论
Parse each model's output and extract discrete claims:
- Factual assertions about the code ("function X does Y", "there's no error handling in Z")
- Recommendations ("should add validation", "refactor this pattern")
- Identified issues ("bug in line N", "security vulnerability")
Tag each claim with its source model.
解析每个模型的输出,提取独立的结论:
- 关于代码的事实断言(如“函数X实现Y功能”、“模块Z中没有错误处理”)
- 建议内容(如“应添加验证机制”、“重构该代码模式”)
- 识别出的问题(如“第N行存在Bug”、“安全漏洞”)
为每个结论标记其来源模型。
2. Deduplicate
2. 去重
Group semantically equivalent claims:
- "Lacks input validation" = "No sanitization" = "User input not checked"
- "Should use async/await" = "Convert to promises" = "Make asynchronous"
Create canonical phrasing. Track which models mentioned each.
将语义等价的结论分组:
- “缺少输入验证” = “无数据清理” = “未检查用户输入”
- “应使用async/await” = “转换为Promise” = “改为异步实现”
创建标准表述,并记录哪些模型提及了该结论。
3. Verify Against Source
3. 对照源代码验证
For each factual claim or identified issue:
CLAIM: "The auth middleware doesn't check token expiry"
VERIFY: Read the auth middleware file
FINDING: [Confirmed | Refuted | Partially true | Cannot verify]
EVIDENCE: [Quote relevant code or explain why claim is wrong]Use Grep, Glob, and Read tools to locate and examine relevant code. Do not trust model claims without verification.
针对每个事实结论或识别出的问题:
结论: "认证中间件未检查令牌过期时间"
验证: 读取认证中间件文件
结果: [已确认 | 已推翻 | 部分正确 | 无法验证]
证据: [引用相关代码或解释结论错误的原因]使用Grep、Glob和Read工具定位并检查相关代码。未经验证,不得轻信模型的结论。
4. Resolve Conflicts
4. 解决矛盾
When models contradict each other:
- Identify the specific disagreement
- Examine the actual code
- Determine which model (if any) is correct
- Document the resolution with evidence
CONFLICT: Model A says "uses SHA-256", Model B says "uses MD5"
INVESTIGATION: Read crypto.js lines 45-60
RESOLUTION: Model B is correct - line 52 shows MD5 usage
EVIDENCE: `const hash = crypto.createHash('md5')`当模型结论相互矛盾时:
- 明确具体的分歧点
- 检查实际代码
- 判断哪个模型(如果有的话)的结论正确
- 记录带有证据的解决结果
矛盾: 模型A称“使用SHA-256”,模型B称“使用MD5”
调查: 读取crypto.js第45-60行
结论: 模型B正确 - 第52行显示使用了MD5
证据: `const hash = crypto.createHash('md5')`5. Synthesize Assessment
5. 整合评估报告
Produce a final document that:
- States verified facts (not model opinions)
- Cites evidence for significant claims
- Notes where verification wasn't possible
- Preserves valuable insights that don't require verification (e.g., design suggestions)
生成最终文档,内容包括:
- 陈述经过验证的事实(而非模型观点)
- 为重要结论引用证据
- 标注无法验证的内容
- 保留无需验证的有价值见解(如设计建议)
Output Format
输出格式
markdown
undefinedmarkdown
undefinedSynthesized Assessment: [Topic]
整合后的评估报告: [主题]
Summary
摘要
[2-3 sentences describing the verified findings]
[2-3句话描述经过验证的发现]
Verified Findings
已验证的发现
Confirmed Issues
已确认的问题
| Issue | Severity | Evidence | Models |
|---|---|---|---|
| [Issue] | High/Med/Low | [file:line or quote] | Claude, GPT |
| 问题 | 严重程度 | 证据 | 来源模型 |
|---|---|---|---|
| [问题描述] | 高/中/低 | [文件:行号 或 代码引用] | Claude, GPT |
Refuted Claims
已推翻的结论
| Claim | Source | Reality |
|---|---|---|
| [What model said] | GPT-4 | [What code actually shows] |
| 结论 | 来源模型 | 实际情况 |
|---|---|---|
| [模型的结论内容] | GPT-4 | [代码实际表现] |
Unverifiable Claims
无法验证的结论
| Claim | Source | Why Unverifiable |
|---|---|---|
| [Claim] | Claude | [Requires runtime testing / external system / etc.] |
| 结论 | 来源模型 | 无法验证的原因 |
|---|---|---|
| [结论内容] | Claude | [需要运行时测试 / 外部系统依赖 / 等] |
Consensus Recommendations
共识建议
[Items where 2+ models agree AND verification supports the suggestion]
[2个及以上模型达成一致且验证支持的建议项]
Unique Insights Worth Considering
值得参考的独特见解
[Valuable suggestions from single models that weren't contradicted]
[单个模型提出的未被反驳的有价值建议]
Conflicts Resolved
已解决的矛盾
| Topic | Model A | Model B | Verdict | Evidence |
|---|---|---|---|---|
| [Topic] | [Position] | [Position] | [Which is correct] | [Code reference] |
| 主题 | 模型A结论 | 模型B结论 | 最终判定 | 证据 |
|---|---|---|---|---|
| [主题] | [观点] | [观点] | [正确的结论] | [代码引用] |
Action Items
行动项
Critical (Verified, High Impact)
紧急(已验证,高影响)
- [Item] — Evidence: [file:line]
- [任务内容] — 证据: [文件:行号]
Important (Verified, Medium Impact)
重要(已验证,中影响)
- [Item] — Evidence: [file:line]
- [任务内容] — 证据: [文件:行号]
Suggested (Unverified but Reasonable)
建议(未验证但合理)
- [Item] — Source: [Models]
undefined- [任务内容] — 来源: [模型名称]
undefinedVerification Guidelines
验证指南
Always verify:
- Bug reports and security issues
- Claims about what code does or doesn't do
- Assertions about missing functionality
- Performance or complexity claims
Trust but note source:
- Style and readability suggestions
- Architectural recommendations
- Best practice suggestions
Mark as unverifiable:
- Runtime behavior claims (without tests)
- Performance benchmarks (without profiling)
- External API behavior
- User experience claims
必须验证的内容:
- Bug报告和安全问题
- 关于代码功能的断言
- 关于缺失功能的声明
- 性能或复杂度相关的结论
可信任但需标注来源的内容:
- 代码风格和可读性建议
- 架构设计建议
- 最佳实践建议
标记为无法验证的内容:
- 运行时行为结论(无测试支持)
- 性能基准测试结果(无性能分析支持)
- 外部API行为
- 用户体验相关结论
Anti-Patterns
反模式
- Blindly merging model outputs without checking code
- Treating model consensus as proof (all models can be wrong)
- Omitting refuted claims (document what was wrong - it's valuable)
- Skipping verification because claims "sound right"
- 未检查代码就盲目合并模型输出
- 将模型共识视为事实(所有模型都可能出错)
- 省略已推翻的结论(记录错误结论很有价值)
- 因结论“听起来合理”而跳过验证步骤