multi-model-meta-analysis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Multi-Model Synthesis

多模型输出整合

Combine outputs from multiple AI models into a verified, comprehensive assessment by cross-referencing claims against the actual codebase.

通过对照实际代码库交叉验证结论，将多个AI模型的输出整合为一份经过验证的全面评估报告。

Core Principle

核心原则

Models hallucinate and contradict each other. The source code is the source of truth. Every significant claim must be verified before inclusion in the final assessment.

模型会产生幻觉且相互矛盾。源代码是唯一的事实依据。所有重要结论在纳入最终评估报告前必须经过验证。

Process

流程

1. Extract Claims

1. 提取结论

Parse each model's output and extract discrete claims:

Factual assertions about the code ("function X does Y", "there's no error handling in Z")
Recommendations ("should add validation", "refactor this pattern")
Identified issues ("bug in line N", "security vulnerability")

Tag each claim with its source model.

解析每个模型的输出，提取独立的结论：

关于代码的事实断言（如“函数X实现Y功能”、“模块Z中没有错误处理”）
建议内容（如“应添加验证机制”、“重构该代码模式”）
识别出的问题（如“第N行存在Bug”、“安全漏洞”）

为每个结论标记其来源模型。

2. Deduplicate

2. 去重

Group semantically equivalent claims:

"Lacks input validation" = "No sanitization" = "User input not checked"
"Should use async/await" = "Convert to promises" = "Make asynchronous"

Create canonical phrasing. Track which models mentioned each.

将语义等价的结论分组：

“缺少输入验证” = “无数据清理” = “未检查用户输入”
“应使用async/await” = “转换为Promise” = “改为异步实现”

创建标准表述，并记录哪些模型提及了该结论。

3. Verify Against Source

3. 对照源代码验证

For each factual claim or identified issue:

CLAIM: "The auth middleware doesn't check token expiry"
VERIFY: Read the auth middleware file
FINDING: [Confirmed | Refuted | Partially true | Cannot verify]
EVIDENCE: [Quote relevant code or explain why claim is wrong]

Use Grep, Glob, and Read tools to locate and examine relevant code. Do not trust model claims without verification.

针对每个事实结论或识别出的问题：

结论: "认证中间件未检查令牌过期时间"
验证: 读取认证中间件文件
结果: [已确认 | 已推翻 | 部分正确 | 无法验证]
证据: [引用相关代码或解释结论错误的原因]

使用Grep、Glob和Read工具定位并检查相关代码。未经验证，不得轻信模型的结论。

4. Resolve Conflicts

4. 解决矛盾

When models contradict each other:

Identify the specific disagreement
Examine the actual code
Determine which model (if any) is correct
Document the resolution with evidence

CONFLICT: Model A says "uses SHA-256", Model B says "uses MD5"
INVESTIGATION: Read crypto.js lines 45-60
RESOLUTION: Model B is correct - line 52 shows MD5 usage
EVIDENCE: `const hash = crypto.createHash('md5')`

当模型结论相互矛盾时：

明确具体的分歧点
检查实际代码
判断哪个模型（如果有的话）的结论正确
记录带有证据的解决结果

矛盾: 模型A称“使用SHA-256”，模型B称“使用MD5”
调查: 读取crypto.js第45-60行
结论: 模型B正确 - 第52行显示使用了MD5
证据: `const hash = crypto.createHash('md5')`

5. Synthesize Assessment

5. 整合评估报告

Produce a final document that:

States verified facts (not model opinions)
Cites evidence for significant claims
Notes where verification wasn't possible
Preserves valuable insights that don't require verification (e.g., design suggestions)

生成最终文档，内容包括：

陈述经过验证的事实（而非模型观点）
为重要结论引用证据
标注无法验证的内容
保留无需验证的有价值见解（如设计建议）

Output Format

输出格式

markdown

undefined

markdown

undefined

Synthesized Assessment: [Topic]

整合后的评估报告: [主题]

Summary

摘要

[2-3 sentences describing the verified findings]

[2-3句话描述经过验证的发现]

Verified Findings

已验证的发现

Confirmed Issues

已确认的问题

Issue	Severity	Evidence	Models
[Issue]	High/Med/Low	[file:line or quote]	Claude, GPT

问题	严重程度	证据	来源模型
[问题描述]	高/中/低	[文件:行号或代码引用]	Claude, GPT

Refuted Claims

已推翻的结论

Claim	Source	Reality
[What model said]	GPT-4	[What code actually shows]

结论	来源模型	实际情况
[模型的结论内容]	GPT-4	[代码实际表现]

Unverifiable Claims

无法验证的结论

Claim	Source	Why Unverifiable
[Claim]	Claude	[Requires runtime testing / external system / etc.]

结论	来源模型	无法验证的原因
[结论内容]	Claude	[需要运行时测试 / 外部系统依赖 / 等]

Consensus Recommendations

共识建议

[Items where 2+ models agree AND verification supports the suggestion]

[2个及以上模型达成一致且验证支持的建议项]

Unique Insights Worth Considering

值得参考的独特见解

[Valuable suggestions from single models that weren't contradicted]

[单个模型提出的未被反驳的有价值建议]

Conflicts Resolved

已解决的矛盾

Topic	Model A	Model B	Verdict	Evidence
[Topic]	[Position]	[Position]	[Which is correct]	[Code reference]

主题	模型A结论	模型B结论	最终判定	证据
[主题]	[观点]	[观点]	[正确的结论]	[代码引用]

Action Items

行动项

Critical (Verified, High Impact)

紧急（已验证，高影响）

[Item] — Evidence: [file:line]

[任务内容] — 证据: [文件:行号]

Important (Verified, Medium Impact)

重要（已验证，中影响）

[Item] — Evidence: [file:line]

[任务内容] — 证据: [文件:行号]

Suggested (Unverified but Reasonable)

建议（未验证但合理）

[Item] — Source: [Models]

undefined

[任务内容] — 来源: [模型名称]

undefined

Verification Guidelines

验证指南

Always verify:

Bug reports and security issues
Claims about what code does or doesn't do
Assertions about missing functionality
Performance or complexity claims

Trust but note source:

Style and readability suggestions
Architectural recommendations
Best practice suggestions

Mark as unverifiable:

Runtime behavior claims (without tests)
Performance benchmarks (without profiling)
External API behavior
User experience claims

必须验证的内容：

Bug报告和安全问题
关于代码功能的断言
关于缺失功能的声明
性能或复杂度相关的结论

可信任但需标注来源的内容：

代码风格和可读性建议
架构设计建议
最佳实践建议

标记为无法验证的内容：

运行时行为结论（无测试支持）
性能基准测试结果（无性能分析支持）
外部API行为
用户体验相关结论

Anti-Patterns

反模式

Blindly merging model outputs without checking code
Treating model consensus as proof (all models can be wrong)
Omitting refuted claims (document what was wrong - it's valuable)
Skipping verification because claims "sound right"

未检查代码就盲目合并模型输出
将模型共识视为事实（所有模型都可能出错）
省略已推翻的结论（记录错误结论很有价值）
因结论“听起来合理”而跳过验证步骤