cross-verified-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cross-Verified Research

交叉验证研究

Systematic research engine with anti-hallucination safeguards and source quality tiering.
具备防幻觉保障机制和来源质量分级的系统化研究引擎。

Rules (Absolute)

绝对规则

  1. Never fabricate sources. No fake URLs, no invented papers, no hallucinated statistics.
  2. Confidence gate. If confidence < 90% on a factual claim, do NOT present it as fact. State uncertainty explicitly.
  3. No speculation as fact. Do not present unverified claims using hedging language as if they were findings. Banned patterns: "아마도", "~인 것 같습니다", "~로 보입니다", "~수도 있습니다", "probably", "I think", "seems like", "appears to be", "likely". If a claim is not verified, label it explicitly as Unverified or Contested — do not soften it with hedging.
  4. BLUF output. Lead with conclusion, follow with evidence. Never bury the answer.
  5. Minimum effort. At least 5 distinct search queries per research task. At least 5 verified sources in final output.
  6. Cross-verify. Every key claim must appear in 2+ independent sources before presenting as fact.
  1. 绝不编造来源。不得使用虚假URL、虚构论文、幻觉生成的统计数据。
  2. 置信度门槛。如果对一项事实声明的置信度低于90%,不得将其作为事实呈现,需明确说明不确定性。
  3. 不得将猜测作为事实。不要使用含糊措辞将未经验证的声明当作研究结果呈现。禁用表述模式:"아마도", "~인 것 같습니다", "~로 보입니다", "~수도 있습니다", "probably", "I think", "seems like", "appears to be", "likely"。如果某条声明未经验证,需明确标注为未验证存在争议——不得用模糊措辞弱化表述。
  4. BLUF输出。优先给出结论,后续附上证据,绝不要将答案隐藏在内容末尾。
  5. 最低工作量要求。每个研究任务至少执行5次不同的搜索查询,最终输出中至少包含5个已验证来源。
  6. 交叉验证。每一项核心声明在作为事实呈现前,必须在2个及以上独立来源中出现。

Pipeline

执行流程

Execute these 4 stages sequentially. Do NOT skip stages.
按顺序执行以下4个阶段,不得跳过。

Stage 1: Deconstruct

阶段1:问题解构

Break the research question into atomic sub-questions.
Input: "Should we use Bun or Node.js for our backend?"
Decomposed:
  1. Runtime performance benchmarks (CPU, memory, startup)
  2. Ecosystem maturity (npm compatibility, native modules)
  3. Production stability (known issues, enterprise adoption)
  4. Developer experience (tooling, debugging, testing)
  5. Long-term viability (funding, community, roadmap)
  • Identify what requires external verification vs. internal knowledge
  • Flag any sub-question where confidence < 90%
将研究问题拆解为原子化的子问题。
Input: "Should we use Bun or Node.js for our backend?"
Decomposed:
  1. Runtime performance benchmarks (CPU, memory, startup)
  2. Ecosystem maturity (npm compatibility, native modules)
  3. Production stability (known issues, enterprise adoption)
  4. Developer experience (tooling, debugging, testing)
  5. Long-term viability (funding, community, roadmap)
  • 区分需要外部验证的内容与属于内部知识的内容
  • 标记所有置信度低于90%的子问题

Stage 2: Search & Collect

阶段2:搜索与收集

For each sub-question requiring verification:
  1. Formulate diverse queries — vary keywords, include year filters, try both English and Korean
  2. Use WebSearch for broad discovery, WebFetch for specific page analysis
  3. Classify every source by tier immediately (see Source Tiers below)
  4. Extract specific data points — numbers, dates, versions, quotes with attribution
  5. Record contradictions — when sources disagree, note both positions
Minimum search pattern:
Query 1: [topic] + "benchmark" or "comparison"
Query 2: [topic] + "production" or "enterprise"
Query 3: [topic] + [current year] + "review"
Query 4: [topic] + "issues" or "problems" or "limitations"
Query 5: [topic] + site:github.com (issues, discussions)
Fallback when WebSearch is unavailable or returns no results:
  1. Use WebFetch to directly access known authoritative URLs (official docs, GitHub repos, Wikipedia)
  2. Rely on internal knowledge but label all claims as Unverified (no external search available)
  3. Ask the user to provide source URLs or documents for verification
  4. Reduce the minimum source requirement but maintain cross-verification where possible
针对每个需要验证的子问题:
  1. 制定多样化查询词——变换关键词、增加年份筛选、同时尝试英文和韩语查询
  2. 使用WebSearch做广泛检索,使用WebFetch做特定页面分析
  3. 立即按分级规则对每个来源进行分类(参见下方来源分级说明)
  4. 提取特定数据点——数字、日期、版本、带出处的引用内容
  5. 记录矛盾点:如果来源观点不一致,同时记录两种立场
最低搜索模式:
Query 1: [topic] + "benchmark" or "comparison"
Query 2: [topic] + "production" or "enterprise"
Query 3: [topic] + [current year] + "review"
Query 4: [topic] + "issues" or "problems" or "limitations"
Query 5: [topic] + site:github.com (issues, discussions)
WebSearch不可用或无返回结果时的降级方案:
  1. 使用WebFetch直接访问已知权威URL(官方文档、GitHub仓库、Wikipedia)
  2. 依赖内部知识,但需将所有声明标注为未验证(无外部搜索结果可用)
  3. 要求用户提供来源URL或文档用于验证
  4. 降低最低来源要求,但尽可能保留交叉验证机制

Stage 3: Cross-Verify

阶段3:交叉验证

For each key finding:
  • Does it appear in 2+ independent Tier S/A sources? → Verified
  • Does it appear in only 1 source? → Unverified (label it)
  • Do sources contradict? → Contested (present both sides with tier labels)
Build a verification matrix:
| Claim | Source 1 (Tier) | Source 2 (Tier) | Status |
|-------|----------------|----------------|--------|
| Bun 3x faster startup | benchmarks.dev (A) | bun.sh/blog (B) | Verified (note: Bun's own blog = biased) |
针对每个核心发现:
  • 是否在2个及以上独立的S/A级来源中出现?→ 已验证
  • 仅在1个来源中出现?→ 未验证(需标注)
  • 来源存在矛盾?→ 存在争议(同时呈现双方观点并标注来源等级)
构建验证矩阵:
| Claim | Source 1 (Tier) | Source 2 (Tier) | Status |
|-------|----------------|----------------|--------|
| Bun 3x faster startup | benchmarks.dev (A) | bun.sh/blog (B) | Verified (note: Bun's own blog = biased) |

Stage 4: Synthesize

阶段4:内容整合

Produce the final report in BLUF format.
以BLUF格式生成最终报告。

Output Format

输出格式

markdown
undefined
markdown
undefined

Research: [Topic]

Research: [Topic]

Conclusion (BLUF)

Conclusion (BLUF)

[1-3 sentence definitive answer or recommendation]
[1-3 sentence definitive answer or recommendation]

Key Findings

Key Findings

[Numbered findings, each with inline source tier labels]
  1. [Finding] — [evidence summary] Sources: 🏛️ [source1], 🛡️ [source2]
  2. [Finding] — [evidence summary] Sources: 🛡️ [source1], 🛡️ [source2]
[Numbered findings, each with inline source tier labels]
  1. [Finding] — [evidence summary] Sources: 🏛️ [source1], 🛡️ [source2]
  2. [Finding] — [evidence summary] Sources: 🛡️ [source1], 🛡️ [source2]

Contested / Uncertain

Contested / Uncertain

[Any claims that couldn't be cross-verified or where sources conflict]
  • ⚠️ [claim] — Source A says X, Source B says Y
[Any claims that couldn't be cross-verified or where sources conflict]
  • ⚠️ [claim] — Source A says X, Source B says Y

Verification Matrix

Verification Matrix

ClaimSourcesTierStatus
.........Verified/Unverified/Contested
ClaimSourcesTierStatus
.........Verified/Unverified/Contested

Sources

Sources

[All sources, grouped by tier]
[All sources, grouped by tier]

🏛️ Tier S — Academic & Primary Research

🏛️ Tier S — Academic & Primary Research

  • Title — Journal/Org (Year)
  • Title — Journal/Org (Year)

🛡️ Tier A — Trusted Official

🛡️ Tier A — Trusted Official

  • Title — Source (Year)
  • Title — Source (Year)

⚠️ Tier B — Community / Caution

⚠️ Tier B — Community / Caution

  • Title — Platform (Year)
  • Title — Platform (Year)

Tier C — General

Tier C — General

  • Title
undefined
  • Title
undefined

Source Tiers

来源分级

Classify every source on discovery.
TierLabelTrust LevelExamples
S🏛️Academic, peer-reviewed, primary research, official specsGoogle Scholar, arXiv, PubMed, W3C/IETF RFCs, language specs (ECMAScript, PEPs)
A🛡️Government, .edu, major press, official docs.gov/.edu, Reuters/AP/BBC, official framework docs, company engineering blogs (Google AI, Netflix Tech)
B⚠️Social media, forums, personal blogs, wikis — flag to userTwitter/X, Reddit, StackOverflow, Medium, dev.to, Wikipedia, 나무위키
C(none)General websites not fitting above categoriesCorporate marketing, press releases, SEO content, news aggregators
发现来源时立即对其进行分类。
等级标签信任等级示例
S🏛️学术内容、同行评审内容、基础研究、官方规范Google Scholar、arXiv、PubMed、W3C/IETF RFC、语言规范(ECMAScript、PEP)
A🛡️政府站点、.edu域名站点、主流媒体、官方文档.gov/.edu站点、路透社/美联社/BBC、官方框架文档、企业技术博客(Google AI、Netflix Tech)
B⚠️社交媒体、论坛、个人博客、维基类站点——需向用户标记Twitter/X、Reddit、StackOverflow、Medium、dev.to、Wikipedia、나무위키
C(无)不属于以上类别的普通网站企业营销内容、新闻稿、SEO内容、新闻聚合站点

Tier Classification Rules

分级规则

  • Company's own content about their product:
    • Official docs → Tier A
    • Feature announcements → Tier A (existence), Tier B (performance claims)
    • Marketing pages → Tier C
  • GitHub:
    • Official repos (e.g., facebook/react) → Tier A
    • Issues/Discussions with reproduction → Tier A (for bug existence)
    • Random user repos → Tier B
  • Benchmarks:
    • Independent, reproducible, methodology disclosed → Tier S
    • Official by neutral party → Tier A
    • Vendor's own benchmarks → Tier B (note bias)
  • StackOverflow: Accepted answers with high votes = borderline Tier A; non-accepted = Tier B
  • Tier B sources must never be cited alone — corroborate with Tier S or A
  • 企业针对自身产品发布的内容:
    • 官方文档 → A级
    • 功能公告 → A级(功能存在性)、B级(性能相关声明)
    • 营销页面 → C级
  • GitHub内容:
    • 官方仓库(例如facebook/react) → A级
    • 带复现步骤的Issue/讨论 → A级(Bug存在性)
    • 普通用户仓库 → B级
  • 基准测试:
    • 独立、可复现、公开测试方法 → S级
    • 中立机构发布的官方测试 → A级
    • 厂商自行发布的基准测试 → B级(需标注偏向性)
  • StackOverflow: 高票采纳答案 = 接近A级;未采纳答案 = B级
  • B级来源绝不能单独引用——需要S级或A级来源佐证

When to Use

适用场景

  • Technology evaluation or comparison
  • Fact-checking specific claims
  • Architecture decision research
  • Market/competitor analysis
  • "Is X true?" verification tasks
  • Any question where accuracy matters more than speed
  • 技术评估或对比
  • 特定声明的事实核查
  • 架构决策调研
  • 市场/竞品分析
  • "X是否属实?"类验证任务
  • 任何准确性优先于速度的问题

When NOT to Use

不适用场景

  • Creative writing or brainstorming (use
    creativity-sampler
    )
  • Code implementation (use
    search-first
    for library discovery)
  • Simple questions answerable from internal knowledge with high confidence
  • Opinion-based questions with no verifiable answer
  • 创意写作或头脑风暴(使用
    creativity-sampler
  • 代码实现(库检索使用
    search-first
  • 可通过内部知识高置信度回答的简单问题
  • 无验证性答案的观点类问题

Integration Notes

集成说明

  • With brainstorming: Can be invoked during brainstorming's "Explore context" phase for fact-based inputs
  • With search-first: search-first finds tools/libraries to USE; this skill VERIFIES factual claims. Different purposes.
  • With adversarial-review: Research findings can feed into adversarial review for stress-testing conclusions
  • 与头脑风暴配合: 可在头脑风暴的「探索上下文」阶段调用,提供基于事实的输入
  • 与search-first配合: search-first用于查找可使用的工具/库;本功能用于验证事实声明,二者用途不同
  • 与对抗性评审配合: 研究结果可输入到对抗性评审流程,对结论进行压力测试