scientific-critical-thinking

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Scientific Critical Thinking

科学批判性思维

Overview

概述

Critical thinking is a systematic process for evaluating scientific rigor. Assess methodology, experimental design, statistical validity, biases, confounding, and evidence quality using GRADE and Cochrane ROB frameworks. Apply this skill for critical analysis of scientific claims.

批判性思维是评估研究严谨性的系统化流程。运用GRADE和Cochrane ROB框架，对研究方法、实验设计、统计有效性、偏差、混杂因素以及证据质量进行评估。将这项技能应用于科学主张的批判性分析中。

When to Use This Skill

何时使用这项技能

This skill should be used when:

Evaluating research methodology and experimental design
Assessing statistical validity and evidence quality
Identifying biases and confounding in studies
Reviewing scientific claims and conclusions
Conducting systematic reviews or meta-analyses
Applying GRADE or Cochrane risk of bias assessments
Providing critical analysis of research papers

在以下场景中应使用这项技能：

评估研究方法与实验设计
评估统计有效性与证据质量
识别研究中的偏差与混杂因素
审查科学主张与结论
开展系统综述或元分析
应用GRADE或Cochrane偏倚风险评估
对研究论文进行批判性分析

Visual Enhancement with Scientific Schematics

借助科学示意图优化视觉呈现

When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.

If your document does not already contain schematics or diagrams:

Use the scientific-schematics skill to generate AI-powered publication-quality diagrams
Simply describe your desired diagram in natural language
Nano Banana Pro will automatically generate, review, and refine the schematic

For new documents: Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.

How to generate schematics:

bash

python scripts/generate_schematic.py "your diagram description" -o figures/output.png

The AI will automatically:

Create publication-quality images with proper formatting
Review and refine through multiple iterations
Ensure accessibility (colorblind-friendly, high contrast)
Save outputs in the figures/ directory

When to add schematics:

Critical thinking framework diagrams
Bias identification decision trees
Evidence quality assessment flowcharts
GRADE assessment methodology diagrams
Risk of bias evaluation frameworks
Validity assessment visualizations
Any complex concept that benefits from visualization

For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.

使用本技能创建文档时，务必考虑添加科学图表与示意图以增强视觉传达效果。

如果你的文档尚未包含示意图或图表：

使用 scientific-schematics 技能生成AI驱动的符合出版标准的图表
只需用自然语言描述你想要的图表
Nano Banana Pro会自动生成、审核并优化示意图

对于新文档： 默认应生成科学示意图，以可视化方式呈现文本中描述的关键概念、工作流程、架构或关系。

如何生成示意图：

bash

python scripts/generate_schematic.py "your diagram description" -o figures/output.png

AI将自动：

创建格式规范的符合出版标准的图片
通过多轮迭代进行审核与优化
确保可访问性（兼顾色弱群体、高对比度）
将输出文件保存至figures/目录

何时添加示意图：

批判性思维框架图
偏差识别决策树
证据质量评估流程图
GRADE评估方法图
偏倚风险评估框架
有效性评估可视化图
任何需要可视化的复杂概念

有关创建示意图的详细指南，请参考scientific-schematics技能文档。

Core Capabilities

核心能力

1. Methodology Critique

1. 研究方法批判

Evaluate research methodology for rigor, validity, and potential flaws.

Apply when:

Reviewing research papers
Assessing experimental designs
Evaluating study protocols
Planning new research

Evaluation framework:

Study Design Assessment
- Is the design appropriate for the research question?
- Can the design support causal claims being made?
- Are comparison groups appropriate and adequate?
- Consider whether experimental, quasi-experimental, or observational design is justified
Validity Analysis
- Internal validity: Can we trust the causal inference?
  - Check randomization quality
  - Evaluate confounding control
  - Assess selection bias
  - Review attrition/dropout patterns
- External validity: Do results generalize?
  - Evaluate sample representativeness
  - Consider ecological validity of setting
  - Assess whether conditions match target application
- Construct validity: Do measures capture intended constructs?
  - Review measurement validation
  - Check operational definitions
  - Assess whether measures are direct or proxy
- Statistical conclusion validity: Are statistical inferences sound?
  - Verify adequate power/sample size
  - Check assumption compliance
  - Evaluate test appropriateness
Control and Blinding
- Was randomization properly implemented (sequence generation, allocation concealment)?
- Was blinding feasible and implemented (participants, providers, assessors)?
- Are control conditions appropriate (placebo, active control, no treatment)?
- Could performance or detection bias affect results?
Measurement Quality
- Are instruments validated and reliable?
- Are measures objective when possible, or subjective with acknowledged limitations?
- Is outcome assessment standardized?
- Are multiple measures used to triangulate findings?

Reference: See

references/scientific_method.md

for detailed principles and

references/experimental_design.md

for comprehensive design checklist.

评估研究方法的严谨性、有效性及潜在缺陷。

适用场景：

审阅研究论文
评估实验设计
评估研究方案
规划新研究

评估框架：

研究设计评估
- 设计是否契合研究问题？
- 该设计能否支撑所提出的因果主张？
- 对照组是否恰当且充足？
- 考量实验、准实验或观察性设计是否合理
有效性分析
- 内部有效性： 我们能否信任因果推断？
  - 检查随机化质量
  - 评估混杂因素控制情况
  - 评估选择偏差
  - 审查流失/退出模式
- 外部有效性： 结果是否具有推广性？
  - 评估样本代表性
  - 考量研究场景的生态有效性
  - 评估研究条件是否与目标应用匹配
- 构念有效性： 测量工具能否捕捉预期构念？
  - 审查测量工具的验证情况
  - 检查操作定义
  - 评估测量工具是直接测量还是间接代理
- 统计结论有效性： 统计推断是否合理？
  - 验证样本量与统计功效是否充足
  - 检查是否符合统计假设
  - 评估统计检验的恰当性
控制与盲法
- 随机化是否正确实施（序列生成、分配隐藏）？
- 盲法是否可行且已实施（受试者、提供者、评估者）？
- 对照条件是否恰当（安慰剂、阳性对照、无治疗）？
- 执行偏差或检测偏差是否可能影响结果？
测量质量
- 测量工具是否经过验证且可靠？
- 尽可能使用客观测量，若为主观测量则需明确其局限性？
- 结局评估是否标准化？
- 是否使用多种测量工具进行三角验证？

参考资料： 详细原则请参阅

references/scientific_method.md

，全面的设计清单请参阅

references/experimental_design.md

。

2. Bias Detection

2. 偏差检测

Identify and evaluate potential sources of bias that could distort findings.

Apply when:

Reviewing published research
Designing new studies
Interpreting conflicting evidence
Assessing research quality

Systematic bias review:

Cognitive Biases (Researcher)
- Confirmation bias: Are only supporting findings highlighted?
- HARKing: Were hypotheses stated a priori or formed after seeing results?
- Publication bias: Are negative results missing from literature?
- Cherry-picking: Is evidence selectively reported?
- Check for preregistration and analysis plan transparency
Selection Biases
- Sampling bias: Is sample representative of target population?
- Volunteer bias: Do participants self-select in systematic ways?
- Attrition bias: Is dropout differential between groups?
- Survivorship bias: Are only "survivors" visible in sample?
- Examine participant flow diagrams and compare baseline characteristics
Measurement Biases
- Observer bias: Could expectations influence observations?
- Recall bias: Are retrospective reports systematically inaccurate?
- Social desirability: Are responses biased toward acceptability?
- Instrument bias: Do measurement tools systematically err?
- Evaluate blinding, validation, and measurement objectivity
Analysis Biases
- P-hacking: Were multiple analyses conducted until significance emerged?
- Outcome switching: Were non-significant outcomes replaced with significant ones?
- Selective reporting: Are all planned analyses reported?
- Subgroup fishing: Were subgroup analyses conducted without correction?
- Check for study registration and compare to published outcomes
Confounding
- What variables could affect both exposure and outcome?
- Were confounders measured and controlled (statistically or by design)?
- Could unmeasured confounding explain findings?
- Are there plausible alternative explanations?

Reference: See

references/common_biases.md

for comprehensive bias taxonomy with detection and mitigation strategies.

识别并评估可能扭曲研究结果的潜在偏差来源。

适用场景：

审阅已发表研究
设计新研究
解读相互矛盾的证据
评估研究质量

系统性偏差审查：

认知偏差（研究者层面）
- 确认偏差： 是否仅强调支持性发现？
- HARKing： 假设是预先设定还是在看到结果后形成的？
- 发表偏差： 文献中是否缺失阴性结果？
- 选择性报告： 是否选择性地呈现证据？
- 检查研究预注册情况与分析计划透明度
选择偏差
- 抽样偏差： 样本是否能代表目标人群？
- 志愿者偏差： 受试者是否存在系统性自我选择？
- 流失偏差： 组间流失情况是否存在差异？
- 幸存者偏差： 样本中是否仅能看到“幸存者”？
- 检查受试者流程图并比较基线特征
测量偏差
- 观察者偏差： 预期是否会影响观察结果？
- 回忆偏差： 回顾性报告是否存在系统性不准确？
- 社会期望偏差： 回答是否偏向于可接受的结果？
- 工具偏差： 测量工具是否存在系统性误差？
- 评估盲法、验证情况与测量客观性
分析偏差
- P值篡改： 是否进行了多次分析直至出现显著性结果？
- 结局转换： 是否用显著性结果替换非显著性结局？
- 选择性报告： 所有计划的分析是否都已报告？
- 亚组挖掘： 亚组分析是否未进行校正？
- 检查研究注册情况并与已发表结局进行对比
混杂因素
- 哪些变量可能同时影响暴露与结局？
- 是否对混杂因素进行了测量与控制（通过统计方法或研究设计）？
- 未测量的混杂因素能否解释研究结果？
- 是否存在合理的替代解释？

参考资料： 包含检测与缓解策略的全面偏差分类请参阅

references/common_biases.md

。

3. Statistical Analysis Evaluation

3. 统计分析评估

Critically assess statistical methods, interpretation, and reporting.

Apply when:

Reviewing quantitative research
Evaluating data-driven claims
Assessing clinical trial results
Reviewing meta-analyses

Statistical review checklist:

Sample Size and Power
- Was a priori power analysis conducted?
- Is sample adequate for detecting meaningful effects?
- Is the study underpowered (common problem)?
- Do significant results from small samples raise flags for inflated effect sizes?
Statistical Tests
- Are tests appropriate for data type and distribution?
- Were test assumptions checked and met?
- Are parametric tests justified, or should non-parametric alternatives be used?
- Is the analysis matched to study design (e.g., paired vs. independent)?
Multiple Comparisons
- Were multiple hypotheses tested?
- Was correction applied (Bonferroni, FDR, other)?
- Are primary outcomes distinguished from secondary/exploratory?
- Could findings be false positives from multiple testing?
P-Value Interpretation
- Are p-values interpreted correctly (probability of data if null is true)?
- Is non-significance incorrectly interpreted as "no effect"?
- Is statistical significance conflated with practical importance?
- Are exact p-values reported, or only "p < .05"?
- Is there suspicious clustering just below .05?
Effect Sizes and Confidence Intervals
- Are effect sizes reported alongside significance?
- Are confidence intervals provided to show precision?
- Is the effect size meaningful in practical terms?
- Are standardized effect sizes interpreted with field-specific context?
Missing Data
- How much data is missing?
- Is missing data mechanism considered (MCAR, MAR, MNAR)?
- How is missing data handled (deletion, imputation, maximum likelihood)?
- Could missing data bias results?
Regression and Modeling
- Is the model overfitted (too many predictors, no cross-validation)?
- Are predictions made outside the data range (extrapolation)?
- Are multicollinearity issues addressed?
- Are model assumptions checked?
Common Pitfalls
- Correlation treated as causation
- Ignoring regression to the mean
- Base rate neglect
- Texas sharpshooter fallacy (pattern finding in noise)
- Simpson's paradox (confounding by subgroups)

Reference: See

references/statistical_pitfalls.md

for detailed pitfalls and correct practices.

批判性评估统计方法、解读与报告情况。

适用场景：

审阅量化研究
评估基于数据的主张
评估临床试验结果
审阅元分析

统计审查清单：

样本量与统计功效
- 是否进行了事前功效分析？
- 样本量是否足以检测有意义的效应？
- 研究是否功效不足（常见问题）？
- 小样本的显著性结果是否会引发效应量被夸大的疑虑？
统计检验
- 检验方法是否契合数据类型与分布？
- 是否检查并满足检验假设？
- 参数检验是否合理，还是应使用非参数替代方法？
- 分析是否与研究设计匹配（如配对 vs 独立样本）？
多重比较
- 是否检验了多个假设？
- 是否进行了校正（Bonferroni、FDR等）？
- 主要结局与次要/探索性结局是否明确区分？
- 多重检验是否可能导致假阳性结果？
P值解读
- P值解读是否正确（原假设为真时数据出现的概率）？
- 是否错误地将非显著性解读为“无效应”？
- 是否将统计显著性与实际重要性混为一谈？
- 是否报告了精确的P值，还是仅报告“p < .05”？
- 是否存在可疑的集中在.05以下的情况？
效应量与置信区间
- 是否在报告显著性的同时报告了效应量？
- 是否提供了置信区间以展示结果的精确性？
- 效应量在实际应用中是否有意义？
- 是否结合领域特定背景解读标准化效应量？
缺失数据
- 数据缺失比例是多少？
- 是否考虑了缺失数据机制（MCAR、MAR、MNAR）？
- 缺失数据是如何处理的（删除、插补、极大似然估计）？
- 缺失数据是否可能导致结果偏差？
回归与建模
- 模型是否过拟合（预测变量过多，未进行交叉验证）？
- 是否在数据范围外进行预测（外推）？
- 是否解决了多重共线性问题？
- 是否检查了模型假设？
常见陷阱
- 将相关性视为因果关系
- 忽略均值回归
- 忽略基础比率
- 德州神枪手谬误（在随机数据中寻找模式）
- 辛普森悖论（亚组混杂）

参考资料： 详细的陷阱与正确实践请参阅

references/statistical_pitfalls.md

。

4. Evidence Quality Assessment

4. 证据质量评估

Evaluate the strength and quality of evidence systematically.

Apply when:

Weighing evidence for decisions
Conducting literature reviews
Comparing conflicting findings
Determining confidence in conclusions

Evidence evaluation framework:

Study Design Hierarchy
- Systematic reviews/meta-analyses (highest for intervention effects)
- Randomized controlled trials
- Cohort studies
- Case-control studies
- Cross-sectional studies
- Case series/reports
- Expert opinion (lowest)
Important: Higher-level designs aren't always better quality. A well-designed observational study can be stronger than a poorly-conducted RCT.
Quality Within Design Type
- Risk of bias assessment (use appropriate tool: Cochrane ROB, Newcastle-Ottawa, etc.)
- Methodological rigor
- Transparency and reporting completeness
- Conflicts of interest
GRADE Considerations (if applicable)
- Start with design type (RCT = high, observational = low)
- Downgrade for:
  - Risk of bias
  - Inconsistency across studies
  - Indirectness (wrong population/intervention/outcome)
  - Imprecision (wide confidence intervals, small samples)
  - Publication bias
- Upgrade for:
  - Large effect sizes
  - Dose-response relationships
  - Confounders would reduce (not increase) effect
Convergence of Evidence
- Stronger when:
  - Multiple independent replications
  - Different research groups and settings
  - Different methodologies converge on same conclusion
  - Mechanistic and empirical evidence align
- Weaker when:
  - Single study or research group
  - Contradictory findings in literature
  - Publication bias evident
  - No replication attempts
Contextual Factors
- Biological/theoretical plausibility
- Consistency with established knowledge
- Temporality (cause precedes effect)
- Specificity of relationship
- Strength of association

Reference: See

references/evidence_hierarchy.md

for detailed hierarchy, GRADE system, and quality assessment tools.

系统化评估证据的强度与质量。

适用场景：

为决策权衡证据
开展文献综述
对比相互矛盾的研究结果
确定对结论的置信度

证据评估框架：

研究设计层级
- 系统综述/元分析（干预效果评估的最高层级）
- 随机对照试验（RCT）
- 队列研究
- 病例对照研究
- 横断面研究
- 病例系列/报告
- 专家意见（最低层级）
重要提示： 更高层级的设计并不总是质量更好。设计良好的观察性研究可能比执行不佳的RCT更具说服力。
同类型设计的质量评估
- 偏倚风险评估（使用合适工具：Cochrane ROB、纽卡斯尔-渥太华量表等）
- 研究方法严谨性
- 透明度与报告完整性
- 利益冲突
GRADE考量（如适用）
- 从研究设计类型开始（RCT = 高质量，观察性研究 = 低质量）
- 降级因素：
  - 偏倚风险
  - 研究间不一致性
  - 间接性（人群/干预/结局不匹配）
  - 不精确性（置信区间宽、样本量小）
  - 发表偏差
- 升级因素：
  - 大效应量
  - 剂量-反应关系
  - 混杂因素会降低（而非增加）效应
证据一致性
- 证据更强的情况：
  - 多次独立重复验证
  - 不同研究团队与场景
  - 不同方法得出相同结论
  - 机制证据与实证证据一致
- 证据较弱的情况：
  - 单一研究或研究团队
  - 文献中存在矛盾结果
  - 存在发表偏差
  - 未进行重复验证
背景因素
- 生物学/理论合理性
- 与已有知识的一致性
- 时序性（原因先于结果）
- 关系的特异性
- 关联强度

参考资料： 详细的层级、GRADE系统与质量评估工具请参阅

references/evidence_hierarchy.md

。

5. Logical Fallacy Identification

5. 逻辑谬误识别

Detect and name logical errors in scientific arguments and claims.

Apply when:

Evaluating scientific claims
Reviewing discussion/conclusion sections
Assessing popular science communication
Identifying flawed reasoning

Common fallacies in science:

Causation Fallacies
- Post hoc ergo propter hoc: "B followed A, so A caused B"
- Correlation = causation: Confusing association with causality
- Reverse causation: Mistaking cause for effect
- Single cause fallacy: Attributing complex outcomes to one factor
Generalization Fallacies
- Hasty generalization: Broad conclusions from small samples
- Anecdotal fallacy: Personal stories as proof
- Cherry-picking: Selecting only supporting evidence
- Ecological fallacy: Group patterns applied to individuals
Authority and Source Fallacies
- Appeal to authority: "Expert said it, so it's true" (without evidence)
- Ad hominem: Attacking person, not argument
- Genetic fallacy: Judging by origin, not merits
- Appeal to nature: "Natural = good/safe"
Statistical Fallacies
- Base rate neglect: Ignoring prior probability
- Texas sharpshooter: Finding patterns in random data
- Multiple comparisons: Not correcting for multiple tests
- Prosecutor's fallacy: Confusing P(E|H) with P(H|E)
Structural Fallacies
- False dichotomy: "Either A or B" when more options exist
- Moving goalposts: Changing evidence standards after they're met
- Begging the question: Circular reasoning
- Straw man: Misrepresenting arguments to attack them
Science-Specific Fallacies
- Galileo gambit: "They laughed at Galileo, so my fringe idea is correct"
- Argument from ignorance: "Not proven false, so true"
- Nirvana fallacy: Rejecting imperfect solutions
- Unfalsifiability: Making untestable claims

When identifying fallacies:

Name the specific fallacy
Explain why the reasoning is flawed
Identify what evidence would be needed for valid inference
Note that fallacious reasoning doesn't prove the conclusion false—just that this argument doesn't support it

Reference: See

references/logical_fallacies.md

for comprehensive fallacy catalog with examples and detection strategies.

检测并指出科学论证与主张中的逻辑错误。

适用场景：

评估科学主张
审阅讨论/结论部分
评估科普传播内容
识别有缺陷的推理

科学领域常见谬误：

因果谬误
- 事后归因谬误： “B在A之后发生，所以A导致了B”
- 相关性=因果关系： 将关联混淆为因果关系
- 反向因果： 误将结果当作原因
- 单一原因谬误： 将复杂结果归因于单一因素
概括谬误
- 草率概括： 从少量样本得出宽泛结论
- 轶事谬误： 将个人经历当作证据
- 选择性筛选： 仅选择支持性证据
- 生态谬误： 将群体模式应用于个体
权威与来源谬误
- 诉诸权威： “专家这么说，所以是对的”（无证据支持）
- 人身攻击： 攻击个人而非论点
- 起源谬误： 根据来源而非优劣判断
- 诉诸自然： “天然=好/安全”
统计谬误
- 忽略基础比率： 忽略先验概率
- 德州神枪手谬误： 在随机数据中寻找模式
- 多重比较： 未对多重检验进行校正
- 检察官谬误： 混淆P(E|H)与P(H|E)
结构谬误
- 虚假二分法： “要么A要么B”，但存在更多选项
- 移动目标： 达到标准后更改证据要求
- 循环论证： 用结论作为前提
- 稻草人谬误： 歪曲论点以进行攻击
科学特定谬误
- 伽利略 Gambit： “他们嘲笑伽利略，所以我的小众观点是正确的”
- 诉诸无知： “未被证明为假，所以是真的”
- 涅槃谬误： 拒绝不完美的解决方案
- 不可证伪： 提出无法检验的主张

识别谬误时：

指出具体谬误名称
解释推理为何存在缺陷
说明需要哪些证据才能进行有效推断
注意：谬误推理并不证明结论错误——只是说明该论点无法支持结论

参考资料： 包含示例与检测策略的全面谬误目录请参阅

references/logical_fallacies.md

。

6. Research Design Guidance

6. 研究设计指导

Provide constructive guidance for planning rigorous studies.

Apply when:

Helping design new experiments
Planning research projects
Reviewing research proposals
Improving study protocols

Design process:

Research Question Refinement
- Ensure question is specific, answerable, and falsifiable
- Verify it addresses a gap or contradiction in literature
- Confirm feasibility (resources, ethics, time)
- Define variables operationally
Design Selection
- Match design to question (causal → experimental; associational → observational)
- Consider feasibility and ethical constraints
- Choose between-subjects, within-subjects, or mixed designs
- Plan factorial designs if testing multiple factors
Bias Minimization Strategy
- Implement randomization when possible
- Plan blinding at all feasible levels (participants, providers, assessors)
- Identify and plan to control confounds (randomization, matching, stratification, statistical adjustment)
- Standardize all procedures
- Plan to minimize attrition
Sample Planning
- Conduct a priori power analysis (specify expected effect, desired power, alpha)
- Account for attrition in sample size
- Define clear inclusion/exclusion criteria
- Consider recruitment strategy and feasibility
- Plan for sample representativeness
Measurement Strategy
- Select validated, reliable instruments
- Use objective measures when possible
- Plan multiple measures of key constructs (triangulation)
- Ensure measures are sensitive to expected changes
- Establish inter-rater reliability procedures
Analysis Planning
- Prespecify all hypotheses and analyses
- Designate primary outcome clearly
- Plan statistical tests with assumption checks
- Specify how missing data will be handled
- Plan to report effect sizes and confidence intervals
- Consider multiple comparison corrections
Transparency and Rigor
- Preregister study and analysis plan
- Use reporting guidelines (CONSORT, STROBE, PRISMA)
- Plan to report all outcomes, not just significant ones
- Distinguish confirmatory from exploratory analyses
- Commit to data/code sharing

Reference: See

references/experimental_design.md

for comprehensive design checklist covering all stages from question to dissemination.

为规划严谨的研究提供建设性指导。

适用场景：

协助设计新实验
规划研究项目
审阅研究提案
改进研究方案

设计流程：

研究问题细化
- 确保问题具体、可回答且可证伪
- 验证问题是否填补了文献空白或解决了矛盾
- 确认可行性（资源、伦理、时间）
- 对变量进行操作化定义
设计选择
- 设计与问题匹配（因果研究→实验设计；关联研究→观察性设计）
- 考量可行性与伦理约束
- 选择组间、组内或混合设计
- 若测试多个因素，规划析因设计
偏差最小化策略
- 尽可能实施随机化
- 规划所有可行层面的盲法（受试者、提供者、评估者）
- 识别并规划控制混杂因素（随机化、匹配、分层、统计调整）
- 标准化所有流程
- 规划如何最小化流失
样本规划
- 进行事前功效分析（明确预期效应、期望功效、α值）
- 样本量考虑流失情况
- 定义明确的纳入/排除标准
- 考量招募策略与可行性
- 规划样本代表性
测量策略
- 选择经过验证、可靠的工具
- 尽可能使用客观测量
- 规划关键构念的多种测量方法（三角验证）
- 确保测量工具对预期变化敏感
- 建立评分者间信度流程
分析规划
- 预先指定所有假设与分析方法
- 明确指定主要结局
- 规划带假设检查的统计检验
- 明确缺失数据处理方式
- 规划报告效应量与置信区间
- 考量多重比较校正
透明度与严谨性
- 预注册研究与分析计划
- 使用报告指南（CONSORT、STROBE、PRISMA）
- 规划报告所有结局，而非仅显著性结果
- 区分验证性与探索性分析
- 承诺数据/代码共享

参考资料： 涵盖从问题提出到成果传播各阶段的全面实验设计清单请参阅

references/experimental_design.md

。

7. Claim Evaluation

7. 主张评估

Systematically evaluate scientific claims for validity and support.

Apply when:

Assessing conclusions in papers
Evaluating media reports of research
Reviewing abstract or introduction claims
Checking if data support conclusions

Claim evaluation process:

Identify the Claim
- What exactly is being claimed?
- Is it a causal claim, associational claim, or descriptive claim?
- How strong is the claim (proven, likely, suggested, possible)?
Assess the Evidence
- What evidence is provided?
- Is evidence direct or indirect?
- Is evidence sufficient for the strength of claim?
- Are alternative explanations ruled out?
Check Logical Connection
- Do conclusions follow from the data?
- Are there logical leaps?
- Is correlational data used to support causal claims?
- Are limitations acknowledged?
Evaluate Proportionality
- Is confidence proportional to evidence strength?
- Are hedging words used appropriately?
- Are limitations downplayed?
- Is speculation clearly labeled?
Check for Overgeneralization
- Do claims extend beyond the sample studied?
- Are population restrictions acknowledged?
- Is context-dependence recognized?
- Are caveats about generalization included?
Red Flags
- Causal language from correlational studies
- "Proves" or absolute certainty
- Cherry-picked citations
- Ignoring contradictory evidence
- Dismissing limitations
- Extrapolation beyond data

Provide specific feedback:

Quote the problematic claim
Explain what evidence would be needed to support it
Suggest appropriate hedging language if warranted
Distinguish between data (what was found) and interpretation (what it means)

系统化评估科学主张的有效性与支撑依据。

适用场景：

评估论文中的结论
评估媒体报道的研究内容
审阅摘要或引言中的主张
检查数据是否支持结论

主张评估流程：

识别主张
- 具体的主张是什么？
- 是因果主张、关联主张还是描述性主张？
- 主张的强度如何（已证实、可能、提示、可能）？
评估证据
- 提供了哪些证据？
- 证据是直接的还是间接的？
- 证据是否足以支撑主张的强度？
- 是否排除了替代解释？
检查逻辑关联
- 结论是否从数据中推导而来？
- 是否存在逻辑跳跃？
- 是否用相关性数据支撑因果主张？
- 是否承认局限性？
评估比例性
- 置信度是否与证据强度成正比？
- 是否恰当使用了谨慎措辞？
- 是否淡化了局限性？
- 是否明确标注了推测内容？
检查过度概括
- 主张是否超出了研究样本的范围？
- 是否承认人群限制？
- 是否认识到背景依赖性？
- 是否包含关于概括性的警告？
危险信号
- 从相关性研究中使用因果性语言
- 使用“证明”或绝对肯定的表述
- 选择性引用文献
- 忽略矛盾证据
- 否认局限性
- 超出数据范围进行外推

提供具体反馈：

引用有问题的主张
解释支撑该主张需要哪些证据
若有必要，建议使用恰当的谨慎措辞
区分数据（实际发现）与解读（含义）

Application Guidelines

应用指南

General Approach

通用方法

Be Constructive
- Identify strengths as well as weaknesses
- Suggest improvements rather than just criticizing
- Distinguish between fatal flaws and minor limitations
- Recognize that all research has limitations
Be Specific
- Point to specific instances (e.g., "Table 2 shows..." or "In the Methods section...")
- Quote problematic statements
- Provide concrete examples of issues
- Reference specific principles or standards violated
Be Proportionate
- Match criticism severity to issue importance
- Distinguish between major threats to validity and minor concerns
- Consider whether issues affect primary conclusions
- Acknowledge uncertainty in your own assessments
Apply Consistent Standards
- Use same criteria across all studies
- Don't apply stricter standards to findings you dislike
- Acknowledge your own potential biases
- Base judgments on methodology, not results
Consider Context
- Acknowledge practical and ethical constraints
- Consider field-specific norms for effect sizes and methods
- Recognize exploratory vs. confirmatory contexts
- Account for resource limitations in evaluating studies

保持建设性
- 同时识别优势与劣势
- 提出改进建议而非仅批评
- 区分致命缺陷与次要局限性
- 认识到所有研究都有局限性
保持具体性
- 指出具体实例（如“表2显示...”或“在方法部分...”）
- 引用有问题的表述
- 提供问题的具体示例
- 参考被违反的具体原则或标准
保持比例性
- 批评的严重程度与问题的重要性匹配
- 区分对有效性的重大威胁与次要问题
- 考量问题是否影响主要结论
- 承认自身评估中的不确定性
应用一致标准
- 对所有研究使用相同标准
- 不要对不喜欢的结果应用更严格的标准
- 承认自身可能存在的偏差
- 基于方法而非结果做出判断
考量背景
- 承认实际与伦理约束
- 考量领域特定的效应量与方法规范
- 区分探索性与验证性研究背景
- 评估研究时考虑资源限制

When Providing Critique

提供批判时

Structure feedback as:

Summary: Brief overview of what was evaluated
Strengths: What was done well (important for credibility and learning)
Concerns: Issues organized by severity
- Critical issues (threaten validity of main conclusions)
- Important issues (affect interpretation but not fatally)
- Minor issues (worth noting but don't change conclusions)
Specific Recommendations: Actionable suggestions for improvement
Overall Assessment: Balanced conclusion about evidence quality and what can be concluded

Use precise terminology:

Name specific biases, fallacies, and methodological issues
Reference established standards and guidelines
Cite principles from scientific methodology
Use technical terms accurately

反馈结构：

摘要： 简要概述评估内容
优势： 做得好的方面（对可信度与学习很重要）
问题： 按严重程度分类的问题
- 关键问题（威胁主要结论的有效性）
- 重要问题（影响解读但不致命）
- 次要问题（值得注意但不改变结论）
具体建议： 可操作的改进建议
总体评估： 对证据质量与可得出结论的平衡总结

使用精准术语：

指出具体的偏差、谬误与方法学问题
参考既定标准与指南
引用科学方法的原则
准确使用技术术语

When Uncertain

存在不确定性时

Acknowledge uncertainty: "This could be X or Y; additional information needed is Z"
Ask clarifying questions: "Was [methodological detail] done? This affects interpretation."
Provide conditional assessments: "If X was done, then Y follows; if not, then Z is concern"
Note what additional information would resolve uncertainty

承认不确定性： “这可能是X或Y；需要补充Z信息”
提出澄清问题： “是否实施了[方法细节]？这会影响解读。”
提供条件性评估： “如果实施了X，则Y成立；如果没有，则Z是问题”
说明需要哪些额外信息才能解决不确定性

Reference Materials

参考资料

This skill includes comprehensive reference materials that provide detailed frameworks for critical evaluation:

references/scientific_method.md
- Core principles of scientific methodology, the scientific process, critical evaluation criteria, red flags in scientific claims, causal inference standards, peer review, and open science principles
references/common_biases.md
- Comprehensive taxonomy of cognitive, experimental, methodological, statistical, and analysis biases with detection and mitigation strategies
references/statistical_pitfalls.md
- Common statistical errors and misinterpretations including p-value misunderstandings, multiple comparisons problems, sample size issues, effect size mistakes, correlation/causation confusion, regression pitfalls, and meta-analysis issues
references/evidence_hierarchy.md
- Traditional evidence hierarchy, GRADE system, study quality assessment criteria, domain-specific considerations, evidence synthesis principles, and practical decision frameworks
references/logical_fallacies.md
- Logical fallacies common in scientific discourse organized by type (causation, generalization, authority, relevance, structure, statistical) with examples and detection strategies
references/experimental_design.md
- Comprehensive experimental design checklist covering research questions, hypotheses, study design selection, variables, sampling, blinding, randomization, control groups, procedures, measurement, bias minimization, data management, statistical planning, ethical considerations, validity threats, and reporting standards

When to consult references:

Load references into context when detailed frameworks are needed
Use grep to search references for specific topics:
```
grep -r "pattern" references/
```
References provide depth; SKILL.md provides procedural guidance
Consult references for comprehensive lists, detailed criteria, and specific examples

本技能包含全面的参考资料，为批判性评估提供详细框架：

references/scientific_method.md
- 科学方法的核心原则、科学流程、批判性评估标准、科学主张中的危险信号、因果推断标准、同行评审与开放科学原则
references/common_biases.md
- 认知、实验、方法学、统计与分析偏差的全面分类，包含检测与缓解策略
references/statistical_pitfalls.md
- 常见的统计错误与误读，包括P值误解、多重比较问题、样本量问题、效应量错误、相关性/因果关系混淆、回归陷阱与元分析问题
references/evidence_hierarchy.md
- 传统证据层级、GRADE系统、研究质量评估标准、领域特定考量、证据合成原则与实用决策框架
references/logical_fallacies.md
- 科学话语中常见的逻辑谬误，按类型（因果、概括、权威、相关性、结构、统计）分类，包含示例与检测策略
references/experimental_design.md
- 全面的实验设计清单，涵盖研究问题、假设、研究设计选择、变量、抽样、盲法、随机化、对照组、流程、测量、偏差最小化、数据管理、统计规划、伦理考量、有效性威胁与报告标准

何时查阅参考资料：

需要详细框架时将参考资料载入上下文
使用grep搜索参考资料中的特定主题：
```
grep -r "pattern" references/
```
参考资料提供深度内容；SKILL.md提供流程指导
查阅参考资料获取全面列表、详细标准与具体示例

Remember

谨记

Scientific critical thinking is about:

Systematic evaluation using established principles
Constructive critique that improves science
Proportional confidence to evidence strength
Transparency about uncertainty and limitations
Consistent application of standards
Recognition that all research has limitations
Balance between skepticism and openness to evidence

Always distinguish between:

Data (what was observed) and interpretation (what it means)
Correlation and causation
Statistical significance and practical importance
Exploratory and confirmatory findings
What is known and what is uncertain
Evidence against a claim and evidence for the null

Goals of critical thinking:

Identify strengths and weaknesses accurately
Determine what conclusions are supported
Recognize limitations and uncertainties
Suggest improvements for future work
Advance scientific understanding

科学批判性思维是关于：

使用既定原则进行系统化评估
提出建设性批判以推动科学进步
置信度与证据强度成正比
对不确定性与局限性保持透明
应用一致标准
认识到所有研究都有局限性
在怀疑与对证据的开放态度之间取得平衡

始终区分：

数据（实际发现的内容）与解读（其含义）
相关性与因果关系
统计显著性与实际重要性
探索性与验证性发现
已知内容与不确定内容
反对某主张的证据与支持原假设的证据

批判性思维的目标：

准确识别优势与劣势
确定结论是否有依据
认识到局限性与不确定性
为未来工作提出改进建议
推动科学理解