cs-methodology-evaluation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

CS Methodology Evaluation

CS Methodology 评估

Use this for CS research where the scientific claim depends on experiments, benchmarks, data analysis, systems measurements, user studies, or software engineering evidence.
本指南适用于科学结论依赖实验、基准测试、数据分析、系统测量、用户研究或软件工程证据的CS研究场景。

Read First

必读内容

  • references/cs-methodology-evaluation-policy.md
  • references/experiment-policy.md
  • references/repository-contract.md
  • references/cs-methodology-evaluation-policy.md
  • references/experiment-policy.md
  • references/repository-contract.md

Workflow

工作流程

  1. Identify claim type and evidence type: algorithmic, empirical, systems, human-subject, dataset, benchmark, tool, reproduction, or negative result.
  2. Define evaluation questions before running experiments.
  3. Select baselines, datasets, metrics, splits, seeds, hardware, and budgets.
  4. Add ablations and sensitivity checks that test the claimed mechanism.
  5. Identify leakage, confounding, overfitting, selection bias, and benchmark mismatch.
  6. Record the plan in
    docs/methodology/evaluation-plan.md
    .
  7. Record threats in
    docs/methodology/threats-to-validity.md
    .
  8. Connect runs to
    experiments/registry.csv
    and output folders.
  1. 确定结论类型与证据类型:算法类、实证类、系统类、人类受试者类、数据集类、基准测试类、工具类、复现类或阴性结果类。
  2. 在开展实验前明确评估问题。
  3. 选择基准模型、数据集、指标、数据划分、随机种子、硬件设备与预算。
  4. 添加用于验证所宣称机制的消融实验与敏感性检验。
  5. 识别数据泄露、混杂因素、过拟合、选择偏差与基准测试不匹配问题。
  6. 将评估计划记录在
    docs/methodology/evaluation-plan.md
    中。
  7. 将有效性威胁记录在
    docs/methodology/threats-to-validity.md
    中。
  8. 将实验运行记录关联至
    experiments/registry.csv
    与输出文件夹。

Minimum Evaluation Standard

最低评估标准

  • credible baseline or reason none exists
  • metric justified by research question
  • dataset provenance and split policy
  • repeatability details: seed, config, environment, hardware
  • negative cases or boundary conditions
  • threats to internal, external, construct, and conclusion validity
  • 可信的基准模型,或说明无基准模型的原因
  • 指标需与研究问题相契合
  • 数据集来源与划分规则明确
  • 可复现性细节:随机种子、配置、环境、硬件
  • 包含阴性案例或边界条件
  • 分析内部、外部、构念与结论有效性的威胁因素

Do Not

禁止事项

  • Tune on test data.
  • Add baselines after seeing only favorable results without noting the timing.
  • Present exploratory runs as confirmed evidence.
  • Hide failed or ambiguous runs that affect the claim.
  • 在测试数据上进行调优。
  • 在仅看到有利结果后添加基准模型却不记录时间节点。
  • 将探索性实验结果作为已验证的证据呈现。
  • 隐瞒对结论有影响的失败或模糊实验结果。