cs-methodology-evaluation

Original：🇺🇸 English

Translated

Use when designing or auditing computer science experiments, evaluation plans, baselines, metrics, ablations, datasets, statistical tests, benchmarks, validity threats, or reproducibility claims.

9installs

Sourcevincenzoimp/academic-research-skills

Added on2026-05-20

NPX Install

npx skill4agent add vincenzoimp/academic-research-skills cs-methodology-evaluation

SKILL.md Content

View Translation Comparison →

CS Methodology Evaluation

Use this for CS research where the scientific claim depends on experiments, benchmarks, data analysis, systems measurements, user studies, or software engineering evidence.

Read First

references/cs-methodology-evaluation-policy.md

```
references/experiment-policy.md
```
```
references/repository-contract.md
```

Workflow

Identify claim type and evidence type: algorithmic, empirical, systems, human-subject, dataset, benchmark, tool, reproduction, or negative result.
Define evaluation questions before running experiments.
Select baselines, datasets, metrics, splits, seeds, hardware, and budgets.
Add ablations and sensitivity checks that test the claimed mechanism.
Identify leakage, confounding, overfitting, selection bias, and benchmark mismatch.
Record the plan in
```
docs/methodology/evaluation-plan.md
```
.
Record threats in
```
docs/methodology/threats-to-validity.md
```
.
Connect runs to
```
experiments/registry.csv
```
and output folders.

Minimum Evaluation Standard

credible baseline or reason none exists
metric justified by research question
dataset provenance and split policy
repeatability details: seed, config, environment, hardware
negative cases or boundary conditions
threats to internal, external, construct, and conclusion validity

Do Not

Tune on test data.
Add baselines after seeing only favorable results without noting the timing.
Present exploratory runs as confirmed evidence.
Hide failed or ambiguous runs that affect the claim.