Loading...
Loading...
Conduct statistical hypothesis testing including null/alternative hypothesis formulation, p-values, Type I/II errors, and test statistic selection. Use this skill when the user needs to determine whether a result is statistically significant, choose the right statistical test, interpret p-values correctly, or evaluate research findings — even if they say 'is this result significant', 'which statistical test should I use', or 'what does this p-value mean'.
npx skill4agent add asgard-ai-platform/skills stat-hypothesis-testingIRON LAW: Statistical Significance ≠ Practical Significance
A p-value < 0.05 means the result is unlikely under the null hypothesis.
It does NOT mean the result is important, large, or practically meaningful.
With a large enough sample, a 0.1% conversion rate difference becomes
"statistically significant" but is practically worthless.
ALWAYS report effect size alongside p-value.IRON LAW: State Hypotheses BEFORE Looking at Data
H₀ (null) and H₁ (alternative) must be defined before data analysis.
Choosing hypotheses after seeing the data = p-hacking = scientific fraud.
"We found an interesting pattern, let's test it on the same data" is invalid.| Concept | Definition |
|---|---|
| H₀ (Null) | Default assumption: no effect, no difference |
| H₁ (Alternative) | What you want to show: there IS an effect/difference |
| p-value | Probability of seeing this result (or more extreme) IF H₀ is true |
| α (significance level) | Threshold for rejecting H₀ (typically 0.05) |
| Type I error (α) | Rejecting H₀ when it's actually true (false positive) |
| Type II error (β) | Failing to reject H₀ when H₁ is true (false negative) |
| Power (1-β) | Probability of detecting a real effect (target: ≥ 0.8) |
| Effect size | Magnitude of the difference (Cohen's d, odds ratio, R²) |
| Data Type | Groups | Test |
|---|---|---|
| Continuous, normal, 2 groups | Independent | Independent t-test |
| Continuous, normal, 2 groups | Paired/before-after | Paired t-test |
| Continuous, normal, 3+ groups | Independent | One-way ANOVA |
| Continuous, non-normal | 2 groups | Mann-Whitney U |
| Categorical | 2+ groups | Chi-square test |
| Continuous, relationship | 2 variables | Pearson correlation (normal) / Spearman (non-normal) |
| Binary outcome | Predictors | Logistic regression |
# Hypothesis Test: {Research Question}
## Hypotheses
- H₀: {null — no effect/difference}
- H₁: {alternative — there IS an effect/difference}
- α = {0.05 or other}
## Test Selection
- Test: {name}
- Rationale: {why this test fits the data}
- Assumptions checked: {normality, independence, equal variance}
## Results
- Test statistic: {value}
- p-value: {value}
- Effect size: {value and interpretation}
- 95% CI: [{lower}, {upper}]
## Decision
{Reject / Fail to reject H₀}
## Interpretation
{What this means in practical terms, with effect size context}references/sample-size.mdreferences/nonparametric-tests.md