Loading...
Loading...
Conduct Exploratory Data Analysis (EDA) using descriptive statistics, visualizations, and data quality checks. Use this skill when the user has a dataset and needs to understand its structure, find patterns, detect anomalies, or prepare data for further analysis — even if they say 'what does this data look like', 'find interesting patterns', 'clean this data', or 'summarize this dataset'.
npx skill4agent add asgard-ai-platform/skills stat-edaIRON LAW: Perform EDA Only AFTER Train/Test Split — Or You Leak the Future
Agents know "do EDA first." But they almost always do EDA on the FULL
dataset before splitting. This is information leakage: you've seen the
test set's distributions, outliers, and correlations, and your subsequent
modeling choices (feature scaling, outlier treatment, imputation strategy)
are now informed by data the model shouldn't see. Split first, then EDA
only on the training set. Apply the same transformations to the test set
without re-examining it.
Exception: data quality checks (nulls, dtypes, duplicates) CAN run on
the full dataset since they don't inform model hyperparameters.references/missing-data.md# EDA Report: {Dataset Name}
## Dataset Overview
- Rows: {N}, Columns: {N}
- Date range: {if applicable}
- Key columns: {description}
## Data Quality
| Issue | Columns Affected | Count/% | Action |
|-------|-----------------|---------|--------|
| Missing values | {cols} | {N / %} | {drop / impute / investigate} |
| Outliers | {cols} | {N} | {cap / remove / keep} |
| Duplicates | — | {N} | {remove} |
## Key Statistics
| Variable | Mean | Median | Std | Min | Max | Distribution |
|----------|------|--------|-----|-----|-----|-------------|
| {var} | ... | ... | ... | ... | ... | {normal/skewed/bimodal} |
## Key Findings
1. {insight with supporting data}
2. {insight}
3. {insight}
## Recommendations
- {next analysis step or data issue to resolve}references/missing-data.md