Loading...
Loading...
Compare original and translation side by side
cd ~/.claude/skills/csv-analyzer/scripts
export $(grep -v '^#' /path/to/project/.env | xargs 2>/dev/null)
python3 analyze_csv.py /path/to/data.csvcd ~/.claude/skills/csv-analyzer/scripts
export $(grep -v '^#' /path/to/project/.env | xargs 2>/dev/null)
python3 analyze_csv.py /path/to/data.csvWhat is the user trying to understand?
│
├── "What does my data look like?" (Overview)
│ └── Run with defaults → overview_dashboard.png
│
├── "Is my data clean?" (Quality)
│ └── Check: quality_score, missing_values, duplicates
│ └── Show: missing_values.png if problems exist
│
├── "What's the distribution?" (Single Variable)
│ ├── Numeric → numeric_distributions.png (histogram + KDE)
│ ├── Categorical → categorical_distributions.png (bar chart)
│ └── Time-based → time_series.png
│
├── "Are there outliers?" (Anomalies)
│ └── box_plots.png → points beyond whiskers are outliers
│
├── "How are variables related?" (Relationships)
│ ├── 2 numeric vars → correlation_heatmap.png
│ ├── 2-6 numeric vars → pairplot.png (scatter matrix)
│ ├── Numeric vs Categorical → violin_plot.png
│ └── All numeric → correlation_heatmap.png
│
└── "Can I predict X from Y?" (Predictive)
└── correlation_heatmap.png → |r| > 0.5 suggests predictive power用户想要了解什么?
│
├── "我的数据是什么样的?"(概览)
│ └── 使用默认设置运行 → overview_dashboard.png
│
├── "我的数据是否干净?"(质量)
│ └── 查看:quality_score、missing_values、duplicates
│ └── 展示:若存在问题则显示missing_values.png
│
├── "数据分布情况如何?"(单变量)
│ ├── 数值型 → numeric_distributions.png(直方图+核密度估计图)
│ ├── 分类型 → categorical_distributions.png(柱状图)
│ └── 时间型 → time_series.png
│
├── "是否存在异常值?"(异常检测)
│ └── box_plots.png → 须线外的点即为异常值
│
├── "变量之间有何关联?"(关系分析)
│ ├── 2个数值型变量 → correlation_heatmap.png
│ ├── 2-6个数值型变量 → pairplot.png(散点矩阵)
│ ├── 数值型 vs 分类型 → violin_plot.png
│ └── 所有数值型变量 → correlation_heatmap.png
│
└── "我能否通过Y预测X?"(预测分析)
└── correlation_heatmap.png → |r| > 0.5 表明具有预测能力| Score | Grade | What to Tell User |
|---|---|---|
| 90-100 | A | "Your data is excellent quality - ready for analysis" |
| 80-89 | B | "Good quality data with minor issues worth noting" |
| 70-79 | C | "Moderate quality - address missing values before critical analysis" |
| 60-69 | D | "Significant quality issues - recommend data cleaning first" |
| <60 | F | "Critical issues - data needs substantial cleaning" |
| 分数 | 等级 | 告知用户的内容 |
|---|---|---|
| 90-100 | A | "您的数据质量极佳 - 可直接用于分析" |
| 80-89 | B | "数据质量良好,存在一些值得注意的小问题" |
| 70-79 | C | "数据质量中等 - 在进行关键分析前需处理缺失值" |
| 60-69 | D | "存在显著的质量问题 - 建议先进行数据清洗" |
| <60 | F | "存在严重问题 - 数据需要大量清洗工作" |
| |r| Value | Strength | What to Say |
|---|---|---|
| 0.9 - 1.0 | Very Strong | "X and Y are very strongly related - almost deterministic" |
| 0.7 - 0.9 | Strong | "X and Y have a strong relationship - X could help predict Y" |
| 0.5 - 0.7 | Moderate | "X and Y are moderately correlated - some predictive value" |
| 0.3 - 0.5 | Weak | "X and Y have a weak relationship - limited predictive power" |
| 0.0 - 0.3 | Negligible | "X and Y appear unrelated" |
| |r| 值 | 强度 | 表述方式 |
|---|---|---|
| 0.9 - 1.0 | 极强 | "X与Y存在极强的相关性 - 几乎呈确定性关系" |
| 0.7 - 0.9 | 强 | "X与Y存在强相关性 - X可用于辅助预测Y" |
| 0.5 - 0.7 | 中等 | "X与Y存在中等相关性 - 具有一定预测价值" |
| 0.3 - 0.5 | 弱 | "X与Y存在弱相关性 - 预测能力有限" |
| 0.0 - 0.3 | 可忽略 | "X与Y似乎不存在关联" |
| Skewness | Distribution Shape | Recommendation |
|---|---|---|
| < -1 | Heavy left tail | "Most values are high, with some very low outliers" |
| -1 to -0.5 | Mild left skew | "Slightly more low outliers than high" |
| -0.5 to 0.5 | Symmetric | "Nicely balanced distribution - good for most analyses" |
| 0.5 to 1 | Mild right skew | "Slightly more high outliers than low" |
| > 1 | Heavy right tail | "Most values are low, with some very high outliers. Consider log transform for modeling." |
| 偏度值 | 分布形态 | 建议 |
|---|---|---|
| < -1 | 左尾偏重 | "大多数数值较高,存在一些极低的异常值" |
| -1 至 -0.5 | 轻度左偏 | "低异常值略多于高异常值" |
| -0.5 至 0.5 | 对称分布 | "分布均衡良好 - 适用于大多数分析场景" |
| 0.5 至 1 | 轻度右偏 | "高异常值略多于低异常值" |
| > 1 | 右尾偏重 | "大多数数值较低,存在一些极高的异常值。建模时可考虑对数变换。" |
"Your dataset has [rows] records and [cols] columns:
- [n] numeric columns: [list top 3]
- [n] categorical columns: [list top 3]
- Data quality score: [score]/100 ([grade])""您的数据集包含[行数]条记录和[列数]列:
- [n]个数值型列:[列出前3个]
- [n]个分类型列:[列出前3个]
- 数据质量分数:[分数]/100([等级])""I noticed some data quality concerns:
- [X]% missing values in [column] - [recommend: drop/impute/investigate]
- [N] duplicate rows detected - [recommend: keep first/remove all/investigate]""Interesting relationships I found:
- [col1] and [col2] are strongly correlated (r=[value]) - [interpretation]
- This suggests [actionable insight]""I detected outliers in [columns]:
- [column]: [n] values beyond normal range ([min outlier] to [max outlier])
- These could be [data errors / genuine extremes / worth investigating]""[Column] has a [right/left]-skewed distribution:
- Most values cluster around [median]
- But there are extreme values up to [max]
- For modeling, consider [log transform / robust methods]""我发现一些数据质量问题:
- [列名]列存在[X]%的缺失值 - [建议:删除/填充/调查原因]
- 检测到[N]条重复记录 - [建议:保留第一条/全部删除/调查原因]""我发现了一些有趣的关联:
- [列1]与[列2]存在强相关性(r=[数值]) - [解读内容]
- 这表明[可落地的洞察]""我在[列名]中检测到异常值:
- [列名]:[n]个数值超出正常范围([最小异常值]至[最大异常值])
- 这些值可能是[数据错误/真实极端值/需要进一步调查]""[列名]存在[右/左]偏态分布:
- 大多数数值集中在[中位数]附近
- 但存在高达[最大值]的极端值
- 建模时可考虑[对数变换/稳健方法]"| Finding | Recommendation |
|---|---|
| Missing >20% in column | "Consider dropping this column or investigating why it's missing" |
| Missing <5% scattered | "Safe to impute with median (numeric) or mode (categorical)" |
| High correlation (>0.9) | "These columns may be redundant - consider keeping only one" |
| Many outliers | "Use robust statistics (median instead of mean) or investigate data collection" |
| Highly skewed | "Apply log transform before linear modeling" |
| Low quality score | "Prioritize data cleaning before analysis" |
| 发现 | 建议 |
|---|---|
| 某列缺失值占比>20% | "考虑删除该列或调查缺失原因" |
| 缺失值占比<5%且分散 | "可安全使用中位数(数值型)或众数(分类型)进行填充" |
| 强相关性(>0.9) | "这些列可能存在冗余 - 考虑仅保留其中一列" |
| 大量异常值 | "使用稳健统计方法(用中位数替代均值)或调查数据收集过程" |
| 严重偏态 | "进行线性建模前先应用对数变换" |
| 低质量分数 | "在分析前优先进行数据清洗" |
undefinedundefined
Then present charts in this order:
1. **overview_dashboard.png** - "Here's your data at a glance"
2. **correlation_heatmap.png** - "Key relationships between variables"
3. **numeric_distributions.png** - "How your numeric data is distributed"
4. **box_plots.png** - "Outlier analysis"
5. **categorical_distributions.png** - "Category breakdowns" (if applicable)
然后按以下顺序展示图表:
1. **overview_dashboard.png** - "这是您的数据概览"
2. **correlation_heatmap.png** - "变量间的关键关联"
3. **numeric_distributions.png** - "数值型数据的分布情况"
4. **box_plots.png** - "异常值分析"
5. **categorical_distributions.png** - "分类 breakdowns"(若适用)python3 analyze_csv.py data.csvpython3 analyze_csv.py data.csvpython3 analyze_csv.py data.csv --format markdown --max-charts 10python3 analyze_csv.py data.csv --format markdown --max-charts 10python3 analyze_csv.py data.csv --no-chartspython3 analyze_csv.py data.csv --no-chartspython3 analyze_csv.py huge.csv --sample 50000python3 analyze_csv.py huge.csv --sample 50000python3 analyze_csv.py data.csv --date-columns created_at updated_atpython3 analyze_csv.py data.csv --date-columns created_at updated_atpython3 analyze_csv.py data.csv --format json --no-chartspython3 analyze_csv.py data.csv --format json --no-chartspython3 analyze_csv.py data.csv --output-dir /path/to/project/.tmp/analysispython3 analyze_csv.py data.csv --output-dir /path/to/project/.tmp/analysis| Chart | When to Show | How to Describe |
|---|---|---|
| overview_dashboard.png | Always for first look | "Here's a bird's eye view of your data" |
| missing_values.png | If missing data exists | "This shows where your data has gaps" |
| numeric_distributions.png | When exploring distributions | "This shows how your numeric values are spread out" |
| box_plots.png | When checking for outliers | "The dots outside the boxes are potential outliers" |
| correlation_heatmap.png | When exploring relationships | "Darker colors = stronger relationships" |
| categorical_distributions.png | For category analysis | "This shows the breakdown of your categories" |
| time_series.png | For temporal data | "Here's how your data changes over time" |
| pairplot.png | For multivariate exploration | "Each cell shows how two variables relate" |
| violin_plot.png | Comparing groups | "This shows how distributions differ across groups" |
| 图表 | 展示场景 | 描述方式 |
|---|---|---|
| overview_dashboard.png | 首次查看数据时必展示 | "这是您的数据全景视图" |
| missing_values.png | 存在缺失数据时 | "该图表展示了数据中的缺失位置" |
| numeric_distributions.png | 探索分布情况时 | "该图表展示了数值型数据的分布范围" |
| box_plots.png | 检查异常值时 | "箱线外的点即为潜在异常值" |
| correlation_heatmap.png | 探索变量关联时 | "颜色越深,相关性越强" |
| categorical_distributions.png | 分类分析时 | "该图表展示了各类别的分布情况" |
| time_series.png | 时间序列数据分析时 | "该图表展示了数据随时间的变化趋势" |
| pairplot.png | 多变量探索时 | "每个单元格展示了两个变量之间的关系" |
| violin_plot.png | 组间比较时 | "该图表展示了不同组之间的分布差异" |
| User Says | Action |
|---|---|
| "Analyze this CSV" | Run full analysis, show overview + key insights |
| "Is my data clean?" | Focus on quality_score, missing values, duplicates |
| "Find patterns" | Show correlation_heatmap, highlight strong correlations |
| "Are there outliers?" | Show box_plots, list outlier counts per column |
| "Compare X across Y" | Generate violin_plot for numeric X vs categorical Y |
| "Show me trends" | Generate time_series if datetime column exists |
| "Create a dashboard" | Generate all charts, present organized summary |
| "What should I clean?" | List columns with missing >5%, duplicates, outliers |
| 用户表述 | 操作 |
|---|---|
| "分析这个CSV文件" | 运行完整分析,展示概览+关键洞察 |
| "我的数据干净吗?" | 重点关注quality_score、缺失值和重复记录 |
| "寻找数据模式" | 展示correlation_heatmap,突出强相关性 |
| "是否存在异常值?" | 展示box_plots,列出每列的异常值数量 |
| "比较不同Y分组下的X" | 生成数值型X vs 分类型Y的violin_plot |
| "展示趋势" | 若存在日期时间列则生成time_series |
| "创建仪表盘" | 生成所有图表,呈现结构化的摘要 |
| "我应该清洗哪些内容?" | 列出缺失值>5%的列、重复记录和异常值情况 |
~/.claude/skills/csv-analyzer/scripts/.tmp/csv_analysis/--output-dir /path/to/project/.tmp/analysiscp ~/.claude/skills/csv-analyzer/scripts/.tmp/csv_analysis/*.png /path/to/project/.tmp/csv_analysis/~/.claude/skills/csv-analyzer/scripts/.tmp/csv_analysis/--output-dir /path/to/project/.tmp/analysiscp ~/.claude/skills/csv-analyzer/scripts/.tmp/csv_analysis/*.png /path/to/project/.tmp/csv_analysis/pip install pandas matplotlib seaborn scipy numpypip install pandas matplotlib seaborn scipy numpy