data-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Analysis - Statistical Computing & Insights

数据分析 - 统计计算与洞察

When to use this skill

何时使用此技能

Activate this skill when:
  • User mentions "数据分析", "统计", "计算指标", "数据洞察"
  • Need to analyze structured data (CSV, JSON, database)
  • Calculate statistics, trends, patterns
  • Financial analysis (returns, volatility, technical indicators)
  • Business analytics (sales, user behavior, KPIs)
  • Scientific data processing and hypothesis testing
在以下场景激活此技能:
  • 用户提及「数据分析」、「统计」、「计算指标」、「数据洞察」
  • 需要分析结构化数据(CSV、JSON、数据库)
  • 需要计算统计数据、趋势、模式
  • 金融分析(收益、波动率、技术指标)
  • 业务分析(销售、用户行为、KPI)
  • 科研数据处理与假设检验

Workflow

工作流程

1. Get data

1. 获取数据

⚠️ IMPORTANT: File naming requirements
  • File names MUST NOT contain Chinese characters or non-ASCII characters
  • Use only English letters, numbers, underscores, and hyphens
  • Examples:
    data.csv
    ,
    sales_report_2025.xlsx
    ,
    analysis_results.json
  • ❌ Invalid:
    销售数据.csv
    ,
    数据文件.xlsx
    ,
    報表.json
  • This ensures compatibility across different systems and prevents encoding issues
If data already exists:
  • Read from file (CSV, JSON, Excel)
  • Query database if available
If file names contain Chinese characters:
  • Ask the user to rename the file to English/ASCII characters
  • Or rename the file when saving it to the agent directory
If no data:
  • Automatically activate
    data-base
    skill
  • Scrape/collect required data
  • Save to structured format
⚠️ 重要:文件命名要求
  • 文件名必须仅使用ASCII字符,不能包含中文或非ASCII字符
  • 仅使用英文字母、数字、下划线和连字符
  • 示例:
    data.csv
    ,
    sales_report_2025.xlsx
    ,
    analysis_results.json
  • ❌ 无效示例:
    销售数据.csv
    ,
    数据文件.xlsx
    ,
    報表.json
  • 此要求可确保跨系统兼容性,避免编码问题
如果数据已存在:
  • 从文件读取(CSV、JSON、Excel)
  • 若有数据库则查询数据库
如果文件名包含中文:
  • 请用户将文件重命名为英文/ASCII字符格式
  • 或在保存到agent目录时重命名文件
如果无可用数据:
  • 自动激活
    data-base
    技能
  • 爬取/收集所需数据
  • 保存为结构化格式

2. Understand requirements

2. 明确需求

Ask the user:
  • What questions do you want to answer?
  • What metrics are important?
  • What format for results? (summary, chart, report)
  • Any specific statistical methods?
询问用户:
  • 你想要解答哪些问题?
  • 哪些指标是重点?
  • 结果需要什么格式?(摘要、图表、报告)
  • 是否有特定的统计方法要求?

3. Analyze

3. 分析处理

General analysis:
  • Descriptive statistics (mean, median, std, percentiles)
  • Distribution analysis (histograms, box plots)
  • Correlation analysis
  • Group comparisons
Financial analysis:
  • Return calculation (simple, log, cumulative)
  • Risk metrics (volatility, VaR, Sharpe ratio)
  • Technical indicators (MA, RSI, MACD)
  • Portfolio analysis
Business analysis:
  • Trend analysis (growth rates, YoY, MoM)
  • Cohort analysis
  • Funnel analysis
  • A/B testing
Scientific analysis:
  • Hypothesis testing (t-test, chi-square, ANOVA)
  • Regression analysis
  • Time series analysis
  • Statistical significance
通用分析:
  • 描述性统计(均值、中位数、标准差、百分位数)
  • 分布分析(直方图、箱线图)
  • 相关性分析
  • 分组对比
金融分析:
  • 收益计算(简单收益、对数收益、累计收益)
  • 风险指标(波动率、VaR、夏普比率)
  • 技术指标(MA、RSI、MACD)
  • 投资组合分析
业务分析:
  • 趋势分析(增长率、同比、环比)
  • 同期群分析
  • 漏斗分析
  • A/B测试
科研分析:
  • 假设检验(t检验、卡方检验、方差分析)
  • 回归分析
  • 时间序列分析
  • 统计显著性

4. Output

4. 输出结果

Generate results in:
  • Summary statistics: Tables with key metrics
  • Charts: Save as PNG files
  • Report: Markdown with findings
  • Data: Processed CSV/JSON for further use
生成以下格式的结果:
  • 统计摘要:包含关键指标的表格
  • 图表:保存为PNG文件
  • 报告:包含分析结论的Markdown文档
  • 数据:处理后的CSV/JSON格式,供后续使用

Python Environment

Python环境

Auto-initialize virtual environment if needed, then execute:
bash
cd skills/data-analysis

if [ ! -f ".venv/bin/python" ]; then
    echo "Creating Python environment..."
    ./setup.sh
fi

.venv/bin/python your_script.py
The setup script auto-installs: pandas, numpy, scipy, scikit-learn, statsmodels, with Chinese font support.
如果需要将自动初始化虚拟环境,然后执行:
bash
cd skills/data-analysis

if [ ! -f ".venv/bin/python" ]; then
    echo "Creating Python environment..."
    ./setup.sh
fi

.venv/bin/python your_script.py
安装脚本会自动安装:pandas、numpy、scipy、scikit-learn、statsmodels,并支持中文字体。

Analysis scenarios

分析场景示例

General data

通用数据

python
import pandas as pd
python
import pandas as pd

Load and summarize

Load and summarize

df = pd.read_csv('data.csv') summary = df.describe() correlations = df.corr()
undefined
df = pd.read_csv('data.csv') summary = df.describe() correlations = df.corr()
undefined

Financial data

金融数据

python
undefined
python
undefined

Calculate returns

Calculate returns

df['return'] = df['price'].pct_change()
df['return'] = df['price'].pct_change()

Risk metrics

Risk metrics

volatility = df['return'].std() * (252 ** 0.5) sharpe = df['return'].mean() / df['return'].std() * (252 ** 0.5)
undefined
volatility = df['return'].std() * (252 ** 0.5) sharpe = df['return'].mean() / df['return'].std() * (252 ** 0.5)
undefined

Business data

业务数据

python
undefined
python
undefined

Group by category

Group by category

grouped = df.groupby('category').agg({ 'revenue': ['sum', 'mean', 'count'] })
grouped = df.groupby('category').agg({ 'revenue': ['sum', 'mean', 'count'] })

Growth rate

Growth rate

df['growth'] = df['revenue'].pct_change()
undefined
df['growth'] = df['revenue'].pct_change()
undefined

Scientific data

科研数据

python
from scipy import stats
python
from scipy import stats

T-test

T-test

t_stat, p_value = stats.ttest_ind(group_a, group_b)
t_stat, p_value = stats.ttest_ind(group_a, group_b)

Regression

Regression

from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X, y)
undefined
from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X, y)
undefined

File path conventions

文件路径规范

Temporary output (session-scoped)

临时输出(会话级)

Files written to the current directory will be stored in the session directory:
python
import time
from datetime import datetime
写入当前目录的文件将存储在会话目录中:
python
import time
from datetime import datetime

Use timestamp for unique filenames (avoid conflicts)

Use timestamp for unique filenames (avoid conflicts)

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

Charts and temporary files

Charts and temporary files

plt.savefig(f'analysis_{timestamp}.png') # → $KODE_AGENT_DIR/analysis_20250115_143022.png df.to_csv(f'results_{timestamp}.csv') # → $KODE_AGENT_DIR/results_20250115_143022.csv

**Always use unique filenames** to avoid conflicts when running multiple analyses:
- Use timestamps: `analysis_20250115_143022.png`
- Use descriptive names + timestamps: `sales_report_q1_2025.csv`
- Use random suffix for scripts: `script_{random.randint(1000,9999)}.py`
plt.savefig(f'analysis_{timestamp}.png') # → $KODE_AGENT_DIR/analysis_20250115_143022.png df.to_csv(f'results_{timestamp}.csv') # → $KODE_AGENT_DIR/results_20250115_143022.csv

**务必使用唯一文件名**,避免多次分析时产生冲突:
- 使用时间戳:`analysis_20250115_143022.png`
- 使用描述性名称+时间戳:`sales_report_q1_2025.csv`
- 为脚本添加随机后缀:`script_{random.randint(1000,9999)}.py`

User data (persistent)

用户数据(持久化)

Use
$KODE_USER_DIR
for persistent user data:
python
import os
user_dir = os.getenv('KODE_USER_DIR')
使用
$KODE_USER_DIR
存储持久化用户数据:
python
import os
user_dir = os.getenv('KODE_USER_DIR')

Save to user memory

Save to user memory

memory_file = f"{user_dir}/.memory/facts/preferences.jsonl"
memory_file = f"{user_dir}/.memory/facts/preferences.jsonl"

Read from knowledge base

Read from knowledge base

knowledge_dir = f"{user_dir}/.knowledge/docs"
undefined
knowledge_dir = f"{user_dir}/.knowledge/docs"
undefined

Environment variables

环境变量

  • KODE_AGENT_DIR
    : Session directory for temporary output (charts, analysis results)
  • KODE_USER_DIR
    : User data directory for persistent storage (memory, knowledge, config)
  • KODE_AGENT_DIR
    :临时输出的会话目录(图表、分析结果)
  • KODE_USER_DIR
    :用户数据的持久化存储目录(记忆、知识库、配置)

Best practices

最佳实践

  • File names MUST be ASCII-only: No Chinese or non-ASCII characters in filenames
  • Always inspect data first:
    df.head()
    ,
    df.info()
    ,
    df.describe()
  • Handle missing values: Drop or impute based on context
  • Check assumptions: Normality, independence, etc.
  • Visualize: Charts reveal patterns tables hide
  • Document findings: Explain metrics and their implications
  • Use correct paths: Temporary outputs to current dir, persistent data to
    $KODE_USER_DIR
  • 文件名必须仅使用ASCII字符:文件名中不能包含中文或非ASCII字符
  • 务必先检查数据:使用
    df.head()
    df.info()
    df.describe()
  • 处理缺失值:根据上下文删除或填充缺失值
  • 验证假设:正态性、独立性等
  • 可视化展示:图表能揭示表格隐藏的模式
  • 记录分析结论:解释指标及其含义
  • 使用正确路径:临时输出到当前目录,持久化数据保存到
    $KODE_USER_DIR

Quick reference

快速参考

  • REFERENCE.md - pandas/numpy API reference
  • references/financial.md - Financial analysis recipes
  • references/business.md - Business analytics recipes
  • references/scientific.md - Statistical testing methods
  • references/templates.md - Code templates
  • REFERENCE.md - pandas/numpy API参考
  • references/financial.md - 金融分析参考手册
  • references/business.md - 业务分析参考手册
  • references/scientific.md - 统计检验方法
  • references/templates.md - 代码模板

Environment setup

环境搭建

This skill uses Python scripts. To set up the environment:
bash
undefined
此技能使用Python脚本。搭建环境步骤:
bash
undefined

Navigate to the skill directory

Navigate to the skill directory

cd apps/assistant/skills/data-analysis
cd apps/assistant/skills/data-analysis

Run the setup script (creates venv and installs dependencies)

Run the setup script (creates venv and installs dependencies)

./setup.sh
./setup.sh

Activate the environment

Activate the environment

source .venv/bin/activate

The setup script will:
- Create a Python virtual environment in `.venv/`
- Install required packages (pandas, numpy, scipy, scikit-learn, statsmodels)

To run Python scripts with the skill environment:
```bash
source .venv/bin/activate

安装脚本将:
- 在`.venv/`目录创建Python虚拟环境
- 安装所需依赖包(pandas、numpy、scipy、scikit-learn、statsmodels)

使用技能环境运行Python脚本:
```bash

Use the virtual environment's Python

Use the virtual environment's Python

.venv/bin/python script.py
.venv/bin/python script.py

Or activate first, then run normally

Or activate first, then run normally

source .venv/bin/activate python script.py
undefined
source .venv/bin/activate python script.py
undefined