data-analysis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Data Analysis - Statistical Computing & Insights

数据分析 - 统计计算与洞察

When to use this skill

何时使用此技能

Activate this skill when:

User mentions "数据分析", "统计", "计算指标", "数据洞察"
Need to analyze structured data (CSV, JSON, database)
Calculate statistics, trends, patterns
Financial analysis (returns, volatility, technical indicators)
Business analytics (sales, user behavior, KPIs)
Scientific data processing and hypothesis testing

在以下场景激活此技能：

用户提及「数据分析」、「统计」、「计算指标」、「数据洞察」
需要分析结构化数据（CSV、JSON、数据库）
需要计算统计数据、趋势、模式
金融分析（收益、波动率、技术指标）
业务分析（销售、用户行为、KPI）
科研数据处理与假设检验

Workflow

工作流程

1. Get data

1. 获取数据

⚠️ IMPORTANT: File naming requirements

File names MUST NOT contain Chinese characters or non-ASCII characters
Use only English letters, numbers, underscores, and hyphens

Examples:

data.csv

sales_report_2025.xlsx

analysis_results.json

❌ Invalid:

销售数据.csv

数据文件.xlsx

報表.json

This ensures compatibility across different systems and prevents encoding issues

If data already exists:

Read from file (CSV, JSON, Excel)
Query database if available

If file names contain Chinese characters:

Ask the user to rename the file to English/ASCII characters
Or rename the file when saving it to the agent directory

If no data:

Automatically activate
```
data-base
```
skill
Scrape/collect required data
Save to structured format

⚠️ 重要：文件命名要求

文件名必须仅使用ASCII字符，不能包含中文或非ASCII字符
仅使用英文字母、数字、下划线和连字符

示例：

data.csv

sales_report_2025.xlsx

analysis_results.json

❌ 无效示例：

销售数据.csv

数据文件.xlsx

報表.json

此要求可确保跨系统兼容性，避免编码问题

如果数据已存在：

从文件读取（CSV、JSON、Excel）
若有数据库则查询数据库

如果文件名包含中文：

请用户将文件重命名为英文/ASCII字符格式
或在保存到agent目录时重命名文件

如果无可用数据：

自动激活
```
data-base
```
技能
爬取/收集所需数据
保存为结构化格式

2. Understand requirements

2. 明确需求

Ask the user:

What questions do you want to answer?
What metrics are important?
What format for results? (summary, chart, report)
Any specific statistical methods?

询问用户：

你想要解答哪些问题？
哪些指标是重点？
结果需要什么格式？（摘要、图表、报告）
是否有特定的统计方法要求？

3. Analyze

3. 分析处理

General analysis:

Descriptive statistics (mean, median, std, percentiles)
Distribution analysis (histograms, box plots)
Correlation analysis
Group comparisons

Financial analysis:

Return calculation (simple, log, cumulative)
Risk metrics (volatility, VaR, Sharpe ratio)
Technical indicators (MA, RSI, MACD)
Portfolio analysis

Business analysis:

Trend analysis (growth rates, YoY, MoM)
Cohort analysis
Funnel analysis
A/B testing

Scientific analysis:

Hypothesis testing (t-test, chi-square, ANOVA)
Regression analysis
Time series analysis
Statistical significance

通用分析：

描述性统计（均值、中位数、标准差、百分位数）
分布分析（直方图、箱线图）
相关性分析
分组对比

金融分析：

收益计算（简单收益、对数收益、累计收益）
风险指标（波动率、VaR、夏普比率）
技术指标（MA、RSI、MACD）
投资组合分析

业务分析：

趋势分析（增长率、同比、环比）
同期群分析
漏斗分析
A/B测试

科研分析：

假设检验（t检验、卡方检验、方差分析）
回归分析
时间序列分析
统计显著性

4. Output

4. 输出结果

Generate results in:

Summary statistics: Tables with key metrics
Charts: Save as PNG files
Report: Markdown with findings
Data: Processed CSV/JSON for further use

生成以下格式的结果：

统计摘要：包含关键指标的表格
图表：保存为PNG文件
报告：包含分析结论的Markdown文档
数据：处理后的CSV/JSON格式，供后续使用

Python Environment

Python环境

Auto-initialize virtual environment if needed, then execute:

bash

cd skills/data-analysis

if [ ! -f ".venv/bin/python" ]; then
    echo "Creating Python environment..."
    ./setup.sh
fi

.venv/bin/python your_script.py

The setup script auto-installs: pandas, numpy, scipy, scikit-learn, statsmodels, with Chinese font support.

如果需要将自动初始化虚拟环境，然后执行：

bash

cd skills/data-analysis

if [ ! -f ".venv/bin/python" ]; then
    echo "Creating Python environment..."
    ./setup.sh
fi

.venv/bin/python your_script.py

安装脚本会自动安装：pandas、numpy、scipy、scikit-learn、statsmodels，并支持中文字体。

Analysis scenarios

分析场景示例

General data

通用数据

python

import pandas as pd

python

import pandas as pd

Load and summarize

df = pd.read_csv('data.csv') summary = df.describe() correlations = df.corr()

undefined

df = pd.read_csv('data.csv') summary = df.describe() correlations = df.corr()

undefined

Financial data

金融数据

python

undefined

python

undefined

Calculate returns

df['return'] = df['price'].pct_change()

Risk metrics

volatility = df['return'].std() * (252 ** 0.5) sharpe = df['return'].mean() / df['return'].std() * (252 ** 0.5)

undefined

volatility = df['return'].std() * (252 ** 0.5) sharpe = df['return'].mean() / df['return'].std() * (252 ** 0.5)

undefined

Business data

业务数据

python

undefined

python

undefined

Group by category

grouped = df.groupby('category').agg({ 'revenue': ['sum', 'mean', 'count'] })

Growth rate

df['growth'] = df['revenue'].pct_change()

undefined

df['growth'] = df['revenue'].pct_change()

undefined

Scientific data

科研数据

python

from scipy import stats

python

from scipy import stats

T-test

t_stat, p_value = stats.ttest_ind(group_a, group_b)

Regression

from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X, y)

undefined

from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X, y)

undefined

File path conventions

文件路径规范

Temporary output (session-scoped)

临时输出（会话级）

Files written to the current directory will be stored in the session directory:

python

import time
from datetime import datetime

python

import time
from datetime import datetime

Use timestamp for unique filenames (avoid conflicts)

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

Charts and temporary files

plt.savefig(f'analysis_{timestamp}.png') # → $KODE_AGENT_DIR/analysis_20250115_143022.png df.to_csv(f'results_{timestamp}.csv') # → $KODE_AGENT_DIR/results_20250115_143022.csv


**Always use unique filenames** to avoid conflicts when running multiple analyses:
- Use timestamps: `analysis_20250115_143022.png`
- Use descriptive names + timestamps: `sales_report_q1_2025.csv`
- Use random suffix for scripts: `script_{random.randint(1000,9999)}.py`

plt.savefig(f'analysis_{timestamp}.png') # → $KODE_AGENT_DIR/analysis_20250115_143022.png df.to_csv(f'results_{timestamp}.csv') # → $KODE_AGENT_DIR/results_20250115_143022.csv


**务必使用唯一文件名**，避免多次分析时产生冲突：
- 使用时间戳：`analysis_20250115_143022.png`
- 使用描述性名称+时间戳：`sales_report_q1_2025.csv`
- 为脚本添加随机后缀：`script_{random.randint(1000,9999)}.py`

User data (persistent)

用户数据（持久化）

Use

$KODE_USER_DIR

for persistent user data:

python

import os
user_dir = os.getenv('KODE_USER_DIR')

使用

$KODE_USER_DIR

存储持久化用户数据：

python

import os
user_dir = os.getenv('KODE_USER_DIR')

Save to user memory

memory_file = f"{user_dir}/.memory/facts/preferences.jsonl"

Read from knowledge base

knowledge_dir = f"{user_dir}/.knowledge/docs"

undefined

knowledge_dir = f"{user_dir}/.knowledge/docs"

undefined

Environment variables

环境变量

```
KODE_AGENT_DIR
```
: Session directory for temporary output (charts, analysis results)
```
KODE_USER_DIR
```
: User data directory for persistent storage (memory, knowledge, config)

```
KODE_AGENT_DIR
```
：临时输出的会话目录（图表、分析结果）
```
KODE_USER_DIR
```
：用户数据的持久化存储目录（记忆、知识库、配置）

Best practices

最佳实践

File names MUST be ASCII-only: No Chinese or non-ASCII characters in filenames
Always inspect data first:
```
df.head()
```
,
```
df.info()
```
,
```
df.describe()
```
Handle missing values: Drop or impute based on context
Check assumptions: Normality, independence, etc.
Visualize: Charts reveal patterns tables hide
Document findings: Explain metrics and their implications
Use correct paths: Temporary outputs to current dir, persistent data to
```
$KODE_USER_DIR
```

文件名必须仅使用ASCII字符：文件名中不能包含中文或非ASCII字符
务必先检查数据：使用
```
df.head()
```
、
```
df.info()
```
、
```
df.describe()
```
处理缺失值：根据上下文删除或填充缺失值
验证假设：正态性、独立性等
可视化展示：图表能揭示表格隐藏的模式
记录分析结论：解释指标及其含义
使用正确路径：临时输出到当前目录，持久化数据保存到
```
$KODE_USER_DIR
```

Quick reference

快速参考

REFERENCE.md - pandas/numpy API reference
references/financial.md - Financial analysis recipes
references/business.md - Business analytics recipes
references/scientific.md - Statistical testing methods
references/templates.md - Code templates

REFERENCE.md - pandas/numpy API参考
references/financial.md - 金融分析参考手册
references/business.md - 业务分析参考手册
references/scientific.md - 统计检验方法
references/templates.md - 代码模板

Environment setup

环境搭建

This skill uses Python scripts. To set up the environment:

bash

undefined

此技能使用Python脚本。搭建环境步骤：

bash

undefined

Navigate to the skill directory

cd apps/assistant/skills/data-analysis

Run the setup script (creates venv and installs dependencies)

./setup.sh

Activate the environment

source .venv/bin/activate


The setup script will:
- Create a Python virtual environment in `.venv/`
- Install required packages (pandas, numpy, scipy, scikit-learn, statsmodels)

To run Python scripts with the skill environment:
```bash

source .venv/bin/activate


安装脚本将：
- 在`.venv/`目录创建Python虚拟环境
- 安装所需依赖包（pandas、numpy、scipy、scikit-learn、statsmodels）

使用技能环境运行Python脚本：
```bash

Use the virtual environment's Python

.venv/bin/python script.py

Or activate first, then run normally

source .venv/bin/activate python script.py

undefined

source .venv/bin/activate python script.py

undefined