data-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Analysis - Statistical Computing & Insights
数据分析 - 统计计算与洞察
When to use this skill
何时使用此技能
Activate this skill when:
- User mentions "数据分析", "统计", "计算指标", "数据洞察"
- Need to analyze structured data (CSV, JSON, database)
- Calculate statistics, trends, patterns
- Financial analysis (returns, volatility, technical indicators)
- Business analytics (sales, user behavior, KPIs)
- Scientific data processing and hypothesis testing
在以下场景激活此技能:
- 用户提及「数据分析」、「统计」、「计算指标」、「数据洞察」
- 需要分析结构化数据(CSV、JSON、数据库)
- 需要计算统计数据、趋势、模式
- 金融分析(收益、波动率、技术指标)
- 业务分析(销售、用户行为、KPI)
- 科研数据处理与假设检验
Workflow
工作流程
1. Get data
1. 获取数据
⚠️ IMPORTANT: File naming requirements
- File names MUST NOT contain Chinese characters or non-ASCII characters
- Use only English letters, numbers, underscores, and hyphens
- Examples: ,
data.csv,sales_report_2025.xlsxanalysis_results.json - ❌ Invalid: ,
销售数据.csv,数据文件.xlsx報表.json - This ensures compatibility across different systems and prevents encoding issues
If data already exists:
- Read from file (CSV, JSON, Excel)
- Query database if available
If file names contain Chinese characters:
- Ask the user to rename the file to English/ASCII characters
- Or rename the file when saving it to the agent directory
If no data:
- Automatically activate skill
data-base - Scrape/collect required data
- Save to structured format
⚠️ 重要:文件命名要求
- 文件名必须仅使用ASCII字符,不能包含中文或非ASCII字符
- 仅使用英文字母、数字、下划线和连字符
- 示例:,
data.csv,sales_report_2025.xlsxanalysis_results.json - ❌ 无效示例:,
销售数据.csv,数据文件.xlsx報表.json - 此要求可确保跨系统兼容性,避免编码问题
如果数据已存在:
- 从文件读取(CSV、JSON、Excel)
- 若有数据库则查询数据库
如果文件名包含中文:
- 请用户将文件重命名为英文/ASCII字符格式
- 或在保存到agent目录时重命名文件
如果无可用数据:
- 自动激活技能
data-base - 爬取/收集所需数据
- 保存为结构化格式
2. Understand requirements
2. 明确需求
Ask the user:
- What questions do you want to answer?
- What metrics are important?
- What format for results? (summary, chart, report)
- Any specific statistical methods?
询问用户:
- 你想要解答哪些问题?
- 哪些指标是重点?
- 结果需要什么格式?(摘要、图表、报告)
- 是否有特定的统计方法要求?
3. Analyze
3. 分析处理
General analysis:
- Descriptive statistics (mean, median, std, percentiles)
- Distribution analysis (histograms, box plots)
- Correlation analysis
- Group comparisons
Financial analysis:
- Return calculation (simple, log, cumulative)
- Risk metrics (volatility, VaR, Sharpe ratio)
- Technical indicators (MA, RSI, MACD)
- Portfolio analysis
Business analysis:
- Trend analysis (growth rates, YoY, MoM)
- Cohort analysis
- Funnel analysis
- A/B testing
Scientific analysis:
- Hypothesis testing (t-test, chi-square, ANOVA)
- Regression analysis
- Time series analysis
- Statistical significance
通用分析:
- 描述性统计(均值、中位数、标准差、百分位数)
- 分布分析(直方图、箱线图)
- 相关性分析
- 分组对比
金融分析:
- 收益计算(简单收益、对数收益、累计收益)
- 风险指标(波动率、VaR、夏普比率)
- 技术指标(MA、RSI、MACD)
- 投资组合分析
业务分析:
- 趋势分析(增长率、同比、环比)
- 同期群分析
- 漏斗分析
- A/B测试
科研分析:
- 假设检验(t检验、卡方检验、方差分析)
- 回归分析
- 时间序列分析
- 统计显著性
4. Output
4. 输出结果
Generate results in:
- Summary statistics: Tables with key metrics
- Charts: Save as PNG files
- Report: Markdown with findings
- Data: Processed CSV/JSON for further use
生成以下格式的结果:
- 统计摘要:包含关键指标的表格
- 图表:保存为PNG文件
- 报告:包含分析结论的Markdown文档
- 数据:处理后的CSV/JSON格式,供后续使用
Python Environment
Python环境
Auto-initialize virtual environment if needed, then execute:
bash
cd skills/data-analysis
if [ ! -f ".venv/bin/python" ]; then
echo "Creating Python environment..."
./setup.sh
fi
.venv/bin/python your_script.pyThe setup script auto-installs: pandas, numpy, scipy, scikit-learn, statsmodels, with Chinese font support.
如果需要将自动初始化虚拟环境,然后执行:
bash
cd skills/data-analysis
if [ ! -f ".venv/bin/python" ]; then
echo "Creating Python environment..."
./setup.sh
fi
.venv/bin/python your_script.py安装脚本会自动安装:pandas、numpy、scipy、scikit-learn、statsmodels,并支持中文字体。
Analysis scenarios
分析场景示例
General data
通用数据
python
import pandas as pdpython
import pandas as pdLoad and summarize
Load and summarize
df = pd.read_csv('data.csv')
summary = df.describe()
correlations = df.corr()
undefineddf = pd.read_csv('data.csv')
summary = df.describe()
correlations = df.corr()
undefinedFinancial data
金融数据
python
undefinedpython
undefinedCalculate returns
Calculate returns
df['return'] = df['price'].pct_change()
df['return'] = df['price'].pct_change()
Risk metrics
Risk metrics
volatility = df['return'].std() * (252 ** 0.5)
sharpe = df['return'].mean() / df['return'].std() * (252 ** 0.5)
undefinedvolatility = df['return'].std() * (252 ** 0.5)
sharpe = df['return'].mean() / df['return'].std() * (252 ** 0.5)
undefinedBusiness data
业务数据
python
undefinedpython
undefinedGroup by category
Group by category
grouped = df.groupby('category').agg({
'revenue': ['sum', 'mean', 'count']
})
grouped = df.groupby('category').agg({
'revenue': ['sum', 'mean', 'count']
})
Growth rate
Growth rate
df['growth'] = df['revenue'].pct_change()
undefineddf['growth'] = df['revenue'].pct_change()
undefinedScientific data
科研数据
python
from scipy import statspython
from scipy import statsT-test
T-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)
t_stat, p_value = stats.ttest_ind(group_a, group_b)
Regression
Regression
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)
undefinedfrom sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)
undefinedFile path conventions
文件路径规范
Temporary output (session-scoped)
临时输出(会话级)
Files written to the current directory will be stored in the session directory:
python
import time
from datetime import datetime写入当前目录的文件将存储在会话目录中:
python
import time
from datetime import datetimeUse timestamp for unique filenames (avoid conflicts)
Use timestamp for unique filenames (avoid conflicts)
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
Charts and temporary files
Charts and temporary files
plt.savefig(f'analysis_{timestamp}.png') # → $KODE_AGENT_DIR/analysis_20250115_143022.png
df.to_csv(f'results_{timestamp}.csv') # → $KODE_AGENT_DIR/results_20250115_143022.csv
**Always use unique filenames** to avoid conflicts when running multiple analyses:
- Use timestamps: `analysis_20250115_143022.png`
- Use descriptive names + timestamps: `sales_report_q1_2025.csv`
- Use random suffix for scripts: `script_{random.randint(1000,9999)}.py`plt.savefig(f'analysis_{timestamp}.png') # → $KODE_AGENT_DIR/analysis_20250115_143022.png
df.to_csv(f'results_{timestamp}.csv') # → $KODE_AGENT_DIR/results_20250115_143022.csv
**务必使用唯一文件名**,避免多次分析时产生冲突:
- 使用时间戳:`analysis_20250115_143022.png`
- 使用描述性名称+时间戳:`sales_report_q1_2025.csv`
- 为脚本添加随机后缀:`script_{random.randint(1000,9999)}.py`User data (persistent)
用户数据(持久化)
Use for persistent user data:
$KODE_USER_DIRpython
import os
user_dir = os.getenv('KODE_USER_DIR')使用存储持久化用户数据:
$KODE_USER_DIRpython
import os
user_dir = os.getenv('KODE_USER_DIR')Save to user memory
Save to user memory
memory_file = f"{user_dir}/.memory/facts/preferences.jsonl"
memory_file = f"{user_dir}/.memory/facts/preferences.jsonl"
Read from knowledge base
Read from knowledge base
knowledge_dir = f"{user_dir}/.knowledge/docs"
undefinedknowledge_dir = f"{user_dir}/.knowledge/docs"
undefinedEnvironment variables
环境变量
- : Session directory for temporary output (charts, analysis results)
KODE_AGENT_DIR - : User data directory for persistent storage (memory, knowledge, config)
KODE_USER_DIR
- :临时输出的会话目录(图表、分析结果)
KODE_AGENT_DIR - :用户数据的持久化存储目录(记忆、知识库、配置)
KODE_USER_DIR
Best practices
最佳实践
- File names MUST be ASCII-only: No Chinese or non-ASCII characters in filenames
- Always inspect data first: ,
df.head(),df.info()df.describe() - Handle missing values: Drop or impute based on context
- Check assumptions: Normality, independence, etc.
- Visualize: Charts reveal patterns tables hide
- Document findings: Explain metrics and their implications
- Use correct paths: Temporary outputs to current dir, persistent data to
$KODE_USER_DIR
- 文件名必须仅使用ASCII字符:文件名中不能包含中文或非ASCII字符
- 务必先检查数据:使用、
df.head()、df.info()df.describe() - 处理缺失值:根据上下文删除或填充缺失值
- 验证假设:正态性、独立性等
- 可视化展示:图表能揭示表格隐藏的模式
- 记录分析结论:解释指标及其含义
- 使用正确路径:临时输出到当前目录,持久化数据保存到
$KODE_USER_DIR
Quick reference
快速参考
- REFERENCE.md - pandas/numpy API reference
- references/financial.md - Financial analysis recipes
- references/business.md - Business analytics recipes
- references/scientific.md - Statistical testing methods
- references/templates.md - Code templates
- REFERENCE.md - pandas/numpy API参考
- references/financial.md - 金融分析参考手册
- references/business.md - 业务分析参考手册
- references/scientific.md - 统计检验方法
- references/templates.md - 代码模板
Environment setup
环境搭建
This skill uses Python scripts. To set up the environment:
bash
undefined此技能使用Python脚本。搭建环境步骤:
bash
undefinedNavigate to the skill directory
Navigate to the skill directory
cd apps/assistant/skills/data-analysis
cd apps/assistant/skills/data-analysis
Run the setup script (creates venv and installs dependencies)
Run the setup script (creates venv and installs dependencies)
./setup.sh
./setup.sh
Activate the environment
Activate the environment
source .venv/bin/activate
The setup script will:
- Create a Python virtual environment in `.venv/`
- Install required packages (pandas, numpy, scipy, scikit-learn, statsmodels)
To run Python scripts with the skill environment:
```bashsource .venv/bin/activate
安装脚本将:
- 在`.venv/`目录创建Python虚拟环境
- 安装所需依赖包(pandas、numpy、scipy、scikit-learn、statsmodels)
使用技能环境运行Python脚本:
```bashUse the virtual environment's Python
Use the virtual environment's Python
.venv/bin/python script.py
.venv/bin/python script.py
Or activate first, then run normally
Or activate first, then run normally
source .venv/bin/activate
python script.py
undefinedsource .venv/bin/activate
python script.py
undefined