Agentic Data Scientist

Skill by ara.so — AI Agent Skills collection.

Agentic Data Scientist is an adaptive multi-agent framework that automates complex data science tasks using a sophisticated workflow with planning, execution, validation, and self-correction. Built on Google's Agent Development Kit (ADK) and Claude Agent SDK, it separates planning from execution and continuously validates work against success criteria.

由ara.so提供的Skill——AI Agent技能集合。

Agentic Data Scientist是一个自适应多Agent框架，它借助规划、执行、验证和自我修正的复杂工作流来自动化复杂的数据科学任务。它基于Google的Agent Development Kit (ADK)和Claude Agent SDK构建，将规划与执行分离，并持续根据成功标准验证工作成果。

What It Does

功能特性

Orchestrated Mode: Full multi-agent workflow with planning, iterative execution, validation, and adaptive replanning
Simple Mode: Direct coding without planning overhead for quick tasks
Multi-Agent Architecture: Specialized agents for planning, coding, reviewing, validation, and summarization
Continuous Validation: Tracks progress against success criteria at every stage
Self-Correcting: Adapts plans based on discoveries during execution
MCP Integration: Access to tools via Model Context Protocol servers
Claude Scientific Skills: 380+ advanced scientific computing skills available to coding agent

编排模式：包含规划、迭代执行、验证和自适应重规划的完整多Agent工作流
简易模式：无需规划开销，直接编码完成快速任务
多Agent架构：专为规划、编码、审核、验证和总结设计的专用Agent
持续验证：在每个阶段跟踪进度是否符合成功标准
自我修正：根据执行过程中的发现调整计划
MCP集成：通过Model Context Protocol服务器访问工具
Claude科学技能：编码Agent可使用380+高级科学计算技能

Installation

安装

bash

undefined

bash

undefined

Install globally with uv

使用uv全局安装

uv tool install agentic-data-scientist

Or use directly with uvx (no installation)

或直接使用uvx（无需安装）

uvx agentic-data-scientist --mode simple "your query"

undefined

uvx agentic-data-scientist --mode simple "your query"

undefined

Prerequisites

前置条件

Required:

Claude Code CLI (for coding agent):

bash

npm install -g @anthropic-ai/claude-code

API Keys (set as environment variables):

bash

export OPENROUTER_API_KEY="your_openrouter_key"  # For planning/review agents
export ANTHROPIC_API_KEY="your_anthropic_key"    # For coding agent

Get keys from:

OpenRouter: https://openrouter.ai/keys
Anthropic: https://console.anthropic.com/

Optional:

bash

undefined

必填项：

Claude Code CLI（供编码Agent使用）：

bash

npm install -g @anthropic-ai/claude-code

API密钥（设置为环境变量）：

bash

export OPENROUTER_API_KEY="your_openrouter_key"  # 供规划/审核Agent使用
export ANTHROPIC_API_KEY="your_anthropic_key"    # 供编码Agent使用

获取密钥地址：

OpenRouter: https://openrouter.ai/keys
Anthropic: https://console.anthropic.com/

可选项：

bash

undefined

Disable network access (web search, URL fetching)

禁用网络访问（网页搜索、URL获取）

export DISABLE_NETWORK_ACCESS=true

undefined

export DISABLE_NETWORK_ACCESS=true

undefined

Configuration

配置

Create a

.env

file in your project directory:

bash

undefined

在项目目录中创建

.env

文件：

bash

undefined

Required

必填项

OPENROUTER_API_KEY=your_openrouter_key ANTHROPIC_API_KEY=your_anthropic_key

Optional

可选项

DISABLE_NETWORK_ACCESS=false # Set to true to disable web tools

undefined

DISABLE_NETWORK_ACCESS=false # 设置为true以禁用网络工具

undefined

Key Commands

核心命令

Basic Usage

基础用法

You must specify
--mode
for every command:

bash

undefined

每次命令必须指定
--mode
参数：

bash

undefined

Orchestrated mode: Full multi-agent workflow

编排模式：完整多Agent工作流

agentic-data-scientist "Perform differential expression analysis"
--mode orchestrated
--files data.csv

Simple mode: Direct coding, no planning

简易模式：直接编码，无需规划

agentic-data-scientist "Write a CSV parser"
--mode simple

undefined

agentic-data-scientist "Write a CSV parser"
--mode simple

undefined

File Handling

文件处理

bash

undefined

bash

undefined

Single file

单个文件

agentic-data-scientist "Analyze dataset"
--mode orchestrated
--files data.csv

Multiple files

多个文件

agentic-data-scientist "Compare datasets"
--mode orchestrated
-f data1.csv -f data2.csv -f metadata.json

Directory upload (recursive)

目录上传（递归）

agentic-data-scientist "Analyze all CSVs in folder"
--mode orchestrated
--files ./data_folder/

undefined

agentic-data-scientist "Analyze all CSVs in folder"
--mode orchestrated
--files ./data_folder/

undefined

Working Directory Options

工作目录选项

bash

undefined

bash

undefined

Default: ./agentic_output/ (preserved after completion)

默认：./agentic_output/（任务完成后保留）

agentic-data-scientist "Analyze data"
--mode orchestrated
--files data.csv

Custom working directory

自定义工作目录

agentic-data-scientist "Generate report"
--mode orchestrated
--files data.csv
--working-dir ./my_analysis

Temporary directory (auto-cleanup)

临时目录（自动清理）

agentic-data-scientist "Quick exploration"
--mode simple
--files data.csv
--temp-dir

Force keep files (override temp-dir cleanup)

强制保留文件（覆盖临时目录清理规则）

agentic-data-scientist "Analysis"
--mode orchestrated
--files data.csv
--temp-dir
--keep-files

undefined

agentic-data-scientist "Analysis"
--mode orchestrated
--files data.csv
--temp-dir
--keep-files

undefined

Logging and Debugging

日志与调试

bash

undefined

bash

undefined

Custom log file location

自定义日志文件位置

agentic-data-scientist "Analyze"
--mode orchestrated
--files data.csv
--log-file ./analysis.log

Verbose logging

详细日志

agentic-data-scientist "Debug issue"
--mode simple
--verbose

undefined

agentic-data-scientist "Debug issue"
--mode simple
--verbose

undefined

Real-World Examples

实际应用示例

Example 1: Complex Data Analysis (Orchestrated Mode)

示例1：复杂数据分析（编排模式）

bash

undefined

bash

undefined

Comprehensive analysis with multiple stages

包含多个阶段的全面分析

agentic-data-scientist
"Perform exploratory data analysis on sales data,
identify trends, create visualizations,
and build a predictive model for future sales"
--mode orchestrated
--files sales_2024.csv
--working-dir ./sales_analysis
--log-file analysis.log


**What happens:**
1. **Planning Phase**: Creates detailed plan with stages (EDA, visualization, modeling)
2. **Execution Phase**: Implements each stage iteratively with validation
3. **Validation**: Checks success criteria after each stage
4. **Adaptation**: Adjusts plan based on discoveries (e.g., data quality issues)
5. **Summary**: Generates comprehensive report with all findings

agentic-data-scientist
"Perform exploratory data analysis on sales data,
identify trends, create visualizations,
and build a predictive model for future sales"
--mode orchestrated
--files sales_2024.csv
--working-dir ./sales_analysis
--log-file analysis.log


**执行流程：**
1. **规划阶段**：创建包含多个阶段（探索性数据分析、可视化、建模）的详细计划
2. **执行阶段**：迭代执行每个阶段并进行验证
3. **验证环节**：每个阶段完成后检查是否符合成功标准
4. **自适应调整**：根据执行中的发现（如数据质量问题）调整计划
5. **总结环节**：生成包含所有发现的全面报告

Example 2: Quick Scripting (Simple Mode)

示例2：快速脚本编写（简易模式）

bash

undefined

bash

undefined

Fast coding without planning overhead

无需规划开销的快速编码

agentic-data-scientist
"Write a Python script that reads multiple CSV files,
merges them on a common ID column,
and exports to Excel with formatting"
--mode simple
--files data1.csv data2.csv data3.csv
--temp-dir


**What happens:**
- Direct execution with coding agent (no planning phase)
- Quick turnaround for straightforward tasks
- Temporary directory auto-cleanup

agentic-data-scientist
"Write a Python script that reads multiple CSV files,
merges them on a common ID column,
and exports to Excel with formatting"
--mode simple
--files data1.csv data2.csv data3.csv
--temp-dir


**执行流程：**
- 直接通过编码Agent执行（无规划阶段）
- 简单任务快速完成
- 临时目录自动清理

Example 3: Multi-File Statistical Analysis

示例3：多文件统计分析

bash

undefined

bash

undefined

Compare multiple datasets

对比多个数据集

agentic-data-scientist
"Compare the distribution of features across treatment groups,
perform statistical tests (t-test, ANOVA),
and generate publication-ready plots"
--mode orchestrated
-f control.csv
-f treatment_a.csv
-f treatment_b.csv
--working-dir ./stats_analysis

undefined

agentic-data-scientist
"Compare the distribution of features across treatment groups,
perform statistical tests (t-test, ANOVA),
and generate publication-ready plots"
--mode orchestrated
-f control.csv
-f treatment_a.csv
-f treatment_b.csv
--working-dir ./stats_analysis

undefined

Example 4: Directory-Based Analysis

示例4：基于目录的分析

bash

undefined

bash

undefined

Process all files in a directory

处理目录中的所有文件

agentic-data-scientist
"Analyze all patient data files in the folder,
aggregate results, and create summary statistics"
--mode orchestrated
--files ./patient_data/
--working-dir ./patient_analysis

undefined

agentic-data-scientist
"Analyze all patient data files in the folder,
aggregate results, and create summary statistics"
--mode orchestrated
--files ./patient_data/
--working-dir ./patient_analysis

undefined

Python API Usage

Python API使用方法

For programmatic access, use the Python API:

python

from agentic_data_scientist.cli import main
import sys

如需程序化调用，可使用Python API：

python

from agentic_data_scientist.cli import main
import sys

Prepare arguments

准备参数

sys.argv = [ 'agentic-data-scientist', 'Perform clustering analysis on customer data', '--mode', 'orchestrated', '--files', 'customers.csv', '--working-dir', './clustering_output' ]

Run

运行

main()


Or use the workflow directly:

```python
import asyncio
from pathlib import Path
from agentic_data_scientist.workflow import create_workflow

async def run_analysis():
    # Create workflow
    workflow = create_workflow(
        query="Analyze customer segments",
        mode="orchestrated",
        files=[Path("customers.csv")],
        working_dir=Path("./output"),
        disable_network=False
    )
    
    # Execute
    result = await workflow.execute()
    print(result)

asyncio.run(run_analysis())

main()


或者直接使用工作流：

```python
import asyncio
from pathlib import Path
from agentic_data_scientist.workflow import create_workflow

async def run_analysis():
    # 创建工作流
    workflow = create_workflow(
        query="Analyze customer segments",
        mode="orchestrated",
        files=[Path("customers.csv")],
        working_dir=Path("./output"),
        disable_network=False
    )
    
    # 执行
    result = await workflow.execute()
    print(result)

asyncio.run(run_analysis())

Common Patterns

常见使用模式

Pattern 1: Iterative Data Exploration

模式1：迭代式数据探索

bash

undefined

bash

undefined

Start with simple mode for quick exploration

使用简易模式快速探索

agentic-data-scientist
"Load dataset and show basic statistics"
--mode simple
--files data.csv

Then use orchestrated mode for deep analysis

然后使用编排模式进行深度分析

agentic-data-scientist
"Perform full statistical analysis including outlier detection,
correlation analysis, and clustering"
--mode orchestrated
--files data.csv
--working-dir ./deep_analysis

undefined

agentic-data-scientist
"Perform full statistical analysis including outlier detection,
correlation analysis, and clustering"
--mode orchestrated
--files data.csv
--working-dir ./deep_analysis

undefined

Pattern 2: Pipeline Development

模式2：Pipeline开发

bash

undefined

bash

undefined

Use orchestrated mode to develop a complete pipeline

使用编排模式开发完整Pipeline

agentic-data-scientist
"Create a data processing pipeline that: \

Cleans and normalizes raw data \
Engineers new features \
Splits into train/test \
Trains multiple models \
Evaluates and selects best model \
Exports model and metrics"
--mode orchestrated
--files raw_data.csv
--working-dir ./ml_pipeline

undefined

agentic-data-scientist
"Create a data processing pipeline that: \

Cleans and normalizes raw data \
Engineers new features \
Splits into train/test \
Trains multiple models \
Evaluates and selects best model \
Exports model and metrics"
--mode orchestrated
--files raw_data.csv
--working-dir ./ml_pipeline

undefined

Pattern 3: Report Generation

模式3：报告生成

bash

undefined

bash

undefined

Generate comprehensive reports

生成全面报告

agentic-data-scientist
"Analyze quarterly sales data and create an executive report
with visualizations, key metrics, and recommendations"
--mode orchestrated
--files q1_sales.csv q2_sales.csv q3_sales.csv q4_sales.csv
--working-dir ./quarterly_report

undefined

agentic-data-scientist
"Analyze quarterly sales data and create an executive report
with visualizations, key metrics, and recommendations"
--mode orchestrated
--files q1_sales.csv q2_sales.csv q3_sales.csv q4_sales.csv
--working-dir ./quarterly_report

undefined

Pattern 4: Debugging with Verbose Logs

模式4：使用详细日志调试

bash

undefined

bash

undefined

Enable verbose logging for troubleshooting

启用详细日志进行故障排查

agentic-data-scientist
"Complex analysis task"
--mode orchestrated
--files data.csv
--verbose
--log-file debug.log
--keep-files

undefined

agentic-data-scientist
"Complex analysis task"
--mode orchestrated
--files data.csv
--verbose
--log-file debug.log
--keep-files

undefined

Multi-Agent Workflow Details

多Agent工作流详情

Agent Roles

Agent角色

Plan Maker: Creates comprehensive plans with stages and success criteria
Plan Reviewer: Validates plans are complete before execution
Plan Parser: Converts plans to structured executable stages
Stage Orchestrator: Manages execution cycle and adaptation
Coding Agent: Implements stages (powered by Claude Code with 380+ scientific skills)
Review Agent: Validates implementations against requirements
Criteria Checker: Tracks progress against success criteria
Stage Reflector: Adapts remaining stages based on learnings
Summary Agent: Synthesizes work into final report

Plan Maker（规划生成Agent）：创建包含阶段和成功标准的全面计划
Plan Reviewer（规划审核Agent）：执行前验证计划是否完整
Plan Parser（规划解析Agent）：将计划转换为结构化可执行阶段
Stage Orchestrator（阶段编排Agent）：管理执行周期和自适应调整
Coding Agent（编码Agent）：实现各个阶段（由具备380+科学技能的Claude Code驱动）
Review Agent（审核Agent）：验证实现是否符合需求
Criteria Checker（标准校验Agent）：跟踪进度是否符合成功标准
Stage Reflector（阶段反思Agent）：根据执行经验调整剩余阶段
Summary Agent（总结Agent）：将工作成果整合为最终报告

Workflow Phases

工作流阶段

Planning Phase:

User Query → Plan Maker → Plan Reviewer → Plan Parser → Structured Plan

Execution Phase (per stage):

Stage → Coding Agent → Review Agent → Criteria Checker → Stage Reflector

Summary Phase:

All Completed Stages → Summary Agent → Final Report

规划阶段：

用户查询 → Plan Maker → Plan Reviewer → Plan Parser → 结构化计划

执行阶段（每个阶段）：

阶段 → Coding Agent → Review Agent → Criteria Checker → Stage Reflector

总结阶段：

所有已完成阶段 → Summary Agent → 最终报告

Troubleshooting

故障排查

API Key Errors

API密钥错误

bash

undefined

bash

undefined

Verify keys are set

验证密钥是否已设置

echo $OPENROUTER_API_KEY echo $ANTHROPIC_API_KEY

Set them if missing

若缺失则设置

export OPENROUTER_API_KEY="your_key" export ANTHROPIC_API_KEY="your_key"

undefined

export OPENROUTER_API_KEY="your_key" export ANTHROPIC_API_KEY="your_key"

undefined

Claude Code Not Found

找不到Claude Code

bash

undefined

bash

undefined

Install Claude Code CLI

安装Claude Code CLI

npm install -g @anthropic-ai/claude-code

Verify installation

验证安装

claude-code --version

undefined

claude-code --version

undefined

Network Access Issues

网络访问问题

bash

undefined

bash

undefined

Disable network tools if causing problems

若网络工具引发问题，可禁用网络访问

export DISABLE_NETWORK_ACCESS=true

Or in .env file

或在.env文件中设置

echo "DISABLE_NETWORK_ACCESS=true" >> .env

undefined

echo "DISABLE_NETWORK_ACCESS=true" >> .env

undefined

File Upload Failures

文件上传失败

bash

undefined

bash

undefined

Verify file exists

验证文件是否存在

ls -la data.csv

Use absolute paths

使用绝对路径

agentic-data-scientist "Analyze"
--mode orchestrated
--files /absolute/path/to/data.csv

Check directory permissions for recursive upload

检查目录权限（递归上传时）

ls -la ./data_folder/

undefined

ls -la ./data_folder/

undefined

Working Directory Issues

工作目录问题

bash

undefined

bash

undefined

Ensure directory is writable

确保目录可写

mkdir -p ./output chmod 755 ./output

Use temp directory if permission issues

若存在权限问题，使用临时目录

agentic-data-scientist "Analyze"
--mode orchestrated
--files data.csv
--temp-dir

undefined

agentic-data-scientist "Analyze"
--mode orchestrated
--files data.csv
--temp-dir

undefined

Execution Hanging

执行停滞

bash

undefined

bash

undefined

Use verbose mode to see what's happening

使用详细模式查看执行状态

agentic-data-scientist "Query"
--mode orchestrated
--files data.csv
--verbose

Try simple mode to isolate planning vs execution issues

尝试使用简易模式，区分是规划还是执行环节的问题

agentic-data-scientist "Query"
--mode simple
--files data.csv

undefined

agentic-data-scientist "Query"
--mode simple
--files data.csv

undefined

Output Not Preserved

输出未保留

bash

undefined

bash

undefined

Default behavior preserves files in ./agentic_output/

默认行为会将文件保留在./agentic_output/目录下

ls -la ./agentic_output/

Explicitly set working directory

显式设置工作目录

agentic-data-scientist "Analyze"
--mode orchestrated
--files data.csv
--working-dir ./my_output

Use --keep-files to override temp-dir cleanup

使用--keep-files参数覆盖临时目录的清理规则

agentic-data-scientist "Analyze"
--mode orchestrated
--files data.csv
--temp-dir
--keep-files

undefined

agentic-data-scientist "Analyze"
--mode orchestrated
--files data.csv
--temp-dir
--keep-files

undefined

Mode Selection Guide

模式选择指南

Use Orchestrated Mode when:

Task is complex with multiple stages
Need thorough planning and validation
Quality and completeness are critical
Task requires iterative refinement
Want comprehensive final report

Use Simple Mode when:

Quick scripting or one-off tasks
Simple question answering
Prototyping or exploration
Want fast turnaround
Don't need multi-stage workflow

选择编排模式的场景：

任务复杂，包含多个阶段
需要完善的规划和验证
对结果质量和完整性要求较高
任务需要迭代优化
需要生成全面的最终报告

选择简易模式的场景：

快速脚本编写或一次性任务
简单问题解答
原型开发或探索性工作
追求快速交付
无需多阶段工作流

Advanced Configuration

高级配置

Custom Prompts

自定义提示词

Extend the framework by customizing agent prompts:

python

from agentic_data_scientist.prompts import PLAN_MAKER_PROMPT

通过自定义Agent提示词扩展框架功能：

python

from agentic_data_scientist.prompts import PLAN_MAKER_PROMPT

Modify prompts for domain-specific needs

根据领域需求修改提示词

custom_prompt = PLAN_MAKER_PROMPT + """ Additional domain context:

Focus on genomics data
Use bioinformatics best practices """

undefined

custom_prompt = PLAN_MAKER_PROMPT + """ 额外领域上下文：

聚焦基因组学数据
遵循生物信息学最佳实践 """

undefined

MCP Server Integration

MCP服务器集成

The framework supports Model Context Protocol for custom tools:

python

undefined

框架支持通过Model Context Protocol集成自定义工具：

python

undefined

Configure MCP servers in your workflow

在工作流中配置MCP服务器

Agents automatically gain access to tools

Agent会自动获取工具访问权限

undefined

undefined

Access to Claude Scientific Skills

访问Claude科学技能

The coding agent has access to 380+ scientific computing skills including:

Statistical analysis
Machine learning
Data visualization
Bioinformatics
Scientific computing libraries

These are automatically available during execution phase.

编码Agent可访问380+科学计算技能，包括：

统计分析
机器学习
数据可视化
生物信息学
科学计算库

这些技能在执行阶段会自动可用。

agentic-data-scientist

Original

Translation

Agentic Data Scientist

Agentic Data Scientist

What It Does

功能特性

Installation

安装

Install globally with uv

使用uv全局安装

Or use directly with uvx (no installation)

或直接使用uvx（无需安装）

Prerequisites

前置条件

Disable network access (web search, URL fetching)

禁用网络访问（网页搜索、URL获取）

Configuration

配置

Required

必填项

Optional

可选项

Key Commands

核心命令

Basic Usage

基础用法

Orchestrated mode: Full multi-agent workflow

编排模式：完整多Agent工作流

Simple mode: Direct coding, no planning

简易模式：直接编码，无需规划

File Handling

文件处理

Single file

单个文件

Multiple files

多个文件

Directory upload (recursive)

目录上传（递归）

Working Directory Options

工作目录选项

Default: ./agentic_output/ (preserved after completion)

默认：./agentic_output/（任务完成后保留）

Custom working directory

自定义工作目录

Temporary directory (auto-cleanup)

临时目录（自动清理）

Force keep files (override temp-dir cleanup)

强制保留文件（覆盖临时目录清理规则）

Logging and Debugging

日志与调试

Custom log file location

自定义日志文件位置

Verbose logging

详细日志

Real-World Examples

实际应用示例

Example 1: Complex Data Analysis (Orchestrated Mode)

示例1：复杂数据分析（编排模式）

Comprehensive analysis with multiple stages

包含多个阶段的全面分析

Example 2: Quick Scripting (Simple Mode)

示例2：快速脚本编写（简易模式）

Fast coding without planning overhead

无需规划开销的快速编码

Example 3: Multi-File Statistical Analysis

示例3：多文件统计分析

Compare multiple datasets

对比多个数据集

Example 4: Directory-Based Analysis

示例4：基于目录的分析

Process all files in a directory

处理目录中的所有文件

Python API Usage

Python API使用方法

Prepare arguments

准备参数

Run

运行

Common Patterns