Agentic Data Scientist

Skill by ara.so — AI Agent Skills collection.

Agentic Data Scientist is an adaptive multi-agent framework that automates complex data science tasks using a sophisticated workflow with planning, execution, validation, and self-correction. Built on Google's Agent Development Kit (ADK) and Claude Agent SDK, it separates planning from execution and continuously validates work against success criteria.

What It Does

Orchestrated Mode: Full multi-agent workflow with planning, iterative execution, validation, and adaptive replanning
Simple Mode: Direct coding without planning overhead for quick tasks
Multi-Agent Architecture: Specialized agents for planning, coding, reviewing, validation, and summarization
Continuous Validation: Tracks progress against success criteria at every stage
Self-Correcting: Adapts plans based on discoveries during execution
MCP Integration: Access to tools via Model Context Protocol servers
Claude Scientific Skills: 380+ advanced scientific computing skills available to coding agent

Installation

bash

# Install globally with uv
uv tool install agentic-data-scientist

# Or use directly with uvx (no installation)
uvx agentic-data-scientist --mode simple "your query"

Prerequisites

Required:

Claude Code CLI (for coding agent):

bash

npm install -g @anthropic-ai/claude-code

API Keys (set as environment variables):

bash

export OPENROUTER_API_KEY="your_openrouter_key"  # For planning/review agents
export ANTHROPIC_API_KEY="your_anthropic_key"    # For coding agent

Get keys from:

OpenRouter: https://openrouter.ai/keys
Anthropic: https://console.anthropic.com/

Optional:

bash

# Disable network access (web search, URL fetching)
export DISABLE_NETWORK_ACCESS=true

Configuration

Create a

.env

file in your project directory:

bash

# Required
OPENROUTER_API_KEY=your_openrouter_key
ANTHROPIC_API_KEY=your_anthropic_key

# Optional
DISABLE_NETWORK_ACCESS=false  # Set to true to disable web tools

Key Commands

Basic Usage

You must specify
--mode
for every command:

bash

# Orchestrated mode: Full multi-agent workflow
agentic-data-scientist "Perform differential expression analysis" \
  --mode orchestrated \
  --files data.csv

# Simple mode: Direct coding, no planning
agentic-data-scientist "Write a CSV parser" \
  --mode simple

File Handling

bash

# Single file
agentic-data-scientist "Analyze dataset" \
  --mode orchestrated \
  --files data.csv

# Multiple files
agentic-data-scientist "Compare datasets" \
  --mode orchestrated \
  -f data1.csv -f data2.csv -f metadata.json

# Directory upload (recursive)
agentic-data-scientist "Analyze all CSVs in folder" \
  --mode orchestrated \
  --files ./data_folder/

Working Directory Options

bash

# Default: ./agentic_output/ (preserved after completion)
agentic-data-scientist "Analyze data" \
  --mode orchestrated \
  --files data.csv

# Custom working directory
agentic-data-scientist "Generate report" \
  --mode orchestrated \
  --files data.csv \
  --working-dir ./my_analysis

# Temporary directory (auto-cleanup)
agentic-data-scientist "Quick exploration" \
  --mode simple \
  --files data.csv \
  --temp-dir

# Force keep files (override temp-dir cleanup)
agentic-data-scientist "Analysis" \
  --mode orchestrated \
  --files data.csv \
  --temp-dir \
  --keep-files

Logging and Debugging

bash

# Custom log file location
agentic-data-scientist "Analyze" \
  --mode orchestrated \
  --files data.csv \
  --log-file ./analysis.log

# Verbose logging
agentic-data-scientist "Debug issue" \
  --mode simple \
  --verbose

Real-World Examples

Example 1: Complex Data Analysis (Orchestrated Mode)

bash

# Comprehensive analysis with multiple stages
agentic-data-scientist \
  "Perform exploratory data analysis on sales data, \
   identify trends, create visualizations, \
   and build a predictive model for future sales" \
  --mode orchestrated \
  --files sales_2024.csv \
  --working-dir ./sales_analysis \
  --log-file analysis.log

What happens:

Planning Phase: Creates detailed plan with stages (EDA, visualization, modeling)
Execution Phase: Implements each stage iteratively with validation
Validation: Checks success criteria after each stage
Adaptation: Adjusts plan based on discoveries (e.g., data quality issues)
Summary: Generates comprehensive report with all findings

Example 2: Quick Scripting (Simple Mode)

bash

# Fast coding without planning overhead
agentic-data-scientist \
  "Write a Python script that reads multiple CSV files, \
   merges them on a common ID column, \
   and exports to Excel with formatting" \
  --mode simple \
  --files data1.csv data2.csv data3.csv \
  --temp-dir

What happens:

Direct execution with coding agent (no planning phase)
Quick turnaround for straightforward tasks
Temporary directory auto-cleanup

Example 3: Multi-File Statistical Analysis

bash

# Compare multiple datasets
agentic-data-scientist \
  "Compare the distribution of features across treatment groups, \
   perform statistical tests (t-test, ANOVA), \
   and generate publication-ready plots" \
  --mode orchestrated \
  -f control.csv \
  -f treatment_a.csv \
  -f treatment_b.csv \
  --working-dir ./stats_analysis

Example 4: Directory-Based Analysis

bash

# Process all files in a directory
agentic-data-scientist \
  "Analyze all patient data files in the folder, \
   aggregate results, and create summary statistics" \
  --mode orchestrated \
  --files ./patient_data/ \
  --working-dir ./patient_analysis

Python API Usage

For programmatic access, use the Python API:

python

from agentic_data_scientist.cli import main
import sys

# Prepare arguments
sys.argv = [
    'agentic-data-scientist',
    'Perform clustering analysis on customer data',
    '--mode', 'orchestrated',
    '--files', 'customers.csv',
    '--working-dir', './clustering_output'
]

# Run
main()

Or use the workflow directly:

python

import asyncio
from pathlib import Path
from agentic_data_scientist.workflow import create_workflow

async def run_analysis():
    # Create workflow
    workflow = create_workflow(
        query="Analyze customer segments",
        mode="orchestrated",
        files=[Path("customers.csv")],
        working_dir=Path("./output"),
        disable_network=False
    )
    
    # Execute
    result = await workflow.execute()
    print(result)

asyncio.run(run_analysis())

Common Patterns

Pattern 1: Iterative Data Exploration

bash

# Start with simple mode for quick exploration
agentic-data-scientist \
  "Load dataset and show basic statistics" \
  --mode simple \
  --files data.csv

# Then use orchestrated mode for deep analysis
agentic-data-scientist \
  "Perform full statistical analysis including outlier detection, \
   correlation analysis, and clustering" \
  --mode orchestrated \
  --files data.csv \
  --working-dir ./deep_analysis

Pattern 2: Pipeline Development

bash

# Use orchestrated mode to develop a complete pipeline
agentic-data-scientist \
  "Create a data processing pipeline that: \
   1) Cleans and normalizes raw data \
   2) Engineers new features \
   3) Splits into train/test \
   4) Trains multiple models \
   5) Evaluates and selects best model \
   6) Exports model and metrics" \
  --mode orchestrated \
  --files raw_data.csv \
  --working-dir ./ml_pipeline

Pattern 3: Report Generation

bash

# Generate comprehensive reports
agentic-data-scientist \
  "Analyze quarterly sales data and create an executive report \
   with visualizations, key metrics, and recommendations" \
  --mode orchestrated \
  --files q1_sales.csv q2_sales.csv q3_sales.csv q4_sales.csv \
  --working-dir ./quarterly_report

Pattern 4: Debugging with Verbose Logs

bash

# Enable verbose logging for troubleshooting
agentic-data-scientist \
  "Complex analysis task" \
  --mode orchestrated \
  --files data.csv \
  --verbose \
  --log-file debug.log \
  --keep-files

Multi-Agent Workflow Details

Agent Roles

Plan Maker: Creates comprehensive plans with stages and success criteria
Plan Reviewer: Validates plans are complete before execution
Plan Parser: Converts plans to structured executable stages
Stage Orchestrator: Manages execution cycle and adaptation
Coding Agent: Implements stages (powered by Claude Code with 380+ scientific skills)
Review Agent: Validates implementations against requirements
Criteria Checker: Tracks progress against success criteria
Stage Reflector: Adapts remaining stages based on learnings
Summary Agent: Synthesizes work into final report

Workflow Phases

Planning Phase:

User Query → Plan Maker → Plan Reviewer → Plan Parser → Structured Plan

Execution Phase (per stage):

Stage → Coding Agent → Review Agent → Criteria Checker → Stage Reflector

Summary Phase:

All Completed Stages → Summary Agent → Final Report

Troubleshooting

API Key Errors

bash

# Verify keys are set
echo $OPENROUTER_API_KEY
echo $ANTHROPIC_API_KEY

# Set them if missing
export OPENROUTER_API_KEY="your_key"
export ANTHROPIC_API_KEY="your_key"

Claude Code Not Found

bash

# Install Claude Code CLI
npm install -g @anthropic-ai/claude-code

# Verify installation
claude-code --version

Network Access Issues

bash

# Disable network tools if causing problems
export DISABLE_NETWORK_ACCESS=true

# Or in .env file
echo "DISABLE_NETWORK_ACCESS=true" >> .env

File Upload Failures

bash

# Verify file exists
ls -la data.csv

# Use absolute paths
agentic-data-scientist "Analyze" \
  --mode orchestrated \
  --files /absolute/path/to/data.csv

# Check directory permissions for recursive upload
ls -la ./data_folder/

Working Directory Issues

bash

# Ensure directory is writable
mkdir -p ./output
chmod 755 ./output

# Use temp directory if permission issues
agentic-data-scientist "Analyze" \
  --mode orchestrated \
  --files data.csv \
  --temp-dir

Execution Hanging

bash

# Use verbose mode to see what's happening
agentic-data-scientist "Query" \
  --mode orchestrated \
  --files data.csv \
  --verbose

# Try simple mode to isolate planning vs execution issues
agentic-data-scientist "Query" \
  --mode simple \
  --files data.csv

Output Not Preserved

bash

# Default behavior preserves files in ./agentic_output/
ls -la ./agentic_output/

# Explicitly set working directory
agentic-data-scientist "Analyze" \
  --mode orchestrated \
  --files data.csv \
  --working-dir ./my_output

# Use --keep-files to override temp-dir cleanup
agentic-data-scientist "Analyze" \
  --mode orchestrated \
  --files data.csv \
  --temp-dir \
  --keep-files

Mode Selection Guide

Use Orchestrated Mode when:

Task is complex with multiple stages
Need thorough planning and validation
Quality and completeness are critical
Task requires iterative refinement
Want comprehensive final report

Use Simple Mode when:

Quick scripting or one-off tasks
Simple question answering
Prototyping or exploration
Want fast turnaround
Don't need multi-stage workflow

Advanced Configuration

Custom Prompts

Extend the framework by customizing agent prompts:

python

from agentic_data_scientist.prompts import PLAN_MAKER_PROMPT

# Modify prompts for domain-specific needs
custom_prompt = PLAN_MAKER_PROMPT + """
Additional domain context:
- Focus on genomics data
- Use bioinformatics best practices
"""

MCP Server Integration

The framework supports Model Context Protocol for custom tools:

python

# Configure MCP servers in your workflow
# Agents automatically gain access to tools

Access to Claude Scientific Skills

The coding agent has access to 380+ scientific computing skills including:

Statistical analysis
Machine learning
Data visualization
Bioinformatics
Scientific computing libraries

These are automatically available during execution phase.

agentic-data-scientist

NPX Install

Tags

SKILL.md Content

Agentic Data Scientist

What It Does

Installation

Prerequisites

Configuration

Key Commands

Basic Usage

File Handling

Working Directory Options

Logging and Debugging

Real-World Examples

Example 1: Complex Data Analysis (Orchestrated Mode)

Example 2: Quick Scripting (Simple Mode)

Example 3: Multi-File Statistical Analysis

Example 4: Directory-Based Analysis

Python API Usage

Common Patterns

Pattern 1: Iterative Data Exploration

Pattern 2: Pipeline Development

Pattern 3: Report Generation

Pattern 4: Debugging with Verbose Logs

Multi-Agent Workflow Details

Agent Roles

Workflow Phases

Troubleshooting

API Key Errors

Claude Code Not Found

Network Access Issues

File Upload Failures

Working Directory Issues

Execution Hanging

Output Not Preserved

Mode Selection Guide

Advanced Configuration

Custom Prompts

MCP Server Integration

Access to Claude Scientific Skills