Loading...
Loading...
Found 103 Skills
Use this skill when performing exploratory data analysis, statistical testing, data visualization, or building predictive models. Triggers on EDA, pandas, matplotlib, seaborn, hypothesis testing, A/B test analysis, correlation, regression, feature engineering, and any task requiring data analysis or statistical inference.
Data analysis expert for statistics, visualization, pandas, and exploration
Design ETL workflows with data validation using tools like Pandas, Dask, or PySpark. Use when building robust data processing systems in Python.
Guidelines for data analysis and Jupyter Notebook development with pandas, matplotlib, seaborn, and numpy.
Guide for modernizing legacy Python 2 scientific computing code to Python 3 with modern libraries. This skill should be used when migrating scientific scripts involving data processing, numerical computation, or analysis from Python 2 to Python 3, or when updating deprecated scientific computing patterns to modern equivalents (pandas, numpy, pathlib).
Data analysis best practices with pandas, numpy, matplotlib, seaborn, and Jupyter notebooks.
Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.
Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.
Analyzes CSV files, generates summary stats, and plots quick visualizations using Python and pandas.
Patterns for efficient ML data pipelines using Polars, Arrow, and ClickHouse. TRIGGERS - data pipeline, polars vs pandas, arrow format, clickhouse ml, efficient loading, zero-copy, memory optimization.
Python data analysis with pandas, numpy, and analytics libraries
Drop-in pandas replacement with ClickHouse performance. Use `import chdb.datastore as pd` (or `from datastore import DataStore`) and write standard pandas code — same API, 10-100x faster on large datasets. Supports 16+ data sources (MySQL, PostgreSQL, S3, MongoDB, ClickHouse, Iceberg, Delta Lake, etc.) and 10+ file formats (Parquet, CSV, JSON, Arrow, ORC, etc.) with cross-source joins. Use this skill when the user wants to analyze data with pandas-style syntax, speed up slow pandas code, query remote databases or cloud storage as DataFrames, or join data across different sources — even if they don't explicitly mention chdb or DataStore. Do NOT use for raw SQL queries, ClickHouse server administration, or non-Python languages.