Loading...
Loading...
Found 91 Skills
Expert guidance for Dagster data orchestration including assets, resources, schedules, sensors, partitions, testing, and ETL patterns. Use when building or extending Dagster projects, writing assets, configuring automation, or integrating with dbt/dlt/Sling.
Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.
Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.
Design data pipelines covering ETL vs ELT architectures, data source integration, scheduling, quality checks, and warehouse design. Use this skill when the user needs to move data between systems, build a data warehouse, automate data processing, or improve data reliability — even if they say 'move data from X to Y', 'build an ETL pipeline', 'our data is a mess', or 'set up a data warehouse'.
Develops and executes Spark code on Dataproc Clusters and Serverless. Reads and writes data using BigLake Iceberg catalogs, BigQuery and Spanner. Debugs execution failures. Use when: - Writing Spark ETL pipelines on GCP. - Training or running inference with ML models with spark on GCP. - Managing Spark clusters, jobs, batches, and interactive sessions. Don't use when: - Writing generic Python scripts that don't use Spark. - Performing simple SQL queries that can be done directly in BigQuery.
Build ETL pipelines and analytics dashboards using the Harvard Art Museums API with Python, SQL, and Streamlit
Use when large data ingestion, backfill, export, ETL, warehouse loading, manifest catch-up, or table synchronization needs to become much faster while preserving data correctness.
Эксперт Airbyte. Используй для настройки ETL/ELT пайплайнов, коннекторов, синхронизации данных и data pipelines.
Create data analytics and data pipeline diagrams using PlantUML syntax with analytics/database stencil icons. Best for ETL pipelines, data lakes, real-time streaming, data warehousing, and BI dashboards. NOT for simple flowcharts (use mermaid) or general cloud infra (use cloud skill).
Builds data infrastructure — ETL/ELT pipelines, data warehousing, stream processing, data quality, orchestration (Airflow/Dagster), and analytics engineering (dbt). Use when the user asks to build data pipelines, set up ETL/ELT workflows, design a data warehouse, configure stream processing, or implement analytics engineering with dbt, Airflow, or Dagster.
Build ETL pipelines and analytics dashboards for Harvard Art Museums API data with MySQL storage and Streamlit visualization
Expert data engineering covering data pipelines, ETL/ELT, data warehousing, streaming, and data quality.