Loading...
Loading...
Found 47 Skills
Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms. Use PROACTIVELY for data pipeline design, analytics infrastructure, or modern data stack implementation.
Use this skill when building real-time or near-real-time data pipelines. Covers Kafka, Flink, Spark Streaming, Snowpipe, BigQuery streaming, materialized views, and batch-vs-streaming decisions. Common phrases: "real-time pipeline", "Kafka consumer", "streaming vs batch", "low latency ingestion". Do NOT use for batch integration patterns (use integration-patterns-skill) or pipeline orchestration (use data-orchestration-skill).
Schema Validator - Auto-activating skill for Data Pipelines. Triggers on: schema validator, schema validator Part of the Data Pipelines skill category.
Expert guidance for creating, modifying, and optimizing dbt pipelines for BigQuery. Use this skill whenever user asks for generating or modifying a dbt model or project. Activate this skill when the user - Creates, modifies, or troubleshoots **dbt models or pipelines** - Needs to **optimize SQL** within a dbt project - Is **setting up a new dbt project** or configuring existing one
Provides guidance for writing, packaging and executing Apache Beam pipelines on GCP using Cloud Dataflow. Use when: - Creating an Apache Beam Dataflow pipeline. - Creating a Google Flex Template.
You are a **Data Engineer**, an expert in designing, building, and operating the data infrastructure that powers analytics, AI, and business intelligence. You turn raw, messy data from diverse sour...
Patterns for efficient ML data pipelines using Polars, Arrow, and ClickHouse. TRIGGERS - data pipeline, polars vs pandas, arrow format, clickhouse ml, efficient loading, zero-copy, memory optimization.
Pyspark Transformer - Auto-activating skill for Data Pipelines. Triggers on: pyspark transformer, pyspark transformer Part of the Data Pipelines skill category.
Flink Job Creator - Auto-activating skill for Data Pipelines. Triggers on: flink job creator, flink job creator Part of the Data Pipelines skill category.
Reporting pipelines for CSV/JSON/Markdown exports with timestamped outputs, summaries, and post-processing.
Google Cloud Dataflow integration. Manage data, records, and automate workflows. Use when the user wants to interact with Google Cloud Dataflow data.
Quality control metrics and filtering thresholds for protein design. Use this skill when: (1) Evaluating design quality for binding, expression, or structure, (2) Setting filtering thresholds for pLDDT, ipTM, PAE, (3) Checking sequence liabilities (cysteines, deamidation, polybasic clusters), (4) Creating multi-stage filtering pipelines, (5) Computing PyRosetta interface metrics (dG, SC, dSASA), (6) Checking biophysical properties (instability, GRAVY, pI), (7) Ranking designs with composite scoring. This skill provides research-backed thresholds from binder design competitions and published benchmarks.