Loading...
Loading...
Found 11 Skills
Implement end-to-end Medallion Architecture (Bronze/Silver/Gold) lakehouse patterns in Microsoft Fabric using PySpark, Delta Lake, and Fabric Pipelines. Use when the user wants to: (1) design a Bronze/Silver/Gold data lakehouse, (2) set up multi-layer workspace with lakehouses for each tier, (3) build ingestion-to-analytics pipelines with data quality enforcement, (4) optimize Spark configurations per medallion layer, (5) orchestrate Bronze-to-Silver-to-Gold flows via notebooks. Triggers: "medallion architecture", "bronze silver gold", "lakehouse layers", "e2e data pipeline", "end-to-end lakehouse", "data lakehouse pattern", "multi-layer lakehouse", "build medallion", "setup medallion".
Pyspark Transformer - Auto-activating skill for Data Pipelines. Triggers on: pyspark transformer, pyspark transformer Part of the Data Pipelines skill category.
Analyze lakehouse data interactively using Fabric Livy sessions and PySpark/Spark SQL for advanced analytics, DataFrames, cross-lakehouse joins, Delta time-travel, and unstructured/JSON data. Use when the user explicitly asks for PySpark, Spark DataFrames, Livy sessions, or Python-based analysis — NOT for simple SQL queries. Triggers: "PySpark", "Spark SQL", "analyze with PySpark", "Spark DataFrame", "Livy session", "lakehouse with Python", "PySpark analysis", "PySpark data quality", "Delta time-travel with Spark".
Use when reading from or writing to Neo4j with Apache Spark or Databricks using the Neo4j Connector for Apache Spark (org.neo4j:neo4j-connector-apache-spark). Covers SparkSession setup, DataFrame reads via labels/Cypher/relationship scan, DataFrame writes with SaveMode, node.keys for MERGE, relationship write mapping, partition and batch tuning, PySpark and Scala examples, Databricks cluster config, Databricks secrets for credentials, Delta Lake to Neo4j pipelines. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT handle the Python bolt driver — use neo4j-driver-python-skill. Does NOT handle GDS algorithms — use neo4j-gds-skill.
Manage the full lifecycle of Alibaba Cloud EMR Serverless Spark workspaces—create workspaces, submit jobs, Kyuubi interactive queries, resource queue scaling, and status queries. Use this Skill when users want to create Spark workspaces, submit Spark jobs, view job status and logs, execute SQL via Kyuubi, scale resource queues, or view workspace status. Also applicable when users say "create a Spark workspace", "submit Spark job", "run PySpark", "execute SQL via Kyuubi", "scale resource queue", "view job logs", etc.
Execute read-only T-SQL queries against Fabric Data Warehouse, Lakehouse SQL Endpoints, and Mirrored Databases via CLI. Default skill for any lakehouse data query (row counts, SELECT, filtering, aggregation) unless the user explicitly requests PySpark or Spark DataFrames. Use when the user wants to: (1) query warehouse/lakehouse data, (2) count rows or explore lakehouse tables, (3) discover schemas/columns, (4) generate T-SQL scripts, (5) monitor SQL performance, (6) export results to CSV/JSON. Triggers: "warehouse", "SQL query", "T-SQL", "query warehouse", "show warehouse tables", "show lakehouse tables", "query lakehouse", "lakehouse table", "how many rows", "count rows", "SQL endpoint", "describe warehouse schema", "generate T-SQL script", "warehouse performance", "export SQL data", "connect to warehouse", "lakehouse data", "explore lakehouse".
Develop Microsoft Fabric Spark/data engineering workflows with intelligent routing to specialized resources. Provides core workspace/lakehouse management and routes to: data engineering patterns, development workflow, or infrastructure orchestration. Use when the user wants to: (1) manage Fabric workspaces and resources, (2) develop notebooks and PySpark applications, (3) design data pipelines and orchestration, (4) provision infrastructure as code. Triggers: "develop notebook", "data engineering", "workspace setup", "pipeline design", "infrastructure provisioning", "Delta Lake patterns", "Spark development", "lakehouse configuration", "organize lakehouse tables", "create Livy session", "notebook deployment".
Design ETL workflows with data validation using tools like Pandas, Dask, or PySpark. Use when building robust data processing systems in Python.
Expert-level Databricks platform, Apache Spark, Delta Lake, MLflow, notebooks, and cluster management
Apache Spark distributed computing. Use for big data processing.
Apache Spark, Hadoop, distributed computing, and large-scale data processing for petabyte-scale workloads