Search Results: pyspark

Found 11 Skills

Data Processingmicrosoft/skills-for-fabr...

e2e-medallion-architecture

Implement end-to-end Medallion Architecture (Bronze/Silver/Gold) lakehouse patterns in Microsoft Fabric using PySpark, Delta Lake, and Fabric Pipelines. Use when the user wants to: (1) design a Bronze/Silver/Gold data lakehouse, (2) set up multi-layer workspace with lakehouses for each tier, (3) build ingestion-to-analytics pipelines with data quality enforcement, (4) optimize Spark configurations per medallion layer, (5) orchestrate Bronze-to-Silver-to-Gold flows via notebooks. Triggers: "medallion architecture", "bronze silver gold", "lakehouse layers", "e2e data pipeline", "end-to-end lakehouse", "data lakehouse pattern", "multi-layer lakehouse", "build medallion", "setup medallion".

🇺🇸|EnglishTranslated

Data Processingjeremylongshore/claude-co...

pyspark-transformer

Pyspark Transformer - Auto-activating skill for Data Pipelines. Triggers on: pyspark transformer, pyspark transformer Part of the Data Pipelines skill category.

🇺🇸|EnglishTranslated

Data Processingmicrosoft/skills-for-fabr...

spark-consumption-cli

Analyze lakehouse data interactively using Fabric Livy sessions and PySpark/Spark SQL for advanced analytics, DataFrames, cross-lakehouse joins, Delta time-travel, and unstructured/JSON data. Use when the user explicitly asks for PySpark, Spark DataFrames, Livy sessions, or Python-based analysis — NOT for simple SQL queries. Triggers: "PySpark", "Spark SQL", "analyze with PySpark", "Spark DataFrame", "Livy session", "lakehouse with Python", "PySpark analysis", "PySpark data quality", "Delta time-travel with Spark".

🇺🇸|EnglishTranslated

Data Processingneo4j-contrib/neo4j-skill...

neo4j-spark-skill

Use when reading from or writing to Neo4j with Apache Spark or Databricks using the Neo4j Connector for Apache Spark (org.neo4j:neo4j-connector-apache-spark). Covers SparkSession setup, DataFrame reads via labels/Cypher/relationship scan, DataFrame writes with SaveMode, node.keys for MERGE, relationship write mapping, partition and batch tuning, PySpark and Scala examples, Databricks cluster config, Databricks secrets for credentials, Delta Lake to Neo4j pipelines. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT handle the Python bolt driver — use neo4j-driver-python-skill. Does NOT handle GDS algorithms — use neo4j-gds-skill.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesaliyun/alibabacloud-aiops...

alibabacloud-emr-spark-manage

Manage the full lifecycle of Alibaba Cloud EMR Serverless Spark workspaces—create workspaces, submit jobs, Kyuubi interactive queries, resource queue scaling, and status queries. Use this Skill when users want to create Spark workspaces, submit Spark jobs, view job status and logs, execute SQL via Kyuubi, scale resource queues, or view workspace status. Also applicable when users say "create a Spark workspace", "submit Spark job", "run PySpark", "execute SQL via Kyuubi", "scale resource queue", "view job logs", etc.

🇺🇸|EnglishTranslated

Data Processingmicrosoft/skills-for-fabr...

sqldw-consumption-cli

Execute read-only T-SQL queries against Fabric Data Warehouse, Lakehouse SQL Endpoints, and Mirrored Databases via CLI. Default skill for any lakehouse data query (row counts, SELECT, filtering, aggregation) unless the user explicitly requests PySpark or Spark DataFrames. Use when the user wants to: (1) query warehouse/lakehouse data, (2) count rows or explore lakehouse tables, (3) discover schemas/columns, (4) generate T-SQL scripts, (5) monitor SQL performance, (6) export results to CSV/JSON. Triggers: "warehouse", "SQL query", "T-SQL", "query warehouse", "show warehouse tables", "show lakehouse tables", "query lakehouse", "lakehouse table", "how many rows", "count rows", "SQL endpoint", "describe warehouse schema", "generate T-SQL script", "warehouse performance", "export SQL data", "connect to warehouse", "lakehouse data", "explore lakehouse".

🇺🇸|EnglishTranslated

Data Processingmicrosoft/skills-for-fabr...

spark-authoring-cli

Develop Microsoft Fabric Spark/data engineering workflows with intelligent routing to specialized resources. Provides core workspace/lakehouse management and routes to: data engineering patterns, development workflow, or infrastructure orchestration. Use when the user wants to: (1) manage Fabric workspaces and resources, (2) develop notebooks and PySpark applications, (3) design data pipelines and orchestration, (4) provision infrastructure as code. Triggers: "develop notebook", "data engineering", "workspace setup", "pipeline design", "infrastructure provisioning", "Delta Lake patterns", "Spark development", "lakehouse configuration", "organize lakehouse tables", "create Livy session", "notebook deployment".

🇺🇸|EnglishTranslated

Data Processingjorgealves/agent_skills

python-data-pipeline-designer

Design ETL workflows with data validation using tools like Pandas, Dask, or PySpark. Use when building robust data processing systems in Python.

🇺🇸|EnglishTranslated

Data Processingpersonamanagmentlayer/pcl

databricks-expert

Expert-level Databricks platform, Apache Spark, Delta Lake, MLflow, notebooks, and cluster management

🇺🇸|EnglishTranslated

Data Processingg1joshi/agent-skills

spark

Apache Spark distributed computing. Use for big data processing.

🇺🇸|EnglishTranslated

Data Processingpluginagentmarketplace/cu...

big-data

Apache Spark, Hadoop, distributed computing, and large-scale data processing for petabyte-scale workloads

🇺🇸|EnglishTranslated

1 scripts/Checked