Search Results: lake

Found 42 Skills

Data Processingaws/agent-toolkit-for-aws

finding-data-lake-assets

Resolve data lake and lakehouse asset references across Glue Data Catalog, S3, S3 Tables, and Redshift. Triggers on: find the table, where is our data, which table has, locate dataset, find data for, search catalog, what tables match, Redshift table, lakehouse table, data lake table, warehouse table, reverse lookup S3 path. Do NOT use for: full catalog audits (use exploring-data-catalog), running queries (use querying-data-lake), creating tables (use creating-data-lake-table).

🇺🇸|EnglishTranslated

Data Processingaws/agent-toolkit-for-aws

creating-data-lake-table

Create managed Iceberg tables using Amazon S3 Tables (s3tables API namespace) with automatic compaction and snapshot management. Sets up table bucket, namespace, table, schema, Glue catalog registration, partitioning, IAM access control. Triggers on: create table, data lake table, analytics table, structured data storage, S3 Tables, Iceberg, Athena table, partitioning strategy, access permissions. Do NOT use for: importing files (use ingesting-into-data-lake), vector storage (use storing-and-querying-vectors), querying existing tables (use querying-data-lake), or locating existing table (use finding-data-lake-assets).

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesaws/agent-toolkit-for-aws

aws-cleanrooms

Troubleshoots and debugs AWS Clean Rooms collaboration issues related to IAM roles, S3 bucket policies, KMS keys, Lake Formation permissions, and CloudWatch logging for custom ML model training and inference jobs. Use when a customer reports permission failures, access errors, or log publishing issues in Clean Rooms.

🇺🇸|EnglishTranslated

Data Processinganthropics/knowledge-work...

instrument-data-to-allotrope

Convert laboratory instrument output files (PDF, CSV, Excel, TXT) to Allotrope Simple Model (ASM) JSON format or flattened 2D CSV. Use this skill when scientists need to standardize instrument data for LIMS systems, data lakes, or downstream analysis. Supports auto-detection of instrument types. Outputs include full ASM JSON, flattened CSV for easy import, and exportable Python code for data engineers. Common triggers include converting instrument files, standardizing lab data, preparing data for upload to LIMS/ELN systems, or generating parser code for production pipelines.

🇺🇸|EnglishTranslated

4 scripts/Attention

Data Processingvivekgana/databricks-plat...

medallion-architecture

Bronze/Silver/Gold layer design patterns and templates for building scalable data lakehouse architectures. Includes incremental processing, data quality checks, and optimization strategies.

🇺🇸|EnglishTranslated

Data Processingmarkdown-viewer/skills

data-analytics

Create data analytics and data pipeline diagrams using PlantUML syntax with analytics/database stencil icons. Best for ETL pipelines, data lakes, real-time streaming, data warehousing, and BI dashboards. NOT for simple flowcharts (use mermaid) or general cloud infra (use cloud skill).

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

lamindb

This skill should be used when working with LaminDB, an open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR. Use when managing biological datasets (scRNA-seq, spatial, flow cytometry, etc.), tracking computational workflows, curating and validating data with biological ontologies, building data lakehouses, or ensuring data lineage and reproducibility in biological research. Covers data management, annotation, ontologies (genes, cell types, diseases, tissues), schema validation, integrations with workflow managers (Nextflow, Snakemake) and MLOps platforms (W&B, MLflow), and deployment strategies.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesaliyun/alibabacloud-aiops...

alibabacloud-emr-cluster-manage

Manage the full lifecycle of Alibaba Cloud E-MapReduce (EMR) ECS clusters—creation, scaling, renewal, and status queries. Use this Skill when users want to set up big data clusters, view cluster status, add nodes, release nodes, configure auto-scaling, check cluster and node states, or diagnose creation failures. Also applicable for scenarios like "create a Hadoop cluster", "data lake cluster", "running out of resources", "check my cluster", "renew", etc. NOTE: This Skill does NOT support cluster deletion, release, or termination under any circumstances. Any request to delete or terminate a cluster will be refused and redirected to the EMR console.

🇺🇸|EnglishTranslated

Data Processingmicrosoft/skills-for-fabr...

sqldw-authoring-cli

Execute authoring T-SQL (DDL, DML, data ingestion, transactions, schema changes) against Microsoft Fabric Data Warehouse and SQL endpoints from agentic CLI environments. Use when the user wants to: (1) create/alter/drop tables from terminal, (2) insert/update/delete/merge data via CLI, (3) run COPY INTO or OPENROWSET ingestion, (4) manage transactions or stored procedures, (5) perform schema evolution, (6) use time travel or snapshots, (7) generate ETL/ELT shell scripts, (8) create views/functions/procedures on Lakehouse SQLEP. Triggers: "create table in warehouse", "insert data via T-SQL", "load from ADLS", "COPY INTO", "run ETL with T-SQL", "alter warehouse table", "upsert with T-SQL", "merge into warehouse", "create T-SQL procedure", "warehouse time travel", "recover deleted warehouse data", "create warehouse schema", "deploy warehouse", "transaction conflict", "snapshot isolation error".

🇺🇸|EnglishTranslated

Data Processingmicrosoft/skills-for-fabr...

spark-consumption-cli

Analyze lakehouse data interactively using Fabric Livy sessions and PySpark/Spark SQL for advanced analytics, DataFrames, cross-lakehouse joins, Delta time-travel, and unstructured/JSON data. Use when the user explicitly asks for PySpark, Spark DataFrames, Livy sessions, or Python-based analysis — NOT for simple SQL queries. Triggers: "PySpark", "Spark SQL", "analyze with PySpark", "Spark DataFrame", "Livy session", "lakehouse with Python", "PySpark analysis", "PySpark data quality", "Delta time-travel with Spark".

🇺🇸|EnglishTranslated

Data Processingneo4j-contrib/neo4j-skill...

neo4j-spark-skill

Use when reading from or writing to Neo4j with Apache Spark or Databricks using the Neo4j Connector for Apache Spark (org.neo4j:neo4j-connector-apache-spark). Covers SparkSession setup, DataFrame reads via labels/Cypher/relationship scan, DataFrame writes with SaveMode, node.keys for MERGE, relationship write mapping, partition and batch tuning, PySpark and Scala examples, Databricks cluster config, Databricks secrets for credentials, Delta Lake to Neo4j pipelines. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT handle the Python bolt driver — use neo4j-driver-python-skill. Does NOT handle GDS algorithms — use neo4j-gds-skill.

🇺🇸|EnglishTranslated

Data Processingeyadsibai/ltk

data-engineering

Use when "data pipelines", "ETL", "data warehousing", "data lakes", or asking about "Airflow", "Spark", "dbt", "Snowflake", "BigQuery", "data modeling"

🇺🇸|EnglishTranslated