Search Results: data-ingestion

Found 35 Skills

DevOps & Cloud Servicesmicrosoft/skills-for-fabr...

eventhouse-authoring-cli

Execute KQL management commands (table management, ingestion, policies, functions, materialized views) against Fabric Eventhouse and KQL Databases via CLI. Use when the user wants to: 1. Create or alter KQL tables, columns, or functions 2. Ingest data into an Eventhouse (inline, from storage, streaming) 3. Configure retention, caching, or partitioning policies 4. Create or manage materialized views and update policies 5. Manage data mappings for ingestion pipelines 6. Deploy KQL schema via scripts Triggers: "create kql table", "kql ingestion", "ingest into eventhouse", "kql function", "materialized view", "kql retention policy", "eventhouse schema", "kql authoring", "create eventhouse table", "kql mapping"

🇺🇸|EnglishTranslated

Data Processingaws/agent-toolkit-for-aws

ingesting-into-data-lake

Import data into the AWS data lake from S3 files, local uploads, JDBC databases (Oracle, SQL Server, PostgreSQL, MySQL, RDS, Aurora), Amazon Redshift, Snowflake, BigQuery, DynamoDB, or existing Glue catalog tables (migration). Default target is S3 Tables; standard Iceberg on a general purpose bucket is supported where S3 Tables is not adopted. Handles one-time loads, recurring pipelines, migrations. Triggers on: import data, load data, ingest, sync database, migrate table, move data to AWS, set up pipeline, ETL, pull from Snowflake, query BigQuery into S3, export DynamoDB, CTAS, convert to Iceberg. Do NOT use for setting up or troubleshooting Glue connections (use connecting-to-data-source), creating empty tables (use creating-data-lake-table), running queries (use querying-data-lake), finding tables by fuzzy name (use finding-data-lake-assets), catalog audit (use exploring-data-catalog), or SaaS platforms like Salesforce, ServiceNow, SAP, MongoDB, Kafka.

🇺🇸|EnglishTranslated

Data Processinguntitled-data-company/dlt...

dlt-skill

Creates and maintains dlt (data load tool) pipelines from APIs, databases, and other sources. Use when the user wants to build or debug pipelines; use verified sources (e.g. Salesforce, GitHub, Stripe) or declarative REST API or custom Python; configure destinations (e.g. DuckDB, BigQuery, Snowflake); implement incremental loading; or edit .dlt config and secrets. Use when the user mentions data ingestion, dlt pipeline, dlt init, rest_api_source, incremental load, or pipeline dashboard.

🇺🇸|EnglishTranslated

5 scripts/Checked

AI & Machine Learninggarrytan/gbrain

cold-start

Day-one data bootstrapping for a new brain. Sequences the highest-leverage data sources to go from empty brain to useful brain in one session. Uses ClawVisor for safe credential handling — the agent never holds raw API keys. Covers Gmail import, calendar sync, contacts seeding, X/Twitter archive, conversation imports, and file archives. Use when a user has just finished gbrain setup and asks "now what?"

🇺🇸|EnglishTranslated

Data Processingelastic/integration-skill...

input-configurations

Input template configuration for Elastic integrations. Covers agent stream templates (agent/stream/*.yml.hbs) for all non-CEL input types: HTTPJSON, AWS S3, CloudWatch, Azure Blob, Azure EventHub, GCS, GCP Pub/Sub, TCP, UDP, HTTP Endpoint, Filestream, Logfile, Journald, Winlog, and WebSocket. For CEL input programs, use the cel-programs skill instead.

🇺🇸|EnglishTranslated

Data Processingdatabricks-solutions/ai-d...

databricks-zerobus-ingest

Build Zerobus Ingest clients for near real-time data ingestion into Databricks Delta tables via gRPC. Use when creating producers that write directly to Unity Catalog tables without a message bus, working with the Zerobus Ingest SDK in Python/Java/Go/TypeScript/Rust, generating Protobuf schemas from UC tables, or implementing stream-based ingestion with ACK handling and retry logic.

🇺🇸|EnglishTranslated

Data Processinggoldsky-io/goldsky-agent

turbo-pipelines

Goldsky Turbo pipeline YAML reference — the authoritative source for field names, required vs optional fields, and valid values. Use whenever the user asks about specific YAML fields: what does `start_at: earliest` vs `latest` do, what fields does a postgres/clickhouse/kafka sink require, what is the `from:` field in a sink, how does `checkpoint` work, what's the syntax for `batch_size` or `primary_key`. Also use for validation errors like 'unknown field' or 'missing required field'. For interactive pipeline building end-to-end, use /turbo-builder instead.

🇺🇸|EnglishTranslated

Data Processingdatahub-project/datahub-s...

datahub-connector-planning

Plans new DataHub connectors by classifying the source system, researching it using a dedicated agent or inline research, and generating a _PLANNING.md blueprint with entity mapping and architecture decisions. Use when building a new connector, researching a source system for DataHub, or designing connector architecture. Triggers on: "plan a connector", "new connector for X", "research X for DataHub", "design connector for X", "create planning doc", or any request to plan/research/design a DataHub ingestion source.

🇺🇸|EnglishTranslated

Data Processingforcedotcom/sf-skills

preparing-datacloud

Salesforce Data Cloud Prepare phase. Use this skill when the user creates or manages Data Cloud data streams, DLOs, transforms, or Document AI configurations. TRIGGER when: user creates or manages Data Cloud data streams, DLOs, transforms, or Document AI configurations, or asks about ingestion into Data Cloud. DO NOT TRIGGER when: the task is connection setup only (use connecting-datacloud), DMOs and identity resolution (use harmonizing-datacloud), or query/search work (use retrieving-datacloud).

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingmicrosoftdocs/agent-skill...

azure-data-explorer

Expert knowledge for Azure Data Explorer development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when configuring ADX clusters, private endpoints, follower DBs, streaming ingestion, or Power BI integration, and other Azure Data Explorer related development tasks. Not for Azure Synapse Analytics (use azure-synapse-analytics), Azure Stream Analytics (use azure-stream-analytics), Azure HDInsight (use azure-hdinsight), Azure Databricks (use azure-databricks).

🇺🇸|EnglishTranslated

Platform Servicesatlassian/forge-skills

forge-connector

Guides building and deploying Atlassian Forge Teamwork Graph connector apps that ingest external data into Atlassian's Teamwork Graph, making it searchable in Rovo Search and surfaced in Rovo Chat. Use when the user wants to build a Forge connector, ingest external data into Atlassian, connect a third-party tool (e.g. Google Drive, ServiceNow, Salesforce) to Atlassian, make external content searchable in Rovo, build a graph:connector module, use the @forge/teamwork-graph SDK, or implement onConnectionChange / validateConnection functions.

🇺🇸|EnglishTranslated

2 scripts/Checked

Data Processingtiangong-ai/skills

kb-meta-fetch

Fetch journal articles from Crossref published after a user-specified date and insert them into PostgreSQL `journals` with DOI deduplication. Use when incrementally ingesting journal metadata from `journals_issn` into `journals`.

🇺🇸|EnglishTranslated

1 scripts/Checked