Loading...
Loading...
Connect Spice to data sources and query across them with federated SQL. Use when connecting to databases (Postgres, MySQL, DynamoDB), data lakes (S3, Delta Lake, Iceberg), warehouses (Snowflake, Databricks), files, APIs, or catalogs; configuring datasets; creating views; writing data; or setting up cross-source queries.
npx skill4agent add spiceai/skills spice-connect-datadatasets:
- from: postgres:customers
name: customers
params:
pg_host: db.example.com
pg_user: ${secrets:PG_USER}
- from: s3://bucket/orders/
name: orders
params:
file_format: parquet
- from: snowflake:analytics.sales
name: sales-- Query across all three sources in one statement
SELECT c.name, o.order_total, s.region
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN sales s ON o.id = s.order_id
WHERE s.region = 'EMEA';datasets:
- from: <connector>:<identifier>
name: <dataset_name>
params:
# connector-specific parameters
acceleration:
enabled: true # optional: materialize locally (see spice-acceleration)| Connector | From Format | Status |
|---|---|---|
| PostgreSQL | | Stable (also Amazon Redshift) |
| MySQL | | Stable |
| DuckDB | | Stable |
| MS SQL Server | | Beta |
| DynamoDB | | Release Candidate |
| MongoDB | | Alpha |
| ClickHouse | | Alpha |
| Connector | From Format | Status |
|---|---|---|
| Snowflake | | Beta |
| Databricks (Delta Lake) | | Stable |
| Spark | | Beta |
| Connector | From Format | Status |
|---|---|---|
| S3 | | Stable |
| Delta Lake | | Stable |
| Iceberg | | Beta |
| Azure BlobFS | | Alpha |
| File (local) | | Stable |
| Connector | From Format | Status |
|---|---|---|
| Spice.ai | | Stable |
| Dremio | | Stable |
| GitHub | | Stable |
| GraphQL | | Release Candidate |
| FlightSQL | | Beta |
| ODBC | | Beta |
| FTP/SFTP | | Alpha |
| HTTP/HTTPS | | Alpha |
| Kafka | | Alpha |
| Debezium CDC | | Alpha |
| SharePoint | | Alpha |
| IMAP | | Alpha |
datasets:
- from: postgres:public.users
name: users
params:
pg_host: localhost
pg_port: 5432
pg_user: ${ env:PG_USER }
pg_pass: ${ env:PG_PASS }
acceleration:
enabled: truedatasets:
- from: s3://my-bucket/data/sales/
name: sales
params:
file_format: parquet
s3_region: us-east-1
acceleration:
enabled: true
engine: duckdbdatasets:
- from: github:github.com/spiceai/spiceai/issues
name: spiceai.issues
params:
github_token: ${ secrets:GITHUB_TOKEN }
acceleration:
enabled: true
refresh_mode: append
refresh_check_interval: 24h
refresh_data_window: 14ddatasets:
- from: file:./data/sales.parquet
name: sales| Format | | Type |
|---|---|---|
| Apache Parquet | | Structured |
| CSV | | Structured |
| Markdown | | Document |
| Text | | Document |
| Document | |
| Microsoft Word | | Document |
locationcontentdatasets:
- from: file:docs/decisions/
name: my_documents
params:
file_format: mddatasets:
- from: s3://bucket/data/
name: partitioned_data
params:
file_format: parquet
hive_partitioning_enabled: trueSELECT * FROM partitioned_data WHERE year = '2024' AND month = '01';name: foospice.public.fooname: myschema.foospice.myschema.foo.<catalog>.<schema>.<table>Note: Acceleration is not supported for catalog tables. Use datasets for accelerated access.
catalogs:
- from: <connector>
name: <catalog_name>
params:
# connector-specific parameters
include:
- 'schema.*' # optional: filter with glob patterns| Connector | From Value | Status |
|---|---|---|
| Unity Catalog | | Stable |
| Databricks | | Beta |
| Iceberg | | Beta |
| Spice.ai | | Beta |
| AWS Glue | | Alpha |
catalogs:
- from: unity_catalog
name: unity
params:
unity_catalog_endpoint: https://my-workspace.cloud.databricks.com
databricks_token: ${ secrets:DATABRICKS_TOKEN }
include:
- 'my_schema.*'SELECT * FROM unity.my_schema.customers LIMIT 10;views:
- name: daily_sales
sql: |
SELECT DATE(created_at) as date, SUM(amount) as total, COUNT(*) as orders
FROM orders
GROUP BY DATE(created_at)
- name: order_details
sql: |
SELECT o.id, c.name as customer, p.name as product, o.quantity
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.idviews:
- name: rankings
sql: |
SELECT product_id, SUM(quantity) as total_sold
FROM orders GROUP BY product_id ORDER BY total_sold DESC LIMIT 100
acceleration:
enabled: true
refresh_check_interval: 1hSELECT * FROM daily_salesINSERT INTOdatasets:
- from: iceberg:https://catalog.example.com/v1/namespaces/sales/tables/transactions
name: transactions
access: read_write # required for writesINSERT INTO transactions SELECT * FROM staging_transactions;${ store_name:KEY }params:
pg_user: ${ env:PG_USER }
pg_pass: ${ secrets:PG_PASSWORD }