s3-explore
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYou are helping the user explore data on remote object storage using DuckDB.
URL:
Question:
$0${1:-list and describe what's there}你将协助用户使用DuckDB探索远程对象存储上的数据。
URL:
问题:
$0${1:-列出并描述其中内容}Step 1 — Detect provider and set up credentials
步骤1 — 检测服务商并配置凭证
Based on the URL or user context, prepend the appropriate secret configuration:
| Provider | URL patterns | Secret setup |
|---|---|---|
| AWS S3 | | |
| Cloudflare R2 | | |
| GCS | | |
| MinIO / custom | | |
For R2, if the user provides an account ID, the endpoint is . R2 URLs like should be rewritten to with the R2 secret.
<account_id>.r2.cloudflarestorage.comr2://bucket/paths3://bucket/pathFor public buckets (e.g., Overture Maps, AWS open data), no secret is needed — skip this step.
Always prepend:
sql
LOAD httpfs;根据URL或用户上下文,添加对应的密钥配置:
| 服务商 | URL 格式 | 密钥配置 |
|---|---|---|
| AWS S3 | | |
| Cloudflare R2 | | |
| GCS | | |
| MinIO / 自定义 | 带自定义端点的 | |
对于R2,如果用户提供了账户ID,端点为。类似的R2 URL应重写为带R2密钥的。
<account_id>.r2.cloudflarestorage.comr2://bucket/paths3://bucket/path对于公开存储桶(例如Overture Maps、AWS开放数据),无需密钥——跳过此步骤。
务必先执行:
sql
LOAD httpfs;Step 2 — Determine what the URL points to
步骤2 — 判断URL指向的内容
If the URL looks like a directory or bucket (no file extension, or ends with ), list its contents with sizes:
/bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT filename, (size / 1024 / 1024)::DECIMAL(10,1) AS size_mb, last_modified
FROM read_blob('<URL>/*')
ORDER BY filename
LIMIT 50;
"Note: only select , , — never select , which would download the actual files.
filenamesizelast_modifiedcontentIf the URL points to a specific file or glob pattern (has a file extension or contains ), preview it:
*bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
DESCRIBE FROM '<URL>';
SELECT count(*) AS row_count FROM '<URL>';
FROM '<URL>' LIMIT 20;
"For Parquet files, get row counts and sizes from metadata (no data download):
bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT file_name,
sum(row_group_num_rows) AS total_rows,
(sum(row_group_compressed_bytes) / 1024 / 1024)::DECIMAL(10,1) AS compressed_mb
FROM parquet_metadata('<URL>')
GROUP BY file_name;
"如果URL看起来是目录或存储桶(无文件扩展名,或以结尾),列出其内容及大小:
/bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT filename, (size / 1024 / 1024)::DECIMAL(10,1) AS size_mb, last_modified
FROM read_blob('<URL>/*')
ORDER BY filename
LIMIT 50;
"注意:仅选择、、——绝不要选择,这会下载实际文件。
filenamesizelast_modifiedcontent如果URL指向特定文件或通配符模式(带有文件扩展名或包含),则预览该文件:
*bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
DESCRIBE FROM '<URL>';
SELECT count(*) AS row_count FROM '<URL>';
FROM '<URL>' LIMIT 20;
"对于Parquet文件,从元数据获取行数和大小(无需下载数据):
bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT file_name,
sum(row_group_num_rows) AS total_rows,
(sum(row_group_compressed_bytes) / 1024 / 1024)::DECIMAL(10,1) AS compressed_mb
FROM parquet_metadata('<URL>')
GROUP BY file_name;
"Step 3 — Answer the question
步骤3 — 回答问题
Using the listing, schema, or sample data, answer:
${1:-list and describe what's there}If the user asks an analytical question (e.g., "how many rows match X"), write and run the appropriate SQL query. DuckDB pushes predicates down into Parquet on S3, so filtering is efficient even on large remote datasets.
使用列表、schema或样本数据,回答:
${1:-列出并描述其中内容}如果用户提出分析类问题(例如“有多少行匹配X”),编写并运行相应的SQL查询。DuckDB会将谓词下推到S3上的Parquet文件,因此即使是大型远程数据集,过滤也很高效。
Error handling
错误处理
- → delegate to
duckdb: command not found/duckdb-skills:install-duckdb - Access denied / 403 → suggest the user check credentials: , environment variables, or provide explicit key/secret
aws configure - Bucket not found / 404 → check the URL and region
- Timeout on large listing → suggest narrowing the glob pattern or adding a prefix
- → 调用
duckdb: command not found/duckdb-skills:install-duckdb - Access denied / 403 → 建议用户检查凭证:、环境变量,或提供明确的密钥/密码
aws configure - Bucket not found / 404 → 检查URL和区域
- 大型列表超时 → 建议缩小通配符模式或添加前缀