s3-explore

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
You are helping the user explore data on remote object storage using DuckDB.
URL:
$0
Question:
${1:-list and describe what's there}
你将协助用户使用DuckDB探索远程对象存储上的数据。
URL:
$0
问题:
${1:-列出并描述其中内容}

Step 1 — Detect provider and set up credentials

步骤1 — 检测服务商并配置凭证

Based on the URL or user context, prepend the appropriate secret configuration:
ProviderURL patternsSecret setup
AWS S3
s3://
CREATE SECRET (TYPE S3, PROVIDER credential_chain);
Cloudflare R2
r2://
,
s3://
with R2 endpoint
CREATE SECRET (TYPE R2, PROVIDER credential_chain);
GCS
gs://
,
gcs://
CREATE SECRET (TYPE GCS, PROVIDER credential_chain);
MinIO / custom
s3://
with custom endpoint
CREATE SECRET (TYPE S3, KEY_ID '...', SECRET '...', ENDPOINT '...', USE_SSL true);
For R2, if the user provides an account ID, the endpoint is
<account_id>.r2.cloudflarestorage.com
. R2 URLs like
r2://bucket/path
should be rewritten to
s3://bucket/path
with the R2 secret.
For public buckets (e.g., Overture Maps, AWS open data), no secret is needed — skip this step.
Always prepend:
sql
LOAD httpfs;
根据URL或用户上下文,添加对应的密钥配置:
服务商URL 格式密钥配置
AWS S3
s3://
CREATE SECRET (TYPE S3, PROVIDER credential_chain);
Cloudflare R2
r2://
、带R2端点的
s3://
CREATE SECRET (TYPE R2, PROVIDER credential_chain);
GCS
gs://
gcs://
CREATE SECRET (TYPE GCS, PROVIDER credential_chain);
MinIO / 自定义带自定义端点的
s3://
CREATE SECRET (TYPE S3, KEY_ID '...', SECRET '...', ENDPOINT '...', USE_SSL true);
对于R2,如果用户提供了账户ID,端点为
<account_id>.r2.cloudflarestorage.com
。类似
r2://bucket/path
的R2 URL应重写为带R2密钥的
s3://bucket/path
对于公开存储桶(例如Overture Maps、AWS开放数据),无需密钥——跳过此步骤。
务必先执行:
sql
LOAD httpfs;

Step 2 — Determine what the URL points to

步骤2 — 判断URL指向的内容

If the URL looks like a directory or bucket (no file extension, or ends with
/
), list its contents with sizes:
bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT filename, (size / 1024 / 1024)::DECIMAL(10,1) AS size_mb, last_modified
FROM read_blob('<URL>/*')
ORDER BY filename
LIMIT 50;
"
Note: only select
filename
,
size
,
last_modified
— never select
content
, which would download the actual files.
If the URL points to a specific file or glob pattern (has a file extension or contains
*
), preview it:
bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
DESCRIBE FROM '<URL>';
SELECT count(*) AS row_count FROM '<URL>';
FROM '<URL>' LIMIT 20;
"
For Parquet files, get row counts and sizes from metadata (no data download):
bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT file_name,
       sum(row_group_num_rows) AS total_rows,
       (sum(row_group_compressed_bytes) / 1024 / 1024)::DECIMAL(10,1) AS compressed_mb
FROM parquet_metadata('<URL>')
GROUP BY file_name;
"
如果URL看起来是目录或存储桶(无文件扩展名,或以
/
结尾),列出其内容及大小:
bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT filename, (size / 1024 / 1024)::DECIMAL(10,1) AS size_mb, last_modified
FROM read_blob('<URL>/*')
ORDER BY filename
LIMIT 50;
"
注意:仅选择
filename
size
last_modified
——绝不要选择
content
,这会下载实际文件。
如果URL指向特定文件或通配符模式(带有文件扩展名或包含
*
),则预览该文件:
bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
DESCRIBE FROM '<URL>';
SELECT count(*) AS row_count FROM '<URL>';
FROM '<URL>' LIMIT 20;
"
对于Parquet文件,从元数据获取行数和大小(无需下载数据):
bash
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT file_name,
       sum(row_group_num_rows) AS total_rows,
       (sum(row_group_compressed_bytes) / 1024 / 1024)::DECIMAL(10,1) AS compressed_mb
FROM parquet_metadata('<URL>')
GROUP BY file_name;
"

Step 3 — Answer the question

步骤3 — 回答问题

Using the listing, schema, or sample data, answer:
${1:-list and describe what's there}
If the user asks an analytical question (e.g., "how many rows match X"), write and run the appropriate SQL query. DuckDB pushes predicates down into Parquet on S3, so filtering is efficient even on large remote datasets.
使用列表、schema或样本数据,回答:
${1:-列出并描述其中内容}
如果用户提出分析类问题(例如“有多少行匹配X”),编写并运行相应的SQL查询。DuckDB会将谓词下推到S3上的Parquet文件,因此即使是大型远程数据集,过滤也很高效。

Error handling

错误处理

  • duckdb: command not found
    → delegate to
    /duckdb-skills:install-duckdb
  • Access denied / 403 → suggest the user check credentials:
    aws configure
    , environment variables, or provide explicit key/secret
  • Bucket not found / 404 → check the URL and region
  • Timeout on large listing → suggest narrowing the glob pattern or adding a prefix
  • duckdb: command not found
    → 调用
    /duckdb-skills:install-duckdb
  • Access denied / 403 → 建议用户检查凭证:
    aws configure
    、环境变量,或提供明确的密钥/密码
  • Bucket not found / 404 → 检查URL和区域
  • 大型列表超时 → 建议缩小通配符模式或添加前缀