s3-explore

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

You are helping the user explore data on remote object storage using DuckDB.

URL:

$0

Question:

${1:-list and describe what's there}

你将协助用户使用DuckDB探索远程对象存储上的数据。

URL:

$0

问题:

${1:-列出并描述其中内容}

Step 1 — Detect provider and set up credentials

步骤1 — 检测服务商并配置凭证

Based on the URL or user context, prepend the appropriate secret configuration:

Provider	URL patterns	Secret setup
AWS S3	`s3://`	`CREATE SECRET (TYPE S3, PROVIDER credential_chain);`
Cloudflare R2	`r2://` , `s3://` with R2 endpoint	`CREATE SECRET (TYPE R2, PROVIDER credential_chain);`
GCS	`gs://` , `gcs://`	`CREATE SECRET (TYPE GCS, PROVIDER credential_chain);`
MinIO / custom	`s3://` with custom endpoint	`CREATE SECRET (TYPE S3, KEY_ID '...', SECRET '...', ENDPOINT '...', USE_SSL true);`

For R2, if the user provides an account ID, the endpoint is

<account_id>.r2.cloudflarestorage.com

. R2 URLs like

r2://bucket/path

should be rewritten to

s3://bucket/path

with the R2 secret.

For public buckets (e.g., Overture Maps, AWS open data), no secret is needed — skip this step.

Always prepend:

sql

LOAD httpfs;

根据URL或用户上下文，添加对应的密钥配置：

服务商	URL 格式	密钥配置
AWS S3	`s3://`	`CREATE SECRET (TYPE S3, PROVIDER credential_chain);`
Cloudflare R2	`r2://` 、带R2端点的 `s3://`	`CREATE SECRET (TYPE R2, PROVIDER credential_chain);`
GCS	`gs://` 、 `gcs://`	`CREATE SECRET (TYPE GCS, PROVIDER credential_chain);`
MinIO / 自定义	带自定义端点的 `s3://`	`CREATE SECRET (TYPE S3, KEY_ID '...', SECRET '...', ENDPOINT '...', USE_SSL true);`

对于R2，如果用户提供了账户ID，端点为

<account_id>.r2.cloudflarestorage.com

。类似

r2://bucket/path

的R2 URL应重写为带R2密钥的

s3://bucket/path

。

对于公开存储桶（例如Overture Maps、AWS开放数据），无需密钥——跳过此步骤。

务必先执行：

sql

LOAD httpfs;

Step 2 — Determine what the URL points to

步骤2 — 判断URL指向的内容

If the URL looks like a directory or bucket (no file extension, or ends with

), list its contents with sizes:

bash

duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT filename, (size / 1024 / 1024)::DECIMAL(10,1) AS size_mb, last_modified
FROM read_blob('<URL>/*')
ORDER BY filename
LIMIT 50;
"

Note: only select

filename

size

last_modified

— never select

content

, which would download the actual files.

If the URL points to a specific file or glob pattern (has a file extension or contains

), preview it:

bash

duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
DESCRIBE FROM '<URL>';
SELECT count(*) AS row_count FROM '<URL>';
FROM '<URL>' LIMIT 20;
"

For Parquet files, get row counts and sizes from metadata (no data download):

bash

duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT file_name,
       sum(row_group_num_rows) AS total_rows,
       (sum(row_group_compressed_bytes) / 1024 / 1024)::DECIMAL(10,1) AS compressed_mb
FROM parquet_metadata('<URL>')
GROUP BY file_name;
"

如果URL看起来是目录或存储桶（无文件扩展名，或以

结尾），列出其内容及大小：

bash

duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT filename, (size / 1024 / 1024)::DECIMAL(10,1) AS size_mb, last_modified
FROM read_blob('<URL>/*')
ORDER BY filename
LIMIT 50;
"

注意：仅选择

filename

、

size

、

last_modified

——绝不要选择

content

，这会下载实际文件。

如果URL指向特定文件或通配符模式（带有文件扩展名或包含

），则预览该文件：

bash

duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
DESCRIBE FROM '<URL>';
SELECT count(*) AS row_count FROM '<URL>';
FROM '<URL>' LIMIT 20;
"

对于Parquet文件，从元数据获取行数和大小（无需下载数据）：

bash

duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT file_name,
       sum(row_group_num_rows) AS total_rows,
       (sum(row_group_compressed_bytes) / 1024 / 1024)::DECIMAL(10,1) AS compressed_mb
FROM parquet_metadata('<URL>')
GROUP BY file_name;
"

Step 3 — Answer the question

步骤3 — 回答问题

Using the listing, schema, or sample data, answer:

${1:-list and describe what's there}

If the user asks an analytical question (e.g., "how many rows match X"), write and run the appropriate SQL query. DuckDB pushes predicates down into Parquet on S3, so filtering is efficient even on large remote datasets.

使用列表、schema或样本数据，回答：

${1:-列出并描述其中内容}

如果用户提出分析类问题（例如“有多少行匹配X”），编写并运行相应的SQL查询。DuckDB会将谓词下推到S3上的Parquet文件，因此即使是大型远程数据集，过滤也很高效。

Error handling

错误处理

duckdb: command not found
→ delegate to

/duckdb-skills:install-duckdb

Access denied / 403 → suggest the user check credentials:
```
aws configure
```
, environment variables, or provide explicit key/secret
Bucket not found / 404 → check the URL and region
Timeout on large listing → suggest narrowing the glob pattern or adding a prefix

duckdb: command not found
→ 调用

/duckdb-skills:install-duckdb

Access denied / 403 → 建议用户检查凭证：
```
aws configure
```
、环境变量，或提供明确的密钥/密码
Bucket not found / 404 → 检查URL和区域
大型列表超时 → 建议缩小通配符模式或添加前缀