querying-aws-s3

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Query AWS S3 System Tables

查询AWS S3系统表

Overview

概述

Works best with the AWS MCP server for sandboxed execution and audit logging. All commands below use the AWS CLI and work in any environment with configured AWS credentials. Use IAM roles or temporary credentials; avoid long-lived access keys.
Amazon S3 Metadata provides continuously-updated Apache Iceberg tables that capture object-level metadata for general-purpose buckets. S3 Storage Lens exports aggregated storage and activity metrics as Iceberg tables. Both are read-only, stored in the AWS-managed
aws-s3
table bucket, and queryable via Amazon Athena.
System tables are preferred over raw S3 APIs (
list-objects-v2
,
head-object
) because:
  • list-objects-v2
    paginates at 1000 objects/page — inefficient for large buckets (millions or billions of objects). The inventory table answers
    SELECT COUNT(*)
    in seconds at any scale.
  • list-objects-v2
    cannot identify who uploaded an object, from which IP, or when something was deleted. Only the journal table has
    requester
    ,
    source_ip_address
    , and delete event tracking.
  • Filtering by tag requires
    get-object-tagging
    per object. The inventory table has
    object_tags
    as a queryable map column.
最佳搭配 AWS MCP server 使用,以实现沙箱执行和审计日志记录。以下所有命令均使用AWS CLI,可在任何配置了AWS凭证的环境中运行。建议使用IAM角色或临时凭证,避免使用长期访问密钥。
Amazon S3元数据提供持续更新的Apache Iceberg表,用于捕获通用存储桶的对象级元数据。S3 Storage Lens将聚合后的存储和活动指标导出为Iceberg表。两者均为只读,存储在AWS托管的
aws-s3
表存储桶中,可通过Amazon Athena进行查询。
系统表优于原生S3 API(
list-objects-v2
head-object
)的原因如下:
  • list-objects-v2
    每页分页1000个对象——对于大型存储桶(数百万或数十亿个对象)效率极低。inventory表可在数秒内完成任意规模的
    SELECT COUNT(*)
    查询。
  • list-objects-v2
    无法识别谁上传了对象、来自哪个IP地址,或对象何时被删除。只有journal表包含
    requester
    source_ip_address
    字段以及删除事件跟踪功能。
  • 按标签过滤需要对每个对象调用
    get-object-tagging
    。inventory表将
    object_tags
    作为可查询的映射列。

Decision Tree

决策树

User intentUse this skill?TableAlternative
How many objects in my bucketYesinventory
What was recently uploaded/deletedYesjournal
Who wrote/deleted objects (audit)Yesjournal (requester, source_ip)
Storage class breakdownYesinventory
Find objects by tag or user metadataYesinventory
Search annotation contentYesannotationSingle object → direct API
get-object-annotation
Write/update an annotationNoDirect API:
put-object-annotation
(tables are read-only)
Query data inside objectsNo
querying-data-lake
Bucket-level storage metrics/trendsYesStorage Lens tables
Enable metadata trackingYessee Enable section
用户意图是否使用该技能?替代方案
我的存储桶中有多少个对象inventory
最近上传/删除了哪些内容journal
谁写入/删除了对象(审计)journal(requester、source_ip)
存储类别拆分inventory
按标签或用户元数据查找对象inventory
搜索注释内容annotation单个对象 → 直接调用API
get-object-annotation
写入/更新注释直接调用API:
put-object-annotation
(表为只读)
查询对象内部的数据
querying-data-lake
存储桶级存储指标/趋势Storage Lens表
启用元数据跟踪查看启用部分

Common Tasks

常见任务

1. Check If Configured

1. 检查配置状态

Before querying, confirm S3 Metadata is enabled on the target bucket.
bash
aws s3api get-bucket-metadata-configuration --bucket <BUCKET> --region <REGION>
Interpret the response:
  • MetadataConfigurationNotFound
    error → not enabled. See Enable section below.
  • TableStatus: ACTIVE
    → ready to query.
  • TableStatus: BACKFILLING
    → queryable but inventory may be incomplete.
  • TableStatus: FAILED
    → check error field (usually IAM).
For Storage Lens:
bash
aws s3control get-storage-lens-configuration --account-id <ACCOUNT> --config-id <CONFIG_ID> --region <REGION>
Look for
DataExport.StorageLensTableDestination.IsEnabled: true
.
查询前,请确认目标存储桶已启用S3元数据。
bash
aws s3api get-bucket-metadata-configuration --bucket <BUCKET> --region <REGION>
响应解读:
  • MetadataConfigurationNotFound
    错误 → 未启用。请查看下方启用部分。
  • TableStatus: ACTIVE
    → 可随时查询。
  • TableStatus: BACKFILLING
    → 可查询,但inventory可能不完整。
  • TableStatus: FAILED
    → 检查错误字段(通常为IAM权限问题)。
对于Storage Lens:
bash
aws s3control get-storage-lens-configuration --account-id <ACCOUNT> --config-id <CONFIG_ID> --region <REGION>
查找
DataExport.StorageLensTableDestination.IsEnabled: true

2. Enable (if not configured)

2. 启用(若未配置)

Enable S3 Metadata on a bucket:
bash
aws s3api create-bucket-metadata-configuration \
  --bucket <BUCKET> \
  --region <REGION> \
  --metadata-configuration '{
    "JournalTableConfiguration": {"RecordExpiration": {"Expiration": "DISABLED"}},
    "InventoryTableConfiguration": {"ConfigurationState": "ENABLED"}
  }'
To also enable annotations (requires a service role):
bash
aws s3api create-bucket-metadata-configuration \
  --bucket <BUCKET> \
  --region <REGION> \
  --metadata-configuration '{
    "JournalTableConfiguration": {"RecordExpiration": {"Expiration": "ENABLED", "Days": 90}},
    "InventoryTableConfiguration": {"ConfigurationState": "ENABLED"},
    "AnnotationTableConfiguration": {"ConfigurationState": "ENABLED", "Role": "<ROLE_ARN>"}
  }'
Enable Storage Lens S3 Tables export:
bash
aws s3control put-storage-lens-configuration \
  --account-id <ACCOUNT> \
  --config-id <CONFIG_ID> \
  --region <REGION> \
  --storage-lens-configuration '{
    "Id": "<CONFIG_ID>",
    "IsEnabled": true,
    "AccountLevel": {"BucketLevel": {}},
    "DataExport": {
      "StorageLensTableDestination": {"IsEnabled": true}
    }
  }'
Register S3 Tables federated catalog in Glue (required for Athena access):
bash
aws glue create-catalog --region <REGION> --cli-input-json '{
  "Name": "s3tablescatalog",
  "CatalogInput": {
    "FederatedCatalog": {
      "Identifier": "arn:aws:s3tables:<REGION>:<ACCOUNT>:bucket/*",
      "ConnectionName": "aws:s3tables"
    }
  }
}'
For setup permissions and IAM role requirements, see Security Considerations below.
在存储桶上启用S3元数据:
bash
aws s3api create-bucket-metadata-configuration \
  --bucket <BUCKET> \
  --region <REGION> \
  --metadata-configuration '{
    "JournalTableConfiguration": {"RecordExpiration": {"Expiration": "DISABLED"}},
    "InventoryTableConfiguration": {"ConfigurationState": "ENABLED"}
  }'
若还需启用注释(需要服务角色):
bash
aws s3api create-bucket-metadata-configuration \
  --bucket <BUCKET> \
  --region <REGION> \
  --metadata-configuration '{
    "JournalTableConfiguration": {"RecordExpiration": {"Expiration": "ENABLED", "Days": 90}},
    "InventoryTableConfiguration": {"ConfigurationState": "ENABLED"},
    "AnnotationTableConfiguration": {"ConfigurationState": "ENABLED", "Role": "<ROLE_ARN>"}
  }'
启用Storage Lens S3表导出:
bash
aws s3control put-storage-lens-configuration \
  --account-id <ACCOUNT> \
  --config-id <CONFIG_ID> \
  --region <REGION> \
  --storage-lens-configuration '{
    "Id": "<CONFIG_ID>",
    "IsEnabled": true,
    "AccountLevel": {"BucketLevel": {}},
    "DataExport": {
      "StorageLensTableDestination": {"IsEnabled": true}
    }
  }'
在Glue中注册S3 Tables联合目录(Athena访问必需):
bash
aws glue create-catalog --region <REGION> --cli-input-json '{
  "Name": "s3tablescatalog",
  "CatalogInput": {
    "FederatedCatalog": {
      "Identifier": "arn:aws:s3tables:<REGION>:<ACCOUNT>:bucket/*",
      "ConnectionName": "aws:s3tables"
    }
  }
}'
关于设置权限和IAM角色要求,请查看下方安全注意事项

3. Verify Permissions

3. 验证权限

Querying requires:
  • Athena execution permissions
  • S3 Tables read permissions (see least-privilege policy in Security Considerations)
  • The S3 Tables federated catalog registered in Glue (
    s3tablescatalog
    )
  • Athena workgroup with SSE-KMS encryption configured on the output location
If
CATALOG_NOT_FOUND
errors occur, the Glue integration may not be enabled. See: Integrating S3 Tables with AWS analytics services
查询需要以下权限:
  • Athena执行权限
  • S3 Tables读取权限(请查看安全注意事项中的最小权限策略)
  • 在Glue中注册的S3 Tables联合目录(
    s3tablescatalog
  • 在输出位置配置了SSE-KMS加密的Athena工作组
若出现
CATALOG_NOT_FOUND
错误,可能未启用Glue集成。请查看: 将S3 Tables与AWS分析服务集成

4. Identify the Target Table

4. 确定目标表

S3 Metadata tables — namespace is
b_<bucket-name>
:
TableWhat it captures
journal
Event log — every CREATE, DELETE, UPDATE_METADATA, and annotation events. Near real-time.
inventory
Current state — one row per object (latest version). Updates within 1 hour.
annotation
Annotation payloads —
text_value
column holds the full content. Near real-time.
Storage Lens tables — namespace is
lens_<config-id>_exp
:
TableWhat it captures
default_storage_metrics
Per-bucket/prefix: object count, size, storage class breakdown. Daily.
default_activity_metrics
Per-bucket/prefix: GET/PUT/DELETE request counts. Daily.
bucket_property_metrics
Bucket config: versioning, encryption, lifecycle settings. Daily.
S3元数据表 — 命名空间为
b_<bucket-name>
捕获内容
journal
事件日志 — 记录所有CREATE、DELETE、UPDATE_METADATA和注释事件。近实时更新。
inventory
当前状态 — 每个对象对应一行(最新版本)。1小时内更新。
annotation
注释负载 —
text_value
列包含完整内容。近实时更新。
Storage Lens表 — 命名空间为
lens_<config-id>_exp
捕获内容
default_storage_metrics
按存储桶/前缀统计:对象数量、大小、存储类别拆分。每日更新。
default_activity_metrics
按存储桶/前缀统计:GET/PUT/DELETE请求次数。每日更新。
bucket_property_metrics
存储桶配置:版本控制、加密、生命周期设置。每日更新。

5. Query

5. 查询

Query syntax:
sql
"s3tablescatalog/aws-s3"."<namespace>"."<table>"
Constraints:
  • You MUST confirm workgroup and output location before executing
  • You MUST ensure the Athena workgroup enforces SSE-KMS encryption on query results
  • You MUST warn user that tables are read-only — no INSERT/UPDATE/DELETE
  • You SHOULD use the key columns documented in this skill to build queries. If you need the full schema (e.g., AWS has added new columns), run
    get-tables
    once on any single namespace — schemas are identical across all instances of the same table type:
    aws glue get-tables --catalog-id "<ACCOUNT>:s3tablescatalog/aws-s3" --database-name "<namespace>" --region <REGION>
Journal — audit who changed what:
sql
SELECT key, record_type, record_timestamp, requester, source_ip_address
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."journal"
WHERE record_type = 'DELETE'
  AND record_timestamp > current_timestamp - interval '24' hour
ORDER BY record_timestamp DESC;
Journal — track annotation events:
sql
SELECT key, record_type, annotation.name, record_timestamp
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."journal"
WHERE record_type IN ('CREATE_ANNOTATION', 'DELETE_ANNOTATION', 'UPDATE_ANNOTATION_METADATA')
ORDER BY record_timestamp DESC LIMIT 20;
Inventory — find objects by storage class:
sql
SELECT key, size, storage_class, last_modified_date
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."inventory"
WHERE storage_class = 'GLACIER'
ORDER BY size DESC LIMIT 50;
Inventory — find objects by tag:
sql
SELECT key, size, object_tags
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."inventory"
WHERE object_tags['environment'] = 'staging';
Annotation — search across payloads:
sql
SELECT object_key, name, text_value
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."annotation"
WHERE text_value LIKE '%error%';
Annotation — extract JSON fields:
sql
SELECT object_key, json_extract_scalar(text_value, '$.status') as status
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."annotation"
WHERE name = 'pipeline_status'
  AND json_extract_scalar(text_value, '$.status') = 'FAILED';
Storage Lens — storage distribution:
sql
SELECT *
FROM "s3tablescatalog/aws-s3"."lens_<config-id>_exp"."default_storage_metrics"
LIMIT 20;
查询语法:
sql
"s3tablescatalog/aws-s3"."<namespace>"."<table>"
约束条件:
  • 执行前必须确认工作组和输出位置
  • 必须确保Athena工作组对查询结果强制使用SSE-KMS加密
  • 必须提醒用户表为只读 — 不支持INSERT/UPDATE/DELETE
  • 应使用本技能中记录的关键字段构建查询。若需要完整 schema(例如AWS新增了列),可在任意单个命名空间上运行一次
    get-tables
    — 同一类型表的schema在所有实例中均相同:
    aws glue get-tables --catalog-id "<ACCOUNT>:s3tablescatalog/aws-s3" --database-name "<namespace>" --region <REGION>
Journal — 审计谁进行了哪些变更:
sql
SELECT key, record_type, record_timestamp, requester, source_ip_address
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."journal"
WHERE record_type = 'DELETE'
  AND record_timestamp > current_timestamp - interval '24' hour
ORDER BY record_timestamp DESC;
Journal — 跟踪注释事件:
sql
SELECT key, record_type, annotation.name, record_timestamp
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."journal"
WHERE record_type IN ('CREATE_ANNOTATION', 'DELETE_ANNOTATION', 'UPDATE_ANNOTATION_METADATA')
ORDER BY record_timestamp DESC LIMIT 20;
Inventory — 按存储类别查找对象:
sql
SELECT key, size, storage_class, last_modified_date
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."inventory"
WHERE storage_class = 'GLACIER'
ORDER BY size DESC LIMIT 50;
Inventory — 按标签查找对象:
sql
SELECT key, size, object_tags
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."inventory"
WHERE object_tags['environment'] = 'staging';
Annotation — 搜索负载内容:
sql
SELECT object_key, name, text_value
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."annotation"
WHERE text_value LIKE '%error%';
Annotation — 提取JSON字段:
sql
SELECT object_key, json_extract_scalar(text_value, '$.status') as status
FROM "s3tablescatalog/aws-s3"."b_<bucket>"."annotation"
WHERE name = 'pipeline_status'
  AND json_extract_scalar(text_value, '$.status') = 'FAILED';
Storage Lens — 存储分布:
sql
SELECT *
FROM "s3tablescatalog/aws-s3"."lens_<config-id>_exp"."default_storage_metrics"
LIMIT 20;

Routing: Athena vs Direct API

路由选择:Athena vs 直接API

ScenarioUse
Single known object + annotation nameDirect API:
get-object-annotation
Aggregate/count across many objectsAthena on annotation or inventory table
Full-text search across annotation payloadsAthena with
LIKE
or
json_extract_scalar
Write/update an annotationDirect API:
put-object-annotation
(table is read-only)
Feature not configured on bucketDirect API loop (
list-objects-v2
+
head-object
); suggest enabling S3 Metadata
场景使用方式
单个已知对象 + 注释名称直接API:
get-object-annotation
跨多个对象进行聚合/计数基于annotation或inventory表的Athena查询
跨注释负载进行全文搜索使用
LIKE
json_extract_scalar
的Athena查询
写入/更新注释直接API:
put-object-annotation
(表为只读)
存储桶未配置该功能直接API循环(
list-objects-v2
+
head-object
);建议启用S3元数据

Troubleshooting

故障排除

ErrorCauseFix
CATALOG_NOT_FOUND
S3 Tables not registered in GlueEnable integration: S3 console > Table buckets > Enable integration
Empty results from journalFeature just enabled; no events recorded yetUpload/delete an object and wait ~1 minute
Empty results from inventoryTable still
BACKFILLING
Check status; wait for ACTIVE (minutes to hours depending on object count)
AccessDenied
querying table
Missing
s3tables:GetTable
or
GetTableMetadataLocation
See Security Considerations below
Wrong namespaceBucket name has periodsPeriods are converted to underscores in namespace:
my.bucket
b_my_bucket
No Storage Lens dataFirst delivery takes up to 48 hoursWait; no historical backfill
错误原因解决方法
CATALOG_NOT_FOUND
S3 Tables未在Glue中注册启用集成:S3控制台 > 表存储桶 > 启用集成
journal返回空结果刚启用功能;尚未记录任何事件上传/删除一个对象,等待约1分钟
inventory返回空结果表仍处于
BACKFILLING
状态
检查状态;等待变为ACTIVE(根据对象数量需要数分钟至数小时)
查询表时出现
AccessDenied
缺少
s3tables:GetTable
GetTableMetadataLocation
权限
查看下方安全注意事项
命名空间错误存储桶名称包含句点句点在命名空间中会转换为下划线:
my.bucket
b_my_bucket
无Storage Lens数据首次交付最多需要48小时等待;无历史回填数据

Security Considerations

安全注意事项

Least-Privilege IAM Policy

最小权限IAM策略

Scope permissions to specific table bucket ARNs rather than using wildcards:
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3tables:GetTable",
        "s3tables:GetTableMetadataLocation",
        "s3tables:GetTableData",
        "s3tables:GetNamespace",
        "s3tables:ListTables",
        "s3tables:ListNamespaces",
        "s3tables:GetTableBucket"
      ],
      "Resource": [
        "arn:aws:s3tables:<REGION>:<ACCOUNT>:bucket/aws-s3",
        "arn:aws:s3tables:<REGION>:<ACCOUNT>:bucket/aws-s3/*"
      ]
    }
  ]
}
将权限范围限定为特定表存储桶ARN,而非使用通配符:
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3tables:GetTable",
        "s3tables:GetTableMetadataLocation",
        "s3tables:GetTableData",
        "s3tables:GetNamespace",
        "s3tables:ListTables",
        "s3tables:ListNamespaces",
        "s3tables:GetTableBucket"
      ],
      "Resource": [
        "arn:aws:s3tables:<REGION>:<ACCOUNT>:bucket/aws-s3",
        "arn:aws:s3tables:<REGION>:<ACCOUNT>:bucket/aws-s3/*"
      ]
    }
  ]
}

Data Sensitivity

数据敏感性

Journal query results may contain sensitive fields:
  • requester
    — AWS account ID or service principal that made the request
  • source_ip_address
    — IP address of the requester
Query results containing these fields should be stored in encrypted, access-controlled locations. Avoid logging or sharing raw query output that contains IP addresses or principal identifiers.
Journal查询结果可能包含敏感字段:
  • requester
    — 发起请求的AWS账户ID或服务主体
  • source_ip_address
    — 请求者的IP地址
包含这些字段的查询结果应存储在加密且受访问控制的位置。避免记录或分享包含IP地址或主体标识符的原始查询输出。

Encryption for Query Results

查询结果加密

Configure the Athena workgroup with
EncryptionConfiguration
to encrypt query results at rest:
json
{
  "ResultConfiguration": {
    "EncryptionConfiguration": {
      "EncryptionOption": "SSE_KMS",
      "KmsKey": "arn:aws:kms:<REGION>:<ACCOUNT>:key/<KEY_ID>"
    }
  }
}
为Athena工作组配置
EncryptionConfiguration
,对静态查询结果进行加密:
json
{
  "ResultConfiguration": {
    "EncryptionConfiguration": {
      "EncryptionOption": "SSE_KMS",
      "KmsKey": "arn:aws:kms:<REGION>:<ACCOUNT>:key/<KEY_ID>"
    }
  }
}

Audit Trail

审计跟踪

Enable CloudTrail logging for Athena (
StartQueryExecution
,
GetQueryResults
) and S3 Tables (
s3tables:GetTableData
) API calls to maintain an audit trail of who queried what metadata. Ensure CloudTrail logs are encrypted with SSE-KMS and stored in a bucket with access logging enabled.
为Athena(
StartQueryExecution
GetQueryResults
)和S3 Tables(
s3tables:GetTableData
)API调用启用CloudTrail日志记录,以维护谁查询了哪些元数据的审计跟踪。确保CloudTrail日志使用SSE-KMS加密,并存储在启用了访问日志记录的存储桶中。

Additional Resources

额外资源