aws-best-practice-research

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AWS Best Practice Research (with Optional Live Assessment)

AWS最佳实践研究（含可选实时评估）

Research and compile comprehensive best-practice checklists for any AWS service using the aws knowledge mcp server documentation search tools. Optionally assess live AWS resources against the compiled checklist.

借助aws knowledge mcp server文档搜索工具，研究并编译任意AWS服务的全面最佳实践检查表。还可选择对照编译好的检查表对运行中的AWS资源进行评估。

Prerequisites

前置条件

This skill requires the aws knowledge mcp server tools to be available:

```
aws___search_documentation
```
— search across AWS documentation topics
```
aws___read_documentation
```
— read full documentation pages
```
aws___recommend
```
— get related documentation recommendations

For the optional live assessment (Step 8):

AWS CLI (
```
aws
```
) — must be configured with credentials that have read access to the target service
jq — for parsing JSON output from AWS CLI commands

该技能需要**aws knowledge mcp server**工具可用：

```
aws___search_documentation
```
— 搜索AWS文档主题
```
aws___read_documentation
```
— 读取完整文档页面
```
aws___recommend
```
— 获取相关文档推荐

对于可选的实时评估（步骤8）：

AWS CLI（
```
aws
```
） — 必须配置有权限读取目标服务的凭证
jq — 用于解析AWS CLI命令输出的JSON结果

Workflow

工作流程

Step 1: Identify Target Service and Assessment Scope

步骤1：确定目标服务与评估范围

Determine from user input:

AWS Service — e.g., ElastiCache Redis, RDS MySQL, MSK, EKS, Aurora, DynamoDB, etc.
Focus areas — HA/DR, security, or all (default: all)
Live assessment info (optional) — does the user provide any of the following?
- AWS credentials (environment variables, profile, or credential file path)
- AWS Region (e.g., us-west-2)
- Resource identifiers (cluster name, instance ID, table name, etc.)

If the service is ambiguous, ask the user to clarify (e.g., "RDS MySQL or RDS PostgreSQL?").

Record whether a live assessment is requested:

If the user provides credentials + region + resource info → run live assessment after checklist
If the user provides partial info → ask for the missing pieces before proceeding
If the user provides no live resource info → skip live assessment, produce checklist only

从用户输入中确认：

AWS服务 — 例如ElastiCache Redis、RDS MySQL、MSK、EKS、Aurora、DynamoDB等
重点领域 — HA/DR、安全，或全部（默认：全部）
实时评估信息（可选） — 用户是否提供以下任意信息？
- AWS凭证（环境变量、配置文件或凭证文件路径）
- AWS区域（例如us-west-2）
- 资源标识符（集群名称、实例ID、表名等）

若服务不明确，请要求用户澄清（例如："是RDS MySQL还是RDS PostgreSQL？"）。

记录用户是否请求实时评估：

若用户提供凭证+区域+资源信息 → 生成检查表后运行实时评估
若用户提供部分信息 → 先询问缺失的信息再继续
若用户未提供任何实时资源信息 → 跳过实时评估，仅生成检查表

Step 2: Sequential Documentation Search

步骤2：按顺序搜索文档

Run the following 5 search queries one at a time, sequentially using

aws___search_documentation

. Do NOT run them in parallel — the aws knowledge mcp server has rate limits and parallel requests will trigger "Too many requests" errors.

Wait for each query to return results before sending the next one. Replace

{SERVICE}

with the actual service name (e.g., "ElastiCache Redis", "Amazon RDS MySQL", "Amazon MSK").

Query 1: "{SERVICE} best practices high availability disaster recovery"
  topics: ["general"]
  limit: 10

Query 2: "{SERVICE} Well-Architected reliability resilience best practices"
  topics: ["general"]
  limit: 10

Query 3: "{SERVICE} replication multi-AZ failover cluster mode backup"
  topics: ["reference_documentation", "troubleshooting"]
  limit: 10

Query 4: "{SERVICE} security encryption authentication access control"
  topics: ["general"]
  limit: 10

Query 5: "{SERVICE} Well-Architected security best practices"
  topics: ["general"]
  limit: 10

Rate limit protection: If any query returns a "Too many requests" error, wait 5 seconds and retry once. If it fails again, skip that query and continue with the next one.

使用

aws___search_documentation

依次运行以下5个搜索查询，请勿并行运行——aws knowledge mcp server有速率限制，并行请求会触发"请求过多"错误。

等待每个查询返回结果后再发送下一个。将

{SERVICE}

替换为实际服务名称（例如"ElastiCache Redis"、"Amazon RDS MySQL"、"Amazon MSK"）。

Query 1: "{SERVICE} best practices high availability disaster recovery"
  topics: ["general"]
  limit: 10

Query 2: "{SERVICE} Well-Architected reliability resilience best practices"
  topics: ["general"]
  limit: 10

Query 3: "{SERVICE} replication multi-AZ failover cluster mode backup"
  topics: ["reference_documentation", "troubleshooting"]
  limit: 10

Query 4: "{SERVICE} security encryption authentication access control"
  topics: ["general"]
  limit: 10

Query 5: "{SERVICE} Well-Architected security best practices"
  topics: ["general"]
  limit: 10

速率限制防护：若任意查询返回"请求过多"错误，等待5秒后重试一次。若再次失败，跳过该查询并继续下一个。

Step 3: Read Key Documentation Pages

步骤3：读取关键文档页面

From the search results, identify and read the most important pages one at a time, sequentially using

aws___read_documentation

. Do NOT read multiple pages in parallel to avoid rate limiting. Prioritize these document types:

Well-Architected Lens pages for the service (Reliability, Security, Performance, Operational Excellence pillars)
Official best practices page
Resilience / disaster recovery page
Overall best practices page

Read each with

max_length: 15000

to get comprehensive content. Typically 3-5 page reads are needed.

If a Well-Architected Lens exists for the service, it is the single most valuable source — always read it.

从搜索结果中识别最重要的页面，依次使用

aws___read_documentation

读取，请勿并行读取多个页面以避免触发速率限制。优先选择以下类型的文档：

服务的Well-Architected Lens页面（可靠性、安全性、性能、运营卓越支柱）
官方最佳实践页面
弹性/灾难恢复页面
整体最佳实践页面

读取时设置

max_length: 15000

以获取全面内容。通常需要读取3-5个页面。

若服务存在Well-Architected Lens，它是最有价值的数据源——务必读取。

Step 4: Extract and Categorize Check Items

步骤4：提取并分类检查项

From all gathered documentation, extract individual check items and organize them into 5 mandatory categories (see

references/output-template.md

for the exact format):

Category 1: High Availability Architecture Items about: cluster mode, replication, replicas per shard, Multi-AZ, AZ distribution, node types, quorum.

Category 2: Disaster Recovery Items about: automatic/manual backups, retention policies, RPO/RTO documentation, Global Datastore / cross-region replication, failover testing, replication lag monitoring.

Category 3: Failover Planning Items about: Test Failover API, FIS resilience testing, client timeout/topology config, SNS event notifications, graceful degradation, WAIT command.

Category 4: Security Configuration Items about: encryption at-rest/in-transit, authentication (AUTH/RBAC/IAM), subnet groups, security groups, KMS keys, dangerous command renaming, RBAC metrics monitoring, IAM control plane policies.

Category 5: Others Items not covered by the above 4 categories, including but not limited to: auto minor version upgrade, engine version, node type selection (Graviton), CloudWatch monitoring, reserved memory, connection pooling, read routing, expensive commands, slow log, IaC management, Auto Scaling, cost tags, client retry logic, performance tuning, operational best practices.

从所有收集到的文档中提取单个检查项，并将其整理为5个必填类别（具体格式请参考

references/output-template.md

）：

类别1：高可用架构 包含以下内容的检查项：集群模式、复制、每个分片的副本数、Multi-AZ、可用区分布、节点类型、法定人数。

类别2：灾难恢复 包含以下内容的检查项：自动/手动备份、保留策略、RPO/RTO文档、Global Datastore/跨区域复制、故障转移测试、复制延迟监控。

类别3：故障转移规划 包含以下内容的检查项：Test Failover API、FIS弹性测试、客户端超时/拓扑配置、SNS事件通知、优雅降级、WAIT命令。

类别4：安全配置 包含以下内容的检查项：静态/传输加密、认证（AUTH/RBAC/IAM）、访问控制、子网组、安全组、KMS密钥、危险命令重命名、RBAC指标监控、IAM控制平面策略。

类别5：其他 未被上述4个类别覆盖的检查项，包括但不限于：自动小版本升级、引擎版本、节点类型选择（Graviton）、CloudWatch监控、预留内存、连接池、读取路由、高开销命令、慢日志、IaC管理、自动扩缩容、成本标签、客户端重试逻辑、性能调优、运营最佳实践。

Scope Boundary for Container / Orchestration Platforms

容器/编排平台的范围边界

When the target service is a container or orchestration platform (EKS, ECS, Fargate, App Runner, Elastic Beanstalk), this skill focuses exclusively on the AWS infrastructure layer. All check items must be verifiable through AWS APIs (

aws eks

aws ecs

aws ec2

aws iam

, etc.).

Do NOT include check items that require

kubectl

, ECS Exec, or any in-cluster / in-task inspection to verify. These belong to a dedicated workload-level assessment skill.

For Amazon EKS, the infrastructure layer scope includes:

In Scope (AWS API verifiable)	Out of Scope (requires kubectl / workload context)
Control plane configuration (K8s version, platform version, API endpoint access, logging)	Pod Disruption Budgets (PDB)
Node group configuration (instance types, scaling, AMI, AZ distribution, disk size)	Topology Spread Constraints
Cluster networking (VPC, subnets, security groups, service CIDR)	Liveness / readiness / startup probes
Add-on presence and versions (VPC CNI, CoreDNS, kube-proxy, EBS CSI, etc.)	Container resource requests / limits
Secrets envelope encryption (KMS key)	Pod securityContext (runAsNonRoot, capabilities)
Authentication mode (ConfigMap vs API) and Access Entries	Pod Security Admission (PSA) namespace labels
Control plane audit logging	automountServiceAccountToken
Cluster deletion protection	Network Policies (K8s resource level)
Node auto-repair and node monitoring agent addon	Pod graceful termination (terminationGracePeriodSeconds, preStop)
Cluster tags and nodegroup tags	Workload-level Velero backups
Upgrade insights and deprecation warnings	Application health check paths
OIDC provider configuration (for IRSA)	Service mesh (mTLS) configuration
GuardDuty EKS protection (account-level)	OPA Gatekeeper / Kyverno policies

For Amazon ECS / Fargate, apply the same principle: check cluster, capacity providers, service auto-scaling, task definition registration, VPC configuration, and IAM roles — but do NOT check container-level health checks, resource limits, or task-internal configuration.

After generating the checklist, append a Scope Notice (see

references/output-template.md

for the exact format) directing users to a workload-level skill for the items that are out of scope.

For each check item, record:

ID — category prefix + sequential number + priority suffix (e.g.,
```
HA-01-hi
```
,
```
DR-02-md
```
,
```
SEC-03-lo
```
)
- Priority suffixes:
```
-hi
```
  (High),
```
-md
```
  (Medium),
```
-lo
```
  (Low)
- This embeds priority directly in the ID for quick visual scanning
Check item name — concise, actionable
Description — what to check and why, specific thresholds or values where applicable
Source — which document/control it comes from (see source annotation rules below)
Priority — High / Medium / Low (also kept as a separate column for filtering)

当目标服务为容器或编排平台（EKS、ECS、Fargate、App Runner、Elastic Beanstalk）时，本技能仅聚焦于AWS基础设施层。所有检查项必须可通过AWS API（

aws eks

、

aws ecs

、

aws ec2

、

aws iam

等）验证。

请勿包含需要

kubectl

、ECS Exec或任何集群内/任务内检查才能验证的检查项，这些属于专门的工作负载级评估技能范畴。

对于Amazon EKS，基础设施层范围包括：

范围内（可通过AWS API验证）	范围外（需要kubectl/工作负载上下文）
控制平面配置（K8s版本、平台版本、API端点访问、日志）	Pod中断预算（PDB）
节点组配置（实例类型、扩缩容、AMI、可用区分布、磁盘大小）	拓扑分布约束
集群网络（VPC、子网、安全组、服务CIDR）	存活/就绪/启动探针
附加组件的存在性和版本（VPC CNI、CoreDNS、kube-proxy、EBS CSI等）	容器资源请求/限制
密钥信封加密（KMS密钥）	Pod securityContext（以非根用户运行、权限）
认证模式（ConfigMap vs API）和访问条目	Pod安全准入（PSA）命名空间标签
控制平面审计日志	automountServiceAccountToken
集群删除保护	网络策略（K8s资源级）
节点自动修复和节点监控代理附加组件	Pod优雅终止（terminationGracePeriodSeconds、preStop）
集群标签和节点组标签	工作负载级Velero备份
升级洞察和弃用警告	应用健康检查路径
OIDC提供者配置（用于IRSA）	服务网格（mTLS）配置
GuardDuty EKS保护（账户级）	OPA Gatekeeper/Kyverno策略

对于Amazon ECS/Fargate，遵循相同原则：检查集群、容量提供者、服务自动扩缩容、任务定义注册、VPC配置和IAM角色——但请勿检查容器级健康检查、资源限制或任务内部配置。

生成检查表后，附加范围说明（具体格式请参考

references/output-template.md

），引导用户使用工作负载级技能评估范围外的项。

为每个检查项记录：

ID — 类别前缀+序号+优先级后缀（例如
```
HA-01-hi
```
、
```
DR-02-md
```
、
```
SEC-03-lo
```
）
- 优先级后缀：
```
-hi
```
  （高）、
```
-md
```
  （中）、
```
-lo
```
  （低）
- 此设计让优先级可通过ID快速识别
检查项名称 — 简洁、可执行
描述 — 检查内容和原因，适用时包含具体阈值或数值
来源 — 该项来自的文档/控制（请参考下方的来源注释规则）
优先级 — 高/中/低（同时作为单独列用于过滤）

Step 5: Compile Source Annotations

步骤5：编译来源注释

Use consistent source tags throughout the checklist:

Tag	Meaning
`WA-REL` / `WA-RELn`	Well-Architected Lens — Reliability Pillar (question N)
`WA-SEC` / `WA-SECn`	Well-Architected Lens — Security Pillar
`WA-PE` / `WA-PEn`	Well-Architected Lens — Performance Efficiency Pillar
`WA-OE` / `WA-OEn`	Well-Architected Lens — Operational Excellence Pillar
`WA-CO`	Well-Architected Lens — Cost Optimization Pillar
`Security Hub [{Service}.N]`	AWS Security Hub CSPM control (e.g., `[ElastiCache.1]` )
`re:Post`	AWS re:Post knowledge center article
`Official Docs`	Service user guide / official documentation
`AWS Blog`	AWS Database Blog or other official blog
`Whitepaper`	AWS whitepaper

在整个检查表中使用统一的来源标签：

标签	含义
`WA-REL` / `WA-RELn`	Well-Architected Lens — 可靠性支柱（第N个问题）
`WA-SEC` / `WA-SECn`	Well-Architected Lens — 安全性支柱
`WA-PE` / `WA-PEn`	Well-Architected Lens — 性能效率支柱
`WA-OE` / `WA-OEn`	Well-Architected Lens — 运营卓越支柱
`WA-CO`	Well-Architected Lens — 成本优化支柱
`Security Hub [{Service}.N]`	AWS Security Hub CSPM控制（例如 `[ElastiCache.1]` ）
`re:Post`	AWS re:Post知识中心文章
`Official Docs`	服务用户指南/官方文档
`AWS Blog`	AWS数据库博客或其他官方博客
`Whitepaper`	AWS白皮书

Step 6: Generate Output (Conditional)

步骤6：生成输出（条件分支）

The output depends on whether the user provided live assessment info in Step 1:

输出内容取决于步骤1中用户是否提供实时评估信息：

If NO live assessment info was provided → Generate Checklist Only

若未提供实时评估信息 → 仅生成检查表

Generate the checklist content using the exact format defined in

references/output-template.md

, then write it to a local markdown file using the Write tool.

File naming:

YYYY-mm-dd-HH-MM-SS-{SERVICE}-best-practice-checklist.md

Replace
```
YYYY-mm-dd-HH-MM-SS
```
with the current timestamp (e.g.,
```
2025-07-15-14-30-00
```
)
Replace
```
{SERVICE}
```
with a lowercase, hyphen-separated service name (e.g.,
```
elasticache-redis
```
,
```
amazon-eks
```
)

Example:

2025-07-15-14-30-00-elasticache-redis-best-practice-checklist.md

Save the file in the current working directory

The checklist output must include:

Title with service name
One table per category (5 tables)
Source annotation legend
Key reference links section

After writing the file, inform the user of the file path.

使用

references/output-template.md

中定义的格式生成检查表内容，然后使用Write工具写入本地Markdown文件。

文件命名规则：

YYYY-mm-dd-HH-MM-SS-{SERVICE}-best-practice-checklist.md

将
```
YYYY-mm-dd-HH-MM-SS
```
替换为当前时间戳（例如
```
2025-07-15-14-30-00
```
）
将
```
{SERVICE}
```
替换为小写、连字符分隔的服务名称（例如
```
elasticache-redis
```
、
```
amazon-eks
```
）

示例：

2025-07-15-14-30-00-elasticache-redis-best-practice-checklist.md

检查表输出必须包含：

含服务名称的标题
每个类别对应一个表格（共5个表格）
来源注释图例
关键参考链接部分

写入文件后，告知用户文件路径。

If live assessment info WAS provided → Skip Checklist, Proceed to Step 8

若提供了实时评估信息 → 跳过检查表，直接进入步骤8

Do NOT generate a separate checklist file. The assessment report (Step 8) will include the full checklist with assessment results in a single, comprehensive document. Generating both would be redundant.

Proceed directly to Step 8.

请勿生成单独的检查表文件。评估报告（步骤8）将包含完整的检查表及评估结果，形成单个综合文档。同时生成两者会冗余。

直接进入步骤8。

Step 7: Offer Next Steps (Checklist-Only Path)

步骤7：提供后续建议（仅适用于仅检查表路径）

This step only applies if you generated a checklist in Step 6 (no live assessment).

After writing the checklist file, suggest:

"I can export this to a spreadsheet if you prefer."
"If you provide AWS credentials and resource identifiers, I can assess a live resource against this checklist."

If the user provided live assessment info in Step 1, skip this step entirely — you should already be proceeding to Step 8.

本步骤仅适用于步骤6中生成了检查表的情况（未进行实时评估）。

写入检查表文件后，建议：

"如果您需要，我可以将其导出为电子表格格式。"
"如果您提供AWS凭证和资源标识符，我可以对照该检查表对运行中的资源进行评估。"

若步骤1中用户已提供实时评估信息，请完全跳过本步骤——应直接进入步骤8。

Step 8: Live Resource Assessment (Optional)

步骤8：实时资源评估（可选）

Only execute this step if the user has provided credentials, region, and resource identifiers. If none were provided, skip this step entirely.

See

references/assessment-workflow.md

for the detailed per-service assessment procedure. The general flow is:

仅当用户提供了凭证、区域和资源标识符时才执行本步骤。若未提供，完全跳过本步骤。

详细的按服务评估流程请参考

references/assessment-workflow.md

。通用流程如下：

8.1 Prepare Environment

8.1 准备环境

If the user provided a credential file path (e.g.,

env.sh

), source it:

bash

source <credential-file-path>

Verify access by running a simple describe command against the target service and region.

若用户提供了凭证文件路径（例如

env.sh

），加载该文件：

bash

source <credential-file-path>

通过对目标服务和区域运行简单的describe命令来验证访问权限。

8.2 Collect Resource Configuration

8.2 收集资源配置

Run the service-specific AWS CLI commands to gather the full configuration of the target resource. Execute independent commands in parallel to save time.

For ElastiCache Redis, the key commands are (see

references/assessment-workflow.md

for the full list):

aws elasticache describe-replication-groups

aws elasticache describe-cache-clusters --show-cache-node-info

aws elasticache describe-cache-subnet-groups

aws elasticache describe-cache-parameters

```
aws elasticache list-tags-for-resource
```
```
aws elasticache describe-snapshots
```
```
aws elasticache describe-events
```

For other services, use the equivalent describe/list commands.

运行特定于服务的AWS CLI命令，收集目标资源的完整配置。可并行执行独立命令以节省时间。

对于ElastiCache Redis，关键命令包括（完整列表请参考

references/assessment-workflow.md

）：

aws elasticache describe-replication-groups

aws elasticache describe-cache-clusters --show-cache-node-info

aws elasticache describe-cache-subnet-groups

aws elasticache describe-cache-parameters

```
aws elasticache list-tags-for-resource
```
```
aws elasticache describe-snapshots
```
```
aws elasticache describe-events
```

对于其他服务，使用对应的describe/list命令。

8.3 Map Configuration to Checklist

8.3 将配置映射到检查表

For each check item in the checklist, determine the assessment status:

Status	Meaning
🟢 PASS	The resource configuration meets or exceeds the recommendation
🔴 FAIL	The resource configuration does not meet the recommendation
🟡 WARN	Cannot be fully verified from infrastructure alone (e.g., client-side settings), or partially meets the recommendation
⚪ N/A	The check does not apply to this resource (e.g., Global Datastore check when cross-region DR is not required)

For each item, record:

The check ID and name (from the checklist)
The assessment status (PASS / FAIL / WARN / N/A)
A specific finding describing what was observed (include actual values)

对于检查表中的每个检查项，确定评估状态：

状态	含义
🟢 通过	资源配置符合或超出建议要求
🔴 未通过	资源配置不符合建议要求
🟡 警告	无法仅通过基础设施完全验证（例如客户端设置），或部分符合建议要求
⚪ 不适用	该检查项不适用于此资源（例如未使用跨区域DR时的Global Datastore检查）

为每个检查项记录：

检查表中的检查ID和名称
评估状态（通过/未通过/警告/不适用）
具体的检查结果描述（包含实际数值）

8.4 Generate Assessment Report

8.4 生成评估报告

Generate the assessment results using the format defined in

references/output-template.md

, then write it to a local markdown file using the Write tool.

This is the ONLY output file when a target resource is provided. The assessment report is self-contained and includes all checklist information (Description, Source, Priority) alongside the assessment results. Do NOT generate a separate checklist file.

File naming:

YYYY-mm-dd-HH-MM-SS-{RESOURCE_ID}-assessment-report.md

Replace
```
YYYY-mm-dd-HH-MM-SS
```
with the current timestamp (e.g.,
```
2025-07-15-14-30-00
```
)
Replace
```
{RESOURCE_ID}
```
with the actual resource identifier, lowercase, hyphens for separators

Example:

2025-07-15-14-30-00-my-redis-cluster-assessment-report.md

Save the file in the current working directory

The report must include:

Resource Summary — key properties of the assessed resource (engine, version, node type, topology, etc.)
Assessment Results by Category — one table per category with full checklist columns (Check Item, Description, Source, Priority) PLUS assessment columns (Status, Finding)
Assessment Summary — counts of PASS/FAIL/WARN/N/A per category
Critical Issues — list of all FAIL items with Priority=High, with specific remediation guidance
Recommendations — grouped by urgency (Immediate / Short-term / Medium-term)
Source Annotations — legend for source abbreviations
Key Reference Links — documentation pages used

After writing the file, inform the user of the file path.

使用

references/output-template.md

中定义的格式生成评估结果，然后使用Write工具写入本地Markdown文件。

当提供目标资源时，这是唯一的输出文件。评估报告是自包含的，包含完整的检查表信息（描述、来源、优先级）以及评估结果。请勿生成单独的检查表文件。

文件命名规则：

YYYY-mm-dd-HH-MM-SS-{RESOURCE_ID}-assessment-report.md

将
```
YYYY-mm-dd-HH-MM-SS
```
替换为当前时间戳（例如
```
2025-07-15-14-30-00
```
）
将
```
{RESOURCE_ID}
```
替换为实际资源标识符，小写，用连字符分隔

示例：

2025-07-15-14-30-00-my-redis-cluster-assessment-report.md

报告必须包含：

资源摘要 — 被评估资源的关键属性（引擎、版本、节点类型、拓扑等）
按类别划分的评估结果 — 每个类别对应一个表格，包含完整的检查表列（检查项、描述、来源、优先级）以及评估列（状态、结果）
评估摘要 — 每个类别中通过/未通过/警告/不适用的数量统计
关键问题 — 所有优先级为高的未通过项列表，包含具体的修复指导
建议 — 按紧急程度分组（立即/短期/中期）
来源注释 — 来源缩写的图例
关键参考链接 — 使用的文档页面

写入文件后，告知用户文件路径。

8.5 Offer Remediation

8.5 提供修复建议

After presenting the assessment results, suggest:

Which FAIL items can be fixed in-place (e.g., enabling backups, adding tags)
Which FAIL items require resource recreation (e.g., encryption at rest)
Whether you can help execute the remediation commands

展示评估结果后，建议：

哪些未通过项可就地修复（例如启用备份、添加标签）
哪些未通过项需要重新创建资源（例如静态加密）
是否需要帮助执行修复命令

Important Guidelines

重要指南

Be comprehensive: Search broadly, read deeply. The value of this skill is completeness. It's better to include a check item and mark it as lower priority than to miss it.
Always cite sources: Every check item must have a source annotation. Users need to know where each recommendation comes from.
Always use sequential requests: All searches and page reads must be executed one at a time, sequentially. Never send multiple aws knowledge mcp server requests in parallel. The MCP server has rate limits that will reject concurrent requests with "Too many requests" errors. Sequential execution is slower but reliable.
Rate limit protection: If any MCP request returns a "Too many requests" error, wait 5 seconds and retry the same request once. If it fails a second time, skip that request and continue with the next step. Do not retry more than once per request.
Focus on actionable items: Each check item should be something the user can verify against their actual configuration. Avoid vague recommendations.
Include specific thresholds: When documentation specifies numbers (e.g., "at least 2 replicas", "reserved-memory-percent >= 25%"), include them in the check description.
Note service-specific nuances: If a check only applies under certain conditions (e.g., "only if cluster mode enabled"), note that in the description.
Live assessment is optional: Never fail or block if the user doesn't provide credentials. The checklist alone is a complete, valuable deliverable.
Respect language: Always output in the same language as the user's conversation.

全面性：广泛搜索、深入阅读。本技能的价值在于完整性。宁可包含一个检查项并标记为低优先级，也不要遗漏。
始终注明来源：每个检查项必须有来源注释。用户需要知道每个建议的出处。
始终使用顺序请求：所有搜索和页面读取必须依次执行。切勿并行发送多个aws knowledge mcp server请求。MCP服务器有速率限制，并发请求会被拒绝并返回"请求过多"错误。顺序执行速度较慢但可靠。
速率限制防护：若任意MCP请求返回"请求过多"错误，等待5秒后重试一次该请求。若再次失败，跳过该请求并继续下一步。每个请求最多重试一次。
聚焦可执行项：每个检查项应是用户可对照实际配置验证的内容。避免模糊的建议。
包含具体阈值：当文档指定数值时（例如"至少2个副本"、"reserved-memory-percent >=25%"），将其包含在检查描述中。
注意服务特定细节：若检查项仅在特定条件下适用（例如"仅当启用集群模式时"），请在描述中注明。
实时评估是可选的：若用户未提供凭证，切勿失败或阻塞流程。仅检查表本身就是完整且有价值的交付物。
尊重语言：始终使用与用户对话相同的语言输出。